Multiclet: First Practice Tests and Performance
The multiclet debug board came into my hands, and I want to share the results of its testing. I’ll also tell you about several pitfalls, which at first may somewhat spoil the nerves of those who want to personally touch the Multiclet.
Immediately it is worth noting that I only consider development in C (and not in Assembler) because Nowadays, the working hours of programmers are more expensive than megahertz and memory. The C-compiler of Multiclet has a difficult fate, and at the given moment it is in its infancy (in particular, no optimizations have been implemented). The situation promises to improve by the middle / end of the year.
This is the debugging kit НW1-MCp04 (older and more expensive). The processor here operates at a frequency of 80 MHz. RS232 interfaces are divorced, USB (1.1, Full-Speed, 12Mbit) and LAN (10/100 MBit) controllers are installed. There is currently no software support for USB and LAN.
Compilation:
The compilation script didn’t work out of the box. the path to platform-dependent inclusions was not specified inside, you can add it yourself:
Filling firmware into the board:
For the firmware upload to work, you need to install the drivers for PicoTap. Unfortunately, the PicoTap drivers themselves are not signed (!?!), So downloading them is difficult. If you disable driver signature verification in Windows, then Windows (8 x64) blocks their loading due to a known incompatibility.
The solution is to manually select the FTDI driver for PicoTap, and ignore the Windows warning that it does not seem to be suitable for this driver. This makes us hope that the creation of a homemade adapter on the FTDI chip is quite possible.
There is no intra circuit debugging.
We add the function of outputting a string and outputting a character with checking for UART buffer overflow:
and finally:
The body of the inner cycle runs 25 million times. On a Multiclet with a frequency of 80 MHz this code works for 20.3 seconds with the current compiler. Let's calculate the performance of the abstract classic processor with these numbers: 25 million cycles * ~ 5 operations per iteration of the cycle / 20.3 seconds - 6.1 million operations per second.
Those. performance at the moment is obtained at the level of an abstract non-superscalar processor with a frequency of 5-10 MHz. Of course, performance will be significantly improved as the compiler develops.
If we help the compiler a bit, and unroll the loop with our hands:
2.4 GFLOP : If all 4 cells perform only the complex multiplication operation, and nothing more.
800 MFLOP : If all 4 cells perform the remaining arithmetic operations in packed form (i.e. the same operation is performed on both 32-bit halves).
400 MFLOP - If we need to do operations only once, and not in pairs (as is usually the case with non-computational code).
Finally, if we cannot parallelize all 4 cells, then we can only count on 150-300 MFLOP .
Manually optimized assembler code for fast Fourier transform with almost perfect parallelization, with cut-out save-load blocks (developers claim that they will be able to optimize them in the future) gives ~ 1.2 GFLOP (less than 2.4 is obtained precisely because not all operations are complex multiplication, we still need additions / subtraction and others).
1.8V - 0.39A
3.3V - 7.2mA
Accordingly, the power consumption is 0.725W at a frequency of 80MHz (it will be higher at 100MHz).
When the reset is clamped, the consumption drops to 0.3A on the 1.8V bus, and 0.8mA on the 3.3V bus.
PS. I’m not worth writing down as opponents of the Multiclet - I used both hands to ensure that everything worked perfectly for him and he tore all of them, and also with both hands for domestic microelectronics.
PPS As for the question of how remote access should work, I’m quite serious - as long as I have no idea but to flash the sent binary and send back everything that falls out of RS232.
Immediately it is worth noting that I only consider development in C (and not in Assembler) because Nowadays, the working hours of programmers are more expensive than megahertz and memory. The C-compiler of Multiclet has a difficult fate, and at the given moment it is in its infancy (in particular, no optimizations have been implemented). The situation promises to improve by the middle / end of the year.
Iron
This is the debugging kit НW1-MCp04 (older and more expensive). The processor here operates at a frequency of 80 MHz. RS232 interfaces are divorced, USB (1.1, Full-Speed, 12Mbit) and LAN (10/100 MBit) controllers are installed. There is currently no software support for USB and LAN.
IDE
Multiclet IDE is a PSPad with binned hot buttons for compiling and loading the binary into the debug board. Such an IDE seemed to me useless, but fortunately, you can build the project and upload the firmware to the board with scripts:Compilation:
MultiClet\SDK\shell\MultiClet\build_project.cmd <директория с исходниками>
The compilation script didn’t work out of the box. the path to platform-dependent inclusions was not specified inside, you can add it yourself:
rem Ключи запуска препроцессора компилятора Си
set CPP_KEYS=%CPP_KEYS% -Wp-I.. -Wp-I"%INCDIR%" -Wp-I"[..ваш путь..]\MultiClet\Projects\inc\c"
Apparently, the developers of the supplied examples in C also faced this problem, because in some of them, either the contents of the desired inclusions are copied, or the files themselves connected from the SDK are copied. Filling firmware into the board:
MultiClet\SDK\bin\mc-ploader <файл с собранным бинарником>
You need to load while holding the reset button on the board. For the firmware upload to work, you need to install the drivers for PicoTap. Unfortunately, the PicoTap drivers themselves are not signed (!?!), So downloading them is difficult. If you disable driver signature verification in Windows, then Windows (8 x64) blocks their loading due to a known incompatibility.
The solution is to manually select the FTDI driver for PicoTap, and ignore the Windows warning that it does not seem to be suitable for this driver. This makes us hope that the creation of a homemade adapter on the FTDI chip is quite possible.
There is no intra circuit debugging.
Writing Hello World
We will display messages through RS232. When connecting to a computer in a standard male-to-male cable, you need to swap the pins RX and TX (2 and 3). We take the standard uart example, reconfigure it to 115200 baud rate:void uart_init(UART_TypeDef *UART)
{
int port, bitrate, control;
port = 0x00000300; //alternative port function for uart0
bitrate = 0x56;//115200 bps
control = 0x00000003; //rx, tx enable
GPIOB->BPS = port;
UART->BDR = bitrate;
UART->CR = control;
}
The Bitrate value is calculated using the formula 80MHz / 115200/8 = 86 (0x56) or 87. We add the function of outputting a string and outputting a character with checking for UART buffer overflow:
void uart_send_with_delay(char byte, UART_TypeDef *UART)
{
while(uart_fifo_full(UART0) == 1);
uart_send_byte(byte, UART);
}
void uart_puts(char *msg, UART_TypeDef *UART)
{
while(*msg)uart_send_with_delay(*msg++, UART);
}
and finally:
void main()
{
uart_init(UART0); //config uart0
uart_puts("Hello world from Multiclet!!\r\n", UART0);
}
We connect to the COM port with any convenient terminal and get the expected result.Practical performance
Take the simplest test program:float i,j,result;
for(i=0;i<1;i+=0.0002)
for(j=0;j<1;j+=0.0002)
{
result+=i*j;
}
The body of the inner cycle runs 25 million times. On a Multiclet with a frequency of 80 MHz this code works for 20.3 seconds with the current compiler. Let's calculate the performance of the abstract classic processor with these numbers: 25 million cycles * ~ 5 operations per iteration of the cycle / 20.3 seconds - 6.1 million operations per second.
Those. performance at the moment is obtained at the level of an abstract non-superscalar processor with a frequency of 5-10 MHz. Of course, performance will be significantly improved as the compiler develops.
If we help the compiler a bit, and unroll the loop with our hands:
for(i=0;i<1;i+=0.0002)//8 seconds
for(j=0;j<1;j+=0.0008)
{
result+=i*j+i*(j+0.0002)+i*(j+0.0004)+i*(j+0.0006);
}
That test will work for 8 seconds, if you simplify the expression to result + = i * (j * 4 + 0.0012), then 6.8 seconds.Theoretical performance
Finally, it became completely clear what the theoretically achievable performance of the Multiclet with perfect parallelization at a frequency of 100 MHz:2.4 GFLOP : If all 4 cells perform only the complex multiplication operation, and nothing more.
800 MFLOP : If all 4 cells perform the remaining arithmetic operations in packed form (i.e. the same operation is performed on both 32-bit halves).
400 MFLOP - If we need to do operations only once, and not in pairs (as is usually the case with non-computational code).
Finally, if we cannot parallelize all 4 cells, then we can only count on 150-300 MFLOP .
Manually optimized assembler code for fast Fourier transform with almost perfect parallelization, with cut-out save-load blocks (developers claim that they will be able to optimize them in the future) gives ~ 1.2 GFLOP (less than 2.4 is obtained precisely because not all operations are complex multiplication, we still need additions / subtraction and others).
Power consumption
At maximum load:1.8V - 0.39A
3.3V - 7.2mA
Accordingly, the power consumption is 0.725W at a frequency of 80MHz (it will be higher at 100MHz).
When the reset is clamped, the consumption drops to 0.3A on the 1.8V bus, and 0.8mA on the 3.3V bus.
Summary
With the current state of the C compiler, the performance is fatally low (corresponding to 5-10 MHz of an abstract non-superscalar processor) due to non-optimized code. All hope that the developers of the compiler for 2013 will finish it, and then Multiclet will be able to compete with other domestic developments.PS. I’m not worth writing down as opponents of the Multiclet - I used both hands to ensure that everything worked perfectly for him and he tore all of them, and also with both hands for domestic microelectronics.
PPS As for the question of how remote access should work, I’m quite serious - as long as I have no idea but to flash the sent binary and send back everything that falls out of RS232.
Only registered users can participate in the survey. Please come in.
Is it worth organizing public remote access to the Multiclet debug board?
- 58.3% Yes, it’s interesting to dig into it myself (and in the comments I’ll write how it should work in my opinion) 160
- 41.6% No, everything is clear to me already. 114