Embox begins climbing Elbrus
 Those who follow our project may have noticed that the e2k folder has appeared in the architecture directory, containing the implementation of support for domestic Elbrus processors . A series of articles about porting Embox to domestic platforms would be incomplete without a story about this architecture.
Those who follow our project may have noticed that the e2k folder has appeared in the architecture directory, containing the implementation of support for domestic Elbrus processors . A series of articles about porting Embox to domestic platforms would be incomplete without a story about this architecture. I will make a couple of comments on the content of the article. First, the process of mastering this architecture by us is at the initial stage, we managed to launch Embox on this platform, but we have not yet implemented many necessary parts, this will be discussed in future publications. Second, this architecture is complex, and a
detailed description requires much more text than the format allows.
one article. Therefore, we propose to take this article as introductory,
containing a minimum of technical information about the architecture itself.
Let's get started
The object of research is the layout of the embedded system on Elbrus
Since we are engaged in Embox (and he, if someone does not know, is focused on embedded systems), we were primarily interested in the option that is positioned by the MCST, including for embedded systems. Turning to the MCST, we found out that the company is interested in using its processors for embedded systems. One of the latest solutions for this segment is the E4C-COM board . In the process of communicating with the MCST, it became clear that for porting and mastering the architecture, you can use any of the available machines, and we were given a computer called Monokub for temporary use.. In general, a monocube is not exactly what we are used to in embedded systems. Typically, embedded systems use single-board computers, a chip-system on a chip, or even a microcontroller, while the monocube is a full-fledged computer, but since it has been tested in climatic and mechanics, it can still be considered an embedded system.
Compiler, build, fill image
After receiving the system unit, of course, the question arose - how to fill the image. The MCST uses its own BIOS (first level system loader). The default OS is Elbrus (i.e. Debian with modifications). We are also interested to run your own image. Fortunately, the MCST loader can run images over the network. For this, the ATA over Ethernet protocol is used .
After we were helped to set up the booth and launch the external image over the network, we started developing our own image. For this we needed a compiler. The compiler was not found in the public domain, but since we signed the NDA, we were given Linux binaries. The compiler turned out to be quite a gcc-compatible one, and we didn’t have to change anything, of course, with the exception of the compilation flags that we have in a separate configuration file. That is very predictable, because Linux, albeit with modifications, is built by this compiler.
A couple of technical issues
Those who are engaged in such specific activities as porting the OS to any platform know that the first thing to do is to correctly place the program code in memory. That is, write a linker script (lds) and implement the start code. We pretty quickly figured out the linker script, but when implementing the start code, we encountered the first magic, which we didn’t understand. The fact is that Elbrus has x86-compatibility mode and at 0x00FF0000 address lies the code to which I will simply give a link , since we borrowed it from the MCST example. The linker script contains
.bootinfo : {
   	 _bootinfo_start = .;
   	 /* . = 0x1000000 - 0x10000; *//* 0x00FF0000 */
   	 *(.x86_boot)
   	 . = _bootinfo_start + 0x10000;
   	 _bootinfo_end = .;
    } SECTION_REGION(bootinfo)
.text : {
   	 _start = .;
   	 /* 0x01000000 */
   	 *(.e2k_entry); The very same start code is written in assembly language is not even, but simply on the Si . It is placed in the section located at the address 0x01000000, which by the way quite corresponds to the starting address of conventional x86-machines - there is a multiboot header or another header at this address .
In order to make sure that the start code and addresses are correct, you need to achieve some kind of output. If it turns out to output any character, then, most likely, there will be no problems with the output of strings. Using this output, you can already use the familiar printf () for debugging. In addition, most platforms provide the ability to output characters, making a simple entry in a specific register (since the bootloader has most likely configured the UART as necessary).
Our computer uses the am85c30 serial port controller (aka z85c30we quickly found how to print one character, and that’s enough for our printf to work. Immediately faced with a strange problem - some of the characters printedf print, as if duplicated, but sometimes mixed. For example, when trying to output Hello, world! It turned out something like Hhelellloo ,, woworrlldd. Now it seems obvious that the matter is multi-core, but at first we’ve been tinkering with the driver for a long time. Our monocube is equipped with a dual-core Elbrus-2C + (1891ВМ7Я) (four DSP cores are not counted) and the bootloader activates all the processor cores. As a result, in order not to tinker with multi-core (SMP), all kernels except the first one are sent to an infinite loop. To do this, we have introduced a variable for the number of the processor and increment it with the help of atomic addition. The zero core continues to work, while other kernels loop.
   cpuid = __e2k_atomic32_add(1, &last_cpuid);
    if (cpuid > 1) {
   	 /* XXX currently we support only single core */while(1);
    }
    /* copy of trap table */memcpy((void*)0, &_t_entry, 0x1800);
    kernel_start();Calling kernel_start () is already passing control to our code.
Atomic addition, we also borrowed, for us it looks like magic. But as you know, it works - do not touch!
#define WMB_AFTER_ATOMIC    ".word 0x00008001\n" \
   			 ".word 0x30000084\n"#define __e2k_atomic32_add(__val, __addr) \
({ \
    int __rval; \
    asm volatile ("\n1:" \
   		   "\n\tldw,0 %[addr], %[rval], mas=0x7" \
   		   "\n\tadds %[rval], %[val], %[rval]" \
   		   "\n\t{"\
   		   "\n\tstw,2 %[addr], %[rval], mas=0x2" \
   		   "\n\tibranch 1b ? %%MLOCK" \
   		   "\n\t}" \
   		   WMB_AFTER_ATOMIC \
   		   : [rval] "=&r" (__rval), [addr] "+m" (*(__addr)) \
   		   : [val] "ir" (__val) \
   		   : "memory"); \
    __rval; \
})Another magic that had to be borrowed is some code that is required for all cores. Namely
staticinlinevoide2k_wait_all(void){
    _Pragma ("no_asm_inline")
    asmvolatile("wait \ttrap = %0, ma_c = %1, fl_c = %2, ld_c = %3, ""st_c = %4, all_e = %5, all_c = %6"
   		 : : "i" (0), "i" (1), "i" (1), "i" (0), "i" (0), "i" (1), "i" (1) : "memory");
}As a result, after writing the start code, we not only had messages that were output using printk, but also began loading modules, which in general is not very trivial for off-standard compilers. So once again I note that this time the compatibility with gcc is very pleased.
The next step is usually to launch the interrupt controller and timer, but thinking that we will have to implement not only the support of these devices, but also the architectural code of the interrupt handlers, we decided that we could start from the periphery. Monocube has a PCIe bus, for programmers it looks like a regular PCI. We were primarily interested in two devices: a display controller and a network controller.
The monocube uses the graphics controller from the sm750 series .. This graphics controller for embedded applications, has onboard support for 2d graphics. The chip is soldered directly to the motherboard, as I understand it. The source for the driver under Linux can be found here .
After finding the driver it seemed that our problems were over, it only remained to implement the controller for PCI. more precisely read / write operations of the PCI configuration space to find out the parameters. The implementation of these functions had to be borrowed again. As a result, read records were reduced to macros like
/*
 * Do load with specified MAS
 */#define _E2K_READ_MAS(addr, mas, type, size_letter, chan_letter) \
({ \
	register type res; \
	asm volatile ("ld" #size_letter "," #chan_letter " \t0x0, [%1] %2, %0" \
              	: "=r" (res) \
              	: "r" ((__e2k_ptr_t) (addr)), \
                	"i" (mas)); \
	res; \
})#define _E2K_WRITE_MAS(addr, val, mas, type, size_letter, chan_letter) \
({ \
	asm volatile ("st" #size_letter "," #chan_letter " \t0x0, [%0] %2, %1" \
              	: \
              	: "r" ((__e2k_ptr_t) (addr)), \
                	"r" ((type) (val)), \
   	 	"i" (mas) \
   	   : "memory"); \
})There is some understanding of what is happening. Elbrus has several alternative address spaces, as, for example, in the SPARC architecture . Identification is done using the address space identifier. That is, the same ld command gets to different internal addresses, and also generates read operations of different lengths (8, 16, 32, 64 bits). If in SPARC it is a separate lda / sta command, then in Elbrus due to parameters, this is the ld command. SPARC architectures borrowed register windows . I will postpone a more detailed story for subsequent articles.
In the end, everything worked out for us with PCI. We were able to obtain all the necessary data, transferred the graphics driver to us, but then we ran into the following problem. To draw a picture in video memory, you had to write it twice. Everything pointed to the cache. In order to solve this problem, it was necessary to deal with the MMU, and this, as they say, is not solved with the kit, as, in principle, many other problems that we encountered and will encounter more than once during the development of this architecture.
We have advanced in other directions: interruptions and system calls, but we will also tell about this in the following articles of this subseries. At the end of this article, I just bring the output to the console (via serial port).

findings
As I said in the preface, I want to focus first of all not on technical details, but on general sensations. So, the feelings are contradictory, although of course more positive. On the one hand, the processor exists and is very interesting in terms of architectural features. Based on this processor, computing systems are produced, there is software of sufficiently high quality. As I said, there were no complaints about the compiler (until a certain point, which I will describe later), there is a full-fledged Linux (Elbrus OS). I personally saw how, in the MCST itself, the developer worked right on the desktop with Elbrus architecture.
But on the other hand, I do not understand why it is with such persistence that they are trying to make a banal replacement for this processor with Intel's x86. After all, nowhere in the world do they use processors based onVLIW architecture as universal personal computers. VLIW, by virtue of its architectural features, is a cool number processor, DSPs are made on it, itanium servers are made, graphics cards are made. No excavator course you can dig a hole for planting trees, but is it worth it.
But the main problem impeding the development of architecture, in my opinion, is the closeness of the entire ecosystem. Yes, in order to get a description of the command system, you just need to sign the NDA, but this is not enough. The architecture is unfamiliar and very complex. Yes, I have always believed that some basic software should be developed directly from the processor manufacturer, or in close cooperation with it. According to this principle, the PC on Elbrus has a set of software with OS “Elbrus”. But it’s still too naive to assume that one company, even a large one, can provide quality support to all components: a processor, a compiler, development and debugging tools, system software (various operating systems), application software, .... even Intel can’t do that. The world has long been moving towards the so-called collaboration or joint development.
I will give an example of a problem with the compiler that we stumbled upon. The driver of the serial port at some point stopped displaying characters, and, at first glance, nothing has changed. It turned out that we removed the unused debugging function, which we inserted in order to understand through the disassembler how to pass arguments to the function in an assembler. That is, if the function is present, everything is fine, if not, then the output disappears. At first they sinned on alignment, but it turned out that the function can be transferred to the end of the C-Schnick, so all the characters from the driver turned out to be in the same places as before, but the problem was reproduced. While this problem has not been solved, we continue to investigate in order to show the compiler developers from the MCST or to understand where we made a mistake.
The given problem, in my opinion, could have been avoided if there were more third-party users. At a minimum, the problem would have come to light earlier, or one could simply google what we did wrong.
The problem of closeness is also recognized in the MCST, since the process of discovering unclassified things has nonetheless begun. For example, I have seen Alt-Linux work on Elbrus PC on several conferences. Here are pictures of the display from one of the conferences this year (sorry, which is hard to see, it was dark). We also connected with the development. We hope that we will be useful to the MCST, since, as it turns out, some of the highlights of the Elbrus architecture cannot be supported in Linux (or the costs are very large), for example, tagged memory.

 Another important point when we discussed the problem of closeness with the developers of the MCST, they objected that, for example, the source code of the Linux kernel was open for a long time, but only we and the Dolomant developers asked questions and how they used them.
Another important point when we discussed the problem of closeness with the developers of the MCST, they objected that, for example, the source code of the Linux kernel was open for a long time, but only we and the Dolomant developers asked questions and how they used them. In addition, according to my information, the company MCTS are going to organize a stand
accessible remotely. On which you can build and run software on a PC with the architecture of Elbrus. If there is a similar interest and you want to use the stand, then you should contact me, for example: describe how you plan to use it, how long it takes, and so on, because to share with software changes, you need to organize a schedule. I will give the data to the MCST or link those who wish with the organizers.
Useful links:
Mailing address for users - user [at] mcst.ru
A brief description of the architecture of Elbrus
Embox
PS sources We will show everyone that we did, at the IT-festival TechTrain on September 1-2.