Climbing Elbrus - Reconnaissance in battle. Technical Part 1. Registers, stacks and other technical details

As promised , we continue to talk about the development of Elbrus processors . This article is technical. The information given in the article is not official documentation, because it was obtained during the study of Elbrus much like a black box. But it will certainly be interesting for a better understanding of Elbrus architecture, because although we had official documentation, many details became clear only after lengthy experiments, when Embox still worked.

Recall that in the previous articleWe talked about the basic system boot and the serial port driver. Embox started, but for further advancement we needed interrupts, a system timer and, of course, I would like to include some set of unit tests, and for this we need setjmp. This article will focus on registers, stacks, and other technical details needed to implement all these things.

Let's start with a brief introduction to architecture, which is the minimum information needed to understand what will be discussed later. In the future, we will refer to the information from this section.

Brief Introduction: Stacks

There are three stacks in Elbrus:

Procedure Stack (PS)
Procedure Chain Stack (PCS)
User Stack (US)

Let's analyze them in more detail. The addresses in the figure are conditional, show in which direction the movements are directed - from the larger address to the smaller or vice versa.

The procedure stack (PS) is intended for data allocated to “operational” registers.

For example, it may be function arguments; in “ordinary” architectures, this concept is closest to general purpose registers. Unlike “regular” processor architectures, in E2K, registers used in functions are stacked on a separate stack.

The Stack of Binding Information (PCS) is designed to place information about the previous (calling) procedure and used when returning. The data on the return address, as well as in the case of the registers, is stored in a separate place. Therefore, stack promotion (for example, to exit by exception in C ++) is a more time-consuming process than in “ordinary” architectures. On the other hand, this eliminates stack overflow problems.

Both of these stacks (PS and PCS) are characterized by a base address, size, and current offset. These parameters are set in the PSP and PCSP registers, they are 128-bit and in assembler you need to refer to specific fields (for example, high or low). In addition, the functioning of stacks is closely related to the concept of a register file, more on that below. Interaction with the file occurs through the mechanism of pumping / swapping registers. An active role in this mechanism is played by the so-called “hardware pointer to the top of the stack” of the procedural or stack of binding information, respectively. About it also below. It is important that at each point in time the data of these stacks are either in RAM or in a register file.

It is also worth noting that these stacks (the procedural stack and the stack of binding information) grow up. We came across this when we implemented context_switch.

The user stack is also set by the base address and size. The current pointer is in the register USD.lo. At its core, it is a classic stack that grows down. Only, unlike “ordinary” architectures, information from other stacks (registers and return addresses) does not fit there.

One non-standard, in my opinion, requirement for the boundaries and sizes of the stacks is 4K alignment, with both the base address of the stack and its size must be aligned to 4K. In other architectures, I have not met such a restriction. We came across this detail, again, when we implemented context_switch.

Brief introduction: Registers. Register files. Register windows

Now that we’ve figured out the stacks a bit, we need to understand how the information is presented in them. To do this, we need to introduce a few more concepts.

A register file (RF) is a set of all registers. There are two register files that we need: one file for connecting information (chain file - CF), the other is called a register file (RF), it stores “operational” registers, which are stored on the procedural stack.

Register window is the area (set of registers) of the register file that is currently available.

I will explain in more detail. What is a set of registers, I think, no one needs to explain.

It is well known that one of the bottlenecks in x86 architecture is precisely a small number of registers. In RISC architectures with registers it is simpler, usually around 16 registers, of which several (2-3) are occupied for official needs. Why not just make 128 registers, because it would seem that this will increase system performance? The answer is quite simple: a processor instruction needs a place to store the register address, and if there are a lot of them, a lot of bits are also needed for this. Therefore, they go to all sorts of tricks, make shadow registers, register banks, windows and so on. By shadow registers, I mean the principle of register organization in ARM. If an interruption or other situation occurs, then a different set of registers with the same names (numbers) is available, while the information stored in the original set remains there. Register banks

Register windows were invented to optimize the work with the stack. As you probably understand, in a “normal” architecture you enter a procedure, save registers to the stack (or the calling procedure saves, depends on the agreement) and you can use registers, because the information is already stored on the stack. But memory access is slow, and therefore should be avoided. When entering the procedure, let's just make a new set of registers available, the data on the old one will be saved, which means that you don’t need to dump it into memory. Moreover, when you return back to the calling procedure, the previous register window will also return, therefore, all data on the registers will be relevant. This is the concept of a register window.

It is clear that you still need to save the registers on the stack (in memory), but this can be done when the free register windows have ended.

And what to do with the input and output registers (arguments when entering the function and the returned result)? Let the window contain part of the registers visible from the previous window, more precisely, part of the registers will be available for both windows. Then, in general, when calling a function, you do not have to access memory. Suppose our registers look like this.

That is, r0 in the first window will be the same register as r2 in the zero, and r1 from the first window will be the same register as r3. That is, writing in r2 before calling the procedure (changing the window number) we get the value in r0 in the called procedure. This principle is called the mechanism of rotating windows.

Let's optimize a little more, because the creators of Elbrus did just that. Let the windows we have will not be a fixed size, but variable, the window size can be set at the time of entry into the procedure. We will do the same with the number of rotated registers. This of course will lead us to some problems, because if there is a window index in the classic rotatable windows, through which it is determined that you need to save the data from the register file onto the stack or load it. But if you enter not the window index, but the register index from which our current window starts, then this problem will not arise. In Elbrus, these indices are contained in the registers PSHTP (for the PS procedure stack) and PCSHTP (for the PCS procedural information stack). The documentation refers to “hardware pointers to the top of the stack”. Now you can try again to read about the stacks, I think it will be more clear.

As you understand, such a mechanism implies that you have the ability to control what is in memory. That is, synchronize the register file and the stack. I mean a system programmer. If you are an application programmer, the equipment will provide transparent entry and exit from the procedure. That is, if there are not enough registers when trying to select a new window, then the register window will automatically “pump out”. In this case, all data from the register file will be saved on the appropriate stack (in memory), and the “pointer to the hardware top of the stack” (offset index) will be reset to zero. Similarly, swapping a register file from the stack occurs automatically. But if you are developing, for example, context switching, which is exactly what we did, then you need a mechanism for working with the hidden part of the register file. In Elbrus, the FLUSHR and FLUSHC commands are used for this. FLUSHR - clearing the register file, all windows except the current one are flushed to the procedural stack, the PSHTP index is accordingly reset to zero. FLUSHC - clearing the binding information file, everything except the current window is dumped onto the binding information stack, the PCSHTP index is also reset to zero.

Brief Introduction: Implementation in Elbrus

Now that we have discussed the non-obvious work with registers and stacks, we will talk more specifically about various situations in Elbrus.

When we enter the next function, the processor creates two windows: a window on the PS stack and a window on the PCS stack.

A window in the PCS stack contains the information necessary to return from a function: for example, IP (Instruction Pointer) of the instruction where you will need to return from the function. With this, everything is more or less clear.

The window on the PS stack is a little trickier. The concept of registers of the current window is introduced. In this window, you have access to the registers of the current window -% dr0,% dr1, ...,% dr15, ... That is, for us, as a user, they are always numbered from 0, but this is numbering relative to the base address of the current window. Through these registers, the arguments are passed when the function is called, and the value is returned, and the function is used as general purpose registers inside the function. Actually, this was explained when considering the mechanism of rotating register windows.

The size of the register window in Elbrus can be controlled. This, as I said, is necessary for optimization. For example, in a function we need only 4 registers for passing arguments and some calculations, in this case the programmer (or compiler) decides how many registers to allocate for the function, and based on this sets the window size. The window size is set by the setwd operation:

	setwd wsz=0x10

Sets the window size in terms of quad-registers (128-bit registers).

Now, let's say you want to call a function from a function. For this, the already described concept of a rotated register window is used. The picture above shows a fragment of a register file where a function with window 1 (green) calls a function with window 2 (orange). In each of these two functions you will have access to your% dr0,% dr1, ... But the arguments will be passed through the so-called rotary registers. In other words, part of the registers of window 1 will become the registers of window 2 (note that these two windows intersect). These registers are also set by the window (see Rotary registers in the picture) and have the address% db [0],% db [1], ... Thus, the% dr0 register in window 2 is nothing more than the% db [0] register in window 1.

The rotation register window is set by the setbn operation:

	setbn   rbs = 0x3, rsz = 0x8

rbs sets the size of the rotated window, and rsz sets the base address, but relative to the current register window. Those. Here we have allocated 3 registers, starting from the 8th.

Based on the foregoing, we show how the function call looks. For simplicity, we assume that the function takes one argument:

voidmy_func(uint64_t a){
}

Then, to call this function, you need to prepare a window of rotary registers (we already did this through setbn). Next, in the% db0 register we put the value that will be passed to my_func. After this, you need to call the CALL instruction and do not forget to tell her where the window of rotated registers begins. We are skipping the preparation for the call (the disp command), because it is not case-sensitive. As a result, in assembler, a call to this function should look like this:

	addd 0, %dr9, %db[0]
	disp %ctpr1, my_func
	call %ctpr1, wbs = 0x8

So, with registers a little figured out. Now let's look at the stack of binding information. It stores the so-called CR registers. In fact, two - CR0, CR1. And they already contain the information necessary for returning from the function.

The registers CR0 and CR1 of the window of the function that called the function with the registers marked in orange are green. The CR0 registers contain the Instruction Pointer of the calling function and a certain predicate file (PF-Predicate File), a story about it is definitely beyond the scope of this article.

The CR1 registers contain data such as PSR (word processor status), window number, window sizes, and so on. In Elbrus, everything is so flexible that each procedure stores information in CR1 even about whether floating point operation is included in the procedure, and a register containing information about software exceptions, but for this, of course, you have to pay for saving additional information.

It is very important not to forget that the register file and the binding information file can be pumped out and swapped out of the main memory and vice versa (from the PS and PCS stacks described above). This point is important when implementing setjmp described later.

SETJMP / LONGJMP

And finally, at least somehow understanding how the stacks and registers are arranged in Elbrus, you can start to do something useful, that is, add new functionality to Embox.

In Embox, the unit testing system requires setjmp / longjmp, so we had to implement these functions.

For the implementation, it is required to save / restore the registers: CR0, CR1, PSP, PCSP, USD, - already familiar to us from a brief introduction. In essence, saving / restoring is implemented in our forehead, but there is a significant nuance that was often hinted at in the description of stacks and registers, namely: stacks must be synchronized, because they are located not only in memory, but also in the register file. This nuance means that you need to take care of several features, without which nothing will work.

The first feature is to disable interrupts during save and restore. When restoring an interrupt, it is mandatory to prohibit, otherwise, a situation may arise in which we enter the interrupt handler with half-switched stacks (referring to pumping out the register file swapping described in the “short description”). And when saving, the problem is that after entering and exiting the interrupt, the processor can again swap part of the register file from the RAM (and this will ruin the invariant conditions PSHTP = 0 and PSCHTP = 0, about them a little lower). Actually, therefore, in setjmp and longjmp interrupts must be disabled. It should also be noted here that specialists from the MCST advised us to use atomic brackets instead of disabling interrupts, but for now we use the simplest (understandable to us) implementation.

The second feature is related to pumping / pumping out a register file from memory. It is as follows. The register file has a limited size and therefore is often pumped into memory and vice versa. Therefore, if we simply store the values of the PSP and PSHTP registers, then we will fix the value of the current pointer in memory and in the register file. But since the register file is changing, at the time of context restoration it will indicate already incorrect (not the ones that we “saved”) data. In order to avoid this, you need to flush the entire register file into memory. Thus, when saving to setjmp, we have PСSP.ind registers in memory and PСSHTP.ind registers in the register window. It turns out that you need to save the whole PCSP.ind + PCSHTP.ind registers. The following is the function that performs this operation:

/* First arg is PCSP, 2nd arg is PCSHTP
 * Returns new PCSP value with updated PCSP.ind
 */
.type update_pcsp_ind,@function
$update_pcsp_ind:
	setwd wsz = 0x4, nfx = 0x0/* Here and below, 10 is size of PCSHTP.ind. Here we
 	* extend the sign of PCSHTP.ind */
	shld %dr1, (64 - 10), %dr1
	shrd %dr1, (64 - 10), %dr1
	/* Finally, PCSP.ind += PCSHTP.ind */
	addd %dr1, %dr0, %dr0
	E2K_ASM_RETURN

It is also necessary to clarify a small point in this code described in the comment, namely, it is necessary to programmatically expand the character in the PCSHTP.ind index, because the index can be negative and stored in additional code. To do this, we first shift to (64-10) to the left (64-bit register), to a 10-bit field, and then back.

The same goes for the PSP (procedure stack)

/* First arg is PSP, 2nd arg is PSHTP
 * Returns new PSP value with updated PSP.ind
 */
.type update_psp_ind,@function
$update_psp_ind:
	setwd wsz = 0x4, nfx = 0x0/* Here and below, 12 is size of PSHTP.ind. Here we
 	* extend the sign of PSHTP.ind as stated in documentation */
	shld %dr1, (64 - 12), %dr1
	shrd %dr1, (64 - 12), %dr1
	muld %dr1, 2, %dr1
	/* Finally, PSP.ind += PSHTP.ind */
	addd %dr1, %dr0, %dr0
	E2K_ASM_RETURN

With a slight difference (the field is 12 bits, and the registers are counted there in 128-bit terms, that is, the value must be multiplied by 2).

Setjmp code itself

C_ENTRY(setjmp):
	setwd wsz = 0x14, nfx = 0x0/* It's for db[N] registers */
	setbn rsz = 0x3, rbs = 0x10, rcur = 0x0/* We must disable interrupts here */
	disp %ctpr1, ipl_save
	ipd  3
	call %ctpr1, wbs = 0x10/* Store current IPL to dr9 */
	addd 0, %db[0], %dr9
	/* Store some registers to jmp_buf */
	rrd %cr0.hi, %dr1
	rrd %cr1.lo, %dr2
	rrd %cr1.hi, %dr3
	rrd %usd.lo, %dr4
	rrd %usd.hi, %dr5
	/* Prepare RF stack to flush in longjmp */
	rrd %psp.hi, %dr6
	rrd %pshtp,  %dr7
	addd 0, %dr6, %db[0]
	addd 0, %dr7, %db[1]
	disp %ctpr1, update_psp_ind
	ipd  3
	call %ctpr1, wbs = 0x10
	addd 0, %db[0], %dr6
	/* Prepare CF stack to flush in longjmp */
	rrd %pcsp.hi, %dr7
	rrd %pcshtp,  %dr8
	addd 0, %dr7, %db[0]
	addd 0, %dr8, %db[1]
	disp %ctpr1, update_pcsp_ind
	ipd  3
	call %ctpr1, wbs = 0x10
	addd 0, %db[0], %dr7
	std %dr1, [%dr0 + E2K_JMBBUFF_CR0_HI]
	std %dr2, [%dr0 + E2K_JMBBUFF_CR1_LO]
	std %dr3, [%dr0 + E2K_JMBBUFF_CR1_HI]
	std %dr4, [%dr0 + E2K_JMBBUFF_USD_LO]
	std %dr5, [%dr0 + E2K_JMBBUFF_USD_HI]
	std %dr6, [%dr0 + E2K_JMBBUFF_PSP_HI]
	std %dr7, [%dr0 + E2K_JMBBUFF_PCSP_HI]
	/* Enable interrupts */
	addd 0, %dr9, %db[0]
	disp %ctpr1, ipl_restore
	ipd  3
	call %ctpr1, wbs = 0x10/* return 0 */
	adds 0, 0, %r0
	E2K_ASM_RETURN

When implementing longjmp, it is important not to forget about the synchronization of both register files, therefore, you need to flush not only the register window (flushr), but also flush the binder file (flushc). Let's describe the macro:

#define E2K_ASM_FLUSH_CPU \
	flushr; \
	nop 2;  \
	flushc; \
	nop 3;

Now that all the information is in memory, we can safely do register recovery in longjmp.

C_ENTRY(longjmp):
	setwd wsz = 0x14, nfx = 0x0
	setbn rsz = 0x3, rbs = 0x10, rcur = 0x0/* We must disable interrupts here */
	disp %ctpr1, ipl_save
	ipd  3
	call %ctpr1, wbs = 0x10/* Store current IPL to dr9 */
	addd 0, %db[0], %dr9
	/* We have to flush both RF and CF to memory because saved values
 	* of P[C]SHTP can be not valid here. */
	E2K_ASM_FLUSH_CPU
	/* Load registers previously saved in setjmp. */
	ldd [%dr0 + E2K_JMBBUFF_CR0_HI], %dr2
	ldd [%dr0 + E2K_JMBBUFF_CR1_LO], %dr3
	ldd [%dr0 + E2K_JMBBUFF_CR1_HI], %dr4
	ldd [%dr0 + E2K_JMBBUFF_USD_LO], %dr5
	ldd [%dr0 + E2K_JMBBUFF_USD_HI], %dr6
	ldd [%dr0 + E2K_JMBBUFF_PSP_HI], %dr7
	ldd [%dr0 + E2K_JMBBUFF_PCSP_HI], %dr8
	rwd %dr2, %cr0.hi
	rwd %dr3, %cr1.lo
	rwd %dr4, %cr1.hi
	rwd %dr5, %usd.lo
	rwd %dr6, %usd.hi
	rwd %dr7, %psp.hi
	rwd %dr8, %pcsp.hi
	/* Enable interrupts */
	addd 0, %dr9, %db[0]
	disp %ctpr1, ipl_restore
	ipd  3
	call %ctpr1, wbs = 0x10/* Actually, we return to setjmp caller with second
	* argument of longjmp stored on r1 register. */
	adds 0, %r1, %r0
	E2K_ASM_RETURN

Context switch

After we figured out setjmp / longjmp, the basic implementation of context_switch seemed clear enough to us. Indeed, as in the first case, we need to save / restore the registers of connecting information and stacks, plus we need to correctly restore the processor status register (UPSR).

I will explain. As in the case of setjmp, when saving registers, you first need to reset the register file and the binding information file into memory (flushr + flushc). After that, we need to save the current values of the registers CR0 and CR1 so that when we return, jump to exactly where the current stream was switched from. Next, we save the descriptors of the PS, PCS, and US stacks. And finally, you need to take care of the correct restoration of the interrupt mode - for these purposes, we also save the UPSR register.

Assembler code context_switch:

C_ENTRY(context_switch):
    setwd wsz = 0x10, nfx = 0x0/* Save prev UPSR */
    rrd %upsr, %dr2
    std %dr2, [%dr0 + E2K_CTX_UPSR]
    /* Disable interrupts before saving/restoring context */
    rrd   %upsr, %dr2
    andnd %dr2, (UPSR_IE | UPSR_NMIE), %dr2
    rwd   %dr2, %upsr
    E2K_ASM_FLUSH_CPU
    /* Save prev CRs */
    rrd %cr0.lo, %dr2
    rrd %cr0.hi, %dr3
    rrd %cr1.lo, %dr4
    rrd %cr1.hi, %dr5
    std %dr2, [%dr0 + E2K_CTX_CR0_LO]
    std %dr3, [%dr0 + E2K_CTX_CR0_HI]
    std %dr4, [%dr0 + E2K_CTX_CR1_LO]
    std %dr5, [%dr0 + E2K_CTX_CR1_HI]
    /* Save prev stacks */
    rrd %usd.lo,  %dr3
    rrd %usd.hi,  %dr4
    rrd %psp.lo,  %dr5
    rrd %psp.hi,  %dr6
    rrd %pcsp.lo, %dr7
    rrd %pcsp.hi, %dr8
    std %dr3, [%dr0 + E2K_CTX_USD_LO]
    std %dr4, [%dr0 + E2K_CTX_USD_HI]
    std %dr5, [%dr0 + E2K_CTX_PSP_LO]
    std %dr6, [%dr0 + E2K_CTX_PSP_HI]
    std %dr7, [%dr0 + E2K_CTX_PCSP_LO]
    std %dr8, [%dr0 + E2K_CTX_PCSP_HI]
    /* Load next CRs */
    ldd [%dr1 + E2K_CTX_CR0_LO], %dr2
    ldd [%dr1 + E2K_CTX_CR0_HI], %dr3
    ldd [%dr1 + E2K_CTX_CR1_LO], %dr4
    ldd [%dr1 + E2K_CTX_CR1_HI], %dr5
    rwd %dr2, %cr0.lo
    rwd %dr3, %cr0.hi
    rwd %dr4, %cr1.lo
    rwd %dr5, %cr1.hi
    /* Load next stacks */
    ldd [%dr1 + E2K_CTX_USD_LO],  %dr3
    ldd [%dr1 + E2K_CTX_USD_HI],  %dr4
    ldd [%dr1 + E2K_CTX_PSP_LO],  %dr5
    ldd [%dr1 + E2K_CTX_PSP_HI],  %dr6
    ldd [%dr1 + E2K_CTX_PCSP_LO], %dr7
    ldd [%dr1 + E2K_CTX_PCSP_HI], %dr8
    rwd %dr3, %usd.lo
    rwd %dr4, %usd.hi
    rwd %dr5, %psp.lo
    rwd %dr6, %psp.hi
    rwd %dr7, %pcsp.lo
    rwd %dr8, %pcsp.hi
    /* Restore next UPSR */
    ldd [%dr1 + E2K_CTX_UPSR],	%dr2
    rwd %dr2, %upsr
    E2K_ASM_RETURN

Another important point is the initialization of the OS thread. In Embox, each thread has a certain primary procedure

void _NORETURN thread_trampoline(void);

in which all further work of the stream will be executed. Thus, we need to somehow prepare the stacks for calling this function, it is here that we are faced with the fact that there are three stacks, and they do not grow in the same direction. By architecture, we create a stream with a single stack, or rather, it has a single place under the stack, at the top we have a structure that describes the stream itself, and so on, here we had to take care of different stacks, not to forget that they should be aligned on 4 kB, do not forget all sorts of access rights and so on.

As a result, at the moment we decided that we will divide the space under the stack into three parts, a quarter under the stack of binding information, a quarter under the procedural stack and half under the user stack.

I bring the code so that you can evaluate how large it is, you need to consider that this is minimal initialization.

/* This value is used for both stack base and size align. */#define E2K_STACK_ALIGN (1UL << 12)#define round_down(x, bound) ((x) & ~((bound) - 1))/* Reserve 1/4 for PSP stack, 1/4 for PCSP stack, and 1/2 for USD stack */#define PSP_CALC_STACK_BASE(sp, size) binalign_bound(sp - size, E2K_STACK_ALIGN)#define PSP_CALC_STACK_SIZE(sp, size) binalign_bound((size) / 4, E2K_STACK_ALIGN)#define PCSP_CALC_STACK_BASE(sp, size) \
    (PSP_CALC_STACK_BASE(sp, size) + PSP_CALC_STACK_SIZE(sp, size))#define PCSP_CALC_STACK_SIZE(sp, size) binalign_bound((size) / 4, E2K_STACK_ALIGN)#define USD_CALC_STACK_BASE(sp, size) round_down(sp, E2K_STACK_ALIGN)#define USD_CALC_STACK_SIZE(sp, size) \
    round_down(USD_CALC_STACK_BASE(sp, size) - PCSP_CALC_STACK_BASE(sp, size),\
   	 E2K_STACK_ALIGN)staticvoide2k_calculate_stacks(struct context *ctx, uint64_t sp,
    uint64_t size){
    uint64_t psp_size, pcsp_size, usd_size;
    log_debug("Stacks:\n");
    ctx->psp_lo |= PSP_CALC_STACK_BASE(sp, size) << PSP_BASE;
    ctx->psp_lo |= E2_RWAR_RW_ENABLE << PSP_RW;
    psp_size = PSP_CALC_STACK_SIZE(sp, size);
    assert(psp_size);
    ctx->psp_hi |= psp_size << PSP_SIZE;
    log_debug("  PSP.base=0x%lx, PSP.size=0x%lx\n",
   	 PSP_CALC_STACK_BASE(sp, size), psp_size);
    ctx->pcsp_lo |= PCSP_CALC_STACK_BASE(sp, size) << PCSP_BASE;
    ctx->pcsp_lo |= E2_RWAR_RW_ENABLE << PCSP_RW;
    pcsp_size = PCSP_CALC_STACK_SIZE(sp, size);
    assert(pcsp_size);
    ctx->pcsp_hi |= pcsp_size << PCSP_SIZE;
    log_debug("  PCSP.base=0x%lx, PCSP.size=0x%lx\n",
   	 PCSP_CALC_STACK_BASE(sp, size), pcsp_size);
    ctx->usd_lo |= USD_CALC_STACK_BASE(sp, size) << USD_BASE;
    usd_size = USD_CALC_STACK_SIZE(sp, size);
    assert(usd_size);
    ctx->usd_hi |= usd_size << USD_SIZE;
    log_debug("  USD.base=0x%lx, USD.size=0x%lx\n",
   	 USD_CALC_STACK_BASE(sp, size), usd_size);
}
staticvoide2k_calculate_crs(struct context *ctx, uint64_t routine_addr){
    uint64_t usd_size = (ctx->usd_hi >> USD_SIZE) & USD_SIZE_MASK;
    /* Reserve space in hardware stacks for @routine_addr *//* Remark: We do not update psp.hi to reserve space for arguments,
     * since routine do not accepts any arguments. */
    ctx->pcsp_hi |= SZ_OF_CR0_CR1 << PCSP_IND;
    ctx->cr0_hi |= (routine_addr >> CR0_IP) << CR0_IP;
    ctx->cr1_lo |= PSR_ALL_IRQ_ENABLED << CR1_PSR;
    /* Divide on 16 because it field contains size in terms
     * of 128 bit values. */
    ctx->cr1_hi |= (usd_size >> 4) << CR1_USSZ;
}
voidcontext_init(struct context *ctx, unsignedint flags,
   	 void (*routine_fn)(void), void *sp, unsignedint stack_size){
    memset(ctx, 0, sizeof(*ctx));
    e2k_calculate_stacks(ctx, sp, stack_size);
    e2k_calculate_crs(ctx, (uint64_t) routine_fn);
    if (!(flags & CONTEXT_IRQDISABLE)) {
   	 ctx->upsr |= (UPSR_IE | UPSR_NMIE);
    }
}

The article also contained work with interrupts, exceptions and timers, but since it turned out so big, we decided to talk about it in the next part .

Just in case, I repeat, this material is not official documentation! To obtain official support, documentation, and the rest, you need to apply directly to the MCST . The code in Embox , of course, is open, but in order to compile it, you will need a cross-compiler, which, again, can be obtained from the MCST .

Tags: