Development of its own core for embedding in a processor system based on FPGA



    So, in the first article of the cycle it was said that it is best to use a processor system to control our equipment implemented using FPGAs for the Redd complex, after which it was shown during the first and second articles how to make this system. Well, it’s done, we can even select some ready-made kernels from the list to include them in it, but the ultimate goal is to manage our own custom kernels. The time has come to consider how to include an arbitrary kernel in the processor system.

    All articles in the series:
    Development of the simplest “firmware” for FPGAs installed in Redd, and debugging using the memory test as an example
    Development of the simplest “firmware” for FPGAs installed in Redd. Part 2. Program code

    To understand today's theory, you should find and download the Avalon Interface Specifications document , since the Avalon bus is the base bus for the NIOS II system. I will refer to sections, tables and figures for the revision of the document of September 26, 2018.

    We open the section 3 devoted to Memory Mapped Interfaces, or rather - 3.2. Table 9 lists the bus signals. Please note that all of these signals are optional. I did not find a single signal that had “Yes” in the Required column. We may well not forward this or that signal to our device. Therefore, in the simplest case, the bus is extremely simple to implement. The beginning of the table looks like this:



    As you can see, all the signals are very well described (except that this is done in English). Below are the timing charts for various cases. The most-most-simplest case does not raise any questions. I will now take the time chart from the document and cover some of the lines with a translucent fill (they are all optional, we have the right to exclude any of the considerations).



    Fearfully? But everything is simple: we are given the address and the read strobe , we must set the data on the readdata bus. And vice versa: we are given the address, the data on the writedata bus and the write strobe, and we have to snap the data. It’s not at all scary, a typical synchronous bus.

    Covered Lines byteenableneeded for the case when the memory access is not 32-bit words. This is extremely important when we design universal kernels. But when we design a one-day core, we simply write in the document about this core (I am an opponent of the mark in my head, but someone can limit it to this) that we need to use 32-bit words and that’s it. Well, and the response signal , it is very special, and it does not interest us in principle.

    Sometimes it is important that when the equipment is not ready, it is possible to delay the operation of the bus for several clock cycles. In this case, the WaitRequest signal should be added . The timing chart will change as follows:



    While WaitRequestcocked, the master knows that our device is busy. Be careful if this signal is not reset, the whole system will “freeze” upon handling, so only a reboot of the FPGA will be able to reset it. JTAG hangs with the system. The last time I observed this phenomenon was in the preparation of this article, so the memories are still vivid.

    Further in the company document more productive cases of data pipelining and batch transactions are considered, but the task of the article is not to consider all possible options, but to show the reader the way to work, emphasizing that all this is not at all scary, so we will restrict ourselves to these two simple options.

    Let's design some simple device that will periodically become unavailable on the bus. The first thing that comes to mind is the serial interface. While the transmission is in progress, we will make the system wait. And in life, I strongly advise against doing this: the processor will stop until the end of a busy transaction, but this is an ideal case for an article, since the implementing code will be understandable and not very cumbersome. In general, we will make a serial transmitter that can send data and chip selection signals to two devices.



    Let's start with the simplest tire option. Let's make a parallel output port, which forms the signals of the choice of crystals.



    For this, I will take the project obtained in the previous article, but in order to avoid confusion, I will put it in the AVALON_DEMO directory. I will not change the names of other files. In this directory, create the my_cores directory . The directory name can be anything. We will store our cores in it. True, today it will be one. Create a CrazySerial.sv file with the following contents:
    module CrazySerial
    (
       input clk,
       input reset,
       input [1:0] address,
       input write,
       input [31:0] writedata,
       output reg [1:0] cs
    );
    always @(posedge clk, posedge reset)
    begin
       if (reset == 1)
       begin
          cs <= 0;
       end else
       begin
          if (write)
             case (address)
             2'h00: cs <= writedata [1:0];
             default:;
          endcase
       end
    end
    endmodule
    

    Let's get it right. First of all, interface lines. clk and reset are the clock and reset lines. Names line address , the write and writedata taken from a table with a list of signals Memory Mapped Interfaces document.





    In fact, I could give any names. Linking logical lines with physical ones will be done later. But if you give the names, as in the table, the development environment will connect them by itself. Therefore, it is better to take the names from the table.

    Well, cs are the crystal selection lines that will come out of the chip.

    The implementation itself is trivial. When reset, the outputs are zeroed. And so - at each measure we check for a signalwrite . If there is an address equal to zero, then click the data. Of course, it would be possible to add a decoder here, which will prevent the choice of two devices at once, but what is good in life will overload the article. The article provides only the most necessary steps, however, it is noted that in life everything can be done more complicated.

    Perfectly. We are ready to introduce this code into the processor system. We go to Platform Designer , select the system we built in our previous experiments as an input file:



    Pay attention to the New Component item in the upper left corner:



    To add your component, click on this item. In the dialog that opens, fill in the fields. And for the article, fill in only the component name:



    Now go to the tabFiles and click Add File :



    Add the previously created file, select it in the list and click Analyze Synthesis File :



    There are no errors parsing SystemVerilog , but there are several conceptual errors. They are caused by the fact that some lines were incorrectly connected by the development environment. We go to the Signals & Interfaces tab and pay attention here:



    The cs lines were incorrectly assigned to the avalon_slave0 interface , the readdata signal . But then all the other lines were recognized correctly, thanks to the fact that we gave them names from the document table. But what to do with problem lines? They should be assigned to an interface like conduit. To do this, click on the “add interface” item.



    In the drop-down menu, select conduit :



    Get a new interface:



    If desired, you can rename it. True, this will certainly be necessary if we want to make several external interfaces. As part of the article, we will leave it the name conduit_end . Now we hook the cs line with the mouse and drag it into this interface. We must manage to throw a signal under the line conduit_end , then we will be allowed to do this. In other places, the cursor will appear as a crossed out circle. In the end, we should have this:



    Replace the signal type with readdata with, say, chipselect . Final picture:



    But the errors remained. The avalon bus is not assigned a reset signal. We select avalon_slave_0 from the list and look at its properties.



    Replace none with reset . At the same time, we will examine the other properties of the interface.



    It can be seen that addressing is in words. Well, a number of other things from the documentation are configured here. What time diagrams are obtained in this case will be drawn at the very bottom of the properties:



    Actually, there are no more errors. You can click Finish . Our created module appeared in the device tree:



    Add it to the processor system, connect clock signals and reset. We connect the data bus to the Data Master processor. Double click onConduit_end and give the external signal a name, say, lines . It turns out something like this:



    It is important not to forget that since we added a block to the system, we must make sure that it does not conflict with anyone in the address space. In this particular case, there are no conflicts in the figure, but anyway, I will select the menu item System-> Assign Base Addresses .

    All. The block is created, configured, added to the system. Click the Generate HDL button , then Finish .

    We make a rough draft of the project, after which we go to the Pin Planner and assign the legs. I got it this way:



    Which corresponds to the contacts B22 and C22 of the interface connector.

    We make the final assembly, load the processor system into the FPGA. Now we need to refine the program code. Launch Eclipse.

    Let me remind you that I am currently working with a project that is located in a different directory relative to my last work with Redd. In order not to get confused, I will delete old projects from the tree (but only from the tree, without erasing the files themselves).



    Next, click on the right mouse button on an empty tree and select Import from the menu :



    Next - General-> Existing Project into Workspace :



    And just select the directory where the project files are stored:





    Both projects inherited from past experiments will connect to the development environment.



    I will highlight the next item in a frame:
    Each time after changing the hardware configuration, select the Nios II -> Generate BSP menu item for the BSP project again.




    Actually, after this operation, a new block appeared in the \ AVALON_DEMO \ software \ SDRAMtest_bsp \ system.h file :
    /*
     * CrazySerial_0 configuration
     *
     */
    #define ALT_MODULE_CLASS_CrazySerial_0 CrazySerial
    #define CRAZYSERIAL_0_BASE 0x4011020
    #define CRAZYSERIAL_0_IRQ -1
    #define CRAZYSERIAL_0_IRQ_INTERRUPT_CONTROLLER_ID -1
    #define CRAZYSERIAL_0_NAME "/dev/CrazySerial_0"
    #define CRAZYSERIAL_0_SPAN 16
    #define CRAZYSERIAL_0_TYPE "CrazySerial"
    

    First of all, we are interested in the constant CRAZYSERIAL_0_BASE .

    Add the following code to the main () function :
    		while (true)
    		{
    			IOWR_ALTERA_AVALON_PIO_DATA (CRAZYSERIAL_0_BASE,0x00);
    			IOWR_ALTERA_AVALON_PIO_DATA (CRAZYSERIAL_0_BASE,0x01);
    			IOWR_ALTERA_AVALON_PIO_DATA (CRAZYSERIAL_0_BASE,0x02);
    			IOWR_ALTERA_AVALON_PIO_DATA (CRAZYSERIAL_0_BASE,0x03);
    		}
    

    We start debugging and look at the contents of the lines with an oscilloscope. There must be incremental binary code. He is there.



    Moreover, the frequency of access to the ports is just wonderful:



    About 25 MHz - half the bus frequency (2 clock cycles). Sometimes the access time is not 2 cycles, but longer. This is due to the execution of branching operations in the program. In general, the simplest access to the bus works.

    It's time to add for example the functionality of the serial port. To do this, add the waitrequest interface signal related to the bus and a pair of serial port signals - sck and sdo . Total, we get the following code fragment on systemverilog :



    Same text:
    module CrazySerial
    (
       input clk,
       input reset,
       input [1:0] address,
       input write,
       input [31:0] writedata,
       output waitrequest,
       output reg [1:0] cs,
       output reg sck,
       output sdo
    );
    


    According to the rules of good form, you need to make a simple machine that will transmit data. Unfortunately, the most uncomplicated machine in the article will look very difficult. But in fact, if I do not increase the functionality of the machine (and as part of the article I’m not going to do this), then it will have only two states: the transmission is in progress and the transmission is not in progress. Therefore, I can encode the state with one signal:
    reg sending = 0;

    During transmission, I need a bit counter, a clock divider (I’m doing a deliberately slow device) and a shift register for the transmitted data. Add the appropriate registers:
       reg [2:0] bit_cnt = 0;
       reg [3:0] clk_div = 0;
       reg [7:0] shifter = 0;
    

    I will divide the frequency by 10 (guided by the principle of "why not?"). Accordingly, on the fifth step I will cock SCK, and on the tenth - drop this line, after which - go to the next bit of data. On all other measures, simply increase the divider counter. It is important not to forget that on the fourth measure you also need to increase the counter, and on the ninth - zero it. If we omit the transition to the next bit, then the specified logic looks like this:
          if (sending)
          begin
             case (clk_div)
                4: begin 
                   sck <= 1;
                   clk_div <= clk_div + 1;
                end
                9: begin
                   sck <= 0;
                   clk_div <= 0;
    //             <переход к следующему биту>
                end
                default: clk_div <= clk_div + 1;
             endcase
          end else
    

    Going to the next bit is easy. They shifted the shift register, then, if the current bit is the seventh, they stopped working by switching the state of the machine, otherwise they increased the bit counter.
                   shifter <= {shifter[6:0],1'b0};
                   if (bit_cnt == 7)
                   begin
                      sending <= 0;
                   end else
                   begin
                      bit_cnt <= bit_cnt + 1;
                   end
    

    Actually, that's all. The output bit is always taken from the high bit of the shift register:
       assign sdo = shifter [7];
    

    And the most important line for the current revision. The waitrequest signal is cocked to unity always when serial data is being transmitted. That is, it is a copy of the sending signal that sets the state of the machine:
       assign waitrequest = sending;
    

    Well, and when writing to address 1 (I recall, here we have the addressing in 32-bit words), we snap the data into the shift register, zero the counters and start the transfer process:
          if (write)
    //...
             2'h01: begin
                bit_cnt <= 0;
                clk_div <= 0;
                sending <= 1;
                shifter <= writedata [7:0];
             end
             default:;
             endcase
          end
    

    Now I will give all the fragments described as a single text:
    module CrazySerial
    (
       input clk,
       input reset,
       input [1:0] address,
       input write,
       input [31:0] writedata,
       output waitrequest,
       output reg [1:0] cs,
       output reg sck,
       output sdo
    );
       reg sending = 0;
       reg [2:0] bit_cnt = 0;
       reg [3:0] clk_div = 0;
       reg [7:0] shifter = 0;
    always @(posedge clk, posedge reset)
    begin
       if (reset == 1)
       begin
          cs <= 0;
          sck <= 0;
          sending <= 0;
       end else
       begin
          if (sending)
          begin
             case (clk_div)
                4: begin 
                   sck <= 1;
                   clk_div <= clk_div + 1;
                end
                9: begin
                   clk_div <= 0;
                   shifter <= {shifter[6:0],1'b0};
                   sck <= 0;
                   if (bit_cnt == 7)
                   begin
                      sending <= 0;
                   end else
                   begin
                      bit_cnt <= bit_cnt + 1;
                   end
                end
                default: clk_div <= clk_div + 1;
             endcase
          end else
          if (write)
             case (address)
             2'h00: cs <= writedata [1:0];
             2'h01: begin
                bit_cnt <= 0;
                clk_div <= 0;
                sending <= 1;
                shifter <= writedata [7:0];
             end
             default:;
             endcase
          end
       end
       assign sdo = shifter [7];
       assign waitrequest = sending;
    endmodule
    


    We begin to introduce new code into the system. Actually, the path is the same as when creating the component, but some of the steps can already be omitted. Now we’ll just get acquainted with the refinement process. Go to Platform Designer . If we only changed the verilog code, it would be quite simple to perform the Generate HDL operation for the finished system. But since the module has new lines (that is, the interface has changed), it needs to be redone. To do this, select it in the tree, press the right mouse button and select Edit .



    We are editing an existing system. So just go to the Files tab and click Analyze Sinthesis Files :



    Predictably errors occurred. But we already know that the wrong lines are to blame. Therefore, we go to the Signals & Interfaces tab , drag sck and sdo along the same line from the avalon_slave_0 interface to the conduit_end interface :



    We also rename the Signal Type fields for them . The result should be the following:



    Actually, that's all. Click Finish , call the Generate HDL File for the processor system, make a draft draft of the project in Quartus, assign new legs:



    These are the contacts A21 and A22 of the interface connector, make the final assembly, fill in the “firmware” in the FPGA.

    Iron updated. Now the program. Let's go to Eclipse. What do we remember to do there? That's right, do not forget to choose Generate BSP .

    Actually, that's all. It remains to add functionality to the program. Let's transfer a pair of bytes to the serial port, but we will send the first byte to the device selected by the line cs [0] , and the second - cs [1] .
    		IOWR_ALTERA_AVALON_PIO_DATA (CRAZYSERIAL_0_BASE,0x01);
    		IOWR_ALTERA_AVALON_PIO_DATA (CRAZYSERIAL_0_BASE+4,0x12);
    		IOWR_ALTERA_AVALON_PIO_DATA (CRAZYSERIAL_0_BASE,0x02);
    		IOWR_ALTERA_AVALON_PIO_DATA (CRAZYSERIAL_0_BASE+4,0x34);
    		IOWR_ALTERA_AVALON_PIO_DATA (CRAZYSERIAL_0_BASE,0x00);
    

    Please note that there are no availability checks there. Parcels go one after another. Nevertheless, everything turned out quite consistently on the oscilloscope. A



    yellow ray - cs [0] , green - sdo , violet - sck , blue - cs [1] . It can be seen that the 0x12 code went to the first device, 0x34 to the second.

    Reading is done similarly, but I just can’t come up with any beautiful example, except for the banal reading of the contents of the connector foot. But that example is so degenerate that it is not even interesting to do. But here it is worth noting that when reading this bus setting can be extremely important:



    If there is a Read line, then the reading timing chart will appear on the settings dialog. And it will show the influence of this parameter. When reading the legs of the connector, it will still not be noticeable, but when reading from the same FIFO or RAM - completely. RAM can be configured to issue data immediately after the address is submitted, or it can be synchronously issued. In the second case, latency is added. After all, the bus set the address, set the strobe ... But there is no data on the nearest edge of the clock signal. They will appear after this front ... That is, the system has one-latency latency. And it just needs to be taken into account by setting this parameter. In short, if you are not reading what was expected, first of all check if you need to configure latency. The rest - reading is no different from writing.

    Well, let me remind you once again that it is better not to remove the bus readiness for long-term operations, otherwise it is quite possible to drastically reduce system performance. The ready signal is good to hold the transaction for a couple of clock cycles, and not up to 80 clock cycles, as in my example. But firstly, any other example would be inconvenient for the article, and secondly, for one-day kernels, this is quite acceptable. You will be fully aware of your actions and will avoid situations when the bus is blocked. True, if the core survives the time allotted to it, such an assumption can spoil life in the future, when everyone forgets about it, and it slows everything down. But it will be later.

    Nevertheless, we have learned to make the processor core control our cores. Everything is clear with the addressable world, now it's time to deal with the streaming world. But we will do this in the next article, and possibly even several articles.

    Conclusion


    The article shows how an arbitrary Verilog kernel can be connected to control the Nios II processor system. The options for the simplest connection to the Avalon bus, as well as the connection in which the bus can be in a busy state, are shown. Links to the literature are given, from which you can find out other operating modes of the Avalon bus in the Memory Mapped mode.

    The resulting project can be downloaded here .

    Also popular now: