Experience using the FPGA board DE10-Standard and DMA PL330



    I got a Terasic DE10-Standard board at my disposal. It has a lot of interesting things: a built-in JTAG programmer, LEDs, switches, buttons, Audio / VGA / USB / Ethernet connectors. I think that there is no particular need to list all its features, because everyone can read the specification of the board on the manufacturer’s website .

    It is important for me that the FPGA chip Cyclone V SX - 5CSXFC6D6F31C6N is on the board. This chip contains two ARM Cortex-A9 processors and 110K FPGA logic elements. This is a real SoC HPS: System-On-Chip, Hard Processor System. With such resources, you can try to do fairly complex projects. Next, I’ll talk about my experience using the board.

    It is very simple to download a Linux image from the terasika website and prepare a bootable SD card for the DE10-Standard board. Linux OS boots from SD, the board comes to life and demonstrates its capabilities.



    Once Linux has booted, you can find the icon of the test application on the desktop. This test GUI application allows you to turn individual board LEDs on and off, watch the status of switches and buttons, set values ​​on 7-segment indicators, and so on. There are many interesting things. When I played around with this program, you think that now you can quickly make your own project and launch it quickly.

    I do not consider myself a FPGA novice developer. I did projects with software processorsand imagine what a Linux kernel is. I myself am involved in the development of development boards with Altera / Intel FPGAs . However, to be honest, this is my first experience with the HPS-FPGA SoC. When I started to make my own project for this board, specifically for this FPGA, I realized that it would not really be easy.



    Of course, the DE10-Standard is not in itself to blame. There is some illusion of ease of development: here is the image of the SD card from terasik, there are source codes for an example project for FPGAs and source codes for test programs. It would seem, take it to fit your needs and everything will work. But no.

    You need to understand that the first impression of ease of development is that it is deceptive. As you can see at a glance - this is just the tip of the iceberg.

    I had to read and study a lot, for example:

    1. The DE10-Standard_v.1.2.0_SystemCD disk is attached to the board and it contains 38 PDF files of various documentation, diagrams and manuals.

    2. Intel’s “ Cyclone V Hard Processor System Technical Reference Manual ” is simply a technical description of 3,536 pages.

    Of course, it’s not necessary to read them all at once, I thought I won’t read them at all, I will manage my knowledge and experience, but still I had to read and understand.

    There is still a good resourcewhich contains even more documentation, source code examples and even a forum. This makes life easier on the one hand, and even more difficult on the other hand, because you have to read and absorb even more information ... Unfortunately, even on the rocketboard forum, it is not always possible to find answers to your questions.

    Thus, the development of the project is quite complicated, because the subject matter of the SoC HPS is complex.



    Imagine a clockwork with many gears. Each gear should exactly fit the next - otherwise nothing will spin. The same is true for the HPS-FPGA system. The system consists of so many software and hardware components: Preloader, U-BOOT, Linux kernel and drivers, a DTB file is generated from a DTS file, then you still need to create RootFS and, of course, develop the hardware system in your FPGA: FPGA SoC project will contain several IP blocks, hardware registers mapped to memory, clock frequencies and domains, I / O signals and so on and so forth ...

    I assume that I know how to create a project for FPGA for my SoC, and I think that it should work well somewhere around 80% since I do not see obvious errors in the project. I also think that I roughly know how to write a DTS file that describes my hardware platform. Suppose I am sure that I wrote the DTS file correctly 80%. A DTB file is generated from a DTS file. Then, to my FPGA hardware I have to write a kernel driver. It is not easy, but I can write drivers. I hope I didn’t make many mistakes there? I hope my driver is at least 80% correct. But what about Preloader? Preloader is the first program to be read and launched from the SD card and it must program the necessary hardware configuration registers of the system on a chip. Did I Preloader Right? Well, let's say I'm 80% sure. Now if you think about it, then what is the likelihood that my system will work? I think somewhere like this: 0.8 * 0.8 * 0.8 * 0.8 = 0.4096 ... The more components in the system, the worse. If something does not work or nothing works at all (for example, kernel panic), then it is rather difficult to understand where the problem is - it can be everywhere.

    The purpose of my work was to make a HPS-FPGA project that uses DMA transactions to transfer data from system memory to FPGAs and back from FPGAs to system memory. Using DMA should offload the processor. There was already an article on the hub about the implementation of DMA in Cyclone V FPGAs , however, I did not want to go by creating my own controller, as Des333 did ... I wanted to use the PL330 controller already in the system.

    Working with the DE10-Standard Board for some time I gained invaluable experience. If you will, I want to give some advice to those who decide to start developing SoC HPS in FPGAs.

    Prepare the board for development


    This is probably advice from the category of obvious. There is an SD card image that contains the necessary files to start the system: FPGA image, DTB file, U-BOOT and Linux kernel zImage. Additional sections contain Preloader and RootFS. If I am developing a SoC HPS project for FPGAs, I compile it in the Quartus Prime CAD environment and the result (RBF, Raw Binary File) should be written to the SD card. Then I compile the Linux kernel and my driver as part of the kernel. I also need to write the resulting files to the SD Card.

    It makes no sense to remove the SD card from the board, insert it into the card reader of a computer or laptop to write files to the card. This may take too long. In addition, frequent plug / unplug can damage the SD card slot in the board or laptop. I recommend configuring U-BOOT so that the necessary files are downloaded from the network from the TFTP server.

    The board has a UART-to-USB connector for connecting it via a USB cable to the developer's computer. I open the terminal program, for example, PUTTY, and turn on the board. You can see right away how messages from U-BOOT ran in the terminal. You can interrupt the download by immediately pressing any key in the terminal.

    I have added several variables to the U-BOOT environment:

    ethaddr=fe:cd:12:34:56:67
    ipaddr=10.8.0.97
    serverip=10.8.0.36
    xfpga=tftpboot 100 socfpga.rbf; fpga load0100 $filesize; run bridge_enable_handoff;    								tftpboot 100 socfpga.dtb
    xload=run xfpga; tftpboot 8000 zImage; bootz 8000 – 100

    On the IP 10.8.0.36 development computer, I installed the TFTP server. In the / tftpboot folder, I store socfpga.rbf (Raw Binary File), the result of compiling the SoC FPGA project in Quartus Prime. In addition, in the same folder I store socfpga.dtb - the corresponding Device Tree Blob and Linux kernel zImage file.

    Now, when I turn on the power on the board, I immediately interrupt the normal download by pressing any key in the terminal and enter the command:
    >run xload

    With this command, U-BOOT downloads the necessary files from the TFTP server, initializes the FPGA in the last compiled image of the project and loads my last zImage. Quick and easy. When I make a change in the FPGA project, I compile the project with a quartus, copy the result to the / tftpboot folder. In the same way, I compile the Linux kernel, I copy the result of the compilation to the / ftfpboot folder . I reboot the board, do “run xload” and now you can try to debug the new system.

    2. Try to find the opensource example SoC-HPS project that is as similar as possible to what you are going to do.


    Captain obvious. Of course, a competent engineer can do everything himself from scratch. However, you can save a little time if you find the source of the project, similar to the one you are going to do.

    Initially, DE10-Standard_v.1.2.0_SystemCD contains two sample projects for the HPS-FPGA. The first project is DE10_Standard_GHRD, which represents minimal features, the Linux console, simple peripheral mapped peripherals like input / output ports for LEDs, buttons, and switches. The second example, DE10_Standard_FB, is more complicated. Here already in FPGA a framebuffer, a video controller, a device for capturing and decoding a video signal and a number of other features are implemented. This allows you to run a full desktop Linux. If you are satisfied with these examples, then everything is fine, take and use.

    Personally, I wanted to find an example using a DMA controller, since I wanted to offload the CPU while transferring data from system memory to FPGA and back from FPGA to system memory. I looked for such an example and found it on the rocketboards website .

    The example is actually not very good, but at least something, you can try to do something. Cyclone V HPS has a built-in PL330 DMA controller and I would like to try to use it. I took the IP peel from the Loopback_FIFO sample project and inserted it using Quartus Prime QSYS into my clone of the DE10_Standard_GHRD project. Unfortunately, I spent a lot of time writing the correct DTS file for my project, the DTS file was not in the example archive. Also, I did not immediately understand that the Linux kernel already has an example of a DMA driver in arch / arm / mach-socfpga / fpga-dma.c. I realized this too late when I almost wrote my own driver.

    Despite these difficulties, I still advise starting the development with a search for existing examples, projects and solutions. Find a few examples, choose the best one for you - with it, and start development.

    3. Use Linux kernel sources as documentation


    With the FPGA Cyclone V HPS, I am developing a new hardware platform. With high probability I will have to write my own driver for the new hardware in the FPGA. There are many articles on the Internet how to write Linux kernel level drivers. But keep in mind that many of these articles are outdated a long time ago, contain incorrect examples, call the old kernel API.

    If you select a specific version of the Linux kernel for the project, then all the information on how to write drivers can be obtained specifically from the sources of this version of the kernel and this will be the most relevant information. Examples of drivers in the ./drivers folder , valid documentation in the ./Documentation folder , examples of writing * .DTS files in the ./arch/arm/boot/dts folder

    In my case, for my project with a DMA controller, the documentation for writing DMA drivers was obtained from ./Documentation/dmaengine/* files .

    Kernel sources can help write a DTS file - for me, a DTS file was a very big problem. The DTS file in text form describes the hardware resources of the system. Then the DTS is compiled into a DTB file, which is then used by the kernel in such a way that the drivers can know what resources belong to the devices.

    As I understand it, theoretically, development should go like this:

    1. We develop a hardware system in a Quartus Prime QSYS CAD environment, configure HPS parameters, add components and IP cores to the system, connect components. We generate the system using QSYS and get the result soc_system.qsys and soc_system.sopsinfo files.
    2. Create a DTS file from a * .sopsinfo file using the command line:
      >sopc2dts --input soc_system.sopcinfo --output socfpga.dts --board soc_system_board_info.xml --board hps_clock_info.xml
    3. Create a DTB from a DTS file:
      >dtc -I dts -O dtb -o socfpga.dtb socfpga.dts

    I read such an instruction on the rocket boards pages , but this method somehow doesn’t work very well (it doesn’t work at all). For myself, I realized that the only working method is to manually fix the existing DTS file example by adapting it to your hardware project.

    As I already wrote, the kernel sources can help in writing a DTS file. I really didn’t immediately understand this myself, but when I understood it, the matter went faster. You need to use the kernel sources as documentation!

    Let’s see an example of a DMA driver from ./driver/dma/fpga-dma.c

    The driver calls the platform_driver_probe () API function and passes a pointer to the structure as an argument:

    #ifdef CONFIG_OFstaticconststructof_device_idfpga_dma_of_match[] = {
    	{.compatible = "altr,fpga-dma",},
    	{},
    };
    MODULE_DEVICE_TABLE(of, fpga_dma_of_match);
    #endifstaticstructplatform_driverfpga_dma_driver = {
    	.probe = fpga_dma_probe,
    	.remove = fpga_dma_remove,
    	.driver = {
    		   .name = "fpga_dma",
    		   .owner = THIS_MODULE,
    		   .of_match_table = of_match_ptr(fpga_dma_of_match),
    		   },
    };
    staticint __init fpga_dma_init(void){
    	return platform_driver_probe(&fpga_dma_driver, fpga_dma_probe);
    }

    This means that in the DTS file there should be a corresponding section with a compatible device name: That is, apparently the platform_driver_probe function will scan the DTB file and look for a device named fpga-dma from the manufacturer altr. If the driver calls functions

    fpga_dma: fpga_dma@0x10033000 {
    compatible = "altr,fpga-dma";






    csr_reg  = platform_get_resource_byname(pdev, IORESOURCE_MEM, "csr");
    data_reg = platform_get_resource_byname(pdev, IORESOURCE_MEM, "data");

    this means that the DTS file must contain named registers with exactly the same names “csr” and “data”. Otherwise, the driver will not be able to start.

    Similarly, the kernel driver can request DMA channels by name:

    staticintfpga_dma_dma_init(struct fpga_dma_pdata *pdata){
    	structplatform_device *pdev = pdata->pdev;
    	pdata->txchan = dma_request_slave_channel(&pdev->dev, "tx");
    	if (pdata->txchan)
    		dev_dbg(&pdev->dev, "TX channel %s %d selected\n",
    			dma_chan_name(pdata->txchan), pdata->txchan->chan_id);
    	else
    		dev_err(&pdev->dev, "could not get TX dma channel\n");
    	pdata->rxchan = dma_request_slave_channel(&pdev->dev, "rx");
    	if (pdata->rxchan)

    And here is the corresponding fragment of the DTS file, which displays the correlation of the sources of the kernel driver and the DTS file:

    fpga_dma: fpga_dma@0x10033000 {
    				compatible = "altr,fpga-dma";
    				reg = <0x000000010x000330000x00000020>,
    					<0x000000000x000340000x00000010>;
    				reg-names = "<b>csr</b>", "<b>data</b>";
    				dmas = <&hps_0_dma0 &hps_0_dma1>;
    				dma-names = "<b>rx</b>", "<b>tx</b>";

    Thus, the DTS file must be written taking into account how the driver requests resources from it. If named registers and the DMA channel are used, then the names must match both in the kernel source and in the DTS file. Only in this way two system gears: the kernel driver and DTS / DTB can work together.

    4. Remember that your source may not be the most recent.


    I needed the Linux kernel compiler and sources so that I could start developing my driver and compiling the kernel for my FPGA system. That is why I downloaded the latest (at that time) Intel SoC FPGA Embedded Development Suite v17.0 and installed it.

    After the full installation, I saw a new folder ~ / intelFPGA / 17.0 / embedded / embeddedsw / sources , where the git_clone.sh script was located. I ran this script and got the kernel sources here ~ / intelFPGA / 17.0 / embedded / embeddedsw / sources / linux-sources .

    The Git branch turned out like this: sockfpga-4.1.22-ltsi-16.1-release . Kernel version 4.1.22 - well, let it be.

    I took version 4.1.22 for granted and started working with this branch on these sources. I built the kernel and found that there is a DMA driver called fpga-dma, and this driver basically works with my LoopbackFIFO IP core in my FPGA project. However, I noticed that the performance of transferring data from the system’s memory to the FPGA and back is very small - the transfer is carried out in single transfers, one word per several cycles. I rechecked my FPGA project a hundred times and I rechecked fpga-dma.cthe driver a hundred times, but I still could not understand why burst transfers did not occur on the bus. I already began to deal with the source code of the D3 driver of the PL330 controller itself. Also, I had to read the Cyclone V Hard Processor System Technical Reference Manual about the HPS PL330 DMA controller. This DMA controller is very complex, it itself has its own set of instructions, you need to write your own program for it. The assembly language program for the PL330 DMA controller might look like this:

    DMAMOV CCR, SB4 SS64 DB4 DS64 
    DMAMOV SAR, 0x1000 
    DMAMOV DAR, 0x4000
    DMALP 16 
    DMALD
    DMAST
    DMALPEND
    DMAEND

    As a result of all my research, I realized that the driver ./drivers/dma/pl330.c never initializes the CCR register of the DMA controller for burst transmission. I did not understand what to do, but later I discovered that newer versions of the kernel already contain a fix for this misunderstanding.

    I manually added the patch to the source DMA driver and received burst transfers! Here is a screenshot of the SignalTap, where I capture the DMA Mem-to-Device transfer:



    Thus, if one day you encounter a technical problem that you don’t know how to solve, double-check: if your problem already has a fix in the more recent kernel sources Linux? As I understand it, the problem with the block transfer of DMA controller PL330 resolved in the kernel 4.6.

    5. Be careful about the individual parts of the system.


    Of course, developing an FPGA SoC system requires specific knowledge. Now I do not want to touch on the features and development methods of IP cores or Verilog / VHDL syntax. Of course, the developer must know a lot. However, I want to draw attention to the fact that making all parts of the system work together is not a very simple task. Too many gears that must rotate synchronously.

    I will try to give an example of my practice.

    I tried to make the PL330 DMA controller driver work with my IP core. I met such a problem: write operations to the device are successful, but read operations always end with a timeout. I tried to find a solution on the Internet and saw that many developers are also asking about this problem, but there is no solution. In the system log, I see a message from the fpga-dma driver “Timeout waiting for RX DMA!”. But what is the problem? - unclear. Why is everything normal with TX transmission, but not with RX transmission? I swapped the channels RX and TX in the FPGA project, and got the opposite “Timeout waiting for TX DMA!”. What is wrong in my second DMA channel?

    I am using Quartus Prime Qsys to edit my SoC. One of the most important components of the SoC system is hps_0, “Arria V / Cyclone V Hard Processor System”. I edited the properties of this component and made sure that both DMA channels are turned on, and RX and TX:



    Is this enough? Actually, of course not! Qsys generates soc_system components for Quartus Prime, but it also creates software components in the ./hps_isw_handoff/soc_system_hps_0 folder .

    There is a hps.xml file which shows the following:

    <hps><system><configname='DEVICE_FAMILY'value='Cyclone V' /><configname='DMA_Enable'value='Yes Yes No No No No No No' /><configname='dbctrl_stayosc1'value='true' /><configname='main_pll_m'value='73' /><configname='main_pll_n'value='0' /><configname='main_pll_c0_internal'value='1' />

    This means that later I have to generate the Preloader component, and this XML file is used to compile it. The compiled Preloader must be recorded in a special section of the SD card. When the system starts, the Preloader starts. It includes all the necessary components of the system by making the necessary entries in special hardware registers.

    The Cyclone V HPS Reset Manager registers are located at the physical address 0xFFD05000 (Cyclone V Hard Processor System Technical Reference Manual). Some bits in the Reset Manager registers must be reset to enable DMA on individual channels.

    Oh well. I am changing the properties of the hps_0 component in Qsys and now I know that probably I should recompile Preloader and write it to SD.

    But this is not the whole story!

    If I use two DMA channels, then I need two interrupts for these two channels and they still need to be manually declared in the DTS file. Why are there such strange numbers 104 and 105? I had to read the Cyclone V HPS Reference Manual. I see that the Generic Interrupt Controller has reserved DMA request lines IRQ 136 and 137:

    hps_0_dma: dma@0xffe01000 {
    compatible = "arm,pl330-16.1", "arm,pl330", "arm,primecell";
    reg = <0xffe01000 0x00001000>;
    interrupt-parent = <&hps_0_arm_gic_0>;
    interrupts = <0 104 4>, <0 105 4>;
    clocks = <&l4_main_clk>;







    However, for some reason, the numbering begins with “32”. So I decide that 136-32 = 104 and 137-32 = 105 are the correct numbers. These magic calculations give the correct values ​​for the DTS file in the interrupts section. Without declaring the second IRQs for PL330 in a DTS file, the second DMA channel always received a timeout error in the kernel driver ... It turns out that I change the HPS properties in Qsys and because of this I may need to simultaneously change both the Preloader and the DTS file - and that’s all time to remember.

    Conclusion


    I had an original project with an example of a DMA project that I found on the pages of the rocketboard website. However, I adapted it and made it work on the DE10-Standard board and the Linux 4.1 kernel.

    This is probably not a great achievement, however:

    1. I wrote a DTS file that was not in the original project. It was pretty hard.
    2. I realized that you need to make a kernel patch to get a block transfer (burst transfer).
    3. I connected the SignalTap analyzer to the FPGA project and now I can see the signals on the bus at the time of DMA transmission
    4. Learned how to write a DMA kernel driver
    5. Hope I understood the whole road-map of the developer for Cyclone V HPS

    If someone wants to experiment with DMA in SoC, I recommend starting experiments with the fpga-dma alter driver. It uses DebugFS, this allows you to use the simple “cat”, “echo” commands directly in the terminal console to perform transactions in the DMA channel:



    I hope this article will be useful to those who are just starting to work with FPGA SoC HPS Cyclone V.

    You can view the project sources here .

    Also popular now: