PCI-Express Core in Achronix FPGA - Quick Start



    This post was written with the aim of showing designers of FPGAs how to start working with the PCI-express bus on the Ahronix Speedster22i platform with the least amount of time and effort . The article describes the organization of the project, the adaptation of which to the specific requirements of the developer is reduced to a simple modification of the source text of just one module, which allows you to connect to the PCIe bus of the host computer in just 1 hour. I hope that developers on other platforms will also find this article interesting.


    The Speedster22i HD1000 FPGA has two PCIe SIG certified PCIe SIG cores for PCIe 3.0 specification, and in the Speedster22i HD1000 Development Kit debug board (which I wrote about in a previous post ), one of these cores is output to a PCIe slot. Through the PCIe interface, it is very convenient to interact with the debug board with the host computer. In fact, this is the only high-speed solution for this purpose. An alternative to using PCIe to connect the debug board to the host computer is the built-in com-port, which is several orders of magnitude slower. All other solutions require more or less hardware sophistication, at least the use of signal level converters will be required.
    At Achronixthere is a reference design that demonstrates the operation of the PCIe hardware core in all its glory - the core works in target mode with access directly to the CPU and through the DMA mechanism for reading and writing. I checked everything works fine. But this design turned out to be quite difficult to modify for its own purposes due to insufficient modularity and excessive complexity of the Verilog code . Therefore, it was decided on the basis of corporate design to create our own version, removing from it everything related to exchange via DMA, as well as structuring it in such a way as to explicitly distinguish in it modules with immutable code and modules whose code needs to be modified to adapt to the specific tasks of the developer. The result is a simple, well-structured project, the adaptation of which to the specific tasks of the developer comes down to a simple change in the code of just one module. Achronix FPGAs
    feature hardware-based IP cores for controllers for interfaces such as PCIe , DDR3 , 100/40 / 10G Ethernet and Interlaken. These hardware cores provide everything that is necessary for the functioning of these interfaces, the only thing that is required from the developer is to write their own interface modules with these controllers. As a result, the amount of work is dramatically reduced. In addition, achieving the required timing is greatly simplified. In the case of the PCIe design , only a few interface modules were needed, most of which were taken from the corporate reference design.


    Brief Project Description


    The project implements access to three 128-bit registers. The PCIe core is configured for 3 BARs: BAR0 - 64KB, BAR1 and BAR2 - 8 KB each. Access to the registers is through BAR1. The presence of 3 BARs is due to compatibility requirements with the driver used. A description of the registers is given below:

    NameDisplacement in AP BAR1a typeDescription
    R00RO{4 {32'hDEADBEEF}}
    R120hRW
    RW
    Bits [7: 0] - output to the LED line
    Bits [127: 8] - not used
    R240hRO
    RW
    Bits [7: 0] - reading the line of switches
    Bits [127: 8] - not used


    When upgrading the project, the first thing that was done was to remove the code associated with the exchange of data via DMA . After that, target_read and target_write read and write channels were used to connect to the kernel . Next, the structure of the modules was determined, as shown in the figure:


    In total, 4 modules were obtained (some of them include submodules)

    Composition of modules:
    • pcie_g3x4.v - PCIe hardware core wrapper . Determines its parameters, such as VendorID , the number of lanes, the local bus width, etc. This module is generated using the ACE development kernel generator .
    • pci_target_bus_ctrl.v - a wrapper module that matches the target channel of the hardware core and the local bus on which the registers are accessible via the PCI bus . Since the target channel consists of two independent subchannels: write and read, this module combines two modules: pci_target_bus_write_ctrl.v and pci_target_bus_read_ctrl.v , which implement write and read operations, respectively.
    • lbus_registers.v - module containing user registers proper. The only module requiring code modification for a specific project.
    • ACX_SNAPSHOT.v - an auxiliary module for in-circuit debugging. At the end of debugging, it can be excluded from the project.


    In this project, in order to achieve the functionality necessary for the developer, it is required to change the source code of only one module - lbus_registers.v . All other modules are used as is, without a single alteration. At the same time, the lbus_registers.v module can be used as a template in which the functionality necessary for the developer is added. Thus, in order to get a working interface with several registers on the PCIe bus , it takes no more than an hour to add the module code.


    PCIe Core Generation


    To generate a kernel, you can use the ACE shell core generator. All specified parameters are saved in a file with the .axip extension, which can be edited at any time. The result of the generator are text files in the Verilog and VHDL languages . A screenshot of the kernel generation process is shown in the figure:



    Pcie core target interface


    The PCI hardware core includes several interfaces, but we are interested in the target interface. Registers acting as passive devices are connected through this interface, and the processor acts as an active device. The target interface consists of 4 channels: setting the write address, write data, set the read address and read data. The write and read channels work independently of each other. The following are timing diagrams of write and read transactions. The same diagrams show local bus signals.


    Local bus


    The local bus has a very simple structure. It consists of two independent channels - writing and reading, and can be configured for different word widths. This project uses words with a width of 128 bits.
    The local bus interface implemented in the lbus_registers.v module provides writing to registers without delay and reading with a delay of 1 clock cycle. Real delays, however, are slightly larger, as the submodules included in the pci_target_bus_ctrl.v module contribute to the latency of write and read transactions.


    Implementation


    The implementation of the project consists of two stages - the synthesis stage and the trace phase.

    Directory structure


    The following directory organization was chosen for implementation:
    pci_simple
        | --- src
        | --- syn
        | --- tr
        | --- tools
    


    The src directory contains Verilog source files . The syn directory contains the files needed for synthesis using the synplify program , and the tr directory contains the files needed for the trace phase. Also in this directory, by default, the generated kernels are located. The tools directory contains drivers and the PciExpress program , with which you can read and write data to registers connected to the PCIe bus .

    Synthesis


    The syn directory contains the pcie_simple_design.prj project file . This file must be specified by the synplify-pro synthesis program developed by Synopsys . The result of this program is the pcie_simple_design.vma file in the syn / rev_1 subdirectory . This file is the input for the next step - tracing. A screenshot of the synthesis step is shown below:



    Trace


    The trace phase is carried out by Achronix's proprietary ACE program . The tr directory contains the pci-simple.prj project file , which must be specified by ACE . At the end of the tracing step, the pci-simple-design.jam firmware file appears in the tr / impl_1 / output subdirectory , which is downloaded directly to the FPGA. Screenshot of the trace phase:



    Constraints


    There are only two const files - one describes the clock circuits, and the other defines the used I / O pins. The files are in the tr directory and are named pcie_simple_design.sdc and pcie_simple_design.pdc, respectively. They are already connected via project files to the synthesis and tracing programs.


    results



    Timing


    Trace results
    Frequency (MHz)
    Clock / GroupTargetAchievedMeets timing
    user_clk212.5308.5yes (+ 45.2%)
    core_clk212.5433.5yes (+ 104.0%)
    sbus_clk50.0138.7yes (+ 177.5%)
    Tck10.0175.4yes (+ 1653.6%)


    We are interested in the user_clk clock group, to which user registers are connected. As you can see, at a given frequency of 212.5 MHz, the result was 308.5 MHz, i.e. 45% higher than required.

    Disposal


    ResourceBusy
    RLBs0.520%
      LUT4 Sites0.410%
      DFF Sites0.520%
      MUX2 Sites0.010%
      Alu sites0.170%
    LRAM Sites 1.280%
    BRAM Sites 0.190%
    BMULT Sites0.000%
    I / O Pad Sites1.980%
      Data pads1.740%
      Clock pads12.50%
      Reset pads0.000%



    Host Connection


    A driver is required to connect to the host computer. Under certain conditions, you can use the driver from the company reference design. The PciExpress.exe application works with this driver , through which you can access the registers connected to the PCIe bus . In order to be able to use these tools, it is necessary to preserve the structure of the original design BARs and save the values ​​of the VendorID and DeviceID parameters .

    To start working with a host computer with the Windows operating system, you must perform the following steps:
    • Connect the debug board to the computer via the PCIe bus . Requires PCIe x8 or wider slot. Connection should be made on switched off devices with observance of anti-static protection measures. Power the debug board from an external power source.
    • Turn on the power of the computer and the board. The power-up procedure is not significant.
    • Download firmware to FPGA.
    • Using the device manager, find the new device on the PCI bus and install a driver for it.
    • Reboot
    • After rebooting, the PciExpress program can write / read registers.


    The following figure just shows the result of reading the register with offset 0 in the address space BAR1:




    Customization of the lbus_registers.v module


    In order for the source code to be used in your own projects, you need to enter the registers necessary for the developer into the design. All user registers are in the lbus_registers.v module, and when customizing it, the following simple steps are required:
    1. Write a code for each user register
    2. Set the address of each register in the parameter list
    3. Write an address decoder code for each register
    4. Connect each register to the write and read buses


    We show how to put these actions into practice.
    • Define the name of the register and its length:
    reg     [AXI_DATA_WIDTH-1:0]        my_register;

    • Define the write and read strobes for this register:
    wire			selw_my_register;
    wire			selr_my_register;
    


    • We write an always-block for this register. This is conveniently done using the generate statement .
    In the simplest case, the code looks like this:
    genvar i;
    generate
        for (i = 0; i < AXI_BE_WIDTH; i = i + 1)
    	begin: leds_lanes
    		always @( posedge clk or negedge rst_n )
    		if (!rst_n)	my_register [7+ 8*i: 8*i] <= 8'h0;
    			else
    				if (selw_my_register && lbus_wr_be[i] )
    					my_register[7+ 8*i: 8*i] <= lbus_wr_data[7+ 8*i: 8*i];
    				else
    					my_register [7+ 8*i: 8*i] <= my_register [7+ 8*i: 8*i];
        end
    endgenerate
    


    If more complex processing of individual bits is required, then the always-block will naturally become more complicated and it may be easier to write code explicitly without using the generate operator.
    • Add the line to the list of parameters:
    parameter	ADDR_MY_REGISTER	= 32'h1234_5678
    ,
    where - instead of 32'h1234_5678 we indicate the real offset in bytes in the required address space
    • We write the formulas for the signals for register selection:
    selw_my_register = reg_wr_hit & (lbus_wr_addr[REG_ADDR_WIDTH-1:0] == ADDR_MY_REGISTER [REG_ADDR_WIDTH+AXI_REMAIN_WIDTH-1:AXI_REMAIN_WIDTH]);
    selr_my_register = reg_rd_hit & (lbus_rd_addr[REG_ADDR_WIDTH-1:0] == ADDR_MY_REGISTER [REG_ADDR_WIDTH+AXI_REMAIN_WIDTH-1:AXI_REMAIN_WIDTH]);


    • In the always_comb block
    always_comb
    	begin
    		case (1'b1)
    	                   …
    		endcase
    	end
    


    add a new branch inside the case statement:
    selr_my_register:	c_reg_rd_data = my_register;


    Repeat the above steps for each user register.

    Module interface


    The module interface is defined as follows:

    module lbus_registers #(
    	parameter   BAR_NMB			= 3'd0
    	parameter   AXI_DATA_WIDTH		= 128,
    	parameter   AXI_BE_WIDTH		= AXI_DATA_WIDTH/8,	// AXI Len Width
    	parameter   LBUS_ADDR_WIDTH	= 12,	// 64 KB expected for NWL Reference Design
    	parameter   REG_ADDR_WIDTH		= LBUS_ADDR_WIDTH,	// 64 KB expected for NWL Reference Design
    	parameter	ADDR_R0		= 32'h000_0000,
    	parameter	ADDR_R1		= 32'h000_0020,
    	parameter	ADDR_R2		= 32'h000_0040
    )
    (
    	input  wire				rst_n,
    	input  wire				clk,
    //
    	input  wire [7:0]			switches,
    	output wire [AXI_DATA_WIDTH-1: 0]	rg1_out,
    	output wire [AXI_DATA_WIDTH-1: 0]	rg2_out,
    	output wire [71: 0]			debug_bus,
    // Local Bus channel
    	input  wire [LBUS_ADDR_WIDTH-1:0]	lbus_wr_addr,
    	input  wire [2:0]			lbus_wr_region,
    	input  wire				lbus_wr_en,
    	input  wire [AXI_BE_WIDTH-1:0]	lbus_wr_be,
    	input  wire [AXI_DATA_WIDTH-1:0]	lbus_wr_data,
    //
    	input  wire [LBUS_ADDR_WIDTH-1:0]	lbus_rd_addr,
    	input  wire  [2:0]			lbus_rd_region,
    	output wire [AXI_DATA_WIDTH-1:0]	lbus_rd_data
    );
    


    Settings


    The settings for the lbus_registers.v module are listed in the table:
    Parameter nameDefault valueValue rangeDescription
    BAR_NMB3'd03'd0-3'd7Номер BARа, на который настроен адресный селектор
    AXI_DATA_WIDTH128128, 256Размер шины данных
    AXI_BE_WIDTHAXI_DATA_WIDTH/8Не следует менять вручную
    LBUS_ADDR_WIDTH128-15Задает разрядность локальной шины адреса. Обычно соответствует размеру АП самого большого BARа
    REG_ADDR_WIDTHLBUS_ADDR_WIDTH<=LBUS_ADDR_WIDTHЗадает разрядность АП локальной шины адреса, соответствующей выбранному BARу
    ADDR_R0
    ADDR_R1
    ADDR_R2
    32'h000_0000Зависит от размера BARа Адрес регистра R0 (R1,R2). Адреса регистров указываются всегда в байтах и соответствуют их смещению в адресном пространстве BARа



    Отладка


    Debugging is carried out using an internal signal analyzer, for which the ACX_SNAPSHOT.v module is used in the project, which is connected by the `define USE_SNAPSHOT ' conditional compilation directive . The documentation for organizing in-circuit debugging is located on the Achronix website in the Snapshot User Guide.pdf file .


    Conclusion and Conclusions


    Even such a difficult task as connecting to the PCI-express bus is solved on the Achronix Speedster22i platform easily and, most importantly, quickly. Creating a working project based on the PCIe hardware core was not easy, but very simple.
    A story about the other hardware cores of the Achronix Speedster22i FPGA is planned as they are mastered. In subsequent posts, we will talk about the DDR-3 and 100G Ehernet cores .


    References


    1. Achronix announces compliance of its PCI Express hardware cores in the FPGA Speedster22i specification PCI-SIG (Eng.) Www.achronix.com/wp-content/uploads/pr/2014_May_PCI-SIG.pdf
    2. Scheme development board HD1000 dev kit (English .) 22iHD1000_Development_Board_Schematic.pdf
    3. Guidelines for use of PCIe controllers to Speedster22i (eng.) www.achronix.com/wp-content/uploads/docs/Speedster22i_PCIe_User_Guide_UG030.pdf
    4. Manual Snapshot (eng.) www.achronix.com/ wp-content / uploads / docs / Speedster22i_Snapshot_User_Guide_UG016.pdf
    5. Original reference design: Speedster22i_PCIe_Demo_Design.zip
    6. Source files of the described project: drive.google.com/file/d/0B9Gt8fTYH6s-VRWhfbbb4

    Also popular now: