PCI-Express Core in Achronix FPGA - Quick Start
This post was written with the aim of showing designers of FPGAs how to start working with the PCI-express bus on the Ahronix Speedster22i platform with the least amount of time and effort . The article describes the organization of the project, the adaptation of which to the specific requirements of the developer is reduced to a simple modification of the source text of just one module, which allows you to connect to the PCIe bus of the host computer in just 1 hour. I hope that developers on other platforms will also find this article interesting.
The Speedster22i HD1000 FPGA has two PCIe SIG certified PCIe SIG cores for PCIe 3.0 specification, and in the Speedster22i HD1000 Development Kit debug board (which I wrote about in a previous post ), one of these cores is output to a PCIe slot. Through the PCIe interface, it is very convenient to interact with the debug board with the host computer. In fact, this is the only high-speed solution for this purpose. An alternative to using PCIe to connect the debug board to the host computer is the built-in com-port, which is several orders of magnitude slower. All other solutions require more or less hardware sophistication, at least the use of signal level converters will be required.
At Achronixthere is a reference design that demonstrates the operation of the PCIe hardware core in all its glory - the core works in target mode with access directly to the CPU and through the DMA mechanism for reading and writing. I checked everything works fine. But this design turned out to be quite difficult to modify for its own purposes due to insufficient modularity and excessive complexity of the Verilog code . Therefore, it was decided on the basis of corporate design to create our own version, removing from it everything related to exchange via DMA, as well as structuring it in such a way as to explicitly distinguish in it modules with immutable code and modules whose code needs to be modified to adapt to the specific tasks of the developer. The result is a simple, well-structured project, the adaptation of which to the specific tasks of the developer comes down to a simple change in the code of just one module. Achronix FPGAs
feature hardware-based IP cores for controllers for interfaces such as PCIe , DDR3 , 100/40 / 10G Ethernet and Interlaken. These hardware cores provide everything that is necessary for the functioning of these interfaces, the only thing that is required from the developer is to write their own interface modules with these controllers. As a result, the amount of work is dramatically reduced. In addition, achieving the required timing is greatly simplified. In the case of the PCIe design , only a few interface modules were needed, most of which were taken from the corporate reference design.
Brief Project Description
The project implements access to three 128-bit registers. The PCIe core is configured for 3 BARs: BAR0 - 64KB, BAR1 and BAR2 - 8 KB each. Access to the registers is through BAR1. The presence of 3 BARs is due to compatibility requirements with the driver used. A description of the registers is given below:
Name | Displacement in AP BAR1 | a type | Description |
---|---|---|---|
R0 | 0 | RO | {4 {32'hDEADBEEF}} |
R1 | 20h | RW RW | Bits [7: 0] - output to the LED line Bits [127: 8] - not used |
R2 | 40h | RO RW | Bits [7: 0] - reading the line of switches Bits [127: 8] - not used |
When upgrading the project, the first thing that was done was to remove the code associated with the exchange of data via DMA . After that, target_read and target_write read and write channels were used to connect to the kernel . Next, the structure of the modules was determined, as shown in the figure:
In total, 4 modules were obtained (some of them include submodules)
Composition of modules:
- pcie_g3x4.v - PCIe hardware core wrapper . Determines its parameters, such as VendorID , the number of lanes, the local bus width, etc. This module is generated using the ACE development kernel generator .
- pci_target_bus_ctrl.v - a wrapper module that matches the target channel of the hardware core and the local bus on which the registers are accessible via the PCI bus . Since the target channel consists of two independent subchannels: write and read, this module combines two modules: pci_target_bus_write_ctrl.v and pci_target_bus_read_ctrl.v , which implement write and read operations, respectively.
- lbus_registers.v - module containing user registers proper. The only module requiring code modification for a specific project.
- ACX_SNAPSHOT.v - an auxiliary module for in-circuit debugging. At the end of debugging, it can be excluded from the project.
In this project, in order to achieve the functionality necessary for the developer, it is required to change the source code of only one module - lbus_registers.v . All other modules are used as is, without a single alteration. At the same time, the lbus_registers.v module can be used as a template in which the functionality necessary for the developer is added. Thus, in order to get a working interface with several registers on the PCIe bus , it takes no more than an hour to add the module code.
PCIe Core Generation
To generate a kernel, you can use the ACE shell core generator. All specified parameters are saved in a file with the .axip extension, which can be edited at any time. The result of the generator are text files in the Verilog and VHDL languages . A screenshot of the kernel generation process is shown in the figure:
Pcie core target interface
The PCI hardware core includes several interfaces, but we are interested in the target interface. Registers acting as passive devices are connected through this interface, and the processor acts as an active device. The target interface consists of 4 channels: setting the write address, write data, set the read address and read data. The write and read channels work independently of each other. The following are timing diagrams of write and read transactions. The same diagrams show local bus signals.
Local bus
The local bus has a very simple structure. It consists of two independent channels - writing and reading, and can be configured for different word widths. This project uses words with a width of 128 bits.
The local bus interface implemented in the lbus_registers.v module provides writing to registers without delay and reading with a delay of 1 clock cycle. Real delays, however, are slightly larger, as the submodules included in the pci_target_bus_ctrl.v module contribute to the latency of write and read transactions.
Implementation
The implementation of the project consists of two stages - the synthesis stage and the trace phase.
Directory structure
The following directory organization was chosen for implementation:
pci_simple | --- src | --- syn | --- tr | --- tools
The src directory contains Verilog source files . The syn directory contains the files needed for synthesis using the synplify program , and the tr directory contains the files needed for the trace phase. Also in this directory, by default, the generated kernels are located. The tools directory contains drivers and the PciExpress program , with which you can read and write data to registers connected to the PCIe bus .
Synthesis
The syn directory contains the pcie_simple_design.prj project file . This file must be specified by the synplify-pro synthesis program developed by Synopsys . The result of this program is the pcie_simple_design.vma file in the syn / rev_1 subdirectory . This file is the input for the next step - tracing. A screenshot of the synthesis step is shown below:
Trace
The trace phase is carried out by Achronix's proprietary ACE program . The tr directory contains the pci-simple.prj project file , which must be specified by ACE . At the end of the tracing step, the pci-simple-design.jam firmware file appears in the tr / impl_1 / output subdirectory , which is downloaded directly to the FPGA. Screenshot of the trace phase:
Constraints
There are only two const files - one describes the clock circuits, and the other defines the used I / O pins. The files are in the tr directory and are named pcie_simple_design.sdc and pcie_simple_design.pdc, respectively. They are already connected via project files to the synthesis and tracing programs.
results
Timing
Trace results | |||
---|---|---|---|
Frequency (MHz) | |||
Clock / Group | Target | Achieved | Meets timing |
user_clk | 212.5 | 308.5 | yes (+ 45.2%) |
core_clk | 212.5 | 433.5 | yes (+ 104.0%) |
sbus_clk | 50.0 | 138.7 | yes (+ 177.5%) |
Tck | 10.0 | 175.4 | yes (+ 1653.6%) |
We are interested in the user_clk clock group, to which user registers are connected. As you can see, at a given frequency of 212.5 MHz, the result was 308.5 MHz, i.e. 45% higher than required.
Disposal
Resource | Busy |
---|---|
RLBs | 0.520% |
LUT4 Sites | 0.410% |
DFF Sites | 0.520% |
MUX2 Sites | 0.010% |
Alu sites | 0.170% |
LRAM Sites | 1.280% |
BRAM Sites | 0.190% |
BMULT Sites | 0.000% |
I / O Pad Sites | 1.980% |
Data pads | 1.740% |
Clock pads | 12.50% |
Reset pads | 0.000% |
Host Connection
A driver is required to connect to the host computer. Under certain conditions, you can use the driver from the company reference design. The PciExpress.exe application works with this driver , through which you can access the registers connected to the PCIe bus . In order to be able to use these tools, it is necessary to preserve the structure of the original design BARs and save the values of the VendorID and DeviceID parameters .
To start working with a host computer with the Windows operating system, you must perform the following steps:
- Connect the debug board to the computer via the PCIe bus . Requires PCIe x8 or wider slot. Connection should be made on switched off devices with observance of anti-static protection measures. Power the debug board from an external power source.
- Turn on the power of the computer and the board. The power-up procedure is not significant.
- Download firmware to FPGA.
- Using the device manager, find the new device on the PCI bus and install a driver for it.
- Reboot
- After rebooting, the PciExpress program can write / read registers.
The following figure just shows the result of reading the register with offset 0 in the address space BAR1:
Customization of the lbus_registers.v module
In order for the source code to be used in your own projects, you need to enter the registers necessary for the developer into the design. All user registers are in the lbus_registers.v module, and when customizing it, the following simple steps are required:
- Write a code for each user register
- Set the address of each register in the parameter list
- Write an address decoder code for each register
- Connect each register to the write and read buses
We show how to put these actions into practice.
• Define the name of the register and its length:
reg [AXI_DATA_WIDTH-1:0] my_register;
• Define the write and read strobes for this register:
wire selw_my_register;
wire selr_my_register;
• We write an always-block for this register. This is conveniently done using the generate statement .
In the simplest case, the code looks like this:
genvar i;
generate
for (i = 0; i < AXI_BE_WIDTH; i = i + 1)
begin: leds_lanes
always @( posedge clk or negedge rst_n )
if (!rst_n) my_register [7+ 8*i: 8*i] <= 8'h0;
else
if (selw_my_register && lbus_wr_be[i] )
my_register[7+ 8*i: 8*i] <= lbus_wr_data[7+ 8*i: 8*i];
else
my_register [7+ 8*i: 8*i] <= my_register [7+ 8*i: 8*i];
end
endgenerate
If more complex processing of individual bits is required, then the always-block will naturally become more complicated and it may be easier to write code explicitly without using the generate operator.
• Add the line to the list of parameters:
parameter ADDR_MY_REGISTER = 32'h1234_5678
, where - instead of 32'h1234_5678 we indicate the real offset in bytes in the required address space
• We write the formulas for the signals for register selection:
selw_my_register = reg_wr_hit & (lbus_wr_addr[REG_ADDR_WIDTH-1:0] == ADDR_MY_REGISTER [REG_ADDR_WIDTH+AXI_REMAIN_WIDTH-1:AXI_REMAIN_WIDTH]);
selr_my_register = reg_rd_hit & (lbus_rd_addr[REG_ADDR_WIDTH-1:0] == ADDR_MY_REGISTER [REG_ADDR_WIDTH+AXI_REMAIN_WIDTH-1:AXI_REMAIN_WIDTH]);
• In the always_comb block
always_comb
begin
case (1'b1)
…
endcase
end
add a new branch inside the case statement:
selr_my_register: c_reg_rd_data = my_register;
Repeat the above steps for each user register.
Module interface
The module interface is defined as follows:
module lbus_registers #(
parameter BAR_NMB = 3'd0
parameter AXI_DATA_WIDTH = 128,
parameter AXI_BE_WIDTH = AXI_DATA_WIDTH/8, // AXI Len Width
parameter LBUS_ADDR_WIDTH = 12, // 64 KB expected for NWL Reference Design
parameter REG_ADDR_WIDTH = LBUS_ADDR_WIDTH, // 64 KB expected for NWL Reference Design
parameter ADDR_R0 = 32'h000_0000,
parameter ADDR_R1 = 32'h000_0020,
parameter ADDR_R2 = 32'h000_0040
)
(
input wire rst_n,
input wire clk,
//
input wire [7:0] switches,
output wire [AXI_DATA_WIDTH-1: 0] rg1_out,
output wire [AXI_DATA_WIDTH-1: 0] rg2_out,
output wire [71: 0] debug_bus,
// Local Bus channel
input wire [LBUS_ADDR_WIDTH-1:0] lbus_wr_addr,
input wire [2:0] lbus_wr_region,
input wire lbus_wr_en,
input wire [AXI_BE_WIDTH-1:0] lbus_wr_be,
input wire [AXI_DATA_WIDTH-1:0] lbus_wr_data,
//
input wire [LBUS_ADDR_WIDTH-1:0] lbus_rd_addr,
input wire [2:0] lbus_rd_region,
output wire [AXI_DATA_WIDTH-1:0] lbus_rd_data
);
Settings
The settings for the lbus_registers.v module are listed in the table:
Parameter name | Default value | Value range | Description |
---|---|---|---|
BAR_NMB | 3'd0 | 3'd0-3'd7 | Номер BARа, на который настроен адресный селектор |
AXI_DATA_WIDTH | 128 | 128, 256 | Размер шины данных |
AXI_BE_WIDTH | AXI_DATA_WIDTH/8 | — | Не следует менять вручную |
LBUS_ADDR_WIDTH | 12 | 8-15 | Задает разрядность локальной шины адреса. Обычно соответствует размеру АП самого большого BARа |
REG_ADDR_WIDTH | LBUS_ADDR_WIDTH | <=LBUS_ADDR_WIDTH | Задает разрядность АП локальной шины адреса, соответствующей выбранному BARу |
ADDR_R0 ADDR_R1 ADDR_R2 | 32'h000_0000 | Зависит от размера BARа | Адрес регистра R0 (R1,R2). Адреса регистров указываются всегда в байтах и соответствуют их смещению в адресном пространстве BARа |
Отладка
Debugging is carried out using an internal signal analyzer, for which the ACX_SNAPSHOT.v module is used in the project, which is connected by the `define USE_SNAPSHOT ' conditional compilation directive . The documentation for organizing in-circuit debugging is located on the Achronix website in the Snapshot User Guide.pdf file .
Conclusion and Conclusions
Even such a difficult task as connecting to the PCI-express bus is solved on the Achronix Speedster22i platform easily and, most importantly, quickly. Creating a working project based on the PCIe hardware core was not easy, but very simple.
A story about the other hardware cores of the Achronix Speedster22i FPGA is planned as they are mastered. In subsequent posts, we will talk about the DDR-3 and 100G Ehernet cores .
References
1. Achronix announces compliance of its PCI Express hardware cores in the FPGA Speedster22i specification PCI-SIG (Eng.) Www.achronix.com/wp-content/uploads/pr/2014_May_PCI-SIG.pdf
2. Scheme development board HD1000 dev kit (English .) 22iHD1000_Development_Board_Schematic.pdf
3. Guidelines for use of PCIe controllers to Speedster22i (eng.) www.achronix.com/wp-content/uploads/docs/Speedster22i_PCIe_User_Guide_UG030.pdf
4. Manual Snapshot (eng.) www.achronix.com/ wp-content / uploads / docs / Speedster22i_Snapshot_User_Guide_UG016.pdf
5. Original reference design: Speedster22i_PCIe_Demo_Design.zip
6. Source files of the described project: drive.google.com/file/d/0B9Gt8fTYH6s-VRWhfbbb4