Transaction execution on the PCI bus. VHDL implementation

    Not so long ago, I asked about the mechanism for polling PCI devices. After I got a job, I completed the test task, and I asked about him, and safely forgot about him. But recently they issued a new project and had to remember everything, at the same time I decided to write here.

    There are a lot of transactions on the PCI bus, only the following will be described in this topic:
    • Configuration transactions
    • I / O Transactions
    • Memory access transactions


    When making transactions, 2 options are possible:
    • When the master is the south bridge
    • When the master is a device connected to the PCI bus

    As already happened, when considering the PCI bus, I will call the master device the master, the slave device the target (Target).
    This article only deals with transactions when the master is the south bridge, since transactions when the master is a device connected to the PCI bus deserves a separate article.

    And so, to work with the bus, we need the following signals:
    entity main isPort (	
    	clk	: instd_logic;
    	AD	: inoutstd_logic_vector(31downto0);
    	IDSEL	: instd_logic;
    	CBE	: instd_logic_vector(3downto0);
    	FRAME	: instd_logic;			  
    	IRDY	: instd_logic;
    	TRDY	: inoutstd_logic;
    	STOP	: inoutstd_logic;
    	PAR	: inoutstd_logic;
    	RST	: instd_logic;
    	DEVSEL	: inoutstd_logic
     );
    end main;
    


    clk (Clock) - provides synchronization of all transactions on PCI, and is also an input for each PCI device.
    AD (Address and Data) - multiplexing bus address and data.
    IDSEL (Initialization Device Select) - selection of the initialization device, used to select the chip during configuration read and write transactions.
    CBE (Bus Command and Byte Enables) - bus commands and byte resolution.
    FRAME (Frame) - the signal is issued by the master at the beginning of the transaction and determines its duration. For single-phase transactions, FRAME always lasts one clock cycle. In multiphase transactions, FRAME is removed one cycle before the transaction is completed.
    IRDY (Initiator Ready) - a signal of readiness of the master. It indicates the willingness of the wizard to complete the current data phase.
    TRDY (Target Ready) - the signal of the readiness of the target, indicating the readiness of the target to complete the current phase of the data.
    STOP (Stop) - this signal is issued by a target if it wants to stop the current transaction.
    PAR (Parity) - parity on AD and CBE lines.
    RST (Reset) - reset signal. It is asynchronous.
    DEVSEL (Device Select) - device selection signal.

    Before working with any device, it must be initialized. Therefore, we consider the features of configuration transaction execution.

    Configuration Transactions General information.


    View of configuration transactions from the OS
    Для генерации конфигурационных транзакций PCI на ПК используются обращения к двум портам ввода-вывода, носящим имена CONFIG_ADDRESS и CONFIG_DATA, имеющим адреса 0CF8h и 0CFCh соответственно и входящим в состав моста Host–PCI, через который шина PCI прямо или косвенно соединяется с процессором.
    Порт CONFIG_ADDRESS имеет размер двойное слово и доступен только как единое целое. Обращения меньшего размера по принадлежащим ему адресам передаются на шину PCI как обычные транзакции ввода-вывода. Этот порт доступен для чтения и записи и имеет следующий формат:

    Когда необходимо выполнить конфигурационную транзакцию, в этот порт записывается адрес регистра конфигурационного пространства PCI, состоящий из номеров шины (разряды 23–16), устройства (15–11), функции (10–8) и собственно регистра (7–2). Биты 1 и 0 должны всегда содержать нули, а старший бит должен содержать единицу, разрешая тем самым выполнение конфигурационной транзакции. Разряды 30–24 зарезервированы и должны содержать нули.

    Собственно генерация конфигурационной транзакции происходит при чтении или записи порта CONFIG_DATA, когда в CONFIG_ADDRESS был записан адрес с установленным старшим битом и номером шины, соответствующим шине, подключенной к мосту Host–PCI, или любой шине PCI, лежащей ниже этой шины и соединённой с ней через один или несколько мостов PCI–PCI (допустимый диапазон номеров шин задаётся мосту Host–PCI в процессе его настройки). Доступ к порту CONFIG_DATA должен иметь размер, равный размеру считываемого или записываемого конфигурационного регистра, адрес которого находится в CONFIG_ADDRESS.
    Если номер шины, заданный в CONFIG_ADDRESS, совпадает с номером шины, подключённой непосредственно к мосту Host–PCI, генерируется конфигурационная транзакция с адресом типа 0, причём номер устройства, находящийхся в разрядах 15–11 порта CONFIG_ADDRESS, используется для выдачи одного из сигналов IDSEL, которые и служат для выбора конкретного устройства. Кроме того, декодированный номер устройства (один единичный и остальные нулевые биты) в фазе адреса конфигурационной транзакции передаётся в разрядах 31–11 адреса.
    Если адрес в CONFIG_ADDRESS указывает не ту шину, которая непосредственно подключена к мосту Host–PCI, последний генерирует конфигурационную транзакцию с адресом типа 1. Она будет обработана мостом PCI–PCI, который опознает содержащийся в адресе номер шины. Этот мост либо выполнит конфигурационную транзакцию с адресом типа 0 (если адресуемое устройство подключено к шине, прямо подсоединённой к этому мосту), либо сгенерирует транзакцию с адресом типа 1, обеспечив тем самым её прохождение через следующий мост. Длина этой цепочки теоретически ограничена только разрядностью поля, отведённого под номер шины (8 бит).

    Если при выполнении транзакции выяснится, что адресуемого конфигурационного регистра не существует (указан номер несуществующей шины, устройства, функции или регистра), то операция записи не возымеет никаких действий, а операция чтения вернёт процессору значение, содержащее единицы в каждом разряде

    Формат адреса для транзакции типа 1.

    Формат адреса для транзакции типа 0.

    Format of the configuration register:

    Minimum set of registers:
    • Vendor ID - field identifies the manufacturer of the device. It is forbidden to use the value 0xFFFF.
    • Device ID - the field identifies a specific type of device. It is forbidden to use the value 0xFFFF.
    • Revision ID - addition to the device identifier. May be zero.
    • Header Type - For multifunction devices. If the 7th bit is 0, then the device is single-functional, otherwise it is multi-functional.
    • Class Code - Read Only. Used to identify the general functionality of the device. The high byte (address 0Bh) defines the base class, the middle by subclass, the low by the program interface (if standardized).
    • Subsystem ID, Subsystem Vendor ID - set by the manufacturer. Only for reading. They store identifiers that enable accurate identification of cards and devices (
      several cards can be installed in the system with matching device and manufacturer identifiers (Device ID and Vendor ID).
    • BAR0 - BAR5 - describes the memory and input / output ports.

    For memory areas and port descriptions are different:
    • Bit 0 = 0 is a sign of memory. No more than 2 GB in size
    • Bit 0 = 1 - a sign of the port area. Up to 256 bytes in size.

    The size of the areas is calculated as follows. 0xFFFFFFFF is written to the BAR. Next, a value is read from the BAR, and subtracted from 0xFFFFFFFF. The result is the size of the area. The unit in the low bit is not taken into account.

    General transaction algorithm


    The master exposes the device address on the AD bus, the command to be executed on the CBE bus, sets the FRAME signal to 0 and the IRDY signal to 0. Next, the master waits for the target to set TRDY and DEVSEL signals. Also, the target exposes the requested data on the AD bus. Data is considered valid when IRDY, TRDY, and DEVSEL are at logic zero.

    Implementation


    To access FPGA conclusions, you will need special components: I / O buffers for working with the Z - state.
    So, for the AD bus, the connection will look like this:
    signal AD_I:			std_logic_vector (AD'range);
    signal AD_O:			std_logic_vector (AD'range);
    signal AD_T:			std_logic;
    AD_BUF:
    for iCount in AD'lowto AD'highgeneratebegin
    	IOBUF_AD : IOBUF
    	genericmap 
    	(
    	DRIVE => 12,
    	IOSTANDARD => "PCI33_3",
    	SLEW => "SLOW")
    	portmap (
    		O => AD_I(iCount),
    		IO => AD(iCount),
    		I => AD_O(iCount),
    		T => AD_T
    	);
    endgenerate;
    

    Where,
    • O - buffer output.
    • IO - input / output of the buffer, directly connected to the FPGA pin.
    • I - buffer output.
    • T - input control, unit level - input, zero level - output.

    For the rest of the signals, likewise, I will not cite it so as not to clutter up the article.

    As I wrote above, at the beginning of a transaction, when an address is set on the AD bus, the FRAME signal is always zero. Below is the code that generates the AdrPhASE signal, during which you need to click the address bus and the command bus for later work. In fact, the AdrPhASE signal is nothing more than a falling edge of the FRAME signal, which uniquely identifies the start of a transaction.
    signal AdrPhASE:		std_logic;
    signal FRAME_D:	std_logic;
    signal Addres:			std_logic_vector(AD_I'range);
    signal Command:	std_logic_vector(CBE'range);
    signal bCfgTr:			boolean;
    process (clk_i, RST_I) beginif (RST_I = '0') then
    		FRAME_D <= '1'after cTCQ;
    	elsif (rising_edge(clk_I)) then
    		FRAME_D <= FRAME_I after cTCQ;
    	endif;
    endprocess;
    AdrPhASE <= not FRAME_I and FRAME_D;
    process (clk_I, RST_I) beginif (RST_I = '0') then
    		Address	<= (others => '0') after cTCQ;
    		Command	<= (others => '0') after cTCQ;
    		bCfgTr <= falseafter cTCQ;
    	elsif (rising_edge(clk_I)) thenif (AdrPhASE = '1') then
    			Address	<= AD_I	 after cTCQ;
    			Command	<= CBE_I after cTCQ;
    			bCfgTr	<= (IDSEL_I = '1') after cTCQ;
    		endif;
    	endif;
    endprocess;
    

    Further, the operation of the entire device can be described using the machine.
    Machine code
    type TSM_PCI_T is		(sIDLE, sDECODE, sCFG_READ, sCFG_WRITE, sIO_READ, sIO_WRITE, sMEM_READ, sMEM_WRITE);
    signal smPCI_T: 		TSM_PCI_T;
    process(clk_I, RST_I) beginif (RST_I = '0') then
    		smPCI_T <= sIDLE after cTCQ;
    	elsif (rising_edge(clk_I)) thencase (smPCI_T) iswhen sIDLE	=>	if (AdrPhASE = '1') then smPCI_T <= sDECODE after cTCQ;	endif;
    			when sDECODE	=>	if (bCfgTr and Address(10downto8) = b"000"and Command(3downto1) = b"101")  thenif (Command(0) = '0') then smPCI_T <= sCFG_READ	after cTCQ;
    								else smPCI_T <= sCFG_WRITE after cTCQ; endif;
    						elsif (Command(3downto1)= b"001") and (Addres(31downto8) = BAR0(31downto8))thenif (Command(0) = '0') then smPCI_T <= sIO_READ after cTCQ;
    							else smPCI_T <= sIO_WRITE after cTCQ;	endif;
    						elsif (Command(3downto1) = b"011") and (Addres(31downto16) = BAR1(31downto16)) thenif (Command(0) = '0') then smPCI_T <= sMEM_READ	after cTCQ;
    								else smPCI_T <= sMEM_WRITE after cTCQ; endif;
    						else smPCI_T <= sIDLE after cTCQ; 
    						endif;
    			when sCFG_READ	=>	if (IRDY_I = '0') then	smPCI_T <= sIDLE after cTCQ;  endif;  	
    			when sCFG_WRITE	=>	if (IRDY_I = '0') then	smPCI_T <= sIDLE after cTCQ;  endif;  	
    			when sIO_WRITE	=>	if (IRDY_I = '0') then	smPCI_T <= sIDLE after cTCQ;  endif;  	
    			when sIO_READ	=>	if (IRDY_I = '0') then	smPCI_T <= sIDLE after cTCQ;  endif;  	
    			when sMEM_READ	=>	if (IRDY_I = '0') then	smPCI_T <= sIDLE after cTCQ;  endif;
    			when sMEM_WRITE	=>	if (IRDY_I = '0') then	smPCI_T	<= sIDLE after cTCQ;  endif;
    			whenothers		=>	smPCI_T <= sIDLE		after cTCQ;
    		endcase;
    	endif;
    endprocess; 

    To understand the above, I will give possible commands transmitted via the CBE bus.
    • 0010 I / O Read
    • 0011 I / O Write
    • 0110 Memory Read
    • 0111 Memory Write
    • 1010 Configuration Read
    • 1011 Configuration Write

    Each team has its own state machine. The transition to it depends on the current state of the CBE bus and the AD bus for transactions accessing memory and I / O ports. Exit to the initial state is carried out upon receipt of the IRDY signal from the master.

    Read configuration

    As described above, transactions of type 0 are used for processing by the device. Since the device is single-function, the function number is 000, which will be checked in the control machine. Depending on the register number (bits 7..0 of the AD bus), the desired register is issued on the AD bus, according to the figure above.
    Configuration Read Code
    signal CfgRData:			std_logic_vector(31downto0):=x"00000000";
    signal CommandReg:			std_logic_vector(15downto0) := x"0000";
    signal StatusReg:			std_logic_vector(15downto0) := x"0200";
    signal LatencyTimer:			std_logic_vector(7downto0) := x"00";
    signal CacheLineSize:			std_logic_vector(7downto0) := x"00";
    signal BAR0:					std_logic_vector(31downto0) := x"00000001";
    signal BAR1:					std_logic_vector(31downto0) := x"00000000";
    signal InterruptLine:			std_logic_vector(7downto0);
    process (clk_I) beginif (rising_edge(clk_I)) thencase (Address(7downto0)) iswhen x"00"	=> CfgRData <= x"00017788" ; --Device ID and Vendor IDwhen x"04"	=> CfgRData <= StatusReg & CommandReg; --Status Register, Command Registerwhen x"08"	=> CfgRData <= x"10000001"; -- Class Code and Revision IDwhen x"0C"	=> CfgRData <= x"0000" & LatencyTimer & CacheLineSize; -- BIST, Header Type(bit 7 = 0, single, bits 6-0 = 0, type0), Latency Timer(for masters), Cache Line Size (bit 2 in 1)when x"10"	=> CfgRData <= BAR0; -- Base Adress 0 (Register IO address decoder)when x"14"	=> CfgRData <= BAR1; -- Base Adress 1when x"28"	=> CfgRData <= x"00000000"; -- CarfdBus CIS Pointerwhen x"2C"	=> CfgRData <= x"00017788"; -- Subsystem ID, Subsystem Vendor IDwhen x"30"	=> CfgRData <= x"00000000"; -- Expanxion Rom Base Addresswhen x"34"	=> CfgRData <= x"00000000"; -- Reserved, Capabilitis Pointerwhen x"38"	=> CfgRData <= x"00000000";	-- Reservedwhen x"3C"	=> CfgRData <= x"004001" & InterruptLine;	-- Max_Lat(only bus master), Min_Gnt, Interrupt Pin, Interrupt Linewhenothers	=> CfgRData <= (others => '0');
    		endcase;	
    	endif;
    endprocess; 

    This is how reading the configuration in the simulator looks like:


    Configuration record

    On the AD bus, the master sets the register address for recording, and in the next clock cycle, the data to be written is set. In BAR0, bits 7..0 are read-only, in BAR1, bits 15..0 are read-only. Therefore, I / O addresses 256, memory addresses 4,294,967,296.
    Configuration Record Code
    process(clk_I, RST_I) beginif(RST_I = '0')then
    		CommandReg <= x"0000"after cTCQ;
    		StatusReg <= x"0200"after cTCQ;
    		LatencyTimer <= x"00"after cTCQ;
    		CacheLineSize <= x"00"after cTCQ;
    		BAR0 <= x"00000001"after cTCQ;
    		BAR1 <= x"00000000"after cTCQ;
    	elsif(rising_edge(clk_I)) thenif (smPCI_T = sCFG_WRITE) thencase(Address(7downto0)) iswhen x"04"	=>	if (CBE_I(1) = '0') then	CommandReg(15downto8) <= AD_I(15downto8)	after cTCQ; endif;
    								if (CBE_I(0) = '0') then	CommandReg(7downto0) <= AD_I(7downto0)		after cTCQ; endif;
    				when x"0C"	=>	if (CBE_I(1) = '0') then	LatencyTimer <= AD_I(15downto8)				after cTCQ; endif;
    								if (CBE_I(0) = '0') then	CacheLineSize <= AD_I(7downto0)				after cTCQ; endif;
    				when x"10"	=>	if (CBE_I(3) = '0')	then 	BAR0(31downto24) <= AD_I(31downto24)		after cTCQ; endif;
    								if (CBE_I(2) = '0')	then 	BAR0(23downto16) <= AD_I(23downto16)		after cTCQ; endif;
    								if (CBE_I(1) = '0')	then 	BAR0(15downto8) <= AD_I(15downto8)			after cTCQ; endif;
    				when x"14"	=>	if (CBE_I(3) = '0')	then 	BAR1(31downto24) <= AD_I(31downto24)		after cTCQ; endif;
    								if (CBE_I(2) = '0')	then 	BAR1(23downto16) <= AD_I(23downto16)		after cTCQ; endif;
    				when x"3C"	=>	if (CBE_I(0) = '0') then	InterruptLine <= AD_I(7downto0)				after cTCQ; endif;
    				whenothers	=>	null;
    			endcase;
    		endif;
    	endif;
    endprocess; 


    Write to port

    On the AD bus, the master sets the register number for recording, in the next clock the data to be written is set.
    We give an example only for writing one register, the rest are written in the same way.
    signal IOReg0:			std_logic_vector (31downto0);
    process(clk_I, RST_I) beginif(RST_I = '0') then
    		IOReg0 <= x"00000000"after cTCQ;
    	elsif (rising_edge(clk_I)) thenif (smPCI_T = sIO_WRITE and Address(7downto0) = x"00") thenif (CBE_I(0) = '0') then	IOReg0( 7downto0) <= AD_I( 7downto0) after cTCQ; endif;
    			if (CBE_I(1) = '0') then IOReg0(15downto8) <= AD_I(15downto8) after cTCQ; endif;
    			if (CBE_I(2) = '0') then IOReg0(23downto16) <= AD_I(23downto16) after cTCQ; endif;
    			if (CBE_I(3) = '0') then IOReg0(31downto24) <= AD_I(31downto24) after cTCQ; endif;
    		endif;
    	endif;
    endprocess;
    

    Read port

    On the AD bus, the master sets the register number to be read. The device then issues the requested data to the AD bus.
    Port read code
    signal IORDate:			std_logic_vector (31downto0);
    process (clk_I, RST_I) beginif (RST_I = '0') then
    		IORDate <= x"00000000";
    	elsif (rising_edge(clk_I)) thencase (Address(7downto0)) iswhen x"00" =>	IORDate <= IOReg0 after cTCQ;
    				when x"04" =>	IORDate <= IOReg1 after cTCQ;
    				when x"08" =>	IORDate <= IOReg2 after cTCQ;
    				when x"0C" =>	IORDate <= IOReg3 after cTCQ;
    				when x"10" =>	IORDate <= IOReg4 after cTCQ;
    				when x"14" =>	IORDate <= IOReg5 after cTCQ;
    				when x"18" =>	IORDate <= IOReg6 after cTCQ;
    				when x"1C" =>	IORDate <= IOReg7 after cTCQ;
    				when x"20" =>	IORDate <= IOReg8 after cTCQ;
    				when x"24" =>	IORDate <= IOReg9 after cTCQ;
    				whenothers =>	IORDate <= (others => '0');
    			endcase;
    	endif;
    endprocess; 

    This is how writing and reading the input / output port looks like:


    Writing and reading memory

    On the AD bus, the master sets the address to which data should be written, and in the next clock the data itself. When reading on the AD bus, the master sets the address to read, then the data itself targets the AD bus.
    Data is written to RAM in port A, read from port B.
    Memory access code
    signal RamWrEn:		std_logic;
    signal RamOutputDate:			std_logic_vector (31downto0);
    signal RamInputDate:			std_logic_vector (31downto0);
    signal RamRst:				std_logic := '0';
    RAMB16_S36_S36_inst : RAMB16_S36_S36
    portmap (
    	DOA	=> open,						-- Port A 32-bit Data Output
    	DOB	=> RamOutputDate,			-- Port B 32-bit Data Output
    	DOPA	=> open,					-- Port A 4-bit Parity Output
    	DOPB	=> open,					-- Port B 4-bit Parity Output
    	ADDRA	=> Address(8downto0),	-- Port A 9-bit Address Input
    	ADDRB	=> Address(8downto0),	-- Port B 9-bit Address Input
    	CLKA	=> clk_I,					-- Port A Clock
    	CLKB	=> clk_I,					-- Port B Clock
    	DIA	=> RamInputDate,			-- Port A 32-bit Data Input
    	DIB	=> x"00000000",			-- Port B 32-bit Data Input
    	DIPA	=> x"0",					-- Port A 4-bit parity Input
    	DIPB	=> x"0",					-- Port-B 4-bit parity Input
    	ENA	=> '1',					-- Port A RAM Enable Input
    	ENB	=> '1',					-- PortB RAM Enable Input
    	SSRA	=> '0',					-- Port A Synchronous Set/Reset Input
    	SSRB	=> '0',					-- Port B Synchronous Set/Reset Input
    	WEA	=> RamWrEn,		-- Port A Write Enable Input
    	WEB	=> '0'-- Port B Write Enable Input
    );
    process(clk_I) beginif (rising_edge(clk_I)) thenif (RST_I = '1') then
    			RamRst <= '0';
    		else
    			RamRst <= '1';
    		endif;
    	endif;
    endprocess;
    process(clk_I, RST_I) beginif(RST_I = '0') then
    		RamInputDate <= (others => '0') after cTCQ;
    		RamWrEn <= '0'after cTCQ;
    	elsif (rising_edge(clk_I)) thenif (smPCI_T = sMEM_WRITE) thenif (CBE_I(0) = '0') then RamInputDate(7downto0) <= AD_I( 7downto0) after cTCQ; endif;
    			if (CBE_I(1) = '0') then RamInputDate(15downto8) <= AD_I(15downto8) after cTCQ; endif;
    			if (CBE_I(2) = '0') then RamInputDate(23downto16) <= AD_I(23downto16) after cTCQ; endif;
    			if (CBE_I(3) = '0') then RamInputDate(31downto24) <= AD_I(31downto24) after cTCQ; endif;
    			RamWrEn <= '1'after cTCQ;
    		else 
    			RamWrEn <= '0'after cTCQ;
    		endif;
    	endif;
    endprocess; 

    This is how writing and reading memory in the simulator looks like: Data on the AD bus is output as follows. Depending on the state of the machine, the corresponding register is connected to the output buffer.



    process (clk_I, RST_I) beginif (RST_I = '0') then
    		AD_O <= (others => '0') after cTCQ;
    	elsif (rising_edge(clk_I)) thenif (smPCI_T = sCFG_READ) then
    			AD_O <= CfgRData after cTCQ;
    		elsif (smPCI_T = sIO_READ) then
    			AD_O <= IORDate after cTCQ;
    		elsif (smPCI_T = sMEM_READ) then
    			AD_O <= RamOutputDate after cTCQ;
    		endif;
    	endif;
    endprocess;
    

    The enable signal for issuing data to the AD bus is generated as follows:
    process (clk_I, RST_I) beginif (RST_I = '0') then
    		AD_T <= '1'after cTCQ;
    	elsif (rising_edge(clk_I)) then
    		AD_T <= not b2l(smPCI_T = sCFG_READ or smPCI_T = sIO_READ or smPCI_T = sMEM_READ) after cTCQ;
    	endif;
    endprocess;
    


    I would also like to add about the STS component as an example of the development of the DEVSEL signal
    DEVSEL_STS : entity WORK.componentIO(Behavioral)
    portmap (iD => DEVSEL_O, oT => DEVSEL_T, clk => clk_I, rst => RST_I)
    

    Since the active levels of the control signals are equal to zero, then to switch to the Z - state and release the line, you must first issue the level of the logical unit and only then transfer it to the Z - state.

    Conclusion


    In conclusion, I want to say that the execution of transactions on the PCI bus is not as difficult as it seems. The developed firmware was uploaded to the FPGA. The board with the FPGA is inserted into the PCI slot and the computer was turned on. The system found the board and requested a driver for it. Works! :) The drive.google.com/file/d/0B-i4aT8Q0ZNxc1VkV3J2b0dTRlU/view?usp=sharing project itself . Open xilinx ise 14.2.





    Also popular now: