SparF February 10, 2017 at 00:44

MIPSfpga and SDRAM. Part 2

The configuration we are considering consists of the following elements:

AHB-Lite bus

It is the main tool for communicating the MIPSfpga core with the outside world. From it, commands to read and write information are received to the SDRAM access module, and read and write data are transmitted through it. Main feature: the address phase of the subsequent command coincides in time with the data phase of the current command. This is best seen in the following diagram:

A brief description of the displayed signals: HCLK - clock signal; HADDR - the address, the data at which we want to write or read in the next phase, is set by the master; HWRITE - at a high level in the next phase, a write operation should be performed, set by the master; HRDATA - read data; HREADY - flag to complete the current operation; HWDATA - recorded data, set by the master. The bus documentation, including a description of all signals and their possible combinations, is included in the MIPSfpga package .

SDRAM

The basic principles on which SDRAM is built are very well described in Chapter 5 of the Harris & Harris textbook [1] . Note the main points:

information about the presence or absence of a capacitor charge is used to store one bit;
the memory is organized in the form of matrices of capacities and control logic: with columns and rows;
during the reading operation, the charge of the cell (capacitor) is consumed, after reading it has to be recharged;
during inactivity, the value of the stored charge also decreases (albeit slower) - memory cells require periodic recharging (the so-called regeneration).

We continue our further discussion using the example of the Micron MT48LC64M8A2 chip . In addition to a very convenient and detailed datasheet, the company provides Verilog a model for simulating work with this memory chip. This, on the one hand, greatly simplifies development, and on the other, it allows you to run MIPSfpga inside the simulator without a debug board and see how the kernel interacts with SDRAM.

The block diagram of the memory chip is shown in the figure below.

Essential elements:

bank (matrix) of memory (4x bank memory array) - this is where the data of interest to us is stored. In this chip, there are 4 banks, each of which has 8192 rows and 1024 columns of 16 bits each. Total, the total chip capacity is 4x8192x1024x16 = 512 Mb = 64 MB.
control device (control logic, bank control logic) - provide decoding of the received command and the issuance of appropriate control signals to other elements;
address bus multiplexers, latches and decoders (row-address mux, 4x bank row-address latch & decoder, column-address counter / latch, column decoder) - provide storage of address information of rows, columns and memory banks received in different commands;
registers and data bus logic (data output register, data input register, i / o gating, dqm mask logic) - provide input / output of data during read and write operations, allow you to work with masks (when out of 16 bits we need only the oldest or youngest bytes), provide the translation of the findings of the data bus in the Z-state, the bus is bidirectional.

Operating conditions

For the correct operation of RAM, we need to fulfill a number of conditions. We will not consider some of them: ensuring temperature conditions, frequency and power stability, signal levels (static discipline), correct wiring on the board. In our field of vision remains:

giving correct control signals corresponding to one or another command;
meeting the requirements of dynamic discipline (Chapter 3 of the Harris & Harris textbook [1] ), taking into account the requirements of the documentation for the microcircuit [2] .

In order to objectively understand what is at stake, consider what a memory access module should do when reading data from RAM. The case of the so-called READ With Auto Precharge - when the microcircuit after the read operation itself provides recharging of the cells that we addressed. Module initialization (INIT), write operations (WRITE), or automatic regeneration (AUTO_REFRESH) are performed in the same way, with a difference in the commands executed and the time limits imposed.

The following are excerpts from the datasheet: a truth table for commands and a timing chart that shows how to correctly read the data.

Note: L - low level, H - high level, X - does not matter, High-Z - high impedance.

Note: tCMS - command setup time, tCMH - command hold time, tAS - address setup time, tAH - address hold time, tRCD - active command to read, tRAS - command period (ACT to PRE), tRC - command period (ACT to ACT), tLZ - output Low impedance time, tAC - access time from clock, tOH - output data hold time, tRP - commad period (PRE to ACT). The minimum values of these and other parameters for different conditions are given in the documentation for the memory chip.

The sequence of actions when reading data (per clock)

T0. Not later than tCMS before the CLK front, ensure the presence of established signals on the pins CS #, RAS #, CAS #, WE #, DQM (hereinafter referred to as the command) corresponding to the ACTIVE command. The indicated signals should not change their state during tCMH from the moment of the CLK front. Not later than tAS before the tCLK edge, set the line address on the address bus (A [12: 0]), and the memory bank address on the memory bank address bus (BA [1: 0]). These signals must be stable during tAH after the CLK edge.

T1. During (tRCD - 1 clock) give the NOP command. After this time period has elapsed, the previously transmitted line address will be guaranteed to be stored in the row-address latch & decoder of the corresponding memory bank, one of 8192 lines will be selected (see the chip structural diagram).

T2. Not later than tCMS before the CLK front, ensure the input of the READ command, do not change the command during tCMH from the moment of the CLK front. Not later than tAS before the tCLK edge, set the column address on the address bus, and the memory bank address on the memory bank address bus. The tenth bit of the address bus is set to 1 as an indication that Auto Precharge must be performed after reading.

T3-T7. Ensure that the NOP command is issued for the entire time the data is read and no less than (tRC - 1 clock) from the moment the ACTIVE command was issued.

T4. After CL clock cycles (the so-called CAS Latency, CAS), the read data will be guaranteed to be present on the DQ data bus. More precisely, they will appear on the bus later (1 cycle + tAC) - for the case when CAS = 2. And they will be stable for at least tOH after the CLK front. During this time, data from the bus must be read.

If you look at the interaction inside the chip, then during the time (1 clock + tAC) the column address will be stored in column-address counter / latch, at the output of the corresponding column decoder memory bank signals will be set that select 16 bits of the column we need, this data will be sent to data output register and, ultimately, will be on the data bus (DQ [15: 0]).

T5-T7. The example we are considering assumes that the memory chip was configured to perform burst operations with a packet size of BL = 4 (burst length, set among other parameters with the LOAD MODE REGISTER command, in the current implementation of the memory access module, it is specified as BL = 2 to get 32 bits of data). For this reason, over the next three clock cycles, the column-address counter / latch will automatically increase by one, and the output of the data bus will receive another 3x16 bit.

It should be noted that the number of measures will not necessarily be equal to 8, as shown in the diagram (T0-T7) - it must be increased upwards in order to satisfy the requirements of all time limits: tRCD, tRC, etc.

Time constraint requirements are met by

phase shift of the clock signal on which the memory (CLK) is operating relative to the clock signal on which the memory access module operates - for small intervals (tCMS, tCMH, tAS, tAH, tAC, tOH);
giving empty commands (NOP) at large intervals (tRCD, tRC, tRP) - where the delay size exceeds the width of 1 clock cycle. For this, the corresponding states are introduced into the composition of the finite state machine of the module.

Clock phase offset

There are several good sources ( [3] and [4] ) that reasonably contrast the “scientific” approach of determining the phase shift of the clock signal to the “trial and error” method. These documents provide a number of formulas for calculating the boundaries of "safe windows" into which the delay values must be substituted. Then it is proposed to shift the clock signals so that their fronts are as close as possible to the centers of these "windows". Agreeing that the described technique works, I want to pay attention to a slightly more "lazy" version of the same approach (it seems to me that it is depicted on pages 12 and 20 of the presentation, but since there are no comments on it, I’m in this not sure):

we take two sheets / strips of paper in a cage (graph paper can be);
observing the scale, we apply several clock cycles to each of them, one of them is for fpga, the other is for a memory chip;
observing the scale, we note on each of them:
forbidden zones in which the readable input signal should not change (A);
areas in which the value of the output signal is not defined;
zones in which the output signal is valid (B).
we place the strips of paper in parallel and shift them relative to each other (a-la slide rule) so that zones A are as close as possible to the centers of zones B and in no case go beyond their border.
Using a ruler, measure the received clock offset, translate it into ns according to scale.

To ensure accurate and stable phase displacement, a PLL module must be included in the system. Usually I add another 3rd clock signal with a frequency 4 times higher than the others and a small phase shift - in order to use it as a clock frequency for a logic analyzer (SignalTap) when debugging memory interactions in hardware.

Memory access module

This section contains the state diagram of the state machine of the memory access module, as well as individual lines of the module code that describe the procedure for reading data (with the indication of line numbers of the code to simplify navigation). The source code of the module as a whole: mfp_ahb_ram_sdram.v . If reading screenshots with the code makes you uncomfortable, fragments of the source code from the article (including comments on them) are duplicated on github .

Состояния конечного автомата, описывающие процедуру чтения, полностью соответствуют тому, что было описано выше на примере диаграммы READ With Auto Precharge.

Правила перехода между этими состояниями:

Там, где необходима задержка, она заносится в регистр delay_n, нулевое значение регистра соответствует флагу DelayFinished. На статусах S_READ4_RD0 и S_READ4_RD1 производится считывание данных из шины DQ:

Кодирование команд и их вывод в зависимости от текущего состояния:

Все задержки являются настраиваемыми и задаются в параметрах модуля, что должно упростить портирование на другие платы, а также модификацию настроек в случае изменения частоты тактового сигнала.

Список литературы

[1] Textbook of David Harris and Sarah Harris, “Digital Circuitry and Computer Architecture”
[2] Documentation for Micron's MT48LC64M8A2 memory microcircuit;
[3] Quartus documentation. The core controller SDRAM (translation)
[4] SDRAM PLL Tuning (presentation)
[5] Ryan Donohue. Synchronization in Digital Logic Circuits (presentation)
[6] Documentation for the IS42S16320D memory chip

All datasheets, articles and presentations that are referenced in the article are available on github .

Tags: