ysdn March 5, 2015 at 10:22

Time limits and static FPGA time analysis using Microsemi SmartTime as an example

From the sandbox

Even while studying at a university, designing various test knick-knacks and doing laboratory work on digital circuitry, I was in a situation where a seemingly correct rechecked project refuses to work “in hardware” several times. At that time, at the dawn of the study of programmable logic, I somehow very rarely ever got to the last points of Design Flow, which, probably, was the trouble. If I accidentally clicked on the Timing Analyzer, after a few seconds of a quick look, it got boring, and I returned to bullying the debug board and composing new frenzy on VHDL.

When the time came for more or less adequate and serious projects, there were more problems, respectively, I began to use Google more intensively and look for answers to my questions. Here, more and more, such terrible phrases as “timing analysis” and “design constraints” began to come across to me, when I read and got a little insight, the realization came that I had missed something very important. At first I was panicky afraid of these unknown constructs, and after all, the first projects worked successfully without them, since the frequency there was no more than a couple of tens of MHz. But when it came to higher frequencies and more complex projects, one can not do without thorough temporary analysis and optimization. As I interacted with people, I was surprised to find that not all of our developers are sufficiently familiar with these processes, which is probably due to a very small amount of documentation and clarification in Russian. Therefore, I decided to share what I accumulated during my work with FPGAs using tools from Microsemi (probably better known as Actel). This post in no way claims to be 100% complete and accurate, just the result of a desire to put knowledge on the shelves and, perhaps, help someone do the same. All comments and suggestions are welcome.

Synchronous circuits and basic definitions

So, as a rule, we are dealing with synchronous circuits. Such schemes consist of the following elements:

I / O ports
sequential elements (triggers);
combinational logic (gates).

The connections of these elements make up the signal paths that pass through the device during operation. Actually, I have already outlined the key concept - the paths . They just determine the performance of the device, in particular, they determine the maximum clock frequency, one of the main requirements of the project and what developers have been struggling with for so long.

The signals begin their journey with the input terminals of the microcircuit, pass sequential and combination elements and arrive at the output terminals. The clock source (CLK) clocks all the triggers of the circuit that memorize the state at its input by the edge of the clock signal (most often along the edge). Between triggers (as well as between input / output ports) is combinational logic. There are two types of delays in the signal path:

delays on elements (cell delay);
propagation delay;

Usually their ratio is 50/50, that is, the path in the wilds of combinational circuits is divided in half between the delay from the input of the next valve to its output and the propagation of the signal along the communication lines. The maximum delay in the circuit corresponds to the critical path , that is, the longest path, which determines the longest period and, accordingly, the maximum frequency of the device. Here you need to consider a few basic concepts.

Naturally, when transmitting a signal, two sides of the interaction obviously appear - the source and the receiver. These are the end points of the path. Endpoints can be I / O ports and triggers. Let us dwell on the triggers. In our case, they are clocked by one clock signal, and the path runs from the output Q of one trigger to the input D of the second. Although there is only one clock signal, in this example we will give it two names:

Launch clock - along the front, new data arrives at output Q of trigger 1;
Latch clock - on the front, trigger 2 remembers what is currently at input D.

Since the data is distributed with a delay caused by the above factors, the signal at input D of trigger 2 does not appear immediately. The following characteristics follow from this:

Setup time (t _su ) - time for which the signal should be set to the front clk of the receiver;
Hold time (t _h ) - time that the signal should be held after the front clk of the receiver;
Slack determines the time margin for t _su and t _h .

t _su and t _h form a kind of corridor, the rod of which is the front latch clock. Now the requirements for the signal at the input D of the receiver are simple - it should not change within this corridor. That is, in the ideal case, to establish long before its left border and change the value to a new one some time after the right border. This same time reserve is called Slack. If Slack is a positive number, then everything is in order, and the data will arrive at the input of the receiver at the required time, if negative - the specified path does not satisfy the time characteristics, that is, the data arrives at the input outside the required time interval, which means that the device will not work correctly .

Actually, difficulties begin here. If you do not have a training scheme with a couple of dozens of triggers, but a complex HDL description that causes headaches when viewing a graphic RTL model, the likelihood of such long paths that dramatically undermine performance increases significantly. In order to control this process and tell the IDE your wishes for the temporal characteristics of the project, the latter contain several convenient tools.

Setting Timing Constraints

Before starting to design a new device, the developer must have as complete information as possible about the requirements for this device and its performance. First of all, these are the temporal characteristics of this system. And when they are known to him himself, you need to report this to the design tool, and then time constraints or time constraints come to the rescue . Time limits - this is information about the requirements for the project’s time characteristics, set out in an understandable language environment, which is most often Synopsis Design Constraints, SDC. This is a de facto standard for describing temporal (and not only) limitations for FPGAs based on Tcl, which, incidentally, is itself universally used to automate the development of equipment.

These descriptions are placed in a * .sdc file and attached to the project. The consumers of this file are all kinds of optimizers who try to breed the crystal so that it meets the requirements of the developer, as well as temporary analyzers, which will be discussed later. Sdc files are uncomplicated, in fact it is an enumeration of commands with arguments and their values. In the description, you can (and should) use the Tcl syntax, including special characters, for example, to place one command on several lines.

So, we list several basic commands and figure out what they describe. The first team and definitely a must have for absolutely any design:

create_clock -name name -period period_value [-waveform edge_list] source

This team, we define in the scheme of the clock signal and describe its characteristics:

Name of constraints

-name name

Period

-period period_value

Dutyness (default 2), square brackets indicate optional argument

[-waveform edge_list]

Signal source (pin, port)

source

Knowledge of CAD tools about the clock signal is most important, since without this, there can be no talk about any analysis and optimization. Further, you can further refine the information about the clock signal using the commands set_clock_latency, set_clock_uncertainty, etc., but we will not consider this here, relying on the default values set in the environment. As an example:

create_clock -name {my_clock} –period 6 –waveform {0 3} {CLK}

This command creates a clock signal with a period of 6 ns, within which the edge will be at 0 ns, and the decline at the 3rd.

Another useful command related to the clock signal:

create_generated_clock -name {name -source reference_pin [-divide_by divide_factor] [-multiply_by multiply_factor] [-invert] source

It describes the clock signal that is generated inside the chip, usually in phase locked loop (PLL) schemes. Actually, the arguments for the most part repeat the settings specified in the PLL - the source of the original signal, division and multiplication factors, signal inversion, etc. Since PLL is used everywhere, this is also an important command, which is common.

Let's move on to the teams that set the actual limitations and design requirements. The first pair of commands:

set_input_delay delay_value -clock clock_ref [–max] [–min] [–clock_fall] input_list
set_output_delay delay_value -clock clock_ref [–max] [–min] [–clock_fall] output_list

Important restrictions if the design interacts with external devices (and this is always the case). Sets the delay for the signal external to the FPGA (input or output) with reference to the clock signal. Interacting with other devices, we must take into account their temporal characteristics, for which both of these serve. For example, there is our FPGA, some device that exchanges data with us, a clock generator, which serves as a common source of clock pulses. In order to effectively conduct time analysis and tracing, it would be nice to know how the signal will come to us and how we can give it out. Typically, such information is described in the corresponding datasheets on the products, so the task usually comes down to viewing the documentation and copying the characteristics to our * .sdc file.

As for the arguments, it’s simple here - the delay value in nanoseconds, the clock signal, optionally indicate whether the delay is maximum or minimum, we can indicate that the binding is on the decline, and the last one is a list of ports to which to apply.

The following pair of commands sets the minimum and maximum delay on the internal path, respectively:

set_min_delay delay_value [-from from_list] [-to to_list]
set_max_delay delay_value [-from from_list] [-to to_list]

The arguments are, again, simple - the delay value in nanoseconds, the starting point and the ending point. Typically, such restrictions apply to purely combinational paths from chip inputs to outputs. This takes into account the set_input_delay and set_output_delay and create_clock constrains, if at least one of the endpoints is a synchronous element. It can also be used for circuits with several clock domains, providing a reliable transition between them.

We finish the consideration of temporary constants with two commands that serve to determine the paths, the passage of which takes more than one clock cycle and false paths. Here you need to retreat again to tell what these paths are.

Multicycle path- such a path, the endpoints of which are triggers, which requires more than one clock cycle period, so that the data passing through it reaches the destination point. It is very important to identify such paths, since by default all optimization tools consider the scheme as single-cycle, that is, they try to bring all the trigger-trigger type paths to one clock cycle. For example, a source produces data with a frequency two times lower than the clock frequency. Then it makes no sense to catch data on each measure, so this path is marked as multicycle and the signals passing through it are given the privilege to linger for 2 measures. If this is not done, our tool will vainly try to optimize this path, while others may suffer, which just require a passage in one clock cycle.

Flase path - false paths, such paths, although physically exist, but there is a reason why we want to exclude them from the processes of optimization and temporary analysis, for example, if a signal never goes through them during the operation of the device. A simple example: we have a 4-bit counter, but we only need to count to 9, then the counter is always reset. But it turns out that incrementing at higher numbers involves paths with significant delays. They are present, but, in fact, are not needed. Such paths are marked as false path and are thus excluded from optimization and time analysis. As in the example with multicycle, if you leave everything as it is, then these paths will undergo optimization with all the ensuing consequences for the remaining paths.

Commands to tame the above paths:

set_multicycle_path ncycles [-from from_list] [–through through_list] [-to to_list]
set_false_path [-from from_list] [-through through_list] [-to to_list]

In both commands, the arguments indicate the end points, and in the case of multicycle, the number of ticks that are given to the signal for the passage of the path.

So, we examined some of the commands with which you can set time limits and describe the requirements for the time characteristics of the project. Their correct assignment and attentiveness are the key to success in the development of FPGA devices, but they can equally create significant difficulties and errors that will be difficult to track and eliminate if unrealistic requirements are described. Of course, the above is only a drop in the ocean, but it also gives an initial idea and foundation for future study. More details about these commands and their keys can be found, for example, in [2]. Now let's move on to the practical part and see how the time analysis looks in the Libero SoC environment, a design tool for Microsemi / Actel FPGAs.

Time Analysis in Libero SoC SmartTime

Compose requirements and introduce time limits - this is not so bad. At this point, a long and complex process of temporary analysis and struggle for megahertz begins. With a more or less adequate complexity of the project, the first result cannot be achieved. Therefore, you will have to revise the requirements, make changes to the restrictions file and modify the project itself. Sometimes you can change the FPGA, for example, to the same, but with a large speedgrade. But in order not to change and immediately understand which chip satisfies the needs of the project, there are static time analysis tools.

Now a time analyzeris included in every modern CAD equipment development. With this program, the developer can find out if his aspirations correspond to the capabilities of the just born (or maybe a hundred times recompiled) device even before flashing the FPGA and testing on a full-scale sample. In modern CAD systems, they have a convenient graphical interface and can be quickly mastered.

Consider a time analyzer using the example included in Libero SoC SmartTime. To do this, create a variation of the classic hello world project for FPGAs with a counter in the Libero SoC environment and use its example to figure out what the time analyzer allows.

A simple third-generation Microsemi FPGA chip, ProASIC3 A3P600, with a standard speedgrade in the PQ 208 package, was chosen in the project. To get started, let us drive the project through Design Flow as it is. At the same time, in the Place & Route settings, you need to select the optimization criterion for time characteristics (Timing-driven).

After that, we will have access to the Designer tool, which, among other things, contains a shell for managing time limits and time analysis - SmartTime. It is represented by two subsystems - Constraints Editor and Timing Analyzer.

By opening the Constraints Editor, we can use the convenient graphical interface to set the very requirements and restrictions that were mentioned above, and then export the * .sdc file. So let's do it. As indicated above, the first and certainly necessary constant is the creation of clock signals with the required characteristics. We have only one such, to describe it, follow the menu: Actions -> Constraint -> Clock.

We indicate the pin from which the signal should come and imagine that we need the project to work at 200 MHz. After clicking OK, we will see how the shred appeared in the editor.

In order for the changes to take effect, click File -> Commit, and from the Designer window, export the restriction file by File -> Export -> Constraint Files .... By default, it lays in the constraint folder in the root of the project. Let's go back to Design Flow and mark the appeared top.sdc file as used in the Synthesize and Compile subparagraphs and open it.

################################################################################
#  SDC WRITER VERSION "3.1";
#  DESIGN "top";
#  Timing constraints scenario: "Primary";
#  DATE "Mon Feb 16 10:48:26 2015";
#  VENDOR "Actel";
#  PROGRAM "Microsemi Libero Software Release v11.4 SP1";
#  VERSION "11.4.1.17"  Copyright (C) 1989-2014 Actel Corp. 
################################################################################
set sdc_version 1.7
########  Clock Constraints  ########
create_clock  -name { Clock } -period 5.000 -waveform { 0.000 2.500  }  { Clock  } 
########  Generated Clock Constraints  ########
########  Clock Source Latency Constraints #########
########  Input Delay Constraints  ########
########  Output Delay Constraints  ########
########   Delay Constraints  ########
########   Delay Constraints  ########
########   Multicycle Constraints  ########
########   False Path Constraints  ########
########   Output load Constraints  ########
########  Disable Timing Constraints #########
########  Clock Uncertainty Constraints #########

We see a specially formatted file in which our create_clock is present, and the rest of the fields are empty (in their place there may be corresponding commands, if set). Well, run Design Flow again, to Verify Timing. Open Designer again and launch the second subsystem - Timing Analyzer. By default, the Maximum Delay Analysis View opens, that is, time delays calculated based on the worst case conditions. Let's look at the results.

Many have long developed a reflex: red color is bad. There are exceptions, but not in this case. Let's move on to the Register-to-Register sub-item, which contains information about the paths between triggers in a single created clock domain in a tabular form. We have some bad results in such ways, a negative Slack appeared, the signal arrival time to the trigger receiver is longer than the calculated maximum allowable. What this threatens is described in the theoretical part at the beginning of the post. Fortunately, everything is not so bad here - only five paths showed a negative result. The Slack distribution can be seen in the histogram in the lower left of the window. Let’s get started. To begin with, we will recall what conditions we set and see what the analyzer said.

It turns out that we got excited, and our 200 MHz was lowered to 172 MHz f _max for this project. Now let’s take a closer look at one of the bad ways, for this we’ll double click on it.

Detailed path information opens. We are shown information about the required time of arrival of melons (Data Required Time), the time of actual arrival of data (Data Arrival Time) and the margin of time (Slack). At the same time, the path is opened in the form of a table with a detailed indication of where and how much the signal is delayed, as well as in the form of an image of the connections between the source trigger and the receiver trigger. The tool also shows how it calculates Data Required Time. In the upper right corner, you can also see a pie chart showing the ratio of delays on valves and delays on connection lines.

Analyzing the results, we conclude that delays on the gates of the combinational chain on the way from the trigger of the 2nd category to the trigger of the 7th category of the counter do not allow the circuit as a whole to operate at the indicated frequency. The combination path is too long and complicated, the data does not have time to arrive at the right time, Slack has a negative value. This situation arises in the senior digits for an obvious reason - in order to settle down to one, the seventh digit needs to make sure that all the others are already installed, accordingly there are 8 paths to it (from all digits, including feedback), and it is likely that some of them will be unacceptable.

Thus, due to just a few paths, the design does not work at the right frequency. It's a shame. How to deal with this? The most common way to increase the performance of synchronous circuits is to eliminate a large amount of combinational logic between triggers by dividing a process into stages, this is called pipelining . In the general case, with this approach, the input data stream arrives as usual, passes through several stages of the pipeline, and appears at the output after a time, depending on the depth of the pipeline. Depth, that is, the number of steps, is selected based on performance requirements.

Let's go back to our project and try to apply this approach to achieve the goal that was set. We split one 8-bit timer into two 4-bit ones, add the transfer output and the clock enable input. Connect the transfer output of the first timer to the clock enable input of the second timer via the D-trigger. We get a two-stage conveyor, the first timer represents the least significant bits, the second timer represents the senior ones.

We start compilation and go to SmartTime. Voila. Negative Slack gone, no errors, the frequency rose to 227 MHz, which is even much more than we needed.

So, applying the pipelining technique, we overclocked the frequency of the project with the counter from 172 MHz to 227 MHz, while the functionality was fully preserved, like the crystal used.

Conclusion

Of course, we considered a very simple case, and this is all very far from real projects and the real optimization process, when the head starts to hurt from red in the window of the temporary analyzer and it takes days to debug the project. When the example becomes a little more complicated, a host of new questions will appear. How to effectively catch multicycle and false paths? What to do with multiple clock domains? Maybe you can somehow fix the wiring of some elements and fix their temporal characteristics?
But this is a good starting point for beginners to master this difficult task. And, of course, it’s worth trying to do the same with your own hands and try to optimize a more complex project.

References:

1. www.microsemi.com , actel.ru - the official Microsemi website with documentation, the website of the official distributor (information in Russian)
2. www.microsemi.com/index.php?option=com_docman&task=doc_download&gid=131597 - about constantin.
3. www.vlsi-expert.com/p/static-timing-analysis.html - about static time analysis.
4. vhdlguru.blogspot.ru/2011/01/what-is-pipelining-explanation-with.html - about pipelining.
5. www.microsemi.com/index.php?option=com_docman&task=doc_download&gid=130940 - SmartTime Guide.

Tags: