Modeling mixed circuits on System Verilog

From the sandbox

They lived, not, not like that ... One day early in the morning, coming to work again, I found out that we have only one power input in the server room and it can burn out. There was nothing to do all day, and I decided to write an article on Habr. The article is aimed at beginners and idly interested.

CMOS technology has reached such a level that modern microcircuits are huge and very complex structures and systems assembled from systems. At the same time, the cost of launching production increases exponentially with a decrease in technological standards. Therefore, during development, it is required to simulate and verify everything to the maximum extent possible. An ideal case, which is even sometimes realized in practice, when the microcircuit started working from the first launch.

Since we live in an analog world, even a digital microcircuit must be able to communicate with this world. Digital microcircuits contain dozens of large analog blocks on a chip, such as ADCs, DACs, PLLs, secondary power supplies, etc. The exception to this rule is likely to be only large processors, such as Core i, etc., where all this economy is brought to the chipset.

Traditionally, spice simulators such as pi-spice, mmsim, hspice, etc. are used to simulate analog blocks. In such simulators, a circuit is described by a system of differential equations of enormous dimension (or by a matrix representing it). Spice simulators at each step of the calculation find a solution to this system of equations by numerical methods. Of course, methods of accelerating these calculations are used, such as: partitioning the matrix into submatrices, parallelizing into a number of threads and computational cores, a variable calculation step, etc.

Unfortunately, numerical methods are fundamentally iterative and poorly parallelized, so this type of simulation, nevertheless, remains slow enough to simulate the system as a whole. Nevertheless, it is widely used in the development of the analog blocks themselves and analog microcircuits. We will lead a story about digital (in general) microcircuits containing analog blocks and analog-digital systems, where we would like to describe our blocks in the form of formulas and equations, and solve these Navier-Stokes equations (joke) analytically. Using this technique does not cancel out a much more accurate calculation on a spice simulator, but only complements it allowing you to speed up development and modeling.

Representation of analog signals

A floating point type is well suited for representing analog signals. In System Verilog, these are the types shortreal (equivalent to float in C) and real. It should be noted that these are data types with memory. The value in them is updated only at the time of assignment, i.e. it is a type similar to reg, but in which the stored value is represented not by 0 or 1, but by voltage or current, which, in turn, is represented as a floating-point number.
Now, we really need a type similar to wire, which is updated continuously, and not just at the time of recording. I must say that there is no such type in System Verilog. Rumor has it that when discussing the standard, there was some movement in order to insert this functionality into it, but it was not implemented in anything concrete. Nevertheless, if you use the ncsim simulator, then it has a var modifier, which makes the analog of wire of the real type and other types. Example:

real a;
varreal b;
assign a = in1+in2; //тут будет ошибка
assign b = in1+in2; // это будет работать, b – всегда будет равно in1+in2

Lyrical digression for pure programmers

Программа на verilog – это параллельная программа. Все строчки кода, в принципе независимы и выполняются как последовательно, так и параллельно, в зависимости от некоторых условий. В данном случае assign сработает при запуске этой программы и будет работать до самого его конца, вычислять сумму непрерывно.

If your simulator does not support var, then you can do this:

real b;
always @( * ) // симулятор входит в этот always при любом изменении in1 или in2
          b <= in1+in2;

The recording is less convenient, but still quite working.

Data type conversion

The following functions are built in verilog for data conversion

$itor() // integer to real
$rtoi() // real to integer
$bitstoreal() //reg [  : ] to real
$realtobits() // real  to reg [  : ]

If the code you want to convert to real is signed and is presented in an additional code, you need to be careful when using these functions, you may need to convert or expand the sign. If you, for some reason, do not want to use these functions, then you can use the following technique.

reg [7:0] code;
int       a;
real      voltage;
always @( * ) 
begin
        a       = {{24{code[7]}}, code[7:0]}; //расширяем знак до размера int 
        voltage = a;
end

Simplified analog block models on Verilog

Additive white noise amplifier

module amp(input var real in, output real out);
   parameter k = 10; //коэффициент усиления 
   parameter seed = 60;
   parameter noise_power = -20; //мощность шума в dB
   real noise;
   always @(*)
   begin
          noise = $sqrt(10**(noise_power/10))* $itor($dist_normal(seed, 0 , 100_000))/100_000;
          out   = in * k + noise;
   end
endmodule

DAC with low-pass filter

`timescale 1ns / 1ps
module DAC(input signed [7:0] DAC_code, output real out);
   parameter fs     = 10e-9;
   parameter ffilt  = fs/64;    //частота расчета фильтра
   parameter CUTOFF = 100e6;    //частота среза фильтра
   parameter a      = ffilt/(ffilt+(1/(2*3.141592* CUTOFF)));
   real DAC_out;
   //ЦАП
   always @( * ) 
           DAC_out <= $bitstoint(DAC_code[7:0]);
   //ФНЧ 1го порядка
    always #(0.5*ffilt)out  <= a* DAC_out + (1-a)*out;
endmodule

ADC taking into account nonlinearity

module ADC (input real in, input clk, output reg [7:0] ADC_code)
  real adc_tf[0:255];
  real min_dist;
  int i,j;
  int dnl_file;
  initial
  begin   
      dnl_file=$fopen("DNL_file","r"); 
      if(dnl_file==0)
        $stop;     
      for(i=0;i<256;i=i+1)
        $fscanf(dnl_file, "%f;", adc_tf[i]);//считываем из файла характеристику АЦП
  end
  always @(posedge clk) 
  begin
    min_dist = 10;
    for(j=0;j<256; j=j+1) //находим ближайший к входному сигналу кодif($abs(in- adc_tf[j]) < min_dist)
           begin
                min_dist = delta_abs;
                ADC_code[7:0]=j;
           end
  end
endmodule

Multiphase Clock Source (PLL)

module MPLL (input en, input [5:0]phase, output clk_out);
  parameter REFERENCE_CLOCK_PERIOD=10e-6;
  parameter PHASES_NUMBER=64;
  reg [PHASES_NUMBER-1:0]PLL_phase=64'h00000000_FFFFFFFF; //ГУН на кольцевом генераторе
  always #(REFERENCE_CLOCK_PERIOD/PHASES_NUMBER)  if(en===1) 
           PLL_phase[PHASES_NUMBER-1:0] <= {PLL_phase[PHASES_NUMBER-2:0], PLL_phase[PHASES_NUMBER-1]}; //сдвигаем кольцевой генератор по кругу
  assign clk_out = PLL_phase[phase]; //мультиплексор клока 
endmodule

The use of such and similar, but more complex analytical models, speeds up the calculations by orders of magnitude compared to spice modeling and allows you to actually simulate and verify the complete system on System Verilog.

Still accelerating

Unfortunately, modern systems are already so complex that this acceleration is not enough, in this case you have to resort to parallelization. Multi-threaded Verilog simulators, as far as I know, have not yet been invented, so it will have to be hand-to-hand.
SystemVerilog introduced a new mechanism for accessing external software modules - Direct Programming Interface (DPI). Because this mechanism is simpler, in comparison with the other two, we will use it.

At the beginning of the module, where we want to call an external function, insert the line import.
import "DPI-C" function int some_funct (input string file_name, input int in, output real out);
Then you can use it in Verilog in the usual way, for example, like this:

always @(posedge clk)           
     res1 <= some_funct (“file.name”, in1, out1);

How to compile and where the libraries are located is described in the documentation for the simulator.
The following is an example of a program running in multiple threads

Example

#include<pthread.h>typedefstruct 
{//work specificdouble in; // данные для расчетаdouble out;   //результат расчета
   …
 //thread specificchar processing;               //флаг разрешения расчетаpthread_mutex_t mutex;
   pthread_cond_t  cond_start;
   pthread_cond_t  cond_finish;       
   void *next_th_params;
   pthread_t tid;
}th_params;
static th_params th_pool[POOL_SIZE];

Расчётная функция:

void* worker_thread(void *x_void_ptr){
  th_params *x_ptr = (th_params *)x_void_ptr;
  while(1)  //бесконечный цикл
  {
       // ждем поступления новых данных
      pthread_mutex_lock (&x_ptr->mutex);         //блокируем        
      x_ptr->processing = 0;                                //Окэй, рэди фор ворк
      pthread_cond_signal(&x_ptr->cond_finish);  //даем гудок, что закончилиwhile(x_ptr->processing == 0)
          pthread_cond_wait(&x_ptr->cond_start, &x_ptr->mutex);  //ждем ответного гудка
      x_ptr->processing = 1;                               //ставим флаг - занят
      pthread_mutex_unlock(&x_ptr->mutex);     //разблокируем// здесь что-то считаем, вероятно ассемблерная вставка SSE2
      …
  }
}

Функция запуска расчётных функций

voidinit(th_params *tp){ 
    int i=0;
    for(;i<12;i++)
    {
        pthread_attr_t attr;
        pthread_attr_init(&attr);
        pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
        pthread_create(th_pool->tid,  &attr, &worker_thread, tp);
    }
}

Функция, раздающая работу расчётным функциям (ее будем вызывать из Verilog постоянно)

intch(double in, double *out){   
   int i;               
   for(i=0;i<12;i+=1)
   {
         //ждем если рабочие функции еще не досчитали
         pthread_mutex_lock(&th_pool[i].mutex);                                //блокируемwhile(th_pool[i].processing == 1)      
                pthread_cond_wait(&th_pool[i].cond_finish, &th_pool[i].mutex); //ждем гудок
         pthread_mutex_unlock(&th_pool[i].mutex);                            //разблокируем
   }  
   //присваиваем результаты в массив на выходе для передачи в Verilogfor(i=0;i<12;i+=1) 
        out[i] = th_pool[i].out;
   for(i=0;i<12;i+=1)
   {
        pthread_mutex_lock   (&th_pool[i].mutex);       //блокируем     
        th_pool[i].in          = in;                                   //передаем расчетной функции данные
        th_pool[i].processing  = 1;                               //ставим флажок разрешения расчета
        pthread_cond_signal  (&th_pool[i].cond_start);  //даем ответный гудок, что бы проснулась
        pthread_mutex_unlock (&th_pool[i].mutex);      //разблокируем   
   }
}

Unfortunately, modern systems are already so complex that even this acceleration is not enough. In this case, you have to resort to using OpenCL for calculations on a video card (no more complicated than DPI), calculations on a cluster or in the cloud. In all these cases, the transport component, i.e. data transfer time to and from the calculating device. The optimal problem, in this case, is where you need to count a lot, while there is, with respect to this calculation, a small amount of data, both source and result. The same applies to the presented program, but to a lesser extent. If this condition is not fulfilled, then often, counting on only one processor is faster.

It is necessary to take into account that none of the presented methods works when there is no power in the server room, however, it was just served, youtube again worked. On this happy note, I hasten to finish my story, work is waiting.

Tags: