How I made a mistake when writing a hash table and what conclusions did I draw from this

Published on February 25, 2016

How I made a mistake when writing a hash table and what conclusions did I draw from this

    For clarity of theoretical understanding, there is no better way than to learn from your own mistakes, from your own bitter experience. (Friedrich Engels)

    Hello!


    A few weeks ago, a colleague wrote to me in linkedin and said that the hash table doesn’t work correctly on my github project .


    Tests and a fix were sent to me, and a situation was really created where the system "hung". When investigating the problem, I realized that I made several mistakes during verification. The topic of verification of the RTL code is not too detailed on Habré, so I decided to write an article.


    From the article you will learn:


    • How can I organize a hash table on an FPGA?
    • on what verification was built.
    • what mistakes I made (they led to the fact that the bug was not noticed before).
    • how can all this be fixed.

    Welcome to cat!


    Briefly about the project


    Goals and background


    The goals that I set before the start of the project:


    • if it is stated that the hash table has N elements, then it is guaranteed to fit any N elements there.
    • the table itself resolves collisions, there is no limit on the number of elements in the basket (but no more than the size of the hash table, of course).
    • the table allows adding, searching and deleting by key
    • if there are no collisions during the search, the table is ready to accept tasks for the search every clock cycle (maximum performance).
    • I don’t worry about resources and clock speed .

    Internal device


    I want to get maximum performance, so I organize the pipeline:



    Input ( ht_cmd):


    typedef enum logic [1:0] {
      OP_SEARCH,
      OP_INSERT,
      OP_DELETE
    } ht_opcode_t;
    typedef struct packed {
      logic        [KEY_WIDTH-1:0]    key;
      logic        [VALUE_WIDTH-1:0]  value;
      ht_opcode_t                     opcode;
    } ht_command_t;

    Output ( ht_res):


    typedef enum int unsigned {
      SEARCH_FOUND,
      SEARCH_NOT_SUCCESS_NO_ENTRY,
      INSERT_SUCCESS,
      INSERT_SUCCESS_SAME_KEY,
      INSERT_NOT_SUCCESS_TABLE_IS_FULL,
      DELETE_SUCCESS,
      DELETE_NOT_SUCCESS_NO_ENTRY
    } ht_rescode_t;
    typedef struct packed {
      ht_command_t                cmd;
      ht_rescode_t                rescode;
      logic  [BUCKET_WIDTH-1:0]   bucket;
      // valid only for opcode = OP_SEARCH
      logic [VALUE_WIDTH-1:0]     found_value;
    } ht_result_t;

    Note :
    Actually, ht_result_tonly rescodeand worries about us found_value, but looking a little ahead - without other fields it would be more difficult to verify. Anyway, if they are not used in iron, then the synthesizer will cut them out and they will not take up resources.


    What and where is located:


    • All data ( key, value) is stored in memory data_table.
    • Information is stored next to the data for organizing a linked list ( next_ptr, next_ptr_val).
    • The cell number where the beginning of the linked list for the basket is located is stored in head_table.
    • empty_ptr_storagestores numbers of empty lines in data_table.

    Word in head_table:


    typedef struct packed {
      logic [HEAD_PTR_WIDTH-1:0] ptr;
      logic                      ptr_val;
    } head_ram_data_t;

    Word in data_table:


    typedef struct packed {
      logic [KEY_WIDTH-1:0]      key;
      logic [VALUE_WIDTH-1:0]    value;
      logic [HEAD_PTR_WIDTH-1:0] next_ptr;
      logic                      next_ptr_val;
    } ram_data_t;

    Work algorithm:


    • calc_hashuses keyto calculate the hash (its value will be the basket number ( bucket_num)).
    • use bucket_numas the address for head_table: get a pointer to the beginning of the linked list for the basket.
    • multiplexer distributes the job to the appropriate module ( data_table_search, data_table_insert, data_table_delete) (based on opcode).
    • modules that receive the corresponding tasks ( task) for execution (find, insert, delete) and read / write from data_table(run through the linked list). Form the result ht_res.

    Interesting nuances:


    • the memory data_tablehas a two-clock reading delay, therefore, in order to search each cycle, several (five) parallel modules are made, tasks between which are distributed by round-robin.
    • to speed up the search for a free cell is provided empty_ptr_storage. At the time of writing, it was implemented very irrationally: by a vector empty_ptr_mask(its length is the number of table cells data_table), which is stored on registers. And the search for an empty element occurs by "busting" for zero measures (on the combination). In terms of resources and frequency, this is not the best solution.

    It looks like this ( click to enlarge ):


    What I did to avoid errors / simplify debugging


    Before starting to create a hash table, I understood that the project was not easy, so I foreseen some things in advance that accelerated development and simplified debugging.


    SystemVerilog & Coding Style


    SystemVerilog and corporate style coding were used as the HDL language.


    It allows you to conveniently describe structures (see above) and create your own data types, for example, to describe FSM:


    enum int unsigned {
      IDLE_S,
      NO_VALID_HEAD_PTR_S,
      READ_HEAD_S,
      GO_ON_CHAIN_S,
      KEY_MATCH_S,
      ON_TAIL_WITHOUT_MATCH_S
    } state, next_state;

    Streaming Interface (Avalon-ST)


    When developing a conveyor, you need to understand well when it will stop .
    You can come up with a lot of things, but it is best to use ready-made, developed interfaces.


    At work, I write under Altera's FPGAs, so I'm better acquainted with the interfaces that they offer / promote. In this project, I used the Avalon family .


    The team is just a stream of data (one word - one team), so I used the standard Avalon-Streaming (Avalon-ST)(signals data, ready, valid).


    This is how the transaction on Avalon-STat looks readyLatency = 0( taken from the standard ):



    You can see in this picture that the ready signal , which the slave controls, shuts down the transaction and data transfer.


    Dummy hash


    When I wondered how I would make this hash table, I realized that the most difficult thing would be to verify it. Obviously, the most interesting nuances appear when you try to add to a chain where there is already data, or delete from a long chain and so on.


    This means that when an impact on the system is applied, it is necessary to be able to easily generate a key ( key) using a predefined basket number .


    With real hash functions, this is problematic, therefore, an HASH_TYPEequal parameter value is provided "dummy hash".
    If this type of hash is selected, then the basket number is simply the most BUCKET_WIDTHsignificant bits of the key.


    Therefore, when key = 0x92123456, a BUCKET_WIDTHis 8, then bucket_num = 0x92.
    This will make it easy to compose the necessary action for the generation of certain boundary cases.


    Simulation Logging


    Sometimes developers debug their RTL modules directly on the hardware (read, boards) using SignalTap or ChipScope . This approach is not always the fastest and most productive - it requires reassembling the entire project (from 10 minutes to several hours) (and sometimes more than once), a board at hand, a debugger, generating input actions, etc.


    To speed up the development, special simulators are used , such as ModelSim , VCS, Icarus Verilog, etc. They allow you to track the values ​​of all (or selected) signals / variables during debugging by constructing time charts (time-frames). It can take a huge amount of time to debug these charts.


    Solution :
    Log all the actions that occur, for quick viewing through the eyes.


    For this purpose data_table_insert, data_table_delete, data_table_searchI've added features that are printed in the log:


    function void print( string s );
      $display("%08t: %m: %s", $time, s);
    endfunction

    The format displayis similar to printf(you can use it %d, %fetc.):


    • %08t - displays the simulation time (it will be convenient then to jump at the right time).
    • %m- prints the module (hierarchical name) where it happened. (Caution: this takes no arguments!)
    • %s - line print

    Logging FSM transitions:


    function void print_state_transition( );
      string msg;
      if( next_state != state )
        begin
          $sformat( msg, "%s -> %s", state, next_state );
          print( msg );
        end
    endfunction

    We print the reception of a new task:


    function string pdata2str( input ht_pdata_t pdata );                                                      
      string s;                                                                                               
      $sformat( s, "opcode = %s key = 0x%x value = 0x%x head_ptr = 0x%x head_ptr_val = 0x%x",                 
          pdata.cmd.opcode, pdata.cmd.key, pdata.cmd.value, pdata.head_ptr, pdata.head_ptr_val );   
      return s;             
    endfunction                                                                                               
    function void print_new_task( ht_pdata_t pdata );                                                         
      print( pdata2str( pdata ) );                                                                            
    endfunction

    And so on...


    For simulation I use ModelSim. In its log, which is displayed on the screen (and by default it will get to the file transcript), the following lines appear:


    1465: top_tb.dut.d_tbl.sea_eng.g_s_eng[3].search.print: opcode = OP_SEARCH key = 0x04000000 value = 0x0000 head_ptr = 0x000 head_ptr_val = 0x0
    1465: top_tb.dut.d_tbl.sea_eng.g_s_eng[3].search.print: IDLE_S -> NO_VALID_HEAD_PTR_S
    1475: top_tb.dut.d_tbl.sea_eng.g_s_eng[3].search.print: RES: key = 0x04000000 value = 0x0000 rescode = SEARCH_NOT_SUCCESS_NO_ENTRY
    1475: top_tb.dut.d_tbl.sea_eng.g_s_eng[3].search.print: NO_VALID_HEAD_PTR_S -> IDLE_S
    1475: ht_tb.print: IN_MONITOR: key = 0x04000000 value = 0x0000 rescode = SEARCH_NOT_SUCCESS_NO_ENTRY
    1485: top_tb.dut.d_tbl.del_eng.print: opcode = OP_DELETE key = 0x04111111 value = 0x0000 head_ptr = 0x000 head_ptr_val = 0x0
    1485: top_tb.dut.d_tbl.del_eng.print: IDLE_S -> NO_VALID_HEAD_PTR_S
    1495: top_tb.dut.d_tbl.del_eng.print: RES: key = 0x04111111 value = 0x0000 rescode = DELETE_NOT_SUCCESS_NO_ENTRY
    1495: top_tb.dut.d_tbl.del_eng.print: NO_VALID_HEAD_PTR_S -> IDLE_S
    1495: ht_tb.print: IN_MONITOR: key = 0x04111111 value = 0x0000 rescode = DELETE_NOT_SUCCESS_NO_ENTRY
    1505: top_tb.dut.d_tbl.ins_eng.print: opcode = OP_INSERT key = 0x04000000 value = 0xb95f head_ptr = 0x000 head_ptr_val = 0x0
    1505: top_tb.dut.d_tbl.ins_eng.print: IDLE_S -> NO_HEAD_PTR_WR_HEAD_PTR_S
    1515: top_tb.dut.h_tbl.print: addr = 0x04 wr_data.ptr = 0x003 wr_data.ptr_val = 0x1
    1515: top_tb.dut.d_tbl.ins_eng.print: NO_HEAD_PTR_WR_HEAD_PTR_S -> NO_HEAD_PTR_WR_DATA_S
    1525: top_tb.dut.d_tbl.ins_eng.print: WR: addr = 0x003 key = 0x04000000 value = 0xb95f next_ptr = 0x000, next_ptr_val = 0x0
    1525: top_tb.dut.d_tbl.ins_eng.print: RES: key = 0x04000000 value = 0xb95f rescode = INSERT_SUCCESS
    1525: top_tb.dut.d_tbl.ins_eng.print: NO_HEAD_PTR_WR_DATA_S -> IDLE_S

    Such a text log is easy to grep, or to run a search (for example, in vim'e).


    Logs saved a huge amount of time: when submitting simple examples, I knew what order should be written and just looked at the log. If an error occurred, then I went on to analyze the code, not the timing diagrams.


    I advise everyone, as a challenge, to try to debug the RTL code within a week without time-outs (to leave the comfort zone).


    Verification


    If you turn to good literature, for example, SystemVerilog for Verification , then as a good, correct testbench there is the following diagram:



    In this article I’m not going to take bread from Chris Spear, so I won’t talk in detail about what all these components mean.


    Scheme of my testbench:



    top_tb


    Top module.


    DUT (Device / Design Under Test)


    Our test subject is an instance of the module hash_table_top.


    ht_driver


    • contains a mailbox gen2drvthat is for teams to be sent to DUT.
    • as soon as a command appears in this queue, it will be sent to DUT.
    • after sending the command to DUT, it is copied to ht_scoreboard.

    In order to put a command in this mailbox, hierarchical access is used:


    task send_to_dut_c( input ht_command_t c );                                                               
      // using hierarchial access to put command in mailbox                                                   
      env.drv.gen2drv.put( c );                                                                               
    endtask                                                        

    tests


    To test the performance, three tests / input actions were written.


    Macros to simplify the work:


    `define CMD( _OP, _KEY, _VALUE ) cmds.push_back( '{ opcode : _OP, key : _KEY, value : _VALUE } );
    `define CMD_SEARCH( _KEY )         `CMD( OP_SEARCH, _KEY, 0 )
    `define CMD_INSERT( _KEY, _VALUE ) `CMD( OP_INSERT, _KEY, _VALUE )
    `define CMD_INSERT_RAND( _KEY )    `CMD_INSERT( _KEY, $urandom() )
    `define CMD_DELETE( _KEY )         `CMD( OP_DELETE, _KEY, 0 )

    Test example:


    task test_01( );
      ht_command_t cmds[$];
      $display("%m:");
      `CMD_INSERT( 32'h01_00_00_00, 16'h1234 )
      `CMD_INSERT( 32'h01_00_10_00, 16'h1235 )
      `CMD_INSERT_RAND( 32'h01_00_00_00 )
      `CMD_INSERT_RAND( 32'h01_00_00_01 )
      `CMD_DELETE     ( 32'h01_00_00_00 )
      `CMD_INSERT_RAND( 32'h01_00_00_02 )
      `CMD_SEARCH( 32'h01_00_00_00 )
      `CMD_SEARCH( 32'h01_00_00_01 )
      `CMD_SEARCH( 32'h01_00_00_01 )
      `CMD_SEARCH( 32'h01_00_00_03 )
      foreach( cmds[i] )
        begin
          send_to_dut_c( cmds[i] );
        end
    endtask

    ht_monitor


    • monitors the output interface ht_res.
    • as soon as there is a result, it unhooks it and sends it to ht_scoreboard.

    ht_scoreboard


    Checks the correct operation of the DUT.


    It contains:


    • two mailbox, where the teams put and the result ht_driverand, ht_monitorrespectively.
    • reference model of the hash table ref_hash_table.

    Work algorithm:


    • as soon as the result has arrived ht_res, he and the corresponding request are taken out of the queue. Here, it is in our hands that there will be an answer to each team.
    • a function is called checkthat feeds the team into the reference model and compares the results from the DUT and the reference model.
    • if there is no match, then a $errormessage will be printed to the log about this, and a red arrow will appear in the GUI of ModelSim at the moment in time when this happened.

    Coverage


    So, there is already a system (if you want, a framework) that allows you to send various input influences, as well as analyze the correctness of the DUT reaction. In order to make sure that there are no bugs, it is necessary to cover "all possible" options.


    SystemVerilog introduced covergroup and coverpoint objects to evaluate coverage . With their help, you can describe the points where we want to sample, as well as what statistics to collect.


    covergroup cg();
      option.per_instance = 1;
      CMDOP:    coverpoint result_locked.cmd.opcode;
      CMDRES:   coverpoint result_locked.rescode;
      BUCKOCUP: coverpoint bucket_occup[ result_locked.bucket ] {
                  bins zero   = { 0 };
                  bins one    = { 1 };
                  bins two    = { 2 };
                  bins three  = { 3 };
                  bins four   = { 4 };
                  bins other  = { [5:$] };
                }
      CMDOP_BUCKOCUP: cross CMDOP, BUCKOCUP;
      CMDRES_BUCKOCUP: cross CMDRES, BUCKOCUP {
                         // we should ignore SEARCH_FOUND, INSERT_SUCCESS_SAME_KEY, DELETE_SUCCESS 
                         // when in bucket was zero elements, because it's not real situation
                         ignore_bins not_real = binsof( CMDRES   ) intersect{ SEARCH_FOUND, INSERT_SUCCESS_SAME_KEY, DELETE_SUCCESS  } &&
                           binsof( BUCKOCUP ) intersect{ 0 };
                       }
    endgroup

    Explanation:


    • CMDOPand CMDRESkeep track of what the operations ht_cmdand results were ht_res.
    • the array bucket_occupstores the number of elements that were in the basket at the time of the operation.
    • CMDOP_BUCKOCUP- "crosses" the team with the number of elements in the basket: we get the events were the team X, and in the basket, which belonged keyto where there were Y elements.
    • CMDRES_BUCKOCUP - "crosses" the result with the number of items in the basket.

    After the end of the simulation, you can get a report in the ModelSim console:


    coverage save 1.ucdb
    vcover report 1.ucdb -verbose -cvg

    Report:


    COVERGROUP COVERAGE:
    ----------------------------------------------------------------------------------------------------
    Covergroup                                             Metric      Goal/ Status                    
                                                                    At Least                           
    ----------------------------------------------------------------------------------------------------
     TYPE /top_tb/dut/resm/cg                               94.0%        100 Uncovered                 
        Coverpoint cg::CMDOP                               100.0%        100 Covered                   
        Coverpoint cg::CMDRES                               85.7%        100 Uncovered                 
        Coverpoint cg::BUCKOCUP                            100.0%        100 Covered                   
        Cross cg::CMDOP_BUCKOCUP                           100.0%        100 Covered                   
        Cross cg::CMDRES_BUCKOCUP                           84.6%        100 Uncovered                 
     Covergroup instance \/top_tb/dut/resm/cg1              94.0%        100 Uncovered                 
        Coverpoint CMDOP                                   100.0%        100 Covered                   
            covered/total bins:                                 3          3                           
            missing/total bins:                                 0          3                           
            bin auto[OP_SEARCH]                                21          1 Covered                   
            bin auto[OP_INSERT]                                21          1 Covered                   
            bin auto[OP_DELETE]                                18          1 Covered                   
        Coverpoint CMDRES                                   85.7%        100 Uncovered                 
            covered/total bins:                                 6          7                           
            missing/total bins:                                 1          7                           
            bin auto[SEARCH_FOUND]                             12          1 Covered                   
            bin auto[SEARCH_NOT_SUCCESS_NO_ENTRY]               9          1 Covered                   
            bin auto[INSERT_SUCCESS]                           14          1 Covered                   
            bin auto[INSERT_SUCCESS_SAME_KEY]                   7          1 Covered                   
            bin auto[INSERT_NOT_SUCCESS_TABLE_IS_FULL]          0          1 ZERO                      
            bin auto[DELETE_SUCCESS]                           11          1 Covered                   
            bin auto[DELETE_NOT_SUCCESS_NO_ENTRY]               7          1 Covered                   
        Coverpoint BUCKOCUP                                100.0%        100 Covered                   
            covered/total bins:                                 6          6                           
            missing/total bins:                                 0          6                           
            bin zero                                            7          1 Covered                   
            bin one                                            13          1 Covered                   
            bin two                                             9          1 Covered                   
            bin three                                          12          1 Covered                   
            bin four                                            8          1 Covered                   
            bin other                                          11          1 Covered                   
        Cross CMDOP_BUCKOCUP                               100.0%        100 Covered                   
            covered/total bins:                                18         18                           
            missing/total bins:                                 0         18                           
            bin <auto[OP_SEARCH],zero>                          1          1 Covered                   
            bin <auto[OP_INSERT],zero>                          5          1 Covered                             
            bin <auto[OP_SEARCH],one>                           5          1 Covered                   
            ...                
        Cross CMDRES_BUCKOCUP                               84.6%        100 Uncovered                 
            covered/total bins:                                33         39                           
            missing/total bins:                                 6         39                           
            bin <auto[SEARCH_NOT_SUCCESS_NO_ENTRY],zero> 
                                                                1          1 Covered                   
            bin <auto[INSERT_SUCCESS],zero>                     5          1 Covered                   
            bin <auto[DELETE_NOT_SUCCESS_NO_ENTRY],zero> 
                                                                1          1 Covered                                 
            ...

    All possible crossings were received automatically - we did not write anything extra.


    After three tests, it is clear that:


    • OP_INSERTthere were 21 insert commands , and 18 delete commands
    • only once tried to search when there were no items in the basket
    • never had an event INSERT_NOT_SUCCESS_TABLE_IS_FULL

    Eventually


    • There is a system that checks whether the DUT is working correctly by comparing its output with the reference model.
    • There is a small set of tests that generate input effects.
    • There is feedback on the quality of the tests (coverage).
    • simplified debug and simulation in advance using "dummy hash" and logging.

    What was the bug


    It turned out, if you apply this effect:


    `CMD_INSERT_RAND( 32'h05_00_00_00 )
    `CMD_INSERT_RAND( 32'h05_00_00_01 )
    `CMD_DELETE     ( 32'h05_00_00_01 )
    `CMD_INSERT_RAND( 32'h05_00_00_02 )
    `CMD_INSERT_RAND( 32'h05_00_00_03 )

    then this will cause the 0x05000003module to data_table_insertfreeze when inserting a key :


    • constantly reads from address 0x001
    • FSM status statehangs in GO_ON_CHAIN_S(and will never exit again)


    ( click to enlarge )


    The following messages appeared in the log:


    385: top_tb.dut.d_tbl.ins_eng.print: opcode = OP_INSERT key = 0x05000003 value = 0x7e7e head_ptr = 0x000 head_ptr_val = 0x1
    385: top_tb.dut.d_tbl.ins_eng.print: IDLE_S -> READ_HEAD_S
    415: top_tb.dut.d_tbl.ins_eng.print: RD: addr = 0x000 key = 0x05000000 value = 0x1f62 next_ptr = 0x001, next_ptr_val = 0x1
    415: top_tb.dut.d_tbl.ins_eng.print: READ_HEAD_S -> GO_ON_CHAIN_S
    445: top_tb.dut.d_tbl.ins_eng.print: RD: addr = 0x001 key = 0x00000000 value = 0x0000 next_ptr = 0x001, next_ptr_val = 0x1
    475: top_tb.dut.d_tbl.ins_eng.print: RD: addr = 0x001 key = 0x00000000 value = 0x0000 next_ptr = 0x001, next_ptr_val = 0x1
    505: top_tb.dut.d_tbl.ins_eng.print: RD: addr = 0x001 key = 0x00000000 value = 0x0000 next_ptr = 0x001, next_ptr_val = 0x1
    535: top_tb.dut.d_tbl.ins_eng.print: RD: addr = 0x001 key = 0x00000000 value = 0x0000 next_ptr = 0x001, next_ptr_val = 0x1
    ...

    We rewind a little log and analyze it. I cited only those lines that are of interest to us (what was read and written in the table data_table):


     75: top_tb.dut.d_tbl.ins_eng.print: opcode = OP_INSERT key = 0x05000000 value = 0x1f62 head_ptr = 0x000 head_ptr_val = 0x0
     95: top_tb.dut.d_tbl.ins_eng.print: WR: addr = 0x000 key = 0x05000000 value = 0x1f62 next_ptr = 0x000, next_ptr_val = 0x0
    115: top_tb.dut.d_tbl.ins_eng.print: opcode = OP_INSERT key = 0x05000001 value = 0x3ff2 head_ptr = 0x000 head_ptr_val = 0x1
    145: top_tb.dut.d_tbl.ins_eng.print: RD: addr = 0x000 key = 0x05000000 value = 0x1f62 next_ptr = 0x000, next_ptr_val = 0x0
    155: top_tb.dut.d_tbl.ins_eng.print: WR: addr = 0x001 key = 0x05000001 value = 0x3ff2 next_ptr = 0x000, next_ptr_val = 0x0
    165: top_tb.dut.d_tbl.ins_eng.print: WR: addr = 0x000 key = 0x05000000 value = 0x1f62 next_ptr = 0x001, next_ptr_val = 0x1
    185: top_tb.dut.d_tbl.del_eng.print: opcode = OP_DELETE key = 0x05000001 value = 0x0000 head_ptr = 0x000 head_ptr_val = 0x1
    215: top_tb.dut.d_tbl.del_eng.print: RD: addr = 0x000 key = 0x05000000 value = 0x1f62 next_ptr = 0x001, next_ptr_val = 0x1
    245: top_tb.dut.d_tbl.del_eng.print: RD: addr = 0x001 key = 0x05000001 value = 0x3ff2 next_ptr = 0x000, next_ptr_val = 0x0
    255: top_tb.dut.d_tbl.del_eng.print: WR: addr = 0x001 key = 0x05000001 value = 0x3ff2 next_ptr = 0x000, next_ptr_val = 0x0
    265: top_tb.dut.d_tbl.del_eng.print: WR: addr = 0x001 key = 0x00000000 value = 0x0000 next_ptr = 0x000, next_ptr_val = 0x0
    285: top_tb.dut.d_tbl.ins_eng.print: opcode = OP_INSERT key = 0x05000002 value = 0x5429 head_ptr = 0x000 head_ptr_val = 0x1
    315: top_tb.dut.d_tbl.ins_eng.print: RD: addr = 0x000 key = 0x05000000 value = 0x1f62 next_ptr = 0x001, next_ptr_val = 0x1
    345: top_tb.dut.d_tbl.ins_eng.print: RD: addr = 0x001 key = 0x00000000 value = 0x0000 next_ptr = 0x000, next_ptr_val = 0x0
    355: top_tb.dut.d_tbl.ins_eng.print: WR: addr = 0x001 key = 0x05000002 value = 0x5429 next_ptr = 0x000, next_ptr_val = 0x0
    365: top_tb.dut.d_tbl.ins_eng.print: WR: addr = 0x001 key = 0x00000000 value = 0x0000 next_ptr = 0x001, next_ptr_val = 0x1
    385: top_tb.dut.d_tbl.ins_eng.print: opcode = OP_INSERT key = 0x05000003 value = 0x7e7e head_ptr = 0x000 head_ptr_val = 0x1
    415: top_tb.dut.d_tbl.ins_eng.print: RD: addr = 0x000 key = 0x05000000 value = 0x1f62 next_ptr = 0x001, next_ptr_val = 0x1
    445: top_tb.dut.d_tbl.ins_eng.print: RD: addr = 0x001 key = 0x00000000 value = 0x0000 next_ptr = 0x001, next_ptr_val = 0x1
    475: top_tb.dut.d_tbl.ins_eng.print: RD: addr = 0x001 key = 0x00000000 value = 0x0000 next_ptr = 0x001, next_ptr_val = 0x1
    505: top_tb.dut.d_tbl.ins_eng.print: RD: addr = 0x001 key = 0x00000000 value = 0x0000 next_ptr = 0x001, next_ptr_val = 0x1

    You may notice that at the moment of time 255-265 ns strange events occurred: first addr = 0x001one value was written to, and then another.


    This leads to the fact that the table data_tablecontains incorrect data:


    • cell 0x000 indicates that in cell 0x001 there is a continuation of the chain ( next_ptr = 0x001, next_ptr_val = 0x1)
    • cell 0x001 stores the key 0x00000000. It cannot be in the basket 0x05, because used in simulation dummy hash.

    When adding the key 0x05000002, an interesting situation occurs: again, recording occurs twice in one cell 0x001.


    • recording at 355 ns records a new value, because the module empty_ptr_storageshowed that 0x001 is empty (we are lucky that now the algorithm of this module is such that it gives the smallest address that is considered empty)
    • writing on 365 ns updates the cell, which is the previous one in the chain, and according to the module it is 0x001. As a result, 0x001 now points to 0x001.

    When you try to add the key 0x05000003, a loop occurs while passing through the chain. Starting with 445 ns, we will endlessly run along the chain and read the same address.


    What was the mistake


    Obviously, the error is introduced by the module that deleted the data ( data_table_delete).


    At the moment of 255 ns, he had to make the flag next_ptr_valequal to zero in the 0x000 cell , and at the moment of 265 ns write to 0x001 write all zeros. So it was supposed by the code of this module, but this, as we see, did not happen.


    The fact that it was necessary to separately save rd_addr, and rd_datathat we read from the end of the chain, and so we used the values which have just worked.


    Such errors (extra delay by one clock cycle, overwritten data at the wrong time, etc.) are very typical in RTL code. They do not represent much interest, so I do not particularly describe this in the article.


    What mistakes were made (during development)


    Project management is "imperfect"


    The fact is that I did not bring the project to the logical end that I imagined. Why - now I do not remember.


    For example, the READMEproject does not indicate where and how this module was used / tested.


    Compare two phrases:


    1. This project is used in production on a 100G switch .
    2. this project was written for fun a few weekends over beer cans tomato juice.

    If I had clearly indicated that I wrote this module just like that on the weekend and didn’t embed it anywhere, then perhaps third-party developers would not use my module and save time (though, then I would not know that I have a bug , and you would not read this article).


    When I started to figure out where the problem is in the code, I was upset when I saw that the problem is with the variable prev_rd_addr. The block where its value is assigned looks like this:


    always_ff @( posedge clk_i or posedge rst_i )
      if( rst_i )
        prev_rd_addr <= '0;
      else
        if( rd_en_o ) //FIXME
          prev_rd_addr <= rd_addr;

    FIXMEwithout explanation is bad. The extra five minutes of describing the problem will always pay off in the future.


    * Conclusions :


    • if you post something to open source, indicate explicitly how and where it was tested on hardware (if tested). If it has not been tested, then write about it explicitly.
    • if you are conducting a project alone, ask someone (possibly from the side, for example on electronix.ru) to review the code. Perhaps he will give some advice, comments.
    • even if there are problems that you could not solve (for example, you were too lazy), then explicitly indicate this in a file of the type, KNOWN_PROBLEMS/KNOWN_BUGS

    Coverage


    It is easy to see that the points the coating is looking at do not give a complete picture:


    • the “history” is not taken into account: how requests went one after another and what baskets they related to.
    • it doesn’t take into account in which part of the chain the search stopped, or the data was deleted (at the beginning / middle / end).

    We fix :


    Enter the type ht_chain_state_tthat will show where we left off during the operation:


    typedef enum int unsigned {
      NO_CHAIN,
      IN_HEAD,
      IN_MIDDLE,
      IN_TAIL,
      IN_TAIL_NO_MATCH
    } ht_chain_state_t;
    // добавляем его в ht_result_t
    ...
    // only for verification
      ht_chain_state_t            chain_state;
    } ht_result_t;

    In the corresponding modules we add the analysis. For data_table_deleteit looks like this:


    ht_chain_state_t chain_state;
    always_ff @( posedge clk_i or posedge rst_i )
      if( rst_i )
        chain_state <= NO_CHAIN;
      else
        if( state != next_state )
          begin
            case( next_state )
              NO_VALID_HEAD_PTR_S     : chain_state <= NO_CHAIN;
              IN_TAIL_WITHOUT_MATCH_S : chain_state <= IN_TAIL_NO_MATCH;
              KEY_MATCH_IN_HEAD_S     : chain_state <= IN_HEAD;
              KEY_MATCH_IN_MIDDLE_S   : chain_state <= IN_MIDDLE;
              KEY_MATCH_IN_TAIL_S     : chain_state <= IN_TAIL;
              // no default: just keep old value
            endcase
          end
    // кладем в результат, чтобы проанализировать в ```ht_res_monitor```
    assign result_o.chain_state = chain_state;

    Changes to ht_res_monitor:


    
    // история результатов
    ht_result_t result_history [HISTORY_DELAY:1];
    always_ff @( posedge clk_i )
      begin
        if( result_locked_val )
          begin
            result_history[1] <= result_locked;
            for( int i = 2; i <= HISTORY_DELAY; i++ )
              begin
                result_history[i] <= result_history[i-1];
              end
          end
      end
    // 1 в маске обозначает, что корзины (bucket) совпали в истории
    logic [HISTORY_DELAY:1] bucket_hist_mask;
    always_comb
      begin
        for( int i = 1; i <= HISTORY_DELAY; i++ )
          bucket_hist_mask[i] = ( result_history[i].bucket == result_locked.bucket );
      end

    Add to covergroup:


    ...
      CMDOP_D1:   coverpoint result_history[1].cmd.opcode;
      CMDOP_D2:   coverpoint result_history[2].cmd.opcode;
      CMDRES_D1:  coverpoint result_history[1].rescode;
      CMDRES_D2:  coverpoint result_history[2].rescode;
      CHAIN:      coverpoint result_locked.chain_state;
      BUCK_HIST_MASK: coverpoint bucket_hist_mask;
      CMDOP_HISTORY_D2: cross CMDOP_D2, CMDOP_D1, CMDOP, BUCK_HIST_MASK;
      CMDRES_HISTORY_D2: cross CMDRES_D2, CMDRES_D1, CMDRES, BUCK_HIST_MASK {
        ignore_bins not_check_now = binsof( CMDRES    ) intersect{ INSERT_NOT_SUCCESS_TABLE_IS_FULL } ||
                                    binsof( CMDRES_D1 ) intersect{ INSERT_NOT_SUCCESS_TABLE_IS_FULL } ||
                                    binsof( CMDRES_D2 ) intersect{ INSERT_NOT_SUCCESS_TABLE_IS_FULL };
      }
      CMDOP_CHAIN: cross CMDOP, CHAIN {
        ignore_bins insert_in_middle        = binsof( CMDOP ) intersect { OP_INSERT        } &&
                                              binsof( CHAIN ) intersect { IN_MIDDLE        };
        ignore_bins insert_in_tail_no_match = binsof( CMDOP ) intersect { OP_INSERT        } &&
                                              binsof( CHAIN ) intersect { IN_TAIL_NO_MATCH };
      }

    In order to make it clear that I will analyze, I will give an example bin for CMDOP_HISTORY_D2:


    bin <auto[OP_DELETE],auto[OP_SEARCH],auto[OP_SEARCH],auto['b10]>

    Hit will occur if:


    • now the team OP_SEARCH,
    • the previous team was OP_SEARCHin a basket different from the current
    • the team before last was OP_DELETEin the basket, which coincides with the current

    Before all the fixes, I had three simple tests written by hand. Run them:


    Coverpoint cg::CMDOP                               100.0%        100 Covered                   
    Coverpoint cg::CMDRES                               85.7%        100 Uncovered                 
    Coverpoint cg::CMDOP_D1                            100.0%        100 Covered                   
    Coverpoint cg::CMDOP_D2                            100.0%        100 Covered                   
    Coverpoint cg::CMDRES_D1                            85.7%        100 Uncovered                 
    Coverpoint cg::CMDRES_D2                            85.7%        100 Uncovered                 
    Coverpoint cg::CHAIN                               100.0%        100 Covered                   
    Coverpoint cg::BUCK_HIST_MASK                      100.0%        100 Covered                   
    Coverpoint cg::BUCKOCUP                            100.0%        100 Covered                   
    Cross cg::CMDOP_BUCKOCUP                           100.0%        100 Covered                   
    Cross cg::CMDRES_BUCKOCUP                           84.6%        100 Uncovered                 
    Cross cg::CMDOP_HISTORY_D2                          18.5%        100 Uncovered 
      covered/total bins:                                20        108                           
      missing/total bins:                                88        108 
    Cross cg::CMDRES_HISTORY_D2                          3.1%        100 Uncovered                 
      covered/total bins:                                27        864                           
      missing/total bins:                               837        864
    Cross cg::CMDOP_CHAIN                               84.6%        100 Uncovered   

    ( I removed the rest of the log, because it is very large )


    As you can see, the points with HISTORYdisgusting coverage (18.5% and 3.1%). Three hand-written tests failed to produce the desired variety.


    Why do I analyze only the last three results, including the current?
    This number is taken at random. Of course, the more, the better, but the more options you get, the more tests you will need for 100% coverage.


    Where is the line here and what is the most optimal number for analysis - I do not know. Probably, this value should be equal to the module delay in the number of commands in the worst case ( I did not count this number specifically about 5 or 6 ).


    Conclusions :


    • coverage is your friend and ally. Without extra costs, he looks at the points that you have indicated and calculates statistics. You can automatically make combinations between different variables and verify that in all possible combinations these variables appear.
    • than used on lshih situations, visit the module, the greater the likelihood of finding errors.
    • if you have access to the RTL that you are testing (white box), then think about what boundary situations are there. Provide this in bines for analysis.

    No randomized test


    When verifying complex and large systems, you need to make sure that almost all options are covered.
    Manually difficult to make all the options. The way out is randomized testing. But just generating random input data is not a good idea: there is a Constrained Random Verification technique .


    Yes, we give random values, but they are somewhat limited, or they obey some simple (or not so) law to reproduce what you need.


    Let's make a function that gives us a random key in the desired range of baskets and key values:


    function bit [KEY_WIDTH-1:0] gen_rand_key( int min_bucket_num  = 0,
                                               int max_bucket_num = ( 2**BUCKET_WIDTH - 1 ),
                                               int max_key_value  = ( 2**( KEY_WIDTH - BUCKET_WIDTH ) - 1 ) );
      bit [BUCKET_WIDTH-1:0] bucket_num;
      bit [KEY_WIDTH-1:0]    gen_key;
      if( hash_table::HASH_TYPE != "dummy" )
        begin
          $display("%m: hash_type = %s not supported here!", hash_table::HASH_TYPE );
          $fatal();
        end
      bucket_num = $urandom_range( max_bucket_num, min_bucket_num );
      gen_key    = $urandom_range( max_key_value,  0              );
      // replace high bits by bucket_num (is needs in dummy hash)
      gen_key[ KEY_WIDTH - 1 : KEY_WIDTH - BUCKET_WIDTH ] = bucket_num;
      return gen_key;
    endfunction

    Now we will generate random transactions for baskets [0;15]and key values [0;7].


    // testing small amount of buckets with random commands
    task test_05( );
      ht_command_t cmds[$];
      $display("%m:");
      for( int c = 0; c < 5000; c++ )
        begin
          `CMD_SEARCH      ( gen_rand_key( 0, 15, 7 ) )
          `CMD_INSERT_RAND ( gen_rand_key( 0, 15, 7 ) )
          `CMD_DELETE      ( gen_rand_key( 0, 15, 7 ) )
        end
      // взболтать, но не смешивать :)
      cmds.shuffle( );
      foreach( cmds[i] )
        begin
          send_to_dut_c( cmds[i] );
        end
    endtask

    As you can see, even this does not give full coverage:


    Cross cg::CMDOP_HISTORY_D2                          98.1%        100 Uncovered           
      covered/total bins:                               106        108                           
      missing/total bins:                                 2        108   
    Cross cg::CMDRES_HISTORY_D2                         81.1%        100 Uncovered 
      covered/total bins:                               701        864                           
      missing/total bins:                               163        864

    If you look at which bin's were not covered, it becomes clear that you need a separate test for one basket:


    // testing only one bucket with random commands
    task test_06( );
      ht_command_t cmds[$];
      $display("%m:");
      for( int c = 0; c < 1000; c++ )
        begin
          `CMD_SEARCH     ( gen_rand_key( 0, 0, 7 ) )
          `CMD_INSERT_RAND( gen_rand_key( 0, 0, 7 ) )
          `CMD_DELETE     ( gen_rand_key( 0, 0, 7 ) )
        end
      cmds.shuffle( );
      foreach( cmds[i] )
        begin
          send_to_dut_c( cmds[i] );
        end
    endtask

    Result:


    Cross cg::CMDOP_HISTORY_D2                         100.0%        100 Covered                
      covered/total bins:                               108        108                           
      missing/total bins:                                 0        108    
    Cross cg::CMDRES_HISTORY_D2                         99.1%        100 Uncovered 
      covered/total bins:                               857        864                           
      missing/total bins:                                 7        864 

    Additionally added a test that inserts a huge amount of data (in order to create rescode=INSERT_NOT_SUCCESS_TABLE_IS_FULL)


    Note :


    • Of course, here is such a random generation and shuffle , not the most beautiful idea. Effective and easy to implement, but not beautiful. SystemVerilog has a special construction constraint block : it can be used to create meaningful transactions with the necessary restrictions.

    Conclusions :


    • don't forget about randomized testing. Do not submit just random data, it is better to provide several tests where various parameters are randomized.

    No real time problem tracking


    The nuance of this project was that if the cause of the error from the investigation can be separated by a huge simulation time.


    If an error occurs (and its detection occurs on exit ht_res), you will have to “rewind” the simulation time, logs, etc. With large and complex systems this can be a time-consuming process.


    Our task as developers (and verifiers) is to find the error as close as possible in time and place (module) in which it is located.


    There are several repositories in this project:


    • head_table
    • data_table
    • empty_ptr_storage

    This bug led to the fact that at some point in time the data in the table became incorrect.
    If we find out immediately ( from the point of view of simulation ), then it will be easier to fix / debug the bug.


    We derive the rules to which the data in the tables should obey
    head_table:


    • it cannot have two identical pointers with ptr_val
    • pointer c ptr_valcannot point to a cell that is marked as empty

    data_table:


    • key must belong to this chain / basket
    • the chain should not form a ring
    • next_ptr_val must not point to a cell that is marked as empty
    • there should not be cells to which no one leads (neither from head_table, nor from any chain)
    • there should be no readings of cells that are marked as empty

    empty_ptr_storage:


    • modules data_table_*should not ask to clear a cell if it is already empty
    • the module does not have the right to return that the cell is empty if it is not (it has not been cleared before).

    Commentary at any point in time :


    • there is a slight nuance that at some point in time the table may still be inconsistent. (For example, when inserting, a modification of two cells is required - after the first record, the data is not consistent). We will check the table when they do not write to it.

    It was written tables_monitor, which in itself contains a reference model head_table, data_table, empty_ptr.
    A function is also written there that bypasses the tables and checks for the conditions that we described earlier. The file can be completely looked here .


    Roll back the fix for playing bugs and see the log:


    ...
    1195: top_tb.dut.d_tbl.del_eng.print: opcode = OP_DELETE key = 0x01000000 value = 0x0000 head_ptr = 0x001 head_ptr_val = 0x1
    1195: top_tb.dut.d_tbl.del_eng.print: IDLE_S -> READ_HEAD_S
    1225: top_tb.dut.d_tbl.del_eng.print: RD: addr = 0x001 key = 0x01001000 value = 0x1235 next_ptr = 0x002 next_ptr_val = 0x1
    1225: top_tb.dut.d_tbl.del_eng.print: READ_HEAD_S -> GO_ON_CHAIN_S
    1255: top_tb.dut.d_tbl.del_eng.print: RD: addr = 0x002 key = 0x01000001 value = 0x3ff2 next_ptr = 0x000 next_ptr_val = 0x1
    1285: top_tb.dut.d_tbl.del_eng.print: RD: addr = 0x000 key = 0x01000002 value = 0x5429 next_ptr = 0x004 next_ptr_val = 0x1
    1315: top_tb.dut.d_tbl.del_eng.print: RD: addr = 0x004 key = 0x01000000 value = 0x1cc0 next_ptr = 0x000 next_ptr_val = 0x0
    1315: top_tb.dut.d_tbl.del_eng.print: GO_ON_CHAIN_S -> KEY_MATCH_IN_TAIL_S
    1325: top_tb.dut.d_tbl.del_eng.print: WR: addr = 0x004 key = 0x01000000 value = 0x1cc0 next_ptr = 0x000 next_ptr_val = 0x0
    1325: top_tb.dut.d_tbl.del_eng.print: KEY_MATCH_IN_TAIL_S -> CLEAR_RAM_AND_PTR_S
    1335: top_tb.dut.d_tbl.del_eng.print: WR: addr = 0x004 key = 0x00000000 value = 0x0000 next_ptr = 0x000 next_ptr_val = 0x0
    1335: top_tb.dut.d_tbl.del_eng.print: RES: key = 0x01000000 value = 0x0000 rescode = DELETE_SUCCESS chain_state = IN_TAIL
    1335: top_tb.dut.d_tbl.del_eng.print: CLEAR_RAM_AND_PTR_S -> IDLE_S
    1335: ht_tb.print: IN_MONITOR: key = 0x01000000 value = 0x0000 rescode = DELETE_SUCCESS chain_state = IN_TAIL
    1335: top_tb.dut.d_tbl.empty_ptr_storage.print: add_empty_ptr: 0x004
    ** Error: ERROR: addr = 0x004. This addr is empty, but ptr is val
    Time: 1340 ns  Scope: top_tb.tm.check_one_addr File: ../tb/tables_monitor.sv Line: 119
    ** Error: ERROR: addr = 0x004 key=0x00000000 don't match bucket_num = 0x01
    Time: 1340 ns  Scope: top_tb.tm.check_one_addr File: ../tb/tables_monitor.sv Line: 127

    It turned out that the error was already detected on those three manual tests (without the participation of randomized ones)!


    Bypass cells showed:


    • there is an indication of the address 0x004, although the reference model believes that this cell is empty, because at 1335 ns she was released.
    • in cell 0x004 lies key = 0x00000000which cannot belong to basket 0x01 (since dummy hash is used ).

    Conclusions :


    • Think about what situations should not occur when working correctly. Write the rules and check it out! Any deviation from the norm clearly indicates an incorrect (or unexpected) behavior.
    • very often, checks are built into communication interfaces between modules. Then it is better to use a construct like assertion .

    Conclusion


    Fools say that they learn from their own experience, I prefer to learn from the experience of others. (Otto von Bismarck)

    In this article, I talked about what a simple testbench for verifying simple IP cores for ASIC / FPGA can consist of in the context of the hash table project.


    Sources :



    My main mistake, I think, is that I did not finish the project at the time. I had a verification plan , and part of the ideas that we implemented in this article were noted there.


    I hope this article was interesting for beginner RTL developers who want to improve their testbenches.


    Share in the comments what painful experience you had in verifying the RTL code :)


    Thanks for attention! I will be glad to questions and comments in the comments or in a personal mail.


    PS:
    This is my first poston Habréin marketdown on Habré.
    Tell me please:


    • Why is the code not adequately highlighted , although I pointed it out systemverilog? GitHub Flavored Markdown kinda should know this ...
    • why does the tag spoilercause all further text not to be recognized as markdown?