Verilog. Digital filter on RAM
What if you need to place a large digital filter on an FPGA? And if the board is already divorced? Is the iron old? Is there little space left in the project? In this topic, one of the possible implementations of the digital FIR filter on the Altera Cyclone II EP2C15 FPGA will be considered. In fact, this is a continuation of this theme from the sandbox.
It will be described how to make a shift register on RAM, while reducing LE costs, and how to get a digital filter from this.
How does a filter work? The basic operation is accumulation multiplication. The filter coefficients are multiplied with the values in the shift register and summed. All if you do not go into details. The necessary ingredients are voiced, now let's get down to business.
We believe that we have already decided on the desired type of frequency response of the filter, with the order of the filter, got its coefficients, we know the speed of the input data. Even better if you parameterize these parameters in any way. So try to do it. Here's my implementation of accumulation multiplication:
Why is ADDR_WIDTH = 9? Because the order of the filter is chosen equal to 2 ^ 9 = 512. Firstly, this is done for ease of obtaining frequency from a divider or PLL. Secondly, I had the opportunity to increase the frequency by 512 times, because the sample rate was 16 kHz. But more on that later. Of course not very readable due to parameterization, but you can figure it out.
Read the topic from the sandbox at the link that was at the top? Was there a RAM pattern? This template does not suit us anymore. I couldn’t get that RAM to read / write in one clock cycle. Maybe everything is from not knowledge, but the filter coefficients are now stored in this module:
Approximately 508 coefficients were omitted so as not to catch up with despondency. Why 24 bits, not 16? I like the spectrum better. But this is not important. Changing the odds is not a long task. In addition, you can attach the memory initialization file with the $ readmemb or $ readmemh script after the initial begin.
This is actually the main reason why I write this. Maybe someone will think to himself that he already knew. Maybe something else will think about the author of the good, something about the wheel there.
Here it will be written how to make a shift register with RAM using a wrapper. Probably everyone read in a handbook on their FPGA that RAM can work as a shift register. How? I did it, there’s nothing complicated about it. But why? The Cyclone family is positioned as devices with a memory bias "devices feature embedded memory structures to address the on-chip memory needs of FPGA designs." And you need to be able to use this memory. The problem is solved in two of this: RAM and the wrapper. RAM is similar to the case with storing filter coefficients:
The only thing is that without initializing RAM, it is automatically filled with zeros. By the way, this technique can be used when recording filter coefficients, if they are less than 2 ^ N.
Now the wrapper itself:
The same address is supplied to RAM with coefficients and a shift register. By feedback through RAM from the shift register, the previous value is transmitted to the module, which is recorded at the current address. Thus, the shift is carried out not in one cycle, but for each one value. An input word is written to each zero address.
Why am I persistently using a state machine, even though some states are not involved? We recall what was written by reference at the very beginning. Now this module works twice as fast, which means, all other things being equal, it is also idle half the time. Theoretically, this half can be occupied with something. This can be a conversion of filter coefficients for adaptive filtering, or the operation of a second filter (something like a time slot). There is nothing here and FSM is not needed here, but I still left this atavism. It’s always easier to remove FSM than to enter it.
Here is the top-end file that came out of shimantik:
You can immediately see what can be corrected to become more beautiful.
Now again about what happened. The main minus is this full serial filter. That is, the frequency of the filter must be raised 2 ^ (ADDR_WIDTH) times relative to the speed of the input data. This problem can be solved if the impulse response of the filter is symmetric, but the shift register RAM will have to be divided into two modules, into which 2 addresses will be sent, the values from RAM will be added and multiplied in the mult module, which will have to add another input. Then the frequency will need to be raised 2 ^ (ADDR_WIDTH-1) times.
Sources and project in Quartus 9.0
ifolder.ru/27556340
It will be described how to make a shift register on RAM, while reducing LE costs, and how to get a digital filter from this.
How does a filter work? The basic operation is accumulation multiplication. The filter coefficients are multiplied with the values in the shift register and summed. All if you do not go into details. The necessary ingredients are voiced, now let's get down to business.
Accumulation Multiplication
We believe that we have already decided on the desired type of frequency response of the filter, with the order of the filter, got its coefficients, we know the speed of the input data. Even better if you parameterize these parameters in any way. So try to do it. Here's my implementation of accumulation multiplication:
module mult
#(parameter COEF_WIDTH = 24, parameter DATA_WIDTH = 16, parameter ADDR_WIDTH = 9, parameter MULT_WIDTH = COEF_WIDTH + DATA_WIDTH)
(
input wire clk,
input wire en,
input wire [ (ADDR_WIDTH-1) : 0 ] ad,
input wire signed [ (COEF_WIDTH-1) : 0 ] coe,
input wire signed [ (DATA_WIDTH-1) : 0 ] pip,
output wire signed [ (DATA_WIDTH-1) : 0 ] dout
);
wire signed [(MULT_WIDTH-1) : 0 ] mu = coe * pip;
reg signed [ (MULT_WIDTH-1) : 0 ] rac = {(MULT_WIDTH){1'b0}};
reg signed [ (DATA_WIDTH-1) : 0 ] ro = {DATA_WIDTH{1'b0}};
assign dout = ro;
always @(posedge clk)
if(en)
if(ad == {ADDR_WIDTH{1'b0}})
begin
rac <= mu;
ro <= rac[ (MULT_WIDTH-2) -: (DATA_WIDTH) ];
end
else
rac <= rac + mu;
endmodule
Why is ADDR_WIDTH = 9? Because the order of the filter is chosen equal to 2 ^ 9 = 512. Firstly, this is done for ease of obtaining frequency from a divider or PLL. Secondly, I had the opportunity to increase the frequency by 512 times, because the sample rate was 16 kHz. But more on that later. Of course not very readable due to parameterization, but you can figure it out.
Filter coefficients
Read the topic from the sandbox at the link that was at the top? Was there a RAM pattern? This template does not suit us anymore. I couldn’t get that RAM to read / write in one clock cycle. Maybe everything is from not knowledge, but the filter coefficients are now stored in this module:
module coef
#(parameter DATA_WIDTH=24, parameter ADDR_WIDTH=9)
(
input wire [(DATA_WIDTH-1):0] data,
input wire [(ADDR_WIDTH-1):0] addr,
input wire we,
input wire clk,
output wire [(DATA_WIDTH-1):0] coef_rom
);
reg [DATA_WIDTH-1:0] rom[2**ADDR_WIDTH-1:0];
reg [(DATA_WIDTH-1):0] data_out;
assign coef_rom = data_out;
initial
begin
rom[0 ] = 24'b000000000000000000000000;
rom[1 ] = 24'b000000000000000000000001;
//new year tree
rom[510] = 24'b000000000000000000000001;
rom[511] = 24'b000000000000000000000000;
end
always @ (posedge clk)
begin
data_out <= rom[addr];
if (we)
rom[addr] <= data;
end
endmodule
Approximately 508 coefficients were omitted so as not to catch up with despondency. Why 24 bits, not 16? I like the spectrum better. But this is not important. Changing the odds is not a long task. In addition, you can attach the memory initialization file with the $ readmemb or $ readmemh script after the initial begin.
Shift register
This is actually the main reason why I write this. Maybe someone will think to himself that he already knew. Maybe something else will think about the author of the good, something about the wheel there.
Here it will be written how to make a shift register with RAM using a wrapper. Probably everyone read in a handbook on their FPGA that RAM can work as a shift register. How? I did it, there’s nothing complicated about it. But why? The Cyclone family is positioned as devices with a memory bias "devices feature embedded memory structures to address the on-chip memory needs of FPGA designs." And you need to be able to use this memory. The problem is solved in two of this: RAM and the wrapper. RAM is similar to the case with storing filter coefficients:
module pip
#(parameter DATA_WIDTH=16, parameter ADDR_WIDTH=9)
(
input wire [(DATA_WIDTH-1):0] data,
input wire [(ADDR_WIDTH-1):0] read_addr, write_addr,
input wire we,
input wire clk,
output wire [(DATA_WIDTH-1):0] pip_ram
);
reg [DATA_WIDTH-1:0] ram[2**ADDR_WIDTH-1:0];
reg [(DATA_WIDTH-1):0] data_out;
assign pip_ram = data_out;
always @ (posedge clk)
begin
data_out <= ram[read_addr];
if (we)
ram[write_addr] <= data;
end
endmodule
The only thing is that without initializing RAM, it is automatically filled with zeros. By the way, this technique can be used when recording filter coefficients, if they are less than 2 ^ N.
Now the wrapper itself:
module upr
#(parameter COEF_WIDTH = 24, parameter DATA_WIDTH = 16, parameter ADDR_WIDTH = 9)
(
input wire clk,
input wire en,
input wire [ (DATA_WIDTH-1) : 0 ] ram_upr,
input wire [ (DATA_WIDTH-1) : 0 ] data_in,
output wire [ (DATA_WIDTH-1) : 0 ] upr_ram,
output wire we_ram,
output wire [ (ADDR_WIDTH-1) : 0 ] adr_out
);
assign upr_ram = (r_adr == {ADDR_WIDTH{1'b0}}) ? data_in : ram_upr;
assign we_ram = (r_state == state1) ? 1'b1 : 1'b0;
assign adr_out = r_adr;
reg [ 2 : 0 ] r_state = state0;
localparam state0 = 3'b001,
state1 = 3'b010,
state2 = 3'b100;
reg [ (ADDR_WIDTH-1) : 0 ] r_adr = {ADDR_WIDTH{1'b0}};
always @(posedge clk)
if(en)
begin
case(r_state)
state0:
r_state <= state1;
state1:
r_state <= state1;
state2:
begin
end
endcase
end
always @(posedge clk)
case(r_state)
state0:
r_adr <= {ADDR_WIDTH{1'b0}};
state1:
r_adr <= r_adr + 1'b1;
state2:
begin
end
endcase
endmodule
The same address is supplied to RAM with coefficients and a shift register. By feedback through RAM from the shift register, the previous value is transmitted to the module, which is recorded at the current address. Thus, the shift is carried out not in one cycle, but for each one value. An input word is written to each zero address.
Why am I persistently using a state machine, even though some states are not involved? We recall what was written by reference at the very beginning. Now this module works twice as fast, which means, all other things being equal, it is also idle half the time. Theoretically, this half can be occupied with something. This can be a conversion of filter coefficients for adaptive filtering, or the operation of a second filter (something like a time slot). There is nothing here and FSM is not needed here, but I still left this atavism. It’s always easier to remove FSM than to enter it.
Total
Here is the top-end file that came out of shimantik:
module filtr_ram(
CLK,
D_IN,
MULT
);
input CLK;
input [15:0] D_IN;
output [15:0] MULT;
wire SYNTHESIZED_WIRE_13;
wire [15:0] SYNTHESIZED_WIRE_1;
wire [8:0] SYNTHESIZED_WIRE_14;
wire SYNTHESIZED_WIRE_4;
wire [15:0] SYNTHESIZED_WIRE_15;
wire SYNTHESIZED_WIRE_6;
wire [0:23] SYNTHESIZED_WIRE_8;
wire [23:0] SYNTHESIZED_WIRE_11;
assign SYNTHESIZED_WIRE_4 = 1;
assign SYNTHESIZED_WIRE_6 = 0;
assign SYNTHESIZED_WIRE_8 = 0;
pip b2v_inst(
.we(SYNTHESIZED_WIRE_13),
.clk(CLK),
.data(SYNTHESIZED_WIRE_1),
.read_addr(SYNTHESIZED_WIRE_14),
.write_addr(SYNTHESIZED_WIRE_14),
.pip_ram(SYNTHESIZED_WIRE_15));
defparam b2v_inst.ADDR_WIDTH = 9;
defparam b2v_inst.DATA_WIDTH = 16;
upr b2v_inst1(
.clk(CLK),
.en(SYNTHESIZED_WIRE_4),
.data_in(D_IN),
.ram_upr(SYNTHESIZED_WIRE_15),
.we_ram(SYNTHESIZED_WIRE_13),
.adr_out(SYNTHESIZED_WIRE_14),
.upr_ram(SYNTHESIZED_WIRE_1));
defparam b2v_inst1.ADDR_WIDTH = 9;
defparam b2v_inst1.COEF_WIDTH = 24;
defparam b2v_inst1.DATA_WIDTH = 16;
coef b2v_inst3(
.we(SYNTHESIZED_WIRE_6),
.clk(CLK),
.addr(SYNTHESIZED_WIRE_14),
.data(SYNTHESIZED_WIRE_8),
.coef_rom(SYNTHESIZED_WIRE_11));
defparam b2v_inst3.ADDR_WIDTH = 9;
defparam b2v_inst3.DATA_WIDTH = 24;
mult b2v_inst5(
.clk(CLK),
.en(SYNTHESIZED_WIRE_13),
.ad(SYNTHESIZED_WIRE_14),
.coe(SYNTHESIZED_WIRE_11),
.pip(SYNTHESIZED_WIRE_15),
.dout(MULT));
defparam b2v_inst5.ADDR_WIDTH = 9;
defparam b2v_inst5.COEF_WIDTH = 24;
defparam b2v_inst5.DATA_WIDTH = 16;
endmodule
You can immediately see what can be corrected to become more beautiful.
Now again about what happened. The main minus is this full serial filter. That is, the frequency of the filter must be raised 2 ^ (ADDR_WIDTH) times relative to the speed of the input data. This problem can be solved if the impulse response of the filter is symmetric, but the shift register RAM will have to be divided into two modules, into which 2 addresses will be sent, the values from RAM will be added and multiplied in the mult module, which will have to add another input. Then the frequency will need to be raised 2 ^ (ADDR_WIDTH-1) times.
Sources and project in Quartus 9.0
ifolder.ru/27556340