ubobrov March 30, 2017 at 17:57

Image rotation on FPGA

Half a year ago, I came across this video on the net .

The first thought was that it was very cool and I could never do that again. Time passed, articles were read, methods were studied, and I was looking for examples of the implementation of this, but to my chagrin, there was nothing concrete on the network. Having stumbled upon the calculation of trigonometric functions using CORDIC algorithms once, I decided to try to create my own image rotator on the FPGA.

CORDIC

Thus, CORDIC - an abbreviation of CO ordinate R otation D Igital C omputer.

This is a powerful tool for calculating hyperbolic and trigonometric functions. Most CORDIC algorithms work by the method of sequential approximation and are not very difficult to implement both in high-level programming languages and in HDL. I will not focus on the mathematics of the method, the reader can familiarize themselves with it on the network or via the links below.

In the public domain, I came across this implementation of the CORDIC algorithm in verilog. This kernel works in 2 modes: Rotate and Vector. For our purposes, the Rotate mode is suitable. It allows you to calculate the values of the functions sin and cos from a given angle in radians or degrees. The library can be configured both in the conveyor and in the combination version. The conveyor is suitable for our purposes, it has the largest Fmax . It will give out the sine and cosine values with a delay of 16 measures.

In RTL Viewer, the CORDIC module is displayed consisting of 16 blocks of the same type:

Each of which receives input from the previous one and the outputs are connected to the inputs of the next. It looks like this:

The core of the library works only in the first quadrant, which means that we have to calculate the remaining three by subtracting pi / 2 ourselves and changing the sign.

The approach I have chosen is not very correct because the quality of the rotated image is poor. This is due to the calculation of coordinates on the fly, without the use of additional data buffering and sequential calculation of coordinates for several passes, as is done in Shear .

The first instance of our rotator is the unit for calculating the quadrant and angle of rotation. The rotation angle is incremented every new frame by 1 degree. Upon reaching an angle of 90 degrees, the quadrant changes to the next one in turn, and the angle is either reset to zero or decremented by 1 degree every new frame.

It looks like this:

always @(posedge clk) begin
    if (!nRst) begin
        cordic_angle <= 17'd0;
        cordic_quadrant <= 2'd0;
        rotator_state <= 2'd0;
    end else begin
        if (frame_changed) begin
            case (rotator_state)
            2'd0: begin
                if (cordic_angle[15:8] == 8'd89) begin
                    cordic_quadrant <= cordic_quadrant + 1'b1;
                    rotator_state <= 2'd1;
                end    else
                    cordic_angle[15:8] <= cordic_angle[15:8] + 1'b1;
            end
            2'd1: begin
                if (cordic_angle[15:8] == 8'd1) begin
                    cordic_quadrant <= cordic_quadrant + 1'b1;
                    rotator_state <= 2'd0;
                end    else
                    cordic_angle[15:8] <= cordic_angle[15:8] - 1'b1;
            end
            default: rotator_state <= 2'd0;
            endcase
        end
    end    
end

Next, the angle value is fed to the CORDIC module, which calculates the values of sin and cos.

cordic CORDIC(
    .clk(clk),
    .rst(~nRst),
    .x_i(17'd19896),
    .y_i(16'd0),
    .theta_i(cordic_angle),
    .x_o(COS),
    .y_o(SIN),
    .theta_o(),
    .valid_in(),
    .valid_out()
    );

Further, it is not difficult to guess that the coordinates of each subsequent pixel will be calculated by the formula:

x '= cos (angle) * x - sin (angle) * y;
y '= sin (angle) * x + cos (angle) * y;

If you leave everything in this form, then the rotation will be centered at the origin. This rotation does not suit us, we need the picture to rotate around its axis with the center in the middle of the image. To do this, we need to carry out calculations relative to the center of the image.

parameter PRECISION   = 15;
parameter OUTPUT      = 12;
parameter INPUT       = 12;
parameter OUT_SIZE    = PRECISION + OUTPUT;
parameter BUS_MSB     = OUT_SIZE + 2;
wire [15:0] res_x = RES_X - 1'b1;
wire [15:0] res_y = RES_Y - 1'b1;
assign    dx = {1'b0, RES_X[11:1]};
assign    dy = {1'b0, RES_Y[11:1]};
always @(posedge clk) begin
    delta_x <= dx << PRECISION;
    delta_y <= dy << PRECISION;
еnd

Next, we calculate the values cos (angle) * x, sin (angle) * x, cos (angle) * y, sin (angle) * y.
You can calculate it like this:

always @(posedge clk) begin
    mult_xcos <= (xi - dx) * COS;
    mult_xsin <= (xi - dx) * SIN;
    mult_ycos <= (yi - dy) * COS;
    mult_ysin <= (yi - dy) * SIN;
end

But I decided to use the lpm_mult megafunctions . Their use significantly increases Fmax .

reg signed [BUS_MSB: 0] tmp_x, tmp_y, mult_xsin, mult_xcos, mult_ysin, mult_ycos;
reg signed [BUS_MSB: 0] delta_x = 0, delta_y = 0;
wire signed [11:0] dx, dy;
reg signed [BUS_MSB: 0] mxsin, mxcos, mysin, mycos;
reg signed [11:0] ddx, ddy;
always @(posedge clk) begin
    ddx <= xi - dx;
    ddy <= yi - dy;
end
wire signed [BUS_MSB-1: 0] mult_xcos1;
wire signed [BUS_MSB-1: 0] mult_xsin1;
wire signed [BUS_MSB-1: 0] mult_ycos1;
wire signed [BUS_MSB-1: 0] mult_ysin1;
lpm_mult M1(.clock(clk), .dataa(COS), .datab(ddx), .result(mult_xcos1));
defparam M1.lpm_widtha = 17;
defparam M1.lpm_widthb = 12;
defparam M1.lpm_pipeline = 1;
defparam M1.lpm_representation = "SIGNED";
lpm_mult M2(.clock(clk), .dataa(SIN), .datab(ddx), .result(mult_xsin1));
defparam M2.lpm_widtha = 17;
defparam M2.lpm_widthb = 12;
defparam M2.lpm_pipeline = 1;
defparam M2.lpm_representation = "SIGNED";
lpm_mult M3(.clock(clk), .dataa(COS), .datab(ddy), .result(mult_ycos1));
defparam M3.lpm_widtha = 17;
defparam M3.lpm_widthb = 12;
defparam M3.lpm_pipeline = 1;
defparam M3.lpm_representation = "SIGNED";
lpm_mult M4(.clock(clk), .dataa(SIN), .datab(ddy), .result(mult_ysin1));
defparam M4.lpm_widtha = 17;
defparam M4.lpm_widthb = 12;
defparam M4.lpm_pipeline = 1;
defparam M4.lpm_representation = "SIGNED";

After multiplication, we get the products whose sign we need to change in each of the following quadrants:

always @(posedge clk) begin
    mxcos <= mult_xcos1;
    mxsin <= mult_xsin1;
    mycos <= mult_ycos1;
    mysin <= mult_ysin1;
    case (cordic_quadrant)
    2'd0: begin
        mxsin <= -mult_xsin1;
    end
    2'd1: begin
        mxcos <= -mult_xcos1;
        mxsin <= -mult_xsin1;
        mycos <= -mult_ycos1;
    end
    2'd2: begin
        mxcos <= -mult_xcos1;
        mysin <= -mult_ysin1;
        mycos <= -mult_ycos1;
    end
    2'd3: begin
        mysin <= -mult_ysin1;
    end
    endcase
end

Now the matter is left to the small thing - to calculate the pixel coordinates themselves:

/*
             I          II         III       IV
           +  +       +  -        -  -      -  -
           +  -       +  +        +  -      -  +
*/
always @(posedge clk) begin
    tmp_x <= delta_x + mxcos + mysin;
    tmp_y <= delta_y + mycos + mxsin;
end
wire [15:0] xo = tmp_x[BUS_MSB] ? 12'd0: tmp_x[OUT_SIZE-1:PRECISION];
wire [15:0] yo = tmp_y[BUS_MSB] ? 12'd0: tmp_y[OUT_SIZE-1:PRECISION];

Cut off pixels that go beyond the boundaries of the image:

wire [11:0] xo_t = (xo[11:0] > res_x[11:0]) ? 12'd0 : xo[11:0];
wire [11:0] yo_t = (yo[11:0] > res_y[11:0]) ? 12'd0 : yo[11:0];

And his address in memory:

//addr_out <= yo[11:0] * RES_X + xo[11:0];

And again, use lpm_mult:

reg [11:0] xo_r, yo_r;
always @(posedge clk) begin
	xo_r <= xo_t;
	yo_r <= yo_t;
end
wire [28:0] result;
lpm_mult M5(.clock(clk), .dataa(RES_X[11:0]), .datab(yo_r[11:0]), .result(result));
	defparam M5.lpm_widtha = 12;
	defparam M5.lpm_widthb = 12;
	defparam M5.lpm_pipeline = 1;
	defparam M5.lpm_representation = "UNSIGNED";
always @(posedge clk) addr_out <= result[22:0] + xo_r[11:0];

That's all, actually!

Method Issues

As I mentioned above, this approach has many drawbacks. Due to the calculation error, holes appear in the output picture; the larger the rotation angle, the more holes. This also happens because the dimensions of the new image are larger than the original. This effect is started by aliasing and there are methods to deal with it, for example, the median filter discussed in my previous article .

Before each subsequent frame, it would not hurt to clean the memory from the previous frame so that a new image is obtained on a clean background, but it takes time and you have to skip one frame.

The only advantage of the method is its ease of implementation and processing speed. coordinates are calculated on the fly.

That's what came out of it

Project Archive in Quartus

→ Link to Yandex disk.

Tags:

Image rotation on FPGA

CORDIC

Method Issues

That's what came out of it

Related Links

Project Archive in Quartus

Also popular now: