Best way to convert for-loops into an FPGA

橙三吉。 提交于 2019-12-06 11:25:22

To convert a piece of code coming from an algorithm with loops, conditionals, et al, into a synthesizable form of Verilog, you need to translate it in to a FSM.

For example, a for loop to do something similar you are asking for would be:

int sample_I[N], sync_I[N]; // assume 32-bit ints, 2-complement numbers.
int sample_Q[N], sync_Q[N];
int i, corsum, abscorsum = 0;

for (i=0;i<N;i++)
{
  corsum = sample_I[i] * sync_I[i] + sample_Q[i] * sync_Q[i];
  abscorsum += abs(corsum);
}

First, group sentences into time slots, so you can see which actions can be done in the same clock cycle (same state), and assign a state to each slot:


1)

i = 0
abscorsum = 0
goto 2)

2)

if i!=N    
  corsum = sample_I[i] * sync_I[i]
  goto 3)
else
  goto 5)

3)

corsum = corsum + sample_Q[i] * sync_Q[i]
i = i + 1
goto 4)

4)

if (corsum >= 0)
  abscorsum = abscorsum + corsum
else
  abscorsum = abscorsum + (-corsum)
goto 2)

5)

STOP

States 2 and 3 may be merged into a single state, but that would force the synthesizer to infer two multipliers, and besides, the propagation delay of the resulting combinatorial path could be very high, limiting the clock frequency allowable for this design. So, I have split the dot product calculation into two parts, each one them using a single multiplication operation. The synthesizer, if instructed so, can use one multiplier and share it for the two operations, as both happen in different clock cycles.

which translates to this module: http://www.edaplayground.com/x/MEG

Signal rst is used to signal the module to start operation. finish is raised by the module to signal end of operation and validness of output (abscorrsum)

sample_I, sync_i, sample_Q and sync_Q are modeled using memory blocks, with i being the address of the element to read. Most synthesizers will infer block RAMs for these vectors, as each of them is read only in one state, and always with the same address signal.

module corrdotprod #(N=4) (
  input wire clk,
  input wire rst,
  output reg [31:0] i,
  input wire [31:0] sample_i,
  input wire [31:0] sync_i,
  input wire [31:0] sample_q,
  input wire [31:0] sync_q,
  output reg [31:0] abscorrsum,
  output reg finish
);

  parameter
    STATE1 = 3'd1,
    STATE2 = 3'd2,
    STATE3 = 3'd3,
    STATE4 = 3'd4,
    STATE5 = 3'd5;

  reg [31:0] corrsum;
  reg [2:0] state;

  always @(posedge clk) begin
    if (rst == 1'b1) begin
      state <= STATE1;
    end
    else begin
      case (state)
        STATE1:
          begin
            i <= 0;
            abscorrsum <= 0;
            finish <= 1'b0;
            state <= STATE2;
          end
        STATE2:
          begin
            if (i!=N) begin
              corrsum <= sample_i * sync_i; // synthesizer deals with multiplication
              state <= STATE3;
            end
            else begin
              state <= STATE5;
            end
          end
        STATE3:
          begin
            corrsum <= corrsum + sample_q * sync_q; // this product can share the multiplier as above
            i <= i + 1;
            state <= STATE4;
          end
        STATE4:
          begin
            if (corrsum[31] == 1'b0) // remember: 2-complement
              abscorrsum <= abscorrsum + corrsum;
            else
              abscorrsum <= abscorrsum + (~corrsum+1);
            state <= STATE2;
          end
        STATE5:
          finish <= 1'b1;
      endcase      
    end
  end
endmodule

Which can be tested with this simple test bench:

module tb;
  reg clk;
  reg rst;
  reg [31:0] sample_i[0:3];
  reg [31:0] sync_i[0:3];
  reg [31:0] sample_q[0:3];
  reg [31:0] sync_q[0:3];
  wire [31:0] i;
  wire [31:0] abscorrsum;

  corrdotprod #(.N(4)) uut  (clk, rst, i, sample_i[i], sync_i[i], sample_q[i], sync_q[i], abscorrsum, finish);

  integer tb_i, tb_corrsum, tb_abscorrsum;
  initial begin
    $dumpfile ("dump.vcd");
    $dumpvars (0, tb.uut);    

    sample_i[0] = 1;
    sample_i[1] = 2;
    sample_i[2] = 3;
    sample_i[3] = 4;

    sync_i[0] = 2;
    sync_i[1] = -2;
    sync_i[2] = 2;
    sync_i[3] = -2;

    sample_q[0] = -1;
    sample_q[1] = -2;
    sample_q[2] = -3;
    sample_q[3] = -4;

    sync_q[0] = 3;
    sync_q[1] = -3;
    sync_q[2] = 3;
    sync_q[3] = -3;

    clk = 0;

    rst = 1;
    #30;
    rst = 0;
    wait (finish == 1);
    $display ("ABSCORRSUM    = %d\n", abscorrsum);

    // Testing result from module
    tb_abscorrsum = 0;
    for (tb_i = 0; tb_i < 4; tb_i = tb_i + 1) begin
      tb_corrsum = sample_i[tb_i] * sync_i[tb_i] + sample_q[tb_i] * sync_q[tb_i];
      if (tb_corrsum<0)
        tb_corrsum = -tb_corrsum;
      tb_abscorrsum = tb_abscorrsum + tb_corrsum;
    end
    $display ("TB_ABSCORRSUM = %d\n", tb_abscorrsum);

    $finish;
  end

  always begin
    clk = #5 !clk;
  end
endmodule
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!