# EE2071 Micro electronic workshop: 

 Gate level systolic multiplier

Example of chip to implement our multiplier

## 0 Purpose

This laboratory report is our introduction to the principle of manual synthesis for a digital system.
We will try to reproduce a design from Verilog Hardware Description Language (in Register Transfer Level) simulated with the Cadence Simucad Silos software to a graphical gate level using Altera Maxplus. We are going to meet this challenge by designing a systolic multiplier with two 8bits inputs and a 16bits output. This multiplier is designed for a digital signal processor and has thus to be able to load 2 input and keep 1 to multiply different values to it.

In a first time we are going to show bloc by bloc the internal components of our design and in a second time we are going to implement the gate level equivalent by reproducing the same behaviour of these components.

## 1 Verilog multiplier

First of all, let's see how this component is interfaced:


## Details:

As said previously, A and B are on 8 bits, C is on 16 bits
The signal $\operatorname{Im}$ (Input mode) allows selecting if we want to load 1 or 2 inputs ( A is not loaded if $\operatorname{Im}=0$ ).
The output C is in high impedance state when $\mathrm{CEz}=1$.
The output Halt is set when the multiplier has completely finished its calculation and is thus ready.

## Internal components



Verilog codes:

The 3 next components were quite straight forward，I thus detail them too much．

| 曾 regA．v | $\square \square$ |
| :---: | :---: |
| 1 | module regatwire＿A ，f，loadi）； |
| 2 | output［7： $\mathrm{e}^{\text {］}}$ wire＿A； |
| 3 | input［7：0］ A ； |
| 4 | input loadA； |
| 5 | reg［7：0］wire＿A； |
| 6 | alwaysa（negedge load $)$wire＿A |
| 7 |  |
| 8 |  |
|  | endmodule |


| 曾 b＿piso．v | －$\square$－ |
| :---: | :---: |
| 1 | module b＿piso（wire＿B，B，1oadB，clkB）； |
| 2 | output wire＿B； |
| 3 | reg wire＿B； |
| 4 | input［7：0］B； |
| 5 | input loadB，clkB； |
| 6 | reg［7：0］B＿reg； |
| 8 |  |
| 9 | always（negedge loadB）begin |
| 10 | B＿reg＝B； |
| 11 | wire＿B＝B＿reg［0］； |
| 12 | end |
| 13 |  |
| 14 | alwaysa（posedge clkB）begin |
| 15 | B＿reg［6：0］＝B＿reg［7：1］； |
| 16 | wire＿B＝B＿reg［0］； |
| 17 | end |
| 18 | endmodule |


| 曾 REG＿mult．v |
| :--- |
| 1 |
| 2 |$\quad$ module REG＿mult（wire＿mult，wire＿A，wire＿B）；

In the beginning, I designed a really simple RTL multiplier, but I never succeeded to make it work, I still don't know why. I thus decided to start again and I did it in the same principle than the gate level one:
=> It's composed by an adder block (a specialised full adder) where the carry out is fed back in the carry in at the next clock pulse. This is a implicit way to ripple it quickly and efficiently.

```
单 adder_block.v 
    module adder_block(So , mult,Si,Resetz,clkP);
        output So;
        imput mult, Si, Resetz, clkP;
        reg carry, So;
    alwaysd(negedge Resetz) begin
        carry = 0;
        So = 0;
    end
    always@(posedge clkP) if(Resetz)
    {carry, So} = Si + mult + carry;
    endmodule
```

...and a module that instantiate it 8 times:

module C_SIPO(C , HaltP, wire_S,Resetz,CEz,clkP); output [15:0]C;
output HaltP;
input wire_S, clkP, Resetz, CEz;
reg [15: 0] C, regC;
reg HaltP;
initial begin
regC $=0$;
$\mathrm{C}=\mathrm{B}$;
HaltP = 0;
end
alwaysd(negedge Resetz) begin
regc = 16'h8090;
HaltP = 0;
end
always@(posedge clkP)
if(Resetz) begin
regC $=$ regC>>1;
\#1 regC[15] = wire_s;
HaltP = regC[0];
end
always(CEz or regC)
begin if(?CEz)
C = regC;
else
C = 16'hzzzz;
end
endmodule

```
            module CNTR(loadA, loadB, cez, clkP, Halt, clkB, //outputs
                Im, clk, CEz, Resetz, HaltP);
        output loadA, loadB, cez, clkP, Halt, clkB;
        imput clk, Resetz, Im, CEz, HaltP;
        reg Halt, halt_tmp;
assign loadA = Resetz * Im,
    loadB = Resetz,
    clkP = ~Halt * clk,
    clkB = ~
    cez = CEz;
initial begin
    Halt = 0;
    halt_tmp = 0;
end
always@(negedge Resetz) Halt = 0;
always@(posedge clkP) halt_tmp = HaltP;
always@(negedge clkP) Halt = halt_tmp;
endmodule
```

```
module Systolic_multiplier(C,Halt , A,B,Im,Resetz,CEz,clk);
    input Im, clk, CEz, Resetz;
    input [7:0] A, B;
    output [15:0] C;
    output Halt;
    wire loadA, loadB, cez, clkP, clkB, HaltP, wire_B, So;
    wire [7:6] wire_A, wire_mult;
CNTR inst1(loadA,loadB,cez,clkP, Halt,clkB, // outputs
                    Im,clk,CEz,Resetz,HaltP); // inputs
regA inst2(wire_A , A,loadA);
b_piso inst3(wire_B , B,loadB,clkB);
REG_mult inst4(wire_mult , wire_f,wire_B);
summ inst5(So , wire_mult,Resetz,clkP);
C_SIPO inst6(C,HaltP , So,Resetz,cez,clkP);
endmodule
```

testfile: (this test file is simplified to be able monitoring the output in the result text file, see next page)

```
## Systolic_multiplier_test.v
module Systolic_multiplier_test;
    reg [7:0] A, B;
    reg Im,Resetz,CEz,clk;
    wire [15:0] C;
    wire Halt;
Systolic_multiplier inst(C, Halt, A, B, Im, Resetz, CEz, clk);
initial begin
$monitor($time, " clk = %b, c = %b, Halt = %b", clk, c, Halt);
    clk = 6;
    Im = 1;
    CEz = 0;
    Resetz = 1;
    A = -127;
    B = -127; // result expected : 3F01
    #1 Resetz = 0;
    #1 Resetz = 1;
    wait(Halt);
    Im = 0;
    B = 127; // result expected : CbFF
    #1 Resetz = 0;
    #1 Resetz = 1;
    wait(Halt);
    Im = 1;
    A = 127;
    B = -127; // result expected : CbFF
    #1 Resetz = 0;
    #1 Resetz = 1;
    wait(Halt);
    Im = 0;
    B = 127; // result expected : 3F01
    #1 Resetz = 0;
    #1 Resetz = 1;
    wait(Halt);
    #30 $finish;
end
always #5 clk = ~clk;
endmodule
```

For this result text file I thus enabled the chip output (no state $Z$ ) to be able to see its evolution:

1 clk $=0, \mathrm{C}=1000000000000000$, Halt $=0$ $5 \mathrm{clk}=1, \mathrm{C}=0100000000000000$, Halt $=0$
$6 \mathrm{clk}=1, \mathrm{C}=1100000000000000$, Halt $=0$
10 clk $=0, C=1100000000000000$, Halt $=0$
15 clk $=1, C=0110000000000000$, Halt $=0$
$20 \mathrm{clk}=0, \mathrm{C}=0110000000000000$, Halt $=0$
25 clk $=1, C=0011000000000000$, Halt $=0$
$30 \mathrm{clk}=0, \mathrm{C}=0011000000000000$, Halt $=0$
35 clk $=1, C=0001100000000000$, Halt $=0$
40 clk $=0, C=0001100000000000$, Halt $=0$
45 clk $=1, C=0000110000000000$, Halt $=0$
50 clk $=0, C=0000110000000000$, Halt $=0$
55 clk $=1, C=0000011000000000$, Halt $=0$
60 clk $=0, C=0000011000000000$, Halt $=0$
65 clk $=1, C=0000001100000000$, Halt $=0$
70 clk $=0, C=0000001100000000$, Halt $=0$
75 clk $=1, c=0000000110000000$, Halt $=0$
80 clk $=0, C=0000000110000000$, Halt $=0$
$85 \mathrm{clk}=1, \mathrm{c}=0000000011000000$, Halt $=0$
86 clk $=1, C=1000000011000000$, Halt $=0$
90 clk $=0, C=1000000011000000$, Halt $=0$
$95 \mathrm{clk}=1, \mathrm{c}=0100000001100000$, Halt $=0$
96 clk $=1, C=1100000001100000$, Halt $=0$
100 clk $=0, C=1100000001100000$, Halt $=0$
105 clk $=1, C=0110000000110000$, Halt $=0$
106 clk $=1, C=1110000000110000$, Halt $=0$ $110 \mathrm{clk}=0, \mathrm{C}=1110000000110000$, Halt $=0$ 115 clk $=1, C=0111000000011000$, Halt $=0$ 116 clk $=1, \mathrm{C}=1111000000011000$, Halt $=0$ $120 \mathrm{clk}=0, \mathrm{C}=1111000000011000$, Halt $=0$ 125 clk $=1, C=0111100000001100$, Halt $=0$ $126 \mathrm{clk}=1, \mathrm{C}=1111100000001100$, Halt $=0$ 130 clk $=0, C=1111100000001100$, Halt $=0$ 135 clk $=1, C=0111110000000110$, Halt $=0$ $136 \mathrm{clk}=1, \mathrm{c}=1111110000000110$, Halt $=0$ 140 clk $=0, C=1111110000000110$, Halt $=0$ 145 clk $=1, \mathrm{C}=0111111000000011$, Halt $=0$ 150 clk $=0, C=0111111000000011$, Halt $=0$ 155 clk $=1, C=0011111100000001$, Halt $=0$ $160 \mathrm{clk}=0, \mathrm{C}=0011111100000001$, Halt $=1$ => 30F1
161 clk $=0, C=1000000000000000$, Halt $=0$
165 clk $=1, C=0100000000000000$, Halt $=0$
$166 \mathrm{clk}=1, \mathrm{C}=1100000000000000$, Halt $=0$
170 clk $=0, C=1100000000000000$, Halt $=0$ 175 clk $=1, \mathrm{C}=0110000000000000$, Halt $=0$ 176 clk $=1, \mathrm{c}=1110000000000000$, Halt $=0$ 180 clk $=0, C=1110000000000000$, Halt $=0$ $185 \mathrm{clk}=1, \mathrm{C}=0111000000000000$, Halt $=0$ 186 clk $=1, \mathrm{C}=1111000000000000$, Halt $=0$ 190 clk $=0, C=1111000000000000$, Halt $=0$ 195 clk $=1, \mathrm{C}=0111100000000000$, Halt $=0$ 196 clk $=1, \mathrm{C}=1111100000000000$, Halt $=0$ 200 clk $=0, C=1111100000000000$, Halt $=0$ 205 clk $=1, C=0111110000000000$, Halt $=0$ 206 clk $=1, C=1111110000000000$, Halt $=0$ 210 clk $=0, \mathrm{C}=1111110000000000$, Halt $=0$ 215 clk $=1, \mathrm{C}=0111111000000000$, Halt $=0$ 216 clk $=1, C=1111111000000000$, Halt $=0$ 220 clk $=0, C=1111111000000000$, Halt $=0$ 225 clk $=1, C=0111111100000000$, Halt $=0$ 226 clk $=1, \mathrm{C}=1111111100000000$, Halt $=0$ 230 clk $=0, C=1111111100000000$, Halt $=0$ 235 clk $=1, C=0111111110000000$, Halt $=0$ $236 \mathrm{clk}=1, \mathrm{C}=1111111110000000$, Halt $=0$ 240 clk $=0, C=1111111110000000$, Halt $=0$ 245 clk $=1, \mathrm{C}=0111111111000000$, Halt $=0$ 250 clk $=0, \mathrm{c}=0111111111000000$, Halt $=0$ 255 clk $=1, \mathrm{C}=0011111111100000$, Halt $=0$ 260 clk $=0, C=0011111111100000$, Halt $=0$ $265 \mathrm{clk}=1, \mathrm{C}=0001111111110000$, Halt $=0$ 270 clk $=0, C=0001111111110000$, Halt $=0$ 275 clk $=1, \mathrm{C}=0000111111111000$, Halt $=0$ 280 clk $=0, C=0000111111111000$, Halt $=0$ 285 clk $=1, \mathrm{C}=0000011111111100$, Halt $=0$ 290 clk $=0, C=0000011111111100$, Halt $=0$ 295 clk $=1, \mathrm{C}=0000001111111110$, Halt $=0$ 300 clk $=0, C=0000001111111110$, Halt $=0$ 305 clk $=1, C=0000000111111111$, Halt $=0$ 306 clk $=1, C=1000000111111111$, Halt $=0$ 310 clk $=0, C=1000000111111111$, Halt $=0$ 315 clk $=1, C=0100000011111111$, Halt $=0$ 316 clk $=1, \quad \mathrm{C}=1100000011111111$, Halt $=0$ $320 \mathrm{clk}=0, \mathrm{C}=1100000011111111$, Halt $=1$ => C0FF

321 clk $=0, C=1000000000000000$, Halt $=0$ 325 clk $=1, C=0100000000000000$, Halt $=0$ 326 clk $=1, \mathrm{C}=1100000000000000$, Halt $=0$ 330 clk $=0, C=1100000000000000$, Halt $=0$ 335 clk $=1, \quad \mathrm{C}=0110000000000000$, Halt $=0$ 336 clk $=1, \mathrm{C}=1110000000000000$, Halt $=0$ 340 clk $=0, C=1110000000000000$, Halt $=0$ 345 clk $=1, C=0111000000000000$, Halt $=0$ 346 clk $=1, C=1111000000000000$, Halt $=0$ 350 clk $=0, C=1111000000000000$, Halt $=0$ 355 clk $=1, \mathrm{C}=0111100000000000$, Halt $=0$ 356 clk $=1, C=1111100000000000$, Halt $=0$ 360 clk $=0, C=1111100000000000$, Halt $=0$ $365 \mathrm{clk}=1, \mathrm{C}=0111110000000000$, Halt $=0$ 366 clk $=1, \mathrm{C}=1111110000000000$, Halt $=0$ 370 clk $=0, C=1111110000000000$, Halt $=0$ 375 clk $=1, \mathrm{C}=0111111000000000$, Halt $=0$ 376 clk $=1, \quad \mathrm{C}=1111111000000000$, Halt $=0$ 380 clk $=0, C=1111111000000000$, Halt $=0$ 385 clk $=1, C=0111111100000000$, Halt $=0$ 386 clk $=1, \mathrm{C}=1111111100000000$, Halt $=0$ 390 clk $=0, C=1111111100000000$, Halt $=0$ 395 clk $=1, C=0111111110000000$, Halt $=0$ 396 clk $=1, \mathrm{C}=1111111110000000$, Halt $=0$ 400 clk $=0, C=1111111110000000$, Halt $=0$ 405 clk $=1, C=0111111111000000$, Halt $=0$ $410 \mathrm{clk}=0, \mathrm{c}=0111111111000000$, Halt $=0$ 415 clk $=1, C=0011111111100000$, Halt $=0$ $420 \mathrm{clk}=0, \mathrm{C}=0011111111100000$, Halt $=0$ 425 clk $=1, \mathrm{C}=0001111111110000$, Halt $=0$ 430 clk $=0, C=0001111111110000$, Halt $=0$ $435 \mathrm{clk}=1, \mathrm{c}=0000111111111000$, Halt $=0$ 440 clk $=0, C=0000111111111000$, Halt $=0$ 445 clk $=1, C=0000011111111100$, Halt $=0$ 450 clk $=0, C=0000011111111100$, Halt $=0$ 455 clk $=1, C=0000001111111110$, Halt $=0$ 460 clk $=0, C=0000001111111110$, Halt $=0$ $465 \mathrm{clk}=1, \mathrm{C}=0000000111111111$, Halt $=0$ 466 clk $=1, C=1000000111111111$, Halt $=0$ 470 clk $=0, C=1000000111111111$, Halt $=0$ 475 clk $=1, C=0100000011111111$, Halt $=0$ 476 clk $=1, C=1100000011111111$, Halt $=0$ 480 clk $=0, \mathrm{C}=1100000011111111$, Halt $=1$ => C0FF 481 clk $=0, C=11000000000000000$, Halt $=0$ 485 clk $=1, C=0100000000000000$, Halt $=0$ 486 clk $=1, C=1100000000000000$, Halt $=0$ 490 clk $=0, C=1100000000000000$, Halt $=0$ 495 clk $=1, \mathrm{C}=0110000000000000$, Halt $=0$ 500 clk $=0, C=0110000000000000$, Halt $=0$ 505 clk $=1, C=0011000000000000$, Halt $=0$ 510 clk $=0, \mathrm{C}=0011000000000000$, Halt $=0$ 515 clk $=1, C=0001100000000000$, Halt $=0$ 520 clk $=0, C=0001100000000000$, Halt $=0$ 525 clk $=1, \mathrm{c}=0000110000000000$, Halt $=0$ 530 clk $=0, C=0000110000000000$, Halt $=0$ 535 clk $=1, \mathrm{C}=0000011000000000$, Halt $=0$ 540 clk $=0, C=0000011000000000$, Halt $=0$ 545 clk $=1, \mathrm{C}=0000001100000000$, Halt $=0$ 550 clk $=0, \mathrm{c}=0000001100000000$, Halt $=0$ 555 clk $=1, C=0000000110000000$, Halt $=0$ 560 clk $=0, C=0000000110000000$, Halt $=0$ 565 clk $=1, \mathrm{c}=0000000011000000$, Halt $=0$ 566 clk $=1, \mathrm{C}=1000000011000000$, Halt $=0$ $570 \mathrm{clk}=0, \mathrm{C}=1000000011000000$, Halt $=0$ 575 clk $=1, C=0100000001100000$, Halt $=0$ 576 clk $=1, C=1100000001100000$, Halt $=0$ 580 clk $=0, C=1100000001100000$, Halt $=0$ 585 clk $=1, C=0110000000110000$, Halt $=0$ $586 \mathrm{clk}=1, \mathrm{C}=1110000000110000$, Halt $=0$ 590 clk $=0, \mathrm{c}=1110000000110000$, Halt $=0$ $595 \mathrm{clk}=1, \mathrm{C}=0111000000011000$, Halt $=0$ $596 \mathrm{clk}=1, \mathrm{C}=1111000000011000$, Halt $=0$ $600 \mathrm{clk}=0, \mathrm{c}=1111000000011000$, Halt $=0$ $605 \mathrm{clk}=1, \mathrm{C}=0111100000001100$, Halt $=0$ $606 \mathrm{clk}=1, \mathrm{C}=1111100000001100$, Halt $=0$ 610 clk $=0, C=1111100000001100$, Halt $=0$ $615 \mathrm{clk}=1, \mathrm{C}=0111110000000110$, Halt $=0$ 616 clk $=1, C=1111110000000110$, Halt $=0$ $620 \mathrm{clk}=0, \mathrm{C}=1111110000000110$, Halt $=0$ $625 \mathrm{clk}=1, \mathrm{c}=0111111000000011$, Halt $=0$ $630 \mathrm{clk}=0, \mathrm{c}=0111111000000011$, Halt $=0$ 635 clk $=1, C=0011111100000001$, Halt $=0$ => 30F1
...But as the multiplier is supposed to disable its output when the result is not ready, I changed a little the test file to do it (by setting CEz at 1 when the device is busy, which place C in a high impedance state).

Systolic_multiplier_test.v

```
module Systolic_multiplier_test;
    reg [7:0] A, B;
    reg Im,Resetz,CEz,clk;
    wire [15:0] C;
    wire Halt;
Systolic_multiplier inst(C, Halt, A, B, Im, Resetz, CEz, clk);
initial begin
$monitor($time, " clk = %b, c = %b, Halt = %b", clk, c, Halt);
```

        c1k \(=0\);
        Im = 1;
        CEz = 1;
        Resetz = 1;
        \(\mathrm{A}=-127\);
        B = -127; // result expected : 3F 01
        \#1 Resetz = 0;
        \#1 Resetz = 1;
        ( \({ }^{(H a l t}\) ) CEz = 0;
        \#9 CEz = 1;
        Im = 0;
        \(\mathrm{B}=127\); // result expected : C@FF
        \#1 Resetz = 日;
        \#1 Resetz = 1;
        (a(Halt) CEz = 0;
        \#9 CEz = 1;
        Im = 1;
        \(\mathrm{A}=127\);
        B = -127; \(\quad / /\) result expected : Ce日F
        \#1 Resetz = 0;
        \#1 Resetz = 1;
        ( \({ }^{(H)}\) Halt) CEz \(=0\);
        \#9 CEz = 1;
        Im = 0;
        B = 127; // result expected : 3F 01
        \#1 Resetz = 0;
        \#1 Resetz = 1;
        //wait(Halt);
        ( \({ }^{(\text {(Halt }) ~ C E z ~}=0\);
        \#9 \$finish;
    end
always \#5 clk = ~clk;
endmodule

Chronogram result for $1^{\text {st }}$ multiplication: $-127 \times-127=16129(=0 \times 81 \times 0 \times 81=0 \times 3 \mathrm{~F} 01)$


Chronogram result for $2^{\text {nd }}$ multiplication: $-127 \times 127=-16129(=0 \times 81 \times 0 \times 7 \mathrm{~F}=0 \times C 0 \mathrm{FF})$


Chronogram result for $3^{\text {rd }}$ multiplication: $127 \times-127=-16129(=0 \times 7 \mathrm{~F} \times 0 \times 81=0 \times C 0 F F)$


Chronogram result for $4^{\text {th }}$ multiplication: $127 \times 127=16129(=0 \times 7 \mathrm{~F} \times 0 \times 7 \mathrm{~F}=0 \times 3 \mathrm{~F} 01)$


## 2 Gate level multiplier

Now we have seen that the Verilog design is efficient. We are thus going to stick to its principle but all the virtual time management of Silos becomes sometime a little more complex in gate level considering that all the gate delays are not zero and all the behavioural description are not always easy to translate (synthesize).
"RegA" block diagram:



"RegA" test chronogram:

| Name: | Value: | 100.0us | 200.0us | 300.0us | 400.0us | 500.0us | 600.0us | 700.0us | 800.0us | 900.0us |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| $\underline{\square} \sim$ loadA |  |  |  |  |  |  |  |  |  |  |
| \#- A [8..1] | B 01100110 | 00000000 |  |  | 01100110 |  |  |  |  |  |
| $\cdots$ wire_A[8..1] | B 00000000 | 00000000 |  |  | , | 01100110 |  |  |  |  |

The result expected is obtained: at the clock edge we get the input in output.
"RegA" time analysis:

| ¢9\% Timing Analyzer $\square$ |  |  |  |  |  |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Delay Matrix <br> Destination |  |  |  |  |  |  |  |  |  |  |
|  |  |  |  |  |  |  |  |  |  |  |
|  |  | wire_A. 1 | wire_A.2 | wire_A3 | wire_A.4 | wire_A5 | wire_A6 | wire_A7 | wire_A8 |  |
|  | A1 |  |  |  |  |  |  |  |  |  |
|  | A2 |  |  |  |  |  |  |  |  |  |
|  | A3 |  |  |  |  |  |  |  |  |  |
| $\bigcirc$ | A4 |  |  |  |  |  |  |  |  |  |
| ${ }^{4}$ | A5 |  |  |  |  |  |  |  |  |  |
| c | A6 |  |  |  |  |  |  |  |  |  |
| e | A7 |  |  |  |  |  |  |  |  |  |
|  | A8 |  |  |  |  |  |  |  |  |  |
|  | loadA | 6.5ns | 6.5ns | 6.5ns | 6.5ns | 6.5ns | 6.5ns | 6.5ns | 6.5ns | * |
| 4 |  |  |  |  |  |  |  |  |  | - |
|  |  |  |  |  |  |  |  |  |  |  |
|  |  |  |  |  |  |  |  |  |  |  |

We keep these results for later, to see the speed limit of our final component.
"b_piso" block diagram: (parallel in serial out)

"b_piso" test chronogram:


Note: as we can see, I've added 2 extra outputs to be able to monitoring the DFF values and the internal clock (clk_int). The sign Bit is correctly propagated and the output of the module is the LSB as wanted.

The internal clock was not a piece of cake to create, it seems to be simple but it's a RS flip flop connected with another logical bloc that allows to disable the clock when it's loading.
"b_piso" time analysis: for some reason the analyser doesn't want to simulate this component:

...but nothing can stop me!
=> I zoomed (a lot) on all transitions of the "b_piso" test chronogram and I found the longest time delay:


It thus seems that the longest time delay is 13.6 ns
"REG_mult" block diagram:

"REG_mult" test chronogram:


Note: This component, completely combinatorial, is definitely the simplest of the multiplier, but the paradox is that it's the only one to perform a real multiplication!
"REG_mult" time analysis


As explained in the Verilog design, to make the sum, I used a full adder and I've duplicated it with synchronised feed back of the carry (by a DFF).
'fa" (Full Adder) block diagram:

"fa" (Full Adder) time analysis:

| a) Timing Analyzer |  |  |  | $\square \square$ |
| :---: | :---: | :---: | :---: | :---: |
| Delay Matrix <br> Destination |  |  |  |  |
| 10$u$1$c$$e$$e$1 |  | c_out | s |  |
|  | c_in | $6.0 n s$ |  | 6.Ons |
|  | input | $6.0 n s$ |  | $6.0 n 5$ |
|  | inter_val | $6.0 n s$ |  | $6.0 n 5$ |
|  |  |  |  |  |
|  | 1 |  |  |  |
| $50$$100$ |  |  |  |  |
| Start |  | Stop |  | List Paths |

"summ" block diagram: (here is the component that effectuate the instantiation of the full adder)

"fa" (Full Adder) test chronogram:


To try making the test more readable I used the "group" function that allows taking several pins to make a bus. I displayed the value of the inputs in binary to see the number of " 1 " in the bus created and in output, the display is in decimal to se the result directly.
"summ" test chronogram:


In the extra output called "S" we obtain half of the sum of the input "wire_mult" and the previous "S" state. => half because of the shift action (which gives the entire part of the half to be more accurate).
We thus obtain as expected:

$$
\begin{aligned}
& 0 \ddots \\
&+4 \\
& \hline 4 \ddots
\end{aligned}
$$

SHIFT $=>4 / 2=2$
$+4$
6
SHIFT => 6/2 $=3$

$$
\frac{+4}{7}
$$

" summ " time analysis:


I've had the idea to use a Carry-Look-Ahead adder before I've chosen this design (to go quicker).
In my $1^{\text {st }}$ shot have not thought that the carry can be "sequentially rippled" then doesn't take that much time! ...however, the CLA adder was really complex and didn't allow saving a lot of time, it's just interesting from more than 16bits additions. But I've implemented (in a long night) then I show it, for the souvenir:


Cla_sum_block.gdf - Graphic Editor


Here is a part of the CLA adder design (just for 4 bits) but the tree is just doubled.

"C_SIPO" block diagram: (serial in parallel out)



This component allows implementing the high impedance state by using the tristate gate, it also allows resetting with the MSB at 1 (it's the marker that will count the 16 clock edges to raise the halt signal when the multiplication is finished) and finally it contains the halt memory cell.

## "C_SIPO" test chronogram:

Just to show ho it works I've set the signal CEz at 1 to place the output in high impedance state.........

$\ldots$ and we can see the halt signal raised at the $16^{\text {th }}$ clock edges.
"C_SIPO" time analysis:

＂CNTR＂block diagram：（control unit）

＂CNTR＂time analysis：

Timing Analyzer
Delay Matrix

| Destination |  |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  |  | clkB | clkP | Halt | loads | $\operatorname{loadB}$ |
| S | clk | 6.0 ns | 6.0 ns | 4．0ns |  |  |
| $\bigcirc$ | HaltP |  | 6.0 ns |  |  |  |
| 1 | Im |  |  |  | 6.0 ns |  |
| ${ }^{\text {c }}$ | Resetz |  |  |  | 6.0 ns | 6.0 ns |
| 1 | 」 |  |  |  |  | － |



Start
Stop
List Paths
"Systolic multiplier" block diagram: (instantiation of all the components)




This analysis seems to give a maximum time of 12.5 ns in hot conditions but I found 13.6 ns in the component b_piso. The maximum speed is thus around $1 / 13.6 \mathrm{~ns} \approx 73 \mathrm{MHz}$ (but this value is just an estimation)
"Systolic multiplier" chronogram: I used simple values to show the result, for the positive multiplication I display in decimal and for the negative one, I used the hexadecimal display.



Note: in this report file (*.rpt) we can see, among other things, the element used in the FPGA chip

| Logic Array Block | Logic Cells | I/0 Pins | Shareable <br> Expanders |  | External <br> Interconnect |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| A: LC1 - LC16 | 4/16( 25\%) | 12/12(106\%) | 1/16 | 6\%) | 6/36 | 16\% |
| B: LC17 - LC32 | 16/16(109\%) | 8/12( 66\%) | 3/16 | 18\%) | 28/36 | 77\% |
| C: LC33 - LC48 | 16/16(190\%) | 4/12( $33 \%$ ) | 12/16 ( | 75\%) | 26/36 | 72\% |
| D: LC49 - LC64 | 16/16(109\%) | 11/12( 91\%) | 4/16( | 25\%) | 20/36 | 55\% |
| Total dedicated input pins used: |  |  | 2/4 |  | ( 50\%) |  |
| Total I/0 pins used: |  |  | 35/48 |  | ( $72 \%$ ) |  |
| Total logic cells used: |  |  | 52/64 |  | ( 81\%) |  |
| Total shareable expanders used: |  |  | 4/64 |  | ( 6\%) |  |
| Total Turbo logic cells used: |  |  | 52/64 |  | ( 81\%) |  |
| Total shareable expanders not auailable (n/a): |  |  | 16/64 |  | ( 25\%) |  |
| Average fan-in: |  |  | 5.11 |  |  |  |
| Total fan-in: |  |  | 266 |  |  |  |
| Total input pins required: |  |  | 20 |  |  |  |
| Total output pins required: |  |  | 17 |  |  |  |
| Total bidirectional pins required: |  |  | 5 |  |  |  |
| Total logic cells required: |  |  | 52 |  |  |  |
| Total flipflops required: |  |  | 51 |  |  |  |
| Total product terms required: |  |  | 180 |  |  |  |
| Total logic cells lending parallel expanders: |  |  | 0 |  |  |  |
| Total shareable expanders in database: |  |  | 3 |  |  |  |

...and we can see the chip selected by Maxplus: (the speed limit depends also on the FPGA selected):

| Chip/ POF | Input <br> Pins | Output Pins | Bidir <br> Pins | LCs | Shareable <br> Expanders | \% Utilized |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| multiplier |  |  |  |  |  |  |
| EPM71064LC68-7 | 20 | 17 | 9 | 52 | 4 | ( $81 \%$ |
| User Pins: | 20 | 17 | 0 |  |  |  |

BONUS: As I still have a few "seconds" before I return this assignment, I've done a simulation of the Verilog code (sometimes modified) in Maxplus.





```
c_sipo.v - Text Editor
    module C_SIPO(C , HaltP, wire_S,Resetz, CEz,clkP);
        output [15: \(\mathrm{g}^{\circ} \mathrm{C}\);
        output HaltP;
        input wire_S, clkP, Resetz, CEz;
        reg [15:0] C, regC;
        reg HaltP;
    always(posedge clkP)
    if (~Resetz) begin
        regC = 16'h80.9;
        HaltP = 0;
    end
    else begin
        regC \(=\) regC>>1;
        \#1 regC[15] = wire_s;
        HaltP = regC[0];
    end
    alwaysd(CEz or regC) begin
    if(:CEz)
        C = regC;
    else
        C = 16'hzzzz;
    end
```



## Entr.v - Text Editor

module CNTR(loadA, loadB, clkP, Halt, clkB,
//outputs Im, clk, Resetz, HaltP);
// no more cez : : !
output loadA, loadB, clkP, Halt, clkB;
input clk, Resetz, Im, HaltP;
reg Halt, halt_tmp;
assign loadA $=$ Resetz $\underset{\sim}{\&} \mathrm{Im}$,
loadB = Resetz,
clkP = ~Halt \& clk,
clkB $={ }^{\sim}$ clk;
always@(posedge clkP) begin
if ( ${ }^{\sim}$ Resetz) halt_tmp = 0;
else halt_tmp = HaltP;
end
always@(negedge clkP) begin
if ( ${ }^{\sim}$ Resetz) Halt = 0;
else Halt = halt_tmp;
end
endmodule

SIMPLE EXAMPLE VALUES FOR THE SIMULATION:
6ii) MAX+plus II - c:ldocuments and settingsldrixoslbureaulalteralsystolic_multiplier - [systolic_multiplier.scf - Waveform Editor]
MAX+plus II File Edit View Node Assign Utilities Options Window Help





AS WE CAN SEE THE AUTOMATIC SYNTHESIS FROM VERILOG FILES TAKES MORE PLACE (86\% OF THE SAME COMPONENT INSTRAD OF 81)

I'm really happy to have found the time to compare the Verilog synthesis by Maxplus and the gate level synthesis also by Maxplus. This is a good finalisation of the comparison of automatic and manual synthesis. I already knew that the occupation ratio is better in gate level, but now I've seen it for real.

It would have been interesting to test the synthesis size of the behavioural component but time is finished now.

## 3 Conclusion

This report was a really good approach to the manual synthesis and the gate level fight. The big deal stays in the delay problems, the requirement analysis and the full testing, but we have a good overview of these.

This assignment made me discover a lot of tricks with Maxplus simulator, Silos simulator and also Verilog HDL (definitively confusing considering that I've seen VHSIC HDL last year). However, I have discovered a good overview of these complex uses, but I'm obviously still very far of the full potentials.

My programs are definitively not the only means to reach the aim and obviously, improvements exist but anyway, I'm really proud to have discovered a new design technique and a new HDL, I know it also exists SystemC, $\mathrm{A}_{\text {Itera }} \mathrm{HDL}$, and more than ten or so but I still have the time...

## 4 References:

BOOKS: Fundamentals of DIGITAL LOGIC with Verilog design (Brown Vanesic - Mc Graw Hill) DIGITAL FUNDAMENTALS $8^{\text {th }}$ ed. (Thomas FLOYD - Pearson Education International) The Verilog Hardware Description Language $5^{\text {th }}$ ed. (Thomas \& Moorby's - Kluwer Academic)

WEBSITES: http://en.wikipedia.org http://www.wordreference.com/fren http://www.cours.polymtl.ca/ele2300/acetates.htm http://ieeexplore.ieee.org http://tams-www.informatik.uni-hamburg.de

