MU
fpga-division
Division Algorithms in FPGAs
Algorithms
The following two algorithms are being examined:
Repeated subtraction (Rept. -)
C style pseudocode:
uint<N>_t numerator = num; // input
uint<N>_t denominator = den; // input
uint<N>_t quotient = 0;
while (numerator >= denominator) {
numerator -= denominator;
++quotient;
}
uint<N>_t quo = quotient; // output
uint<N>_t rmn = numerator; // outputSRT division
The implemented algorithm is documented in a .ppt about computer arithmetic from the UCCS.
C style pseudocode:
uint<2*N + 1>_t numerator = num; // input
uint<N>_t denominator = den; // input
uint<N>_t quotient = 0;
uint<N>_t remainder = 0;
uint<log2(N-1)>_t k;
uint<N>_t ap = 0;
uint<N>_t an = 0;
k = leading_zeros(denominator);
denominator <<= k;
numerator <<= k;
for (int i = 0; i < N; ++i) {
switch (top_3_bits(numerator)) {
case 0:
case 7:
ap <<= 1;
an <<= 1;
numerator <<= 1;
break;
case 4:
case 5:
case 6:
ap <<= 1;
an = (an << 1) | 1;
numerator <<= 1;
numerator += (denominator << N);
break;
default: // case 1, 2, 3
ap = (ap << 1) | 1;
an <<= 1;
numerator <<= 1;
numerator -= (denominator << N);
break;
}
}
quotient = ap - an;
remainder = numerator >> N;
if (msb(numerator) == 1) {
remainder += denominator;
--quotient;
}
uint<N>_t quo = quotient; // output
uint<N>_t rmn = remainder >> k; // outputNotes:
apcan be integrated into the N LSBs of thenumeratorto save N flip-flops (see VHDL code)numeratoris namedpin the VHDL code (named(P,A)in the.ppt)numis namedAanddenis namedBin the.ppt
Performance (ATTL)
The performance of the algorithms is measured in termns of area, timing, throughput and latency (ATTL).
Iterative Approach
| 10 bits | 12 bits | N bits | ||||
|---|---|---|---|---|---|---|
| Rept. - | SRT | Rept. - | SRT | Rept. - | SRT | |
| Area (#ff) | a | b | c | d | ? | ? |
| Timing (MHz) | x | y | x | y | x | y |
| Throughput (bits/clk) | 2.3 | 1 | 2.4 | 1 | ? | 1 |
| Avg. Latency (#clk) | 4.3 *) | 10 | 5.0 *) | 12 | ? | N |
*) Determined by Simulation
a, b, c, d, x, y: TBD
Unrolled Approach
| 10 bits | 12 bits | N bits | ||||
|---|---|---|---|---|---|---|
| Rept. - | SRT | Rept. - | SRT | Rept. - | SRT | |
| Area (#ff) | a | b | c | d | ? | ? |
| Timing (MHz) | u | v | w | x | ? | ? |
| Throughput (bits/clk) | 2.3 * M | M | 2.4 * M | M | ? | M |
| Avg. Latency (#clk) | 4.3 / M | 10 / M | 5.0 / M | 12 / M | ? | N / M |
M: Iterations per Cycle
a, b, c, d, u, v, w, x: TBD
License
Copyright © 2019 Dominik Müller and Nico Canzani
This project is licensed under the terms of the Apache License 2.0 - see the LICENSE file for details
On this page
Languages
VHDL79.6%Stata15.5%Python4.9%
Contributors
Apache License 2.0
Created November 16, 2019
Updated November 21, 2023