-
Notifications
You must be signed in to change notification settings - Fork 44
/
README
336 lines (265 loc) · 13.5 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
Verilog BCH encoder/decoder.
This is a Verilog based BCH encoder and decoder for single bit, dual bit, and
3 or more bit error correction. The equations and layout for the encoder
and decoders is taken from "The design of a vhdl based synthesis tool for bch
codecs." by Ernest Jamro.
The decoder/encoder is completely parameterizable. The two main parameter
are:
DATA_BITS - The number of data bits
T - The number of bits that can be corrected
The module bch_param generates a BCH parameter set:
`include "bch_params.vh"
localparam [`BCH_PARAM_SZ-1:0] P = bch_params(DATA_BITS, T);
The generated parameters are passed to the BCH modules and can be access
with the following:
`BCH_M(P) - Degree of the field
`BCH_N(P) - Actual code size (padded)
`BCH_K(P) - Actual data size (padded)
`BCH_T(P) - Actual bits corrected (may be more than specified)
`BCH_DATA_BITS(P) - Data bits
`BCH_ECC_BITS(P) - Generate ECC bits
`BCH_CODE_BITS(P) - Data bits + ECC bits
`BCH_SYNDROMES_SZ(P) - Bits required for syndromes passed between modules
`BCH_SIGMA_SZ(P) - Bits required for sigma equation passed between
modules
`BCH_ERR_SZ(P) - Bits required for number of errors
`BCH_PARAM_SZ - Bits required to hold parameters
Note that the number of errors correctable for a given polynomial is sparse.
The search function will choose the next highest number of correctable
errors rather than trying to move to the next polynomial. For instance, if
DATA_BITS=64, T=8 is seleceted, (127, 71, 9) is chosen rather than
(255, 191, 8).
Many modules accept a PIPELINE_STAGES and/or a REG_RATIO parameter. These
parameters relate to area/speed/latency optimizations. There is not a hard
and fast rule of which setting will give the best speed or area result.
PIPELINE_STAGES adds additional pipelining stages to operation, increasing
register count and latency, but possibly increasing speed.
For modules that accept input in parallel and have duplicated register
sets for dealing with the parallel input, REG_RATIO reduces the number of
registers used. REG_RATIO=1 will include all registers, REG_RATIO=2 will
calculate every other register asyncronously from the previous register,
REG_RATIO=3 will calculate every third register asyncrounously and so on.
The maximum allowable REG_RATIO is REG_RATIO=BITS (the input width).
Note that when compiling with Xilinx tools, the '-loop_iteration_limit'
option to XST is required to be set above the `BCH_N constant. Choosing
a value that is DATA_BITS is usually safe.
The code currently compiles under Icarus Verilog, Xilinx Isim and Xilinx XST.
Bit Order:
When operating with multi-bit input/output, the first bit in each sequence is
the high bit of the first word. If padding is added to the last word is a
sequence, it is added to the lowest bits of the word.
Modules:
The encoder consists of a single module, but the decoder consists of several
modules that are meant to be used together. Depending on requirements,
certain modules can be shared between decoders. Slice counts and max
frequencies are show below each module based on the Xilinx Virtex-6 LX240T-3,
built with all default options except for '-opt_level 2' passed to XST.
Additional optimizations may yield lower area/higher speeds.
bch_encode - Performs encoding of input data via a large LFSR. Takes a BITS
wide input. For the first ceil(`BCH_DATA_BITS / BITS) cycles (including
cycle 0) the output mirrors the input. Then, for ceil(`BCH_ECC_BITS / BITS)
cycles the output contains the ECC bits. In the case that `BCH_DATA_BITS %
BITS is non-zero, the low bits of the last data word will be ignored.
A pipeline stage can be added by setting the PIPELINE_STAGES=1 parameter.
T=1,DATA_BITS=64:
BITS= 1,PIPELINE_STAGES=0 12 slices 503 MHz
BITS= 8,PIPELINE_STAGES=1 18 slices 543 MHz
BITS=16,PIPELINE_STAGES=0 18 slices 543 MHz
T=1,DATA_BITS=256:
BITS=16,PIPELINE_STAGES=0 22 slices 543 MHz
T=3,DATA_BITS=256:
BITS=16,PIPELINE_STAGES=0 72 slices 390 MHz
T=8,DATA_BITS=256:
BITS=16,PIPELINE_STAGES=1 125 slices 447 MHz
T=12,DATA_BITS=256:
BITS=16,PIPELINE_STAGES=1 183 slices 387 MHz
T=12,DATA_BITS=4096:
BITS= 8,PIPELINE_STAGES=0 150 slices 375 MHz
BITS=16,PIPELINE_STAGES=0 295 slices 318 MHz
bch_blank_ecc - Allows the ECC on erased flash to be valid. It does this by
precomputing the ECC of an all 1's input and then inverting it. This module
provides this data so it can be XOR'd with the ECC bit output from the encode
module and XOR'd again before input to the bch_syndrome module.
bch_syndrome - Calculates the syndrome equations. Takes a BITS wide input.
Operates for ceil(`BCH_CODE_BITS / BITS) cycles to produce the compact
syndrome equations. Note that bch_syndrome expects the data and ecc input
to be merged together without pad bits in the case of `BCH_DATA_BITS %
BITS being non-zero, which is not what bch_encode produces. It is
recommended to make BITS divisible by `BCH_DATA_BITS. In the event that
`BCH_CODE_BITS % BITS is non-zero, the upper bits of the final word are
ignored. Up to two pipeline stages can be added by setting the
PIPELINE_STAGES parameter.
T=1,DATA_BITS=64:
BITS= 1,PIPELINE_STAGES=0,REG_RATIO=1 8 slices 464 MHz
BITS= 4,PIPELINE_STAGES=1,REG_RATIO=4 13 slices 506 MHz
BITS= 8,PIPELINE_STAGES=2,REG_RATIO=1 30 slices 485 MHz
BITS=16,PIPELINE_STAGES=2,REG_RATIO=4 48 slices 408 MHz
T=1,DATA_BITS=256:
BITS=16,PIPELINE_STAGES=2,REG_RATIO=4 52 slices 386 MHz
T=3,DATA_BITS=256:
BITS=16,PIPELINE_STAGES=2,REG_RATIO=8 66 slices 451 MHz
T=8,DATA_BITS=256:
BITS=16,PIPELINE_STAGES=1,REG_RATIO=16 43 slices 508 MHz
T=12,DATA_BITS=256:
BITS= 4,PIPELINE_STAGES=2,REG_RATIO=4 21 slices 488 MHz
BITS= 8,PIPELINE_STAGES=2,REG_RATIO=8 22 slices 512 MHz
BITS=16,PIPELINE_STAGES=2,REG_RATIO=16 42 slices 485 MHz
T=12,DATA_BITS=4096:
BITS= 8,PIPELINE_STAGES=1,REG_RATIO=8 25 slices 512 MHz
BITS=16,PIPELINE_STAGES=1,REG_RATIO=16 42 slices 453 MHz
bch_errors_present - Determines based on syndromes if any errors are present
(Not required for decoding). This module can be used to allow several
bch_syndrome modules to share bch_sigma_* and bch_error_* modules or to
determine that the error is data free before allowing bch_sigma_* to complete.
Up to two pipeline stages can be added by setting the PIPELINE_STAGES
parameter.
bch_sigma_* - Key equation solvers (sigma). Takes syndrome equations as input
and produces the key equation, as well as the number of bit errors detected.
Solving the key equation is required for T > 2 and optional for T == 2, and
not supported for T == 1.
bch_sigma_bma_serial - Serial Berlekamp–Massey algorithm with inversion.
This option takes longer but requires less space than the parallel option.
This option takes `BCH_T(P) * (`BCH_M(P) + 2) - 2 cycles to solve the key
equation. Because of variability in the basis conversion circuits, the
performance can vary widely for different values of `BCH_M.
T= 2,DATA_BITS=64 63 slices 376 MHz
T= 2,DATA_BITS=256 80 slices 345 MHz
T= 2,DATA_BITS=1024 95 slices 263 MHz
T= 2,DATA_BITS=4096 120 slices 276 MHz
T= 8,DATA_BITS=256 192 slices 217 MHz
T= 8,DATA_BITS=1024 225 slices 262 MHz
T= 8,DATA_BITS=4096 272 slices 190 MHz
T=12,DATA_BITS=256 266 slices 208 MHz
T=12,DATA_BITS=1024 299 slices 205 MHz
T=12,DATA_BITS=4096 429 slices 194 MHz
bch_sigma_bma_parallel - Parallel Berlekamp–Massey algorithm without
inversion. This option operates in less cycles than the serial option, but
requires more gates. This option takes `BCH_T(P) * 2 - 1 cycles to solve the
key equation.
T= 2,DATA_BITS=64 92 slices 330 MHz
T= 2,DATA_BITS=256 138 slices 327 MHz
T= 2,DATA_BITS=1024 222 slices 284 MHz
T= 2,DATA_BITS=4096 335 slices 236 MHz
T= 8,DATA_BITS=256 444 slices 223 MHz
T= 8,DATA_BITS=1024 586 slices 251 MHz
T= 8,DATA_BITS=4096 754 slices 212 MHz
T=12,DATA_BITS=256 576 slices 230 MHz
T=12,DATA_BITS=1024 761 slices 229 MHz
T=12,DATA_BITS=4096 1025 slices 191 MHz
bch_sigma_bma_noinv - Serial Berlekamp–Massey algorithm without inversion.
This option takes `BCH_T(P) * (`BCH_M(P) * 2 + 1) cycles to solve the key
equation.
T= 3,DATA_BITS=256 19 slices 421 MHz
T= 3,DATA_BITS=1024 107 slices 348 MHz
T= 3,DATA_BITS=4096 128 slices 320 MHz
T= 8,DATA_BITS=256 19 slices 307 MHz
T= 8,DATA_BITS=1024 231 slices 261 MHz
T= 8,DATA_BITS=4096 305 slices 232 MHz
T=12,DATA_BITS=256 23 slices 489 MHz
T=12,DATA_BITS=1024 342 slices 196 MHz
T=12,DATA_BITS=4096 445 slices 203 MHz
bch_error_* - Error locator. After 2 cycles, it clocks out a BITS wide error
bits word ceil(`BCH_DATA_BITS / BITS) cycles, each cycle representing a set
of error locations. If `BCH_DATA_BITS % BITS is non-zero, the low bits
of the last word in the output are masked.
bch_error_dec - Error location function for T < 3. Takes syndromes directly
as input rather than a solved key equation. Same output as bch_error_*
above, but also produces the error_count. Up to two pipeline stages for
T==2 or one for T==1 can be added by setting the PIPELINE_STAGES parameter.
T=1,DATA_BITS=64:
BITS= 1,PIPELINE_STAGES=1,REG_RATIO=1 17 slices 491 MHz
BITS= 8,PIPELINE_STAGES=1,REG_RATIO=1 45 slices 510 MHz
BITS= 8,PIPELINE_STAGES=1,REG_RATIO=8 23 slices 490 MHz
BITS=16,PIPELINE_STAGES=0,REG_RATIO=8 32 slices 482 MHz
T=1,DATA_BITS=256:
BITS=16,PIPELINE_STAGES=0,REG_RATIO=16 31 slices 466 MHz
T=1,DATA_BITS=4096:
BITS=16,PIPELINE_STAGES=0,REG_RATIO=4 88 slices 423 MHz
BITS=16,PIPELINE_STAGES=1,REG_RATIO=16 64 slices 385 MHz
T=2,DATA_BITS=64:
BITS= 1,PIPELINE_STAGES=2,REG_RATIO=1 36 slices 442 MHz
BITS= 8,PIPELINE_STAGES=2,REG_RATIO=8 105 slices 504 MHz
BITS= 8,PIPELINE_STAGES=2,REG_RATIO=4 84 slices 488 MHz
BITS=16,PIPELINE_STAGES=2,REG_RATIO=1 183 slices 424 MHz
T=2,DATA_BITS=256:
BITS=16,PIPELINE_STAGES=2,REG_RATIO=8 367 slices 291 MHz
T=2,DATA_BITS=4096:
BITS=16,PIPELINE_STAGES=1,REG_RATIO=4 821 slices 230 MHz
BITS=16,PIPELINE_STAGES=2,REG_RATIO=16 637 slices 215 MHz
bch_error_tmec - Error location function for T > 1. Requires the solved key
equation as input. Same output as bch_error_* above. Up to two pipeline
stages can be added by setting the PIPELINE_STAGES parameter.
T=3,DATA_BITS=64:
BITS= 1,PIPELINE_STAGES=0,REG_RATIO=1 26 slices 485 MHz
BITS= 8,PIPELINE_STAGES=2,REG_RATIO=4 75 slices 457 MHz
BITS=16,PIPELINE_STAGES=2,REG_RATIO=8 111 slices 393 MHz
T=3,DATA_BITS=256:
BITS=16,PIPELINE_STAGES=2,REG_RATIO=4 152 slices 416 MHz
T=3,DATA_BITS=4096:
BITS=16,PIPELINE_STAGES=2,REG_RATIO=8 314 slices 352 MHz
T=8,DATA_BITS=256:
BITS=16,PIPELINE_STAGES=2,REG_RATIO=8 373 slices 337 MHz
T=8,DATA_BITS=4096:
BITS=16,PIPELINE_STAGES=2,REG_RATIO=16 737 slices 251 MHz
T=12,DATA_BITS=256:
BITS=16,PIPELINE_STAGES=2,REG_RATIO=16 595 slices 297 MHz
T=12,DATA_BITS=4096:
BITS= 8,PIPELINE_STAGES=2,REG_RATIO=8 585 slices 262 MHz
BITS=16,PIPELINE_STAGES=2,REG_RATIO=16 1247 slices 183 MHz
sim - Test bench core and example code for connecting together the different
modules. Takes an additional parameter to specify the type of key equation
solver:
OPTION - The type of key equation solver:
OPTION == NONE - For T < 3, skip the key solving process
OPTION == PARALLEL - For T > 1, use bch_sigma_bma_parallel
OPTION == SERIAL - For T > 1, use bch_sigma_bma_serial
OPTION == NOINV - For T > 1, use bch_sigma_bma_noinv
BITS - The word width for bch_encode, bch_syndrome, and bch_error_*.
REG_RATIO - When using multi-bit datastreams, bch_syndrome and bch_error_*
create a duplicated register set for each bit. This causes them to instead
only create a register for every REG_RATIOth bit. The additional values
are calculated asyncronously which may create timing issues, but will
reduce register usage.
TODO:
Improve simulation times.
Shared syndrome/chien stages between data streams?
Forney algorithm for error location?
BTZ algorithm for error location?
Encoder Example Output:
Note: This includes XOR'ing the ECC data with bch_blank_ecc. The output
matches the ECC generated by Linux software BCH.
DATA_BITS=4096, T=3, BITS=8, GP(2^13)
0x000: 24 ac a8 48 81 7a 91 03 90 6b 21 21 c6 6a 0f 8d
0x010: b0 de 9d 60 b1 99 fb 62 a6 21 43 4d 48 4a ec 91
0x020: 80 c7 cf 00 b0 b3 eb 60 6e 8f c4 dc bc b3 b1 78
0x030: b7 ea 2b 6e 31 ef b4 62 9f 97 bf 3e 63 24 ec c7
0x040: a3 21 ef 47 55 4b c8 ab a7 ef f5 4e ee 4e 4b dd
0x050: 48 6b 4c 91 f1 db 37 e2 4f 96 0c 9e 73 69 6e e7
0x060: 17 5e 00 2f a3 72 53 47 3a 92 1a 74 bd 14 71 7b
0x070: b4 1a d5 69 a6 64 e5 4d c6 46 f7 8d 50 e1 ce a0
0x080: 01 44 dc 03 04 84 26 08 55 a2 1e aa b9 33 33 73
0x090: 69 7c 04 d3 c8 1d c1 91 b0 1c 3d 61 ca 6b e5 95
0x0a0: d6 bb f9 ac ab 54 65 57 40 52 e8 81 75 7c 6a eb
0x0b0: b8 97 ff 70 f3 4e a3 e7 c4 92 21 88 50 a6 90 a0
0x0c0: 53 50 ce a7 3c 32 8c 79 4f bd 16 9e 51 86 74 a2
0x0d0: 82 cd 23 04 1b 2c d2 37 1e cf 04 3c 91 48 51 23
0x0e0: d7 b0 a3 ae 6d a6 40 da 63 9f d2 c6 75 c8 a8 ea
0x0f0: 3d 40 ae 7b 54 bb f0 a8 d0 59 eb a1 8e 7d e9 1d
0x100: a1 82 4f 42 f2 aa 4f e4 dc 06 33 b9 5c cc 60 b8
0x110: 7e 4c c4 fd a8 d2 55 50 8f d9 9b 1e 9b d2 39 36
0x120: 46 e0 c4 8c 32 df ac 64 f9 aa 85 f2 f1 85 9d e2
0x130: 1f 95 b6 3e ed 06 b3 db f9 6a 23 f3 3a 29 be 75
0x140: da 03 d3 b5 91 ec 7b 22 92 4c 75 25 0b 7c c4 17
0x150: eb 03 7f d7 8a 3f e3 15 69 f4 84 d2 30 fe b8 60
0x160: 43 73 34 87 13 b7 ea 26 ee 5a 41 dd bc b7 4f 78
0x170: 48 db 30 90 7e 00 a0 fd b6 81 a7 6c 9c e7 9b 38
0x180: f8 f1 4b f0 cb 6f d7 97 a1 f4 9b 42 1e 11 28 3d
0x190: da 9a 5b b4 92 d2 d1 24 fc 75 4f f9 54 f5 30 a8
0x1a0: 1a c6 a8 34 61 1c fc c3 71 30 58 e3 39 4f f4 73
0x1b0: 5f 99 ac be 64 3d be c9 ce e8 69 9c c5 2d d1 8b
0x1c0: f4 15 99 e9 cd 49 4b 9b fa 6a 1b f5 22 b0 94 44
0x1d0: ef bf 1f de d3 6c d9 a7 67 58 90 cf 5a db 14 b4
0x1e0: 94 83 f3 28 b7 47 41 6f 5b 36 ca b7 a6 6d f9 4d
0x1f0: ad 47 51 5b fb 29 7d f7 9e 98 8f 3c 22 ff 8c 44
ECC: 24 a2 6b 4d 5b
-- Russ Dill <[email protected]>