|
Computation of cyclic redundancy checks
Computation of a cyclic redundancy check is derived from the mathematics of polynomial division, modulo two. In practice, it resembles long division of the binary message string, with a fixed number of zeroes appended, by the "generator polynomial" string except that exclusive or operations replace subtractions. Division of this type is efficiently realised in hardware by a modified shift register, and in software by a series of equivalent algorithms, starting with simple code close to the mathematics and becoming faster (and arguably more obfuscated) through byte-wise parallelism and space–time tradeoffs.
Various CRC standards extend the polynomial division algorithm by specifying an initial shift register value, a final exclusive OR step and, most critically, a bit ordering (endianness). As a result, the code seen in practice deviates confusingly from "pure" division, Most commonly, a 256-entry lookup table is used, replacing the inner loop over j
with:
// Msbit-first rem = (rem leftShift 8) xor big_endian_table[string[i] xor ((leftmost 8 bits of rem) rightShift (n-8))] // Lsbit-first rem = (rem rightShift 8) xor little_endian_table[string[i] xor (rightmost 8 bits of rem)]
- Code fragment 6: Cores of table based division
One of the most commonly encountered CRC algorithms is known as CRC-32, used by (among others) Ethernet, FDDI, ZIP and other archive formats, and PNG image format. Its polynomial can be written msbit-first as 0x04C11DB7, or lsbit-first as 0xEDB88320. The W3C webpage on PNG includes an appendix with a short and simple table-driven implementation in C of CRC-32. You will note that the code corresponds to the lsbit-first byte-at-a-time pseudocode presented here, and the table is generated using the bit-at-a-time code.
Using a 256-entry table is usually most convenient, but other sizes can be used. In small microcontrollers, using a 16-entry table to process four bits at a time gives a useful speed improvement while keeping the table small. On computers with ample storage, a -entry table can be used to process 16 bits at a time.
Contents
Generating the tables
The software to generate the tables is so small and fast that it is usually faster to compute them on program startup than to load precomputed tables from storage. One popular technique is to use the bit-at-a-time code 256 times to generate the CRCs of the 256 possible 8-bit bytes. However, this can be optimized significantly by taking advantage of the property that table[i xor j] == table[i] xor table[j]
. Only the table entries corresponding to powers of two need to be computed directly.
In the following example code, crc
holds the value of table[i]
:
big_endian_table[0] := 0 crc := 0x8000 // Assuming a 16-bit polynomial i := 1 do { if crc and 0x8000 { crc := (crc leftShift 1) xor 0x1021 // The CRC polynomial } else { crc := crc leftShift 1 } // crc is the value of big_endian_table[i]; let j iterate over the already-initialized entries for j from 0 to i−1 { big_endian_table[i + j] := crc xor big_endian_table[j]; } i := i leftshift 1 } while i < 256
- Code fragment 7: Byte-at-a-time CRC table generation, MSB first
little_endian_table[0] := 0 crc := 1; i := 128 do { if crc and 1 { crc := (crc rightShift 1) xor 0x8408 // The CRC polynomial } else { crc := crc rightShift 1 } // crc is the value of little_endian_table[i]; let j iterate over the already-initialized entries for j from 0 to 255 by 2 × i { little_endian_table[i + j] := crc xor little_endian_table[j]; } i := i rightshift 1 } while i > 0
- Code fragment 8: Byte-at-a-time CRC table generation, LSB first
In these code samples, the table index i + j
is equivalent to i xor j
; you may use whichever form is more convenient.
Byte-Slicing using multiple tables
Parallel computation without table
Parallel update for a byte or a word at a time can also be done explicitly, without a table. This is normally used in high-speed hardware implementations. For each bit an equation is solved after 8 bits have been shifted in. The following tables list the equations for some commonly used polynomials, using following symbols:
ci | CRC bit 7…0 (or 15…0) before update |
ri | CRC bit 7…0 (or 15…0) after update |
di | input data bit 7…0 |
ei = di + ci | ep = e7 + e6 + … + e1 + e0 (parity bit) |
si = di + ci+8 | sp = s7 + s6 + … + s1 + s0 (parity bit) |
Polynomial: | (x7 + x3 + 1) × x (left-shifted CRC-7-CCITT) | x8 + x5 + x4 + 1 (CRC-8-Dallas/Maxim) |
---|---|---|
Coefficients: | 0x12 = (0x09 << 1) (MSBF/normal) | 0x8c (LSBF/reverse) |
r0 ← r1 ← r2 ← r3 ← r4 ← r5 ← r6 ← r7 ← |
0 e0 + e4 + e7 e1 + e5 e2 + e6 e3 + e7 + e0 + e4 + e7 e4 + e1 + e5 e5 + e2 + e6 e6 + e3 + e7 |
e0 + e4 + e1 + e0 + e5 + e2 + e1 e1 + e5 + e2 + e1 + e6 + e3 + e2 + e0 e2 + e6 + e3 + e2 + e0 + e7 + e4 + e3 + e1 e3 + e0 + e7 + e4 + e3 + e1 e4 + e1 + e0 e5 + e2 + e1 e6 + e3 + e2 + e0 e7 + e4 + e3 + e1 |
C code fragments: |
uint8_t c, d, e, f, r;
…
e = c ^ d;
f = e ^ (e >> 4) ^ (e >> 7);
r = (f << 1) ^ (f << 4);
|
uint8_t c, d, e, f, r;
…
e = c ^ d;
f = e ^ (e << 3) ^ (e << 4) ^ (e << 6);
r = f ^ (f >> 4) ^ (f >> 5);
|
Polynomial: | x16 + x12 + x5 + 1 (CRC-16-CCITT) | x16 + x15 + x2 + 1 (CRC-16-ANSI) | ||
---|---|---|---|---|
Coefficients: | 0x1021 (MSBF/normal) | 0x8408 (LSBF/reverse) | 0x8005 (MSBF/normal) | 0xa001 (LSBF/reverse) |
r0 ← r1 ← r2 ← r3 ← r4 ← r5 ← r6 ← r7 ← r8 ← r9 ← r10 ← r11 ← r12 ← r13 ← r14 ← r15 ← |
s4 + s0 s5 + s1 s6 + s2 s7 + s3 s4 s5 + s4 + s0 s6 + s5 + s1 s7 + s6 + s2 c0 + s7 + s3 c1 + s4 c2 + s5 c3 + s6 c4 + s7 + s4 + s0 c5 + s5 + s1 c6 + s6 + s2 c7 + s7 + s3 |
c8 + e4 + e0 c9 + e5 + e1 c10 + e6 + e2 c11 + e0 + e7 + e3 c12 + e1 c13 + e2 c14 + e3 c15 + e4 + e0 e0 + e5 + e1 e1 + e6 + e2 e2 + e7 + e3 e3 e4 + e0 e5 + e1 e6 + e2 e7 + e3 |
sp s0 + sp s1 + s0 s2 + s1 s3 + s2 s4 + s3 s5 + s4 s6 + s5 c0 + s7 + s6 c1 + s7 c2 c3 c4 c5 c6 c7 + sp |
c8 + ep c9 c10 c11 c12 c13 c14 + e0 c15 + e1 + e0 e2 + e1 e3 + e2 e4 + e3 e5 + e4 e6 + e5 e7 + e6 ep + e7 ep |
C code fragments: |
uint8_t d, s, t;
uint16_t c, r;
…
s = d ^ (c >> 8);
t = s ^ (s >> 4);
r = (c << 8) ^
t ^
(t << 5) ^
(t << 12);
|
uint8_t d, e, f;
uint16_t c, r;
…
e = c ^ d;
f = e ^ (e << 4);
r = (c >> 8) ^
(f << 8) ^
(f << 3) ^
(f >> 4);
|
uint8_t d, s, p;
uint16_t c, r, t;
…
s = d ^ (c >> 8);
p = s ^ (s >> 4);
p = p ^ (p >> 2);
p = p ^ (p >> 1);
p = p & 1;
t = p | (s << 1);
r = (c << 8) ^
(t << 15) ^
t ^
(t << 1);
|
uint8_t d, e, p;
uint16_t c, r, f;
…
e = c ^ d;
p = e ^ (e >> 4);
p = p ^ (p >> 2);
p = p ^ (p >> 1);
p = p & 1;
f = e | (p << 8);
r = (c >> 8) ^
(f << 6) ^
(f << 7) ^
(f >> 8);
|
Two-step computation
As the CRC-32 polynomial has a large number of terms, when computing the remainder a byte at a time each bit depends on several bits of the previous iteration. In byte-parallel hardware implementations this calls for either multiple-input or cascaded XOR gates which increases propagation delay.
To maximise computation speed, an intermediate remainder can be calculated by passing the message through a 123-bit shift register. The polynomial is a carefully selected multiple of the standard polynomial such that the terms (feedback taps) are widely spaced, and no bit of the remainder is XORed more than once per byte iteration. Thus only two-input XOR gates, the fastest possible, are needed. Finally the intermediate remainder is divided by the standard polynomial in a second shift register to yield the CRC-32 remainder.
One-pass checking
When appending a CRC to a message, it is possible to detach the transmitted CRC, recompute it, and verify the recomputed value against the transmitted one. However, a simpler technique is commonly used in hardware.
When the CRC is transmitted with the correct byte order (matching the chosen bit-ordering convention), a receiver can compute an overall CRC, over the message and the CRC, and if they are correct, the result will be zero. This possibility is the reason that most network protocols that include a CRC do so before the ending delimiter; it is not necessary to know whether the end of the packet is imminent to check the CRC.
CRC variants
In practice, most standards specify presetting the register to all-ones and inverting the CRC before transmission. This has no effect on the ability of a CRC to detect changed bits, but gives it the ability to notice bits that are added to the message.
Preset to −1
The basic mathematics of a CRC accepts (considers as correctly transmitted) messages which, when interpreted as a polynomial, are a multiple of the CRC polynomial. If some leading 0 bits are prepended to such a message, they will not change its interpretation as a polynomial. This is equivalent to the fact that 0001 and 1 are the same number.
But if the message being transmitted does care about leading 0 bits, the inability of the basic CRC algorithm to detect such a change is undesirable. If it is possible that a transmission error could add such bits, a simple solution is to start with the rem
shift register set to some non-zero value; for convenience, the all-ones value is typically used. This is mathematically equivalent to complementing (binary NOT) the first n bits of the message, where n is the number of bits in the CRC register.
This does not affect CRC generation and checking in any way, as long as both generator and checker use the same initial value. Any non-zero initial value will do, and a few standards specify unusual values, but the all-ones value (−1 in twos complement binary) is by far the most common. Note that a one-pass CRC generate/check will still produce a result of zero when the message is correct, regardless of the preset value.
Post-invert
The same sort of error can occur at the end of a message. Appending 0 bits to a message is equivalent to multiplying its polynomial by x, and if it was previously a multiple of the CRC polynomial, the result of that multiplication will be, as well. This is equivalent to the fact that, since 726 is a multiple of 11, so is 7260.
A similar solution can be applied at the end of the message, inverting the CRC register before it is appended to the message. Again, any non-zero change will do; inverting all the bits (XORing with an all-ones pattern) is simply the most common.
This has an effect on one-pass CRC checking: instead of producing a result of zero when the message is correct, it produces a constant non-zero result. (To be precise, the result is the CRC (without non-zero preset, but with post-invert) of the inversion pattern.) Once this constant has been obtained (most easily by performing a one-pass CRC generate/check on an arbitrary message), it can be used directly to verify the correctness of any other message checked using the same CRC algorithm.