Implementing CRC-32 in Decaf

CRC-32 is a used to detect data corruption during a transmission – one example is the use of CRC-32 to provide a CRC hash to the Frame Check Sequence in Ethernet transmissions.

It is a straightforward algorithm to implement in most languages, and code samples can be easily found online.

The challenge is to implement in a high-level language which does not come with bitwise abstractions!

You may ask, why would I spend time implementing something like this, writing an article to go with it, all in a language nobody will ever use or even hear of? The answer is because I can¹. What are you, the reader, going to do? Call the software police? Call my boss? Not likely.

What is Decaf (DeXML)?

Decaf is a language that is visually similar to CoffeeScript, however it is vastly different under the hood. It’s actually a scripting language which is executed inside a PHP context. It’s how Deputy is able to deliver customer implementations of the product at a high velocity.

Decaf has no import, require or include. There are no custom data types, developers are limited to number, string, array (actually PHP associative arrays), boolean and null. In Decaf, you cannot pass functions as arguments, so that means no callbacks. You cannot define functions while inside the scope of another function. Most importantly for the following challenge, Decaf has no bitwise operators.

The reason for these limitations is that Decaf is first transpiled into XML (or DeXML as it is known within the company). So really, the PHP application is reading XML to execute a series of commands – this is in contrast to other languages which actually get compiled.

Is it possible to implement a CRC-32 algorithm in Decaf, despite all the limitations of the language? Of course! In the spirit of Deputy CX Engineering, some workarounds are required. Read on to find out how.

Work Around The Limitations

We’re going to need some way of converting between an array of bits and base-10 integers.

First, let’s build a function which converts integers to binary. The function should take an integer argument and return an array of bits. Each bit is the integer 0 or 1.

The following function will iterate over each “column” of the binary number (1s, 2s, 4s, 8s etc) and determine whether the column should be 1 or 0.

First take modulo of the input value and 2 – this will either be a 0 or 1. Save this value into the array of bits, in the current position. Finally, subtract the value from the input. Repeat.

We have assumed we are dealing with 32 bit numbers.

intToBinary = (intA) ->
  arrBit = []
  intLength = 32
  intIndex = 0

  while intIndex < intLength
    arrBit[intIndex] = 0

    if intA > 0
      intMod = intA % 2
      arrBit[intIndex] = intMod
      intA = (intA - intMod) / 2

    intIndex = intIndex + 1

  arrOut = []
  for intBit in sort({var: arrBit, with: "reverse"})
    arrOut[length({var: arrOut})] = intBit

  return arrOut

To write a function which converts binary back into integer, iterate through each column of the binary number, evaluate the expression b * 2^p, where b is the value in the current position, and 2^p is the place value (or position) of the bit. This means that the value p is actually equal to log_2(2^p). When we loop through the array of bits, p initially is len(bits) - 1 and is finally 0 when the loop completes.

To simplify: Set p to be len(bits) - 1 and then decrement p at the end of the loop.

To calculate the base-10 value, sum the expression b * 2^p for every value of b in bits with the corresponding place value 2^p.

binaryToInt = (arrBit) ->
  intInt = 0
  intPower = length({var: arrBit}) - 1
  for intBit in arrBit
    intInt = intInt + math({op: "pow", arg1: 2, arg2: intPower}) * intBit
    intPower = intPower - 1

  return intInt

Algorithm Summary

With the workarounds in place, let’s get on with the crc32 implementation.

Start by generating the lookup table (or fast table).
- Create an array of length 256 to store the table.
- Iterate over each element of the table and note the index.
  - Perform logical right shift on the index to get right shifted index.
  - (In binary) if the value of the 1s column in the index is 1:
  - Then, perform logical XOR on the right shifted index with the magic value 0xEDB88320 to get the XORed index.
  - Update the current element of table with the XORed index.
- Return the filled the 256-element table.
- Notice that the result of this is constant! It only needs to be generated once. Or not at all, as long as you can find a it on the internet.
Take the input as a string of 8-bit symbols.
Set the crc value as 0.
For each symbol in the input
- Truncate (mask) the crc value by setting the upper 24 bits to 0 (just like the inverse of a /8, if you have a computer networks background).
- Logical XOR the symbol with the truncated crc value.
- Use this value as the index in the lookup table. Get the value.
- Right shift the original crc value by 8 bits.
- Set the crc value to be the logical XOR of the lookup table value and the right-shifted original crc value.
- This way, each symbol changes the crc value.
Once all symbols are processed from input, return the logical NOT of the crc value

Phew! If you’re confused at this point, so am I. Stay with me, code samples up next.

Wait, what’s with this “magic value” 0xEDB88320?

The value 0xEDB88320 is actually hexadecimal value for a generator polynomial which was mathematically determined² to optimise error detection, compatibility with hardware, and performance during calculation of the CRC-32 checksum.

The procedure above where the lookup table is generated is actually mathematically equivalent taking a polynomial function g(x) and calculating the each element of the array as g(current index).

The “magic value” is a way to store this polynomial as a constant, and then generate the values for the lookup table using a set of bitwise operations.

Implemented in Decaf

Here’s the main crc32 function (with the lookup table truncated for brevity):

crc32 = (strData) ->
  arrFastTable = [
    0, 1996959894, 3993919788, 
    # ... 250 values omitted ...  
    3272380065, 1510334235, 755167117
  ]

  arrBitZero = intToBinary({intA: 0})
  arrBit24Zeros = [
    0,0,0,0,0,0,0,0,
    0,0,0,0,0,0,0,0,
    0,0,0,0,0,0,0,0,
    1,1,1,1,1,1,1,1,
  ]

  arrCRCBit = logicalNot({arrBitA: arrBitZero})
  arrOrd = []
  intLength = length({var: strData})
  intIndex = 0
  while intIndex < intLength
    strChar = string({function: "substr", input: strData, arg1: intIndex, arg2: 1})
    strOrd = string({function: "ord", input: strChar})
    arrOrd[intIndex] = strOrd
    intIndex = intIndex + 1

  for intOrd in arrOrd
    arrOrdBit = intToBinary({intA: intOrd})
    arrCRCBitTruncated = logicalAnd({arrBitA: arrCRCBit, arrBitB: arrBit24Zeros})
    arrCRCXOROrd = logicalXOR({arrBitA: arrOrdBit, arrBitB: arrCRCBitTruncated})
    intCRCXOROrd = binaryToInt({arrBit: arrCRCXOROrd})
    intFastTableValue = arrFastTable[intCRCXOROrd]
    arrFastTableValue = intToBinary({intA: intFastTableValue})
    arrCRCRightShift8 = logicalRightShift({arrBit: arrCRCBit, intShift: 8})
    arrCRCBit = logicalXOR({arrBitA: arrFastTableValue, arrBitB: arrCRCRightShift8})

  arrCRCBit = logicalNot({arrBitA: arrCRCBit})

  return binaryToHex({arrBit: arrCRCBit})

You might notice that in order to process integers in binary, they are converted to arrays of bits. All the “logical” functions will operate on arrays of bits.

This maintains the correct length for each value.
This also avoids constant conversion between binary and base-10.

Yes, I also wrote a bunch of bitwise abstractions to go with this, and make the code a bit easier to follow. Will anyone ever need these abstractions? Maybe, for IP addresses or something. If you know any applications of bitwise operations which apply to workforce management, then leave a comment.

Validation

The following table shows a few validations of crc32.coffee, which has been compared against crc32.online which uses the same magic value.

Input	crc32.online	crc32.coffee
Hello World	4a17b156	4a17b156
Arie is epic	def989f5	def989f5
The script of Bee Movie	1ec6c44a (instant)	timeout :(

This is super slow (>1s)! It has to perform a lot of looping and string splitting. For this reason, it was never used in production.

Still, I had a lot of fun solving this puzzle. Sometimes you just have to do things to see if it’s possible.

Appendix A

Explanation of each function

crc32(strData)
- generate a CRC32 hexadecimal string using input string strData.
makeFastTable()
- generate the polynomial lookup table (256 element array).
- only need to execute this once, manually - the lookup table is constant. I copied the result inside crc32 as a constant.
intToBinary(intA)
- convert integer to an array of bits (integer 0 or 1).
binaryToInt(arrBit)
- convert array of bits into integer.
logicalRightShift(arrBit, intShift)
- perform a bitwise right-shift operation on an array of bits arrBit by the amount intShift.
logicalAnd(arrBitA, arrBitB)
- perform bitwise AND operation using arrBitA and arrBitB.
logicalXOR(arrBitA, arrBitB)
- perform bitwise XOR (exclusive-OR) operation using arrBitA and arrBitB.
logicalNot(arrBitA)
- perform bitwise NOT on arrBitA.
binaryToHex(arrBit)
- convert array of bits into hexadecimal number (stored as string).

Appendix B

Source code for crc32.coffee

View crc32.coffee here

References

The original reason for creating this in Decaf was that I needed to hash some environment variable names because the database field only accepted 32 characters. At the time, there was a way to generate an MD5 checksum, but the implementation was broken. ↩︎
https://en.wikipedia.org/wiki/Cyclic_redundancy_check#Designing_polynomials ↩︎