Implementing CRC-32 in Decaf

CRC-32 is a used to detect data corruption during a transmission – one example is the use of CRC-32 to provide a CRC hash to the Frame Check Sequence in Ethernet transmissions.

It is a straightforward algorithm to implement in most languages, and code samples can be easily found online.

The challenge is to implement in a high-level language which does not come with bitwise abstractions!

You may ask, why would I spend time implementing something like this, writing an article to go with it, all in a language nobody will ever use or even hear of? The answer is because I can1. What are you, the reader, going to do? Call the software police? Call my boss? Not likely.

Algorithm Summary

  1. Start by generating the lookup table (or fast table).
    • Create an array of length 256 to store the table.
    • Iterate over each element of the table and note the index.
      • Perform logical right shift on the index to get right shifted index.
      • (In binary) if the value of the 1s column in the index is 1:
      • Then, perform logical XOR on the right shifted index with the magic value 0xEDB88320 to get the XORed index.
      • Update the current element of table with the XORed index.
    • Return the filled the 256-element table.
    • Notice that the result of this is constant! It only needs to be generated once. Or not at all, as long as you can find a it on the internet.
  2. Take the input as a string of 8-bit symbols.
  3. Set the crc value as 0.
  4. For each symbol in the input
    • Truncate (mask) the crc value by setting the upper 24 bits to 0 (just like the inverse of a /8, if you have a computer networks background).
    • Logical XOR the symbol with the truncated crc value.
    • Use this value as the index in the lookup table. Get the value.
    • Right shift the original crc value by 8 bits.
    • Set the crc value to be the logical XOR of the lookup table value and the right-shifted original crc value.
    • This way, each symbol changes the crc value.
  5. Once all symbols are processed from input, return the logical NOT of the crc value

Phew! If you’re confused at this point, so am I. Stay with me, code samples up next.

Wait, what’s with this “magic value” 0xEDB88320?

The value 0xEDB88320 is actually hexadecimal value for a generator polynomial which was mathematically determined2 to optimise error detection, compatibility with hardware, and performance during calculation of the CRC-32 checksum.

The procedure above where the lookup table is generated is actually mathematically equivalent taking a polynomial function g(x) and calculating the each element of the array as g(current index).

The “magic value” is a way to store this polynomial as a constant, and then generate the values for the lookup table using a set of bitwise operations.

Implemented in Decaf

Here’s the main crc32 function (with the lookup table truncated for brevity):

crc32 = (strData) ->
  arrFastTable = [
    0, 1996959894, 3993919788, 
    # ... 250 values omitted ...  
    3272380065, 1510334235, 755167117
  ]

  arrBitZero = intToBinary({intA: 0})
  arrBit24Zeros = [
    0,0,0,0,0,0,0,0,
    0,0,0,0,0,0,0,0,
    0,0,0,0,0,0,0,0,
    1,1,1,1,1,1,1,1,
  ]

  arrCRCBit = logicalNot({arrBitA: arrBitZero})
  arrOrd = []
  intLength = length({var: strData})
  intIndex = 0
  while intIndex < intLength
    strChar = string({function: "substr", input: strData, arg1: intIndex, arg2: 1})
    strOrd = string({function: "ord", input: strChar})
    arrOrd[intIndex] = strOrd
    intIndex = intIndex + 1

  for intOrd in arrOrd
    arrOrdBit = intToBinary({intA: intOrd})
    arrCRCBitTruncated = logicalAnd({arrBitA: arrCRCBit, arrBitB: arrBit24Zeros})
    arrCRCXOROrd = logicalXOR({arrBitA: arrOrdBit, arrBitB: arrCRCBitTruncated})
    intCRCXOROrd = binaryToInt({arrBit: arrCRCXOROrd})
    intFastTableValue = arrFastTable[intCRCXOROrd]
    arrFastTableValue = intToBinary({intA: intFastTableValue})
    arrCRCRightShift8 = logicalRightShift({arrBit: arrCRCBit, intShift: 8})
    arrCRCBit = logicalXOR({arrBitA: arrFastTableValue, arrBitB: arrCRCRightShift8})

  arrCRCBit = logicalNot({arrBitA: arrCRCBit})

  return binaryToHex({arrBit: arrCRCBit})

You might notice that in order to process integers in binary, they are converted to arrays of bits. All the “logical” functions will operate on arrays of bits.

Yes, I also wrote a bunch of bitwise abstractions to go with this, and make the code a bit easier to follow. Will anyone ever need these abstractions? Maybe, for IP addresses or something. If you know any applications of bitwise operations which apply to workforce management, then please contact me.

Validation

The following table shows a few validations of crc32.coffee, which has been compared against crc32.online which uses the same magic value.

Input crc32.online crc32.coffee
Hello World 4a17b156 4a17b156
Arie is epic def989f5 def989f5
The script of Bee Movie 1ec6c44a (instant) timeout :(

Performance

This is super slow (>1s)! It has to perform a lot of looping and string splitting. For this reason, it was never used in production.

You Haven’t Heard of Decaf?

Decaf is a language similar to CoffeeScript, however it is vastly different under the hood. It’s actually a scripting language which is executed inside a PHP context. It’s how Deputy is able to deliver customer implementations of the product at a high velocity.

Decaf has no import, require or include. There are no custom data types, developers are limited to number, string, array (actually PHP associative arrays), boolean and null. In Decaf, you cannot pass functions as arguments, so that means no callbacks. You cannot define functions while inside the scope of another function. Most importantly for the following challenge, Decaf has no bitwise operators.

The reason for these limitations is that Decaf is first transpiled into XML (or DeXML as it is known within the company). So really, the PHP application is reading XML to execute a series of commands – this is in contrast to other languages which actually get compiled.

Is it possible to implement a CRC-32 algorithm in Decaf, despite all the limitations of the language? Of course! In the spirit of Deputy CX Engineering, some workarounds are required. Read on to find out how.

Appendix A

Explanation of each function

Appendix B

Source code for crc32.coffee

View crc32.coffee here

References


  1. The original reason for creating this in Decaf was that I needed to hash some environment variable names because the database field only accepted 32 characters. At the time, there was a way to generate an MD5 checksum, but the implementation was broken. ↩︎

  2. https://en.wikipedia.org/wiki/Cyclic_redundancy_check#Designing_polynomials ↩︎