Variable Length Quantity

  • Antony Pinchbeck
  • 2020-04-21
  • Elixir

From wikipedia:

A variable-length quantity (VLQ) is a universal code that uses an arbitrary number of binary octets (eight-bit bytes) to represent an arbitrarily large integer. A VLQ is essentially a base-128 representation of an unsigned integer with the addition of the eighth bit to mark continuation of bytes.

When working with some IoT devices they require the use of VLQ to encode certain integer values to provide and efficient “on the wire” data packet.

Basically, in each encoded byte you have seven bits to represent the value. The eighth bit is used to show a continuation of the encoding into the next byte.

Encoding values from 0 to 127 will use only a single byte, for example encoding:

127 will result in the binary value of 0 111111. That is no continuation as the value will be contained in a single byte. 127 is encoded as 0x7F

128 will result in the binary value of 1 000001 0 000000. The first byte has the continuation bit set. 128 is encoded as 0x8100

255 will result in the binary value of 1 000001 0 111111. The first byte has the continuation bit set. 128 is encoded as 0x817F

Encoding an arbitrary integer

To encode a value

  1. Convert the value to base 128
  2. For the lowest byte, set the continuation bit to zero
  3. For all other bytes, set the continuation bit to one

Step 1.

defp break_into_seven_bits(0, acc) do
  acc
end

defp break_into_seven_bits(data, acc) do
  value = 0x7F &&& data
  data = data >>> 0x7

  acc = [value | acc]

  break_into_seven_bits(data, acc)
end

Here we are and’ing the value to encode with 0x7F, then shifting the value right by 7 bits. The resultant value is stored in a list of base 128 values.

Step 2.

defp encode_uint_values(acc) do
  length = length(acc)

  {acc, _} =
    Enum.reduce(acc, {<<>>, length}, fn entry, {acc, length} ->
      acc =
        if length > 1 do
          value = 0x80 ||| entry
          acc <> <<value::size(8)>>
        else
          acc <> <<entry::size(8)>>
        end

      {acc, length - 1}
    end)

  acc
end

The list of values is then walked to produce a 8 bit binary values ready for transmission. For all values in the list except the last value the continuation bit is set by or’ing the value with 0x80

Decoding an arbitrary integer

To decode a value

  1. Extract the first byte
  2. Test if this has the continuation value set
  3. If it is set, extract the next byte and go to setp 2
  4. If it is not set, calculate the base 10 value
def decode_uint(data) do
  <<value::size(8), data::binary>> = data

  continuation = 0x80 &&& value

  decode_uint(continuation, value, data, [])
end

Here we use pattern matching to extract the first byte from the passed data. The continuation bit is then extracted. We then pass the tail of the data along with the initial accumulator to continue the decoding.

# continuation is set - more data
defp decode_uint(0x80, value, data, acc) do
  value = 0x7F &&& value

  <<next_value::size(8), data::binary>> = data
  continuation = 0x80 &&& next_value

  decode_uint(continuation, next_value, data, [value | acc])
end

# continuation is unset - no more data
defp decode_uint(0, value, _data, acc) do
  acc = [value | acc]
  acc = Enum.reverse(acc)
  length = length(acc) - 1

  # sum values shifting by 7 bits for each entry
  {total, _} =
    Enum.reduce(acc, {0, length}, fn entry, {total, index} ->
      entry = entry <<< (index * 7)

      {total + entry, index - 1}
    end)

  total
end

While there is more data, continuation bit set, continue to extract values from the passed data and add the value to the accumulator.

When the input list has been exhausted, walk the accumulator converting from base 128 to base 10.

Testing

Because the API is symmetric, we can easily leverage the property testing to ensure there is good coverage of potential values.

property "Ensure it encodes and decodes over a range of values" do
  check all value <- resize(positive_integer(), 5_000_000), max_runs: 10_000 do
    assert value == value |> Vlq.encode() |> Vlq.decode()
  end
end

By providing a few doctests, some explicit values can be tested.

You can explore the code on github

© 2001 - 2024 Component X Software Limited All rights reserved.

2020-11-04