Variable Length Quantity
From wikipedia:
A variable-length quantity (VLQ) is a universal code that uses an arbitrary number of binary octets (eight-bit bytes) to represent an arbitrarily large integer. A VLQ is essentially a base-128 representation of an unsigned integer with the addition of the eighth bit to mark continuation of bytes.
When working with some IoT devices they require the use of VLQ to encode certain integer values to provide and efficient "on the wire" data packet.
Basically, in each encoded byte you have seven bits to represent the value. The eighth bit is used to show a continuation of the encoding into the next byte.
Encoding values from 0 to 127 will use only a single byte, for example encoding:
127 will result in the binary value of 0 111111
. That is no continuation as the value will be contained in a single byte. 127 is encoded as 0x7F
128 will result in the binary value of 1 000001 0 000000
. The first byte has the continuation bit set. 128 is encoded as 0x8100
255 will result in the binary value of 1 000001 0 111111
. The first byte has the continuation bit set. 128 is encoded as 0x817F
Encoding an arbitrary integer
To encode a value
- Convert the value to base 128
- For the lowest byte, set the continuation bit to zero
- For all other bytes, set the continuation bit to one
Step 1.
defp break_into_seven_bits(0, acc) do
acc
end
defp break_into_seven_bits(data, acc) do
value = 0x7F &&& data
data = data >>> 0x7
acc = [value | acc]
break_into_seven_bits(data, acc)
end
Here we are and'ing the value to encode with 0x7F, then shifting the value right by 7 bits. The resultant value is stored in a list of base 128 values.
Step 2.
defp encode_uint_values(acc) do
length = length(acc)
{acc, _} =
Enum.reduce(acc, {<<>>, length}, fn entry, {acc, length} ->
acc =
if length > 1 do
value = 0x80 ||| entry
acc <> <<value::size(8)>>
else
acc <> <<entry::size(8)>>
end
{acc, length - 1}
end)
acc
end
The list of values is then walked to produce a 8 bit binary values ready for transmission. For all values in the list except the last value the continuation bit is set by or'ing the value with 0x80
Decoding an arbitrary integer
To decode a value
- Extract the first byte
- Test if this has the continuation value set
- If it is set, extract the next byte and go to setp 2
- If it is not set, calculate the base 10 value
def decode_uint(data) do
<<value::size(8), data::binary>> = data
continuation = 0x80 &&& value
decode_uint(continuation, value, data, [])
end
Here we use pattern matching to extract the first byte from the passed data. The continuation bit is then extracted. We then pass the tail of the data along with the initial accumulator to continue the decoding.
# continuation is set - more data
defp decode_uint(0x80, value, data, acc) do
value = 0x7F &&& value
<<next_value::size(8), data::binary>> = data
continuation = 0x80 &&& next_value
decode_uint(continuation, next_value, data, [value | acc])
end
# continuation is unset - no more data
defp decode_uint(0, value, _data, acc) do
acc = [value | acc]
acc = Enum.reverse(acc)
length = length(acc) - 1
# sum values shifting by 7 bits for each entry
{total, _} =
Enum.reduce(acc, {0, length}, fn entry, {total, index} ->
entry = entry <<< (index * 7)
{total + entry, index - 1}
end)
total
end
While there is more data, continuation bit set, continue to extract values from the passed data and add the value to the accumulator.
When the input list has been exhausted, walk the accumulator converting from base 128 to base 10.
Testing
Because the API is symmetric, we can easily leverage the property testing to ensure there is good coverage of potential values.
property "Ensure it encodes and decodes over a range of values" do
check all value <- resize(positive_integer(), 5_000_000), max_runs: 10_000 do
assert value == value |> Vlq.encode() |> Vlq.decode()
end
end
By providing a few doctests, some explicit values can be tested.
You can explore the code on github