Unraveling the Power of Binary Matching in Elixir for Efficient Data Processing

A close-up image of two spools of thread, one red and one white, sitting next to each other on a wood table
Paulo Valente

Software Engineer

Paulo Valente

With Elixir, you get the scalability and stability power you need for lasting success. And when you partner with DockYard, you get leaders in Elixir ready to execute your vision. Book a free consult to learn more.

Introduction

Elixir has quite a few different ways of representing collections of data: lists for ordered collections, MapSets for set manipulation, Maps and Keyword Lists for key-value pair association, and so on. However, there’s another important data structure: BEAM binaries.

A bitstring is the more general version of the binary, which are, respectively, a sequence of bits or a sequence of bytes. Much like lists, they are ordered. As a bonus, binaries also have known sizes and provide some powerful tools for writing and reading data from them.

How to Create a Binary

In short, Elixir provides the <<>> operator for creating a binary. There are a few ways of creating one, such as:

iex(1)> x = 123
123
iex(2)> <<x::8>>
"{"
iex(3)> <<x::7>>
<<123::size(7)>>
iex(4)> <<0::1, x::7>>
"{"

In this first example, we already notice something strange–the first command in iex(2) returned a string! Elixir strings are actually binaries under the hood. x::8 means that we’re writing 123 as an eight-bit number into the bitstring. Likewise <<x::7>> means that we’re representing 123 as a seven-bit bitstring, which isn`t a string. The last line, however, adds the missing leading zero, which then returns the same as the original line.

There are a few different ways of embedding values in a binary, like in the example below. For a more in-depth explanation of valid types, the official documentation for <<...>> is pretty extensive. The example also introduces a more extensive syntax for matching binary elements, which allows us to extract data in a more structured way from binaries.

iex(1)> <<?J::utf8, ?o::utf8, ?s::utf8, 50089::utf16>>
"José"
iex(2)> embed = <<1024.25::float-32-little>>
<<0, 8, 128, 68>>
iex(13)> <<extracted::float-32-little>> = embed
<<0, 8, 128, 68>>
iex(14)> extracted
1024.25

The Basics of Binary Pattern Matching

Let’s take a look at the following example:

iex> binary = <<1::1, 2::3, 3::4, 4::8>>
<<163, 4>>

The statement is encoding four numbers into a bitstring in such a way that they only occupy two bytes. We can extract them back by using pattern matching with the same size specifications:

iex> <<a::1, b::3, c::4, d::8>> = binary
iex> a
1
iex> b
2
iex> c
3
iex> d
4

Now for a more complete example, let’s encode a float:

<<1024.25::float-32-little>>

This statement says that we want to write a number (1024.25) with the float type encoding, using 32 bits as the bitsize and using little endian encoding. As a comparison, here’s the same number written in both little and big endian:

iex> <<1024.25::float-32-little>>
<<0, 8, 128, 68>>
iex> <<1024.25::float-32-big>>
<<68, 128, 8, 0>>

Finally, let’s look into binary comprehension and recursion. Binary comprehension is a way for us to iterate over a binary, extracting and manipulating data.

binary = "Elixir"
for <<char::utf8 <- binary>>, into: <<>> do
  case_changed =
    cond do
      char in ?a..?z -> char + ?A - ?a
      char in ?A..?Z -> char + ?a - ?A
      true -> char
    end

  <<case_changed::utf8>>
end
# returns "eLIXIR"

In this more complex example, we’re changing the case of all characters in the string and splicing the string back together. All features of the common for comprehension work here.

For recursion, we just need to treat pattern matching as we normally do. The following example encodes and then decodes a key-length-value encoded binary.

defmodule KLVCodec do
  @key_bits 8
  @length_bits 32

  @keys [:first_name, :last_name]

  def encode(data) do
    for %{key: key, value: value} <- data, into: <<>> do
      encoded = :erlang.term_to_binary(value)
      encoded_size = byte_size(encoded)
      <<to_key_code(key)::@key_bits, encoded_size::@length_bits, encoded::bytes-size(encoded_size)>>
    end
  end

  defp to_key_code(key) do
    Enum.find_index(@keys, & &1 == key)
  end

  def decode(encoded, acc \\ [])

  def decode(<<>>, acc), do: Enum.reverse(acc)
  def decode(<<key_code::@key_bits, length::@length_bits, encoded::bytes-size(length), rest::bitstring>>, acc) do
    key = from_key_code(key_code)
    data = :erlang.binary_to_term(encoded)

    decode(rest, [%{key: key, value: data} | acc])
  end

  defp from_key_code(idx), do: Enum.at(@keys, idx)
end

iex(1)> encoded = KLVCodec.encode([%{key: :first_name, value: "Paulo"}, %{key: :last_name, value: "Valente"}])
<<0, 0, 0, 0, 11, 131, 109, 0, 0, 0, 5, 80, 97, 117, 108, 111, 1, 0, 0, 0, 13, 131, 109, 0, 0, 0, 7, 86, 97, 108, 101, 110, 116, 101>>
iex(2)> KLVCodec.decode(encoded)
[%{value: "Paulo", key: :first_name}, %{value: "Valente", key: :last_name}]

Although this is a more complex example, it really showcases the power of binary matching. A special highlight is the following statement, which uses one of the matched fields (length) as metadata for the next matched field (bytes-size(length)):

<<key_code::@key_bits, length::@length_bits, encoded::bytes-size(length), rest::bitstring>>

Conclusion

Elixir’s binary matching is a powerful tool for efficient data processing, offering both simplicity and performance. It excels in various applications, from basic pattern matching to complex data encoding and decoding. This versatility makes it an essential skill for Elixir developers, opening doors to innovative solutions in data-intensive tasks.

Newsletter

Stay in the Know

Get the latest news and insights on Elixir, Phoenix, machine learning, product strategy, and more—delivered straight to your inbox.

Narwin holding a press release sheet while opening the DockYard brand kit box