One of the more powerful features of the Elixir language when building web applications and other high-performance software is the ability to use the Erlang Term Storage library, aka :ets
. :ets
can often single-handedly replace entire caching products and strategies in your tech stack. The Elixir core language currently does not have a wrapper around the :ets
library, therefore users interact directly with the Erlang interface. As is the case with Erlang itself, the developer user experience is a bit rough around the edges, and can be confusing to developers trying to learn the library.
I worked through these issues recently, and want to share what I learned and the library I put together to help myself and other developers work with :ets
. There are many great blog posts on what :ets
is and how to use it; however, this post will focus on some of the common issues that you might encounter when using :ets
, as well as introduce Ets, which is designed to improve the Elixir developer user experience when working with :ets
.
Let’s first dive into some of the common pitfalls of :ets
.
Creating :ets tables
One often confusing aspect of working with :ets
is the experience around creating tables. While Elixir developers are used to keyword lists for options, :ets
takes a regular list, containing a mix of both single atoms (e.g. :private
) and key/value tuples (e.g. {:write_concurrency, false}
). The format of the options aren’t consistent. Some boolean options such as :named_table
or :compressed
are taken as a single atom flag. Other boolean options such as {write_concurrency: true}
and {read_concurrency: true}
are taken as key/value tuples. Non-boolean options are similary inconsistent. Some non-boolean options are taken as single atoms – for example, protection level is specified as either :private
, :protected
, or :public
. Other non-boolean options such as {:keypos, 1}
appear as key/value tuples. Finally, :ets.new()
takes a table name for its first parameter, even when you are creating an unnamed table; in which case, the name is ignored. This mixture of confusing options means that even developers who have used :ets
for a while often end up back in the documentation when creating a new table.
Bags and sets, ordered and duplicate
There are four different types of :ets
tables, :set
, :bag
, :ordered_set
, and :duplicate_bag
. What exactly each of these does is a source of confusion even among long time Elixir developers. In practice, the two bags act pretty much the same as each other most of the time, as do the two sets, and ordered/duplicate probably should have been configuration flags on :set
and :bag
respectively.
:set
and :ordered_set
both allow only one record for any single key. Inserting a second record with the same key will overwrite the first record with the second record. The only difference between :set
and :ordered_set
is that one keeps the records in term order of the keys. This is useful if you want to use first
/last
/next
/prev
, but adds overhead to insert compared to a simple :set
, so it should only be used if you definitely need order. Side note: using those four functions on anything other than an :ordered_set
might result in an ArgumentError
, but almost certainly will result in inconsistent results. Additionally, even though lookup
on a set can only ever return a single record, the function returns that single value in a list (or an empty list if none found), which has to be taken into account every time you call lookup
.
:bag
and :duplicate_bag
both allow for multiple records with the same key. The only difference between the two is that :bag
does not allow two records where all values in the record are the same (e.g. inserting {:a, :b, :c}
twice would result in a single entry in a :bag
, but two entries in :duplicate_bag
). :bag
comes with the implementation overhead of checking for duplicates on insert; so unless you explicitly need to prevent duplicate full records, you should use :duplicate_bag
over :bag
.
What is a record?
Another thing that confused me about :ets
is that I initially heard it described as a key-value store like Redis. This, plus the available examples, had me thinking that one element in the tuple was the key, and the other elements in the tuple are all the values associated with that key. I thought that insert
with {:a, :b}
and then {:a, :c}
would result in {:a, :b, :c}
, :a
being the key, and :b
/:c
being the two values associated with it. Instead, it’s more like a relational database where each row is a single record, and one of the values in the row (specified by the :keypos
option) serves as the key. By default, it’s the first value in the inserted tuple.
For example, if our record is {email, name, phone}
, then by default email
is the key:
table = :ets.new(:users, [:set])
:ets.insert(table, {"me@example.com", "Mike Binns", "555-867-5309"})
:ets.lookup(table, "me@example.com") # => [{"me@example.com", "Mike Binns", "555-867-5309"}]
:ets.lookup(table, "Mike Binns") # => []
but if we set :keypos
to 2
, then the name column is the key:
table = :ets.new(:users, [:set, {:keypos, 2}])
:ets.insert(table, {"me@example.com", "Mike Binns", "555-867-5309"})
:ets.lookup(table, "me@example.com") # => []
:ets.lookup(table, "Mike Binns") # => [{"me@example.com", "Mike Binns", "555-867-5309"}]
As an added (somewhat confusing) feature, :ets
records don’t all have to be the same size, so you can add {:a, :b, :c}
and {:c, :d}
to the same table. The only caveat to that is that the record cannot be smaller than the keypos (e.g. {:a}
cannot be inserted into a table with {:keypos, 2}
), or you will end up with…
ArgumentError
You don’t get far using :ets
before you run into your first ArgumentError
. There are many ways you can mess up when using :ets
, from passing incorrect args, to attempting an operation on a table that doesn’t exist or attempting to create a table that already exists, to inserting invalid values such as non-lists or records smaller than the keypos
, to attempting lookup_element
on a key that doesn’t exist. The issue with :ets
is that regardless of what you do wrong, your result is a raised ArgumentError
with no additional information. You have to know where to look and what may have caused the error. This is difficult when you are learning :ets
and don’t know the common pitfalls. The raise also doesn’t allow the standard Elixir {:ok, value} | {:error, reason}
matching, so you have to wrap everything in try/catch if you want to be safe.
:"$end_of_table"
Another interesting quirk of :ets
is the :"$end_of_table"
atom. The dollar sign in the atom necessitates quotes, hence the awkward :"$
at the beginning of the atom. This atom shows up as a return value in a number of calls, including first
, last
, next
, and prev
. It is also returned by match
, either solo (if there are no more rows that match) or as part of a tuple (if the current page of results is the last page and isn’t a full page). When working with any of the functions that may return it, you have to specifically check for the atom in your pattern matches, or risk passing it on to your running code.
Introducing the Ets
library
When I began working with :ets
, I didn’t get far before looking for a nice Elixir wrapper. Unfortunately, the handful of existing wrappers were limited in scope and didn’t address the issues I was dealing with, so the idea of writing a more comprehensive wrapper came up. One of the many benefits of working at DockYard is that client work is done Monday through Thursday, and Fridays are “DockYard Days,” during which DockYarders can work on things like professional development, mentoring, or Open Source contributions. The Elixir Library Ets is the result of a number of my DockYard Days over the past months. The design goals for the Ets
library, outlined in the README, are listed below. As you can see, Ets
is designed to eliminate or avoid the pitfalls I have described in this post.
From Ets README.md
The purpose of this package is to improve the developer experience when both learning and interacting with Erlang Term Storage.
This will be accomplished by:
- Conforming to Elixir standards:
- Two versions of all functions:
- Main function (e.g.
get
) returns{:ok, return}
/{:error, reason}
tuples. - Bang function (e.g.
get!
) returns unwrapped value or raises on :error.
- Main function (e.g.
- All options specified via keyword list.
- Two versions of all functions:
- Wrapping unhelpful
ArgumentError
’s with appropriate error returns.- Avoid adding performance overhead by using try/rescue instead of pre-validation
- On rescue, try to determine what went wrong (e.g. missing table) and return appropriate error
- Fall back to
{:error, :unknown_error}
(logging details) if unable to determine reason.
- Appropriate error returns/raises when encountering
$end_of_table
. - Providing Elixir friendly documentation.
- Providing
Ets.Set
andEts.Bag
modules with appropriate function signatures and error handling.Ets.Set.get
returns a single item (or nil/provided default) instead of list as sets never have multiple records for a key.
- Providing abstractions on top of the two base modules for specific usages
Ets.Set.KeyValueSet
abstracts away the concept of tuple records, replacing it with standard key/value interactions.
Try it out
You can add Ets
to your Elixir project by adding {:ets, "~> 0.6.0"}
to your mix.exs
dependencies. Check for the latest published version on hex. The documentation is available on hexdocs. Please take a look, give it a shot, and let me know how it goes.
DockYard is a digital product agency offering exceptional strategy, design, full stack engineering, web app development, custom software, Ember, Elixir, and Phoenix services, consulting, and training.