Caching In Ecto v3.0

By: Scott Newcomer
Storage barrels

Chris McCord recently outlined what’s new in Phoenix 1.4. In addition to the official release, Ecto v3.0 was published! The latest major Ecto release moved SQL adapters to a separate repository ecto_sql. Some projects may not use Ecto to talk to a database but rather as a data mapping and validation layer. This was a welcome improvement. You can see the changelog here.

However, there was one improvement that caught my eye. It was this commit to the PostgreSQL driver for Elixir, effectively caching the planning phase of INSERT, UPDATE and DELETE operations. Although caching the SELECT query plan is pre-Ecto v3.0, this is a good opportunity to go over why and how this strategy improves performance for Phoenix applications.

Database Performance Improvements

Theoretically, it makes sense that it would be prudent to cache access to a database resource. Request the same resource thousands of times and one naturally asks, why make a roundtrip to the database a thousand times? As we know, caching comes with a whole host of other problems, such as invalidating the cache. Moreover, tracking a select statement as it works its way through the database internals is complicated. In PostgreSQL, there are 4 stages: Parse, Analyze & Rewrite, Plan, and Execute. I will collectively refer to the Parse, Analyze & Rewrite, and Plan steps as the planning phase in this article.

The planning phase is important as this is when PostgreSQL starts to do some heavy work. For example, it may ask, should an index be used and if so, which one? What about joining tables? All of this can get expensive.

So, my first question was “How does Ecto cache CRUD operations?” I knew about PostgreSQL views and materialized views, so let’s start there to give us a little background on the subject.

DATABASE VIEWS

A database TABLE is an organized source for your data. A database VIEW is the lens through which you can look at your data. This is usually done by querying the VIEW, which in turn has a query stored in the VIEW definition. Lastly, a MATERIALIZED VIEW is a VIEW that also has its own storage mechanism to copy the data from the underlying TABLE, effectively taking a snapshot of the query results. This can be very effective in read-heavy environments. I believe at the time of this writing, to make use of these features in Ecto, you need to simply create a migration file and write raw SQL.

-- cache query
CREATE VIEW comments AS SELECT comment, count(id) AS count from posts GROUP BY comment;
-- cache data
CREATE MATERIALIZED VIEW comments AS SELECT comment, count(id) AS count from posts GROUP BY comment;

PREPARED STATEMENTS

With database access and storage background in mind, how does Ecto do it? It is similar to PREPARE statements in PostgreSQL.

Prepared statements save the query plan for repeated execution within the same session. This is different from a VIEW, which needs to construct the query plan every time the view is used, and also is different from a MATERIALIZED VIEW, which stores the query results.

PREPARE: parse, analyze, plan, optimize the SQL statement and store without executing it

EXECUTE: bind values to the parameters and execute the statement

So, as you can see, by breaking up the work we avoid repetitive parse and analysis work. By most measures, using PREPARE can give you, at minimum, a ~15% performance boost with trivial SELECT statements. That only improves when you have to JOIN tables.

PREPARE find_comment (text) AS
  SELECT * FROM post where comment = ($1);

EXECUTE find_comment("Narwin")

The downside is they are not shared across sessions whereas VIEWs are persisted similar to a plain table. In other words, Ecto (the client) maintains a pool of sessions/connections to the database, and with each request, Ecto grabs a connection from the pool and returns it after it is done.

If we have a loop where the same parameterized query is executed - Repo.get(Post, id) - hundreds or thousands of times, we can use prepared statements to reduce the work needed to retrieve a resource. If you have set up a PREPARE statement, you can check the list of prepared statements in PostgreSQL via select * from pg_prepared_statements;. However, if you have a running Phoenix application, executing this command doesn’t return anything. So what gives?

The next question is, by what mechanisms do we use Elixir to implement this caching behaviour?

How To Cache Queries Using Elixir

ETS tables in Elixir are an in-memory, key-value storage mechanism. If you want to learn more about ETS tables, see Chris McCord’s article here or the official docs here.

Digging around the Ecto repository, you will find ETS tables in use to cache prepared statements. Here is the best explanation I’ve found on how Ecto utilizes the ETS table to cache queries:

  # ecto/lib/ecto/query/planner.ex

  @doc """
  Plans the query for execution.
  Planning happens in multiple steps:
    1. First the query is planned by retrieving
       its cache key, casting and merging parameters
    2. Then a cache lookup is done, if the query is
       cached, we are done
    3. If there is no cache, we need to actually
       normalize and validate the query, asking the
       adapter to prepare it
    4. The query is sent to the adapter to be generated

  ## Cache
  All entries in the query, except the preload and sources
  field, should be part of the cache key.
  The cache value is the compiled query by the adapter
  along-side the select expression.
  """

Ecto will cache the result of the prepare step and if a subsequent Repo.get(User, id) is called, Ecto will find the previous planned query via :ets.lookup(cache, key), thus bypassing the entire planning phase.

Here is the tuple that will be inserted into the ets table via :ets.insert_new(MyProject.Repo, {key, :cache, select, prepared}). Notice the prepared tuple item is a string ready for execution.

{[
   :all,
   0,
   {:where,
    [and: {:==, [], [{{:., [], [{:&, [], [0]}, :id]}, [], []}, {:^, [], [0]}]}]},
   {"users", MyProject.Accounts.User, 20747221, nil},
   {:select,
    {:{}, [], [{{:., [], [{:&, [], [0]}, :id]}, [], []}, {:&, [], [0]}]}}
 ],
 :cache,
 %{
   assocs: [],
   from: {:any,
    {:source, {"users", MyProject.Accounts.User}, nil,
     [
       id: :id,
       email: :string,
       name: :string,
       signed_in_at: :utc_datetime,
       username: :string,
       inserted_at: :naive_datetime,
       updated_at: :naive_datetime
     ]}},
   postprocess: {:tuple, [value: :id, source: :from]},
   preprocess: [source: :from],
   take: []
 },
 {133058,
  "SELECT u0.\"id\", u0.\"email\", u0.\"name\", u0.\"signed_in_at\", u0.\"username\", u0.\"inserted_at\", u0.\"updated_at\", u0.\"id\" FROM \"users\" AS u0 WHERE (u0.\"id\" = $1)"}
}

To reiterate, another query such as Repo.get(User, 'another-id') will first try :ets.lookup(cache, key) and successfully find an entry as long as it is a simple WHERE clause on id. If you order or limit the query, another ets entry will be created and Ecto will have to prepare the query before executing it.

Caching Insert/Update/Delete Operations

An even further mind boggling point for me is, “Why would I cache an insert operation?” Well INSERT INTO books (id, title) VALUES (332, 'Learning about Ecto') still can be prepared before being executed.

PREPARE insert_book (int, text) AS
  INSERT INTO books VALUES($1, $2);
EXECUTE insert_book(332, 'Learning About Ecto');

However, as we know the PREPARE statement isn’t necessarily used, but the result of the planning phase is cached in ETS tables. So, caching the planning step can still provide performance benefits.

Wrapping Up

So what did I (and hope you) learn? Ecto has a prepare step and this is very important in order to generate a PREPARE like statement to be cached. For applications that are READ heavy, Ecto has, for quite a while, cached the prepared statements in ets tables. With Ecto v3.0, WRITE heavy apps should see performance improvements with the addition of this caching strategy to INSERT, UPDATE, and DELETE operations.

Lastly, is there anything you need to do? Nope. Just follow some of the update guides and articles.

DockYard is a digital product agency offering exceptional user experience, design, full stack engineering, web app development, custom software, Ember, Elixir, and Phoenix services, consulting, and training.