Semantic Search with Phoenix, Axon, and Elastic

Introduction

Recent advancements in the field of natural language processing have produced models seemingly capable of capturing the semantic meaning of text.

Transformer-based language models are a type of neural network architecture capable of modeling complex relationships in natural language. Google introduced the original transformer model, BERT, in 2017 and it has played a fundamental role in their search algorithm since its inception.

BERT is powerful enough to capture context and meaning. Its ability to effectively model text makes it a natural fit for semantic search. In a semantic search, the goal is to match documents and queries based on context and intent, rather than just matching on keywords. In other words, semantic search allows you to express queries in natural language, and retrieve documents that closely match the intent of your query.

Transformer models like BERT are a powerful tool and, fortunately, you can leverage this power directly from Elixir. In this post, I’ll walk you through a simple semantic search application that plays the role of sommelier–matching natural language requests to wines.

Setting up the application

Let’s start by creating a new Phoenix application:

$ mix phx.new wine --no-ecto

Next, navigate here to download a newline delimited JSON (jsonlines) file of wines scraped from Wine.com. You’ll need these later on. Save the scraped wines to a directory of your choosing–I have mine in priv.

Now you’ll want to set up Elasticsearch. If you’re not familiar with Elasticsearch, it is a search and analytics engine that enables integration of search capabilities into existing applications. This application makes use of a local Elasticsearch setup; however, you can easily make use of a hosted instance from AWS or elsewhere.

To setup Elasticsearch, start by creating a new docker network:

docker network create elastic

Next, start the official Elasticsearch container:

docker run --name es01 \
  --net elastic \
  -p 9200:9200 \
  -p 9300:9300 \
  -m 4gb \
  -it docker.elastic.co/elasticsearch/elasticsearch:8.3.0

It’s important to use at least version 8.x as that’s the minimum version that supports the functionality required for semantic search. You might encounter issues with vm.max_map_count on startup, you can fix this by raising the value of vm.max_map_count with:

sysctl -w vm.max_map_count=262144

Or by changing the value in /etc/sysctl.conf.

Once the container starts, you’ll see a bunch of metadata logged including a password for the elastic user. You’ll want to save the password. I’ve got mine stored in an ELASTICSEARCH_PASSWORD environment variable.

Elasticsearch makes use of SSL by default, so you’ll need to grab the certificate file from the running container. You can do this by opening a new terminal and running:

docker cp es01:/usr/share/elasticsearch/config/certs/http_ca.crt .

Now you can verify the server is running with:

curl --cacert http_ca.crt -u elastic https://localhost:9200

You will be prompted for a password for the user elastic–this is the password you saved in the previous step. If the query was successful, you should get a 200 response with some metadata about the server:

{
  "name" : "a60143889402",
  "cluster_name" : "docker-cluster",
  "cluster_uuid" : "GyaCxRKHSoOFbLXsza7q1A",
  "version" : {
    "number" : "8.3.0",
    "build_type" : "docker",
    "build_hash" : "5b8b981647acdf1ba1d88751646b49d1b461b4cc",
    "build_date" : "2022-06-23T22:48:49.607492124Z",
    "build_snapshot" : false,
    "lucene_version" : "9.2.0",
    "minimum_wire_compatibility_version" : "7.17.0",
    "minimum_index_compatibility_version" : "7.0.0"
  },
  "tagline" : "You Know, for Search"
}

Configuring the Elasticsearch index

In Elasticsearch, a collection of documents is called an index.

An index is somewhat similar to a relational database. In an index, you define different types (think tables) of documents and define the properties (think columns) of a document.

In this application, you want a user to be able to enter a natural language wine description and receive a list of potential wines back. That means you’ll just need a single “type”, i.e. wines. But what kind of properties does each wine type need to have?

Realistically, you can define any number of properties you’d like; however, the most important property for your wine type is its document-vector.

The document-vector is a dense vector representation of a document. BERT and other transformer models work by embedding text into a dense numerical representation.

These numerical representations exist in really high-dimensional space, close to other texts with similar meanings. For example, if you were to embed the words “king” and “queen” and visualize where they are in space, you’d notice they’re relatively close together. More than likely, they’d probably be near other words like “royalty”, “throne”, “prince”, and “princess.”

BERT can embed long strings of text into high-dimensional space, such that sentences with similar meanings lie close together in space. For example, the embeddings for “I like dogs” and “I like puppies” should be very similar as the meanings of the sentences are essentially identical.

So what does this have to do with search? Well, if you pre-compute the embeddings for some fixed number of documents, you can use the same technique for an input query, and compute the distance between the input query and all documents on hand. In the end, you’ll have a ranking of documents in order of their similarity to the input query.

Elasticsearch supports this kind of similarity search with dense_vector types, which allow you to store embeddings from models like BERT. Elasticsearch can then perform an approximate K-Nearest Neighbors search to determine the most similar documents to your query.

Knowing that, create a new file wine_index.json in a priv/elastic directory and copy the following:

{
  "mappings": {
    "properties": {
      "document-vector": {
        "type": "dense_vector",
        "dims": 768,
        "index": true,
        "similarity": "l2_norm"
      }
    }
  }
}

Next, run the following to create a new index in elasticsearch:

curl --cacert http_ca.crt -XPUT -d @priv/elastic/wine_index.json -u elastic https://localhost:9200/wine -H 'Content-Type: application/json'

You should get a 200 response and a success message. You’ve successfully created an Elasticsearch index, now you need to add documents to the index!

Embedding scraped wine products

With your index configured, you need to compute embeddings for each of the wines you have and add them to your index.

Start by creating a new Elixir script in priv called embed_wine_documents.exs. This script will house the code for embedding wine documents–in a real application, you might choose to move this code into your application. For example, you’d probably want to periodically scrape the web for more wine products and update the index regularly.

Fortunately, the logic for embedding and adding documents to an index follows regardless of if it’s in a script or your application.

To embed documents and add them to the Elasticsearch index, you’ll need a few dependencies:

Mix.install([
  {:httpoison, "~> 1.8"},
  {:jason, "~> 1.3"},
  {:axon_onnx, "~> 0.2.0-dev", github: "elixir-nx/axon_onnx"},
  {:axon, "~> 0.2.0-dev", github: "elixir-nx/axon", override: true},
  {:exla, "~> 0.3.0-dev", github: "elixir-nx/nx", sparse: "exla"},
  {:nx, "~> 0.3.0-dev", github: "elixir-nx/nx", sparse: "nx", override: true},
  {:tokenizers, "~> 0.1.0-dev", github: "elixir-nx/tokenizers", branch: "main"},
  {:rustler, ">= 0.0.0", optional: true}
])

Some of these might be familiar.

You’re probably familiar with httpoison and jason–both of these libraries are necessary to make requests to the Elasticsearch API.

I’ve written about nx, exla, and axon in previous posts.

nx and exla are the foundations of the Nx ecosystem, and axon is a deep learning library in Elixir. axon_onnx is a new library which allows you to import ONNX neural networks.

ONNX is an open neural network serialization format supported by most of the Python ecosystem. With axon_onnx, you can leverage the massive Python ecosystem of pre-trained models directly in Elixir.

Finally, tokenizers is a library that offers bindings around HuggingFace tokenizers. Most transformer models like BERT make use of custom tokenizers that pre-process sequences in a probabilistic way. You’ll need tokenizers to pre-process input data for use in a pre-trained BERT model.

Now, create a module called EmbedWineDocuments and add the following code:

defmodule EmbedWineDocuments do
  alias Tokenizers.{Tokenizer, Encoding}

  def format_document(document) do
    "Name: #{document["name"]}\n" <>
      "Varietal: #{document["varietal"]}\n" <>
      "Location: #{document["location"]}\n" <>
      "Alcohol Volume: #{document["alcohol_volume"]}\n" <>
      "Alcohol Percent: #{document["alcohol_percent"]}\n" <>
      "Price: #{document["price"]}\n" <>
      "Winemaker Notes: #{document["notes"]}\n" <>
      "Reviews:\n#{format_reviews(document["reviews"])}"
  end

  defp format_reviews(reviews) do
    reviews
    |> Enum.map(fn review ->
      "Reviewer: #{review["author"]}\n" <>
        "Review: #{review["review"]}\n" <>
        "Rating: #{review["rating"]}"
    end)
    |> Enum.join("\n")
  end
end

The wine documents are JSON files, but BERT only works on text–this code formats a document into a string representation. Next, you’ll need to tokenize the input to generate for use in an Axon model. Add the following code to the module:

  def encode_text(tokenizer, text, max_sequence_length) do
    {:ok, encoding} = Tokenizer.encode(tokenizer, text)

    encoded_seq =
      encoding
      |> Enum.map(&Encoding.pad(&1, max_sequence_length))
      |> Enum.map(&Encoding.truncate(&1, max_sequence_length))

    input_ids = encoded_seq |> Enum.map(&Encoding.get_ids/1) |> Nx.tensor()
    token_type_ids = encoded_seq |> Enum.map(&Encoding.get_type_ids/1) |> Nx.tensor()
    attention_mask = encoded_seq |> Enum.map(&Encoding.get_attention_mask/1) |> Nx.tensor()

    %{
      "input_ids" => input_ids,
      "token_type_ids" => token_type_ids,
      "attention_mask" => attention_mask
    }
  end

This function computes the inputs required for the BERT model from a tokenizer and input text. Notice you also must provide a max sequence length. Axon requires fixed input shapes so, to handle variable sequence lengths, it’s common to pad or truncate inputs to a certain number of tokens.

Finally, you’ll want to implement a function for computing embeddings using Axon:

  def compute_embedding(model, params, inputs) do
    Axon.predict(model, params, inputs, compiler: EXLA)
  end

The next step is to create a pipeline that iterates through each entry in your collection of wine documents, computes its embedding, and adds it to your existing Elasticsearch index. Start by adding the following code immediately after EmbedWineDocuments:

max_sequence_length = 120
batch_size = 128

{bert, bert_params} =
  AxonOnnx.import("priv/models/model.onnx", batch: batch_size, sequence: max_sequence_length)

bert = Axon.nx(bert, fn {_, out} -> out end)

{:ok, tokenizer} = Tokenizers.Tokenizer.from_pretrained("bert-base-uncased")

AxonOnnx.import/2 imports an existing ONNX model into an Axon model and parameters. It takes a path to an ONNX file. We’ll generate this model in a little bit. The additional Axon.nx layer just extracts the desired output from the original model. Tokenizers.Tokenizer.from_pretrained loads a pre-trained tokenizer for use with a pre-trained model.

In this example, the name of the model you’re using is the bert-base-uncased model, so you need to use its accompanying tokenizer.

Now, add the following code to complete your embedding pipeline:

path_to_wines = "priv/wine_documents.jsonl"
endpoint = "https://localhost:9200/wine/_doc/"
password = System.get_env("ELASTICSEARCH_PASSWORD")
credentials = "elastic:#{password}"
headers = [
  Authorization: "Basic #{Base.encode64(credentials)}",
  "Content-Type": "application/json"
]
options = [ssl: [cacertfile: "http_ca.crt"]]

document_stream =
  path_to_wines
  |> File.stream!()
  |> Stream.map(&Jason.decode!/1)
  |> Stream.map(fn document -> {document["url"], EmbedWineDocuments.format_document(document)} end)
  |> Stream.chunk_every(batch_size)
  |> Stream.flat_map(fn batches ->
    {urls, texts} = Enum.unzip(batches)
    inputs = EmbedWineDocuments.encode_text(tokenizer, texts, max_sequence_length)
    embedded = EmbedWineDocuments.compute_embedding(bert, bert_params, inputs)

    embedded
    |> Nx.to_batched(1)
    |> Enum.map(&Nx.to_flat_list(Nx.squeeze(&1)))
    |> Enum.zip_with(urls, fn vec, url -> %{"url" => url, "document-vector" => vec} end)
    |> Enum.map(&Jason.encode!/1)
  end)
  |> Stream.map(fn data ->
    {:ok, _} = HTTPoison.post(endpoint, data, headers, options)
    :ok
  end)

Enum.reduce(document_stream, 0, fn :ok, counter ->
  IO.write("\rDocuments Embedded: #{counter}")
  counter + 1
end)

The first few lines are just metadata for loading wine documents and sending requests to the Elasticsearch server.

The document_stream streams lines from the wine document, parses them using Jason, and then turns the parsed text into a batched tensor using some of the convenience functions you defined early on. The embedding is computed with Axon.predict(bert, bert_params, inputs, compiler: EXLA). Notice we perform the embedding computation on batches of input tensors as it’s more efficient for inference than predicting on single tensors at once.

This code sends just the URL and document vector to the index. You can store additional information if you’d like as long as you at least store the document-vector field.

Before running your script, you need to actually create the ONNX model for import. The HuggingFace Transformers library is a Python library that implements a large number of state-of-the-art pre-trained models. They have an out-of-the-box conversion tool that converts existing models from PyTorch to ONNX. Assuming you don’t already have it installed, you’ll need to install Python and pip, and then install the transformers library:

pip install transformers

Next, you’ll want to import the bert-base-uncased model from the command line using:

python -m transformers.onnx --model=bert-base-uncased priv/models/

This will save a model.onnx file in a priv/models directory. The ONNX file contains the pre-trained BERT model.

Now you can run this script with:

$ elixir priv/embed_wine_documents.exs

It will take a few minutes to run. After it’s completed, you will have a populated index ready for search!

Implementing search in Phoenix

Now that you have a populated index, you need to handle user queries in your Phoenix application. Your application will need to:

Accept user queries from an input form
Compute embeddings from the text query
Search for similar embeddings with Elasticsearch
Return the top N similar results

Start by opening up mix.exs and adding some additional dependencies:

{:httpoison, "~> 1.8"},
{:jason, "~> 1.3"},
{:axon_onnx, "~> 0.2.0-dev", github: "elixir-nx/axon_onnx"},
{:axon, "~> 0.2.0-dev", github: "elixir-nx/axon", override: true},
{:exla, "~> 0.3.0-dev", github: "elixir-nx/nx", sparse: "exla"},
{:nx, "~> 0.3.0-dev", github: "elixir-nx/nx", sparse: "nx", override: true},
{:tokenizers, "~> 0.1.0-dev", github: "elixir-nx/tokenizers", branch: "main"},
{:rustler, ">= 0.0.0", optional: true}

You should also change floki to not only be a test dependency:

{:floki, ">= 0.30.0"},

Notice these are the exact same dependencies you used when computing the wine embeddings for each document ahead of time. You can run mix deps.get to make sure you have everything you need. Next, open up your application.ex file and add the following line to start/0:

# Load the model into memory on startup
:ok = Wine.Model.load()

The Wine.Model.load/0 function will load the model on application start-up so you can handle inference requests from the Phoenix application. Next, create a model.ex file in the lib/wine/ directory and add the following code:

defmodule Wine.Model do
  @max_sequence_length 120

  def load() do
    {model, params} =
      AxonOnnx.import("priv/models/model.onnx", batch: 1, sequence: max_sequence_length())

    {:ok, tokenizer} = Tokenizers.Tokenizer.from_pretrained("bert-base-uncased")

    {_, predict_fn} = Axon.compile(model, compiler: EXLA)

    predict_fn =
      EXLA.compile(
        fn params, inputs ->
          {_, pooled} = predict_fn.(params, inputs)
          Nx.squeeze(pooled)
        end,
        [params, inputs()]
      )

    :persistent_term.put({__MODULE__, :model}, {predict_fn, params})
    # Load the tokenizer as well
    :persistent_term.put({__MODULE__, :tokenizer}, tokenizer)

    :ok
  end

  def max_sequence_length(), do: @max_sequence_length

  defp inputs() do
    %{
      "input_ids" => Nx.template({1, 120}, {:s, 64}),
      "token_type_ids" => Nx.template({1, 120}, {:s, 64}),
      "attention_mask" => Nx.template({1, 120}, {:s, 64})
    }
  end
end

The most important function in this module is load. It’s responsible for loading the model for use in later inference requests.

You do this by first loading a model and tokenizer in the same way you did in your embedding script. Next, you compile the model into its initialization and prediction functions using Axon.compile/2. Finally, you make use of the new EXLA.compile/2 function, which compiles and caches a version of your function to avoid compilation overhead on execution.

This function only compiles one version of the model’s predict function. In a production setting, you might want to compile additional versions for various batch sizes to handle overlapping requests.

After the model is compiled, you store the predict function and the model parameters using Erlang’s :persistent_term. :persistent_term provides global storage. The reason for using :persistent_term over other mechanisms is that :persistent_term avoids copying data when accessing elements–in other words, you won’t be repeatedly copying your model’s massive parameters every time you perform an inference.

Next, open up your router.ex and add a new search endpoint:

get "/:query", PageController, :index

Now, update the index clause in your PageController to look like:

alias Tokenizers.{Tokenizer, Encoding}

def index(conn, %{"query" => query}) do
  {predict_fn, params} = get_model()

  inputs = get_inputs_from_query(query)
  embedded_vector = predict_fn.(params, inputs) |> Nx.to_flat_list()

  case get_closest_results(embedded_vector) do
    {:ok, documents} ->
      render(conn, "index.html", wine_documents: documents, query: %{"query" => query})

    _error ->
      conn
      |> put_flash(:error, "Something went wrong")
      |> render("index.html", wine_documents: [], query: %{})
  end
end

def index(conn, _params) do
  render(conn, "index.html", wine_documents: [], query: %{})
end

You haven’t implemented most of these functions yet, but the logic is relatively straightforward. When the user sends a query to the server, you access the model using a get_model function.

Next, you parse the query into encoded inputs using a get_inputs_from_query method. Using the model and encoded inputs, you compute an embedded vector representation of the inputs and pass those to a get_closest_results.

Finally, you handle error and success cases in the process of retrieving similar documents.

Now, copy the following code to implement these missing functions:

defp get_tokenizer() do
  :persistent_term.get({Wine.Model, :tokenizer})
end

defp get_model() do
  :persistent_term.get({Wine.Model, :model})
end

defp get_inputs_from_query(query) do
  tokenizer = get_tokenizer()

  {:ok, encoded_seq} = Tokenizer.encode(tokenizer, query)

  encoded_seq =
    encoded_seq
    |> Encoding.pad(Wine.Model.max_sequence_length())
    |> Encoding.truncate(Wine.Model.max_sequence_length())

  input_ids = encoded_seq |> Encoding.get_ids() |> Nx.tensor()
  token_type_ids = encoded_seq |> Encoding.get_type_ids() |> Nx.tensor()
  attention_mask = encoded_seq |> Encoding.get_attention_mask() |> Nx.tensor()

  %{
    "input_ids" => Nx.new_axis(input_ids, 0),
    "token_type_ids" => Nx.new_axis(token_type_ids, 0),
    "attention_mask" => Nx.new_axis(attention_mask, 0)
  }
end

defp get_closest_results(embedded_vector) do
  options = [ssl: [cacertfile: @cacertfile_path], recv_timeout: 60_000]

  password = System.get_env("ELASTICSEARCH_PASSWORD")
  credentials = "elastic:#{password}"

  headers = [
    Authorization: "Basic #{Base.encode64(credentials)}",
    "Content-Type": "application/json"
  ]

  query = format_query(embedded_vector)

  with {:ok, data} <- Jason.encode(query),
       {:ok, response} <- HTTPoison.post(@elasticsearch_endpoint, data, headers, options),
       {:ok, results} <- Jason.decode(response.body) do
    parse_response(results)
  else
    _error ->
      :error
  end
end

Most of these functions should be relatively straightforward.

get_inputs_from_query is almost identical to the encode_text function you used to compute document embeddings.

The get_closest_results function implements the logic of querying Elasticsearch and handling the response.

Both options and headers are similar to the options and headers you used to embed documents originally.

Next, you use a missing format_query function to convert the dense vector into a valid Elasticsearch query. Then, you encode the query, send the request to your running Elasticsearch server, and decode the response. If any of the steps fail along the way, you return an error.

Now you just need to implement the remaining helper functions. Add the following code to PageController:

defp format_query(vector) do
  %{
    "knn" => %{
      "field" => "document-vector",
      "query_vector" => vector,
      "k" => @top_k,
      "num_candidates" => @num_candidates
    },
    "_source" => ["url"]
  }
end

defp parse_response(response) do
  hits = get_in(response, ["hits", "hits"])
  case hits do
    nil ->
      :error

    hits ->
      results = Enum.map(hits, fn
        %{"_source" => result} ->
          url = result["url"]
          get_wine_preview(url)
      end)

      {:ok, results}
  end
end

defp get_wine_preview(url) do
  with {:ok, %{body: body}} <- HTTPoison.get(url),
       {:ok, page} <- Floki.parse_document(body) do
    title = page |> Floki.find(".pipName") |> Floki.text()
    %{url: url, title: title}
  else
    _error ->
      %{url: url, title: "Generic wine"}
  end
end

format_query/1 builds an Elasticsearch K-Nearest Neighbors query from a given dense vector. It takes a few parameters that control how the search works: k and num_candidates. k controls the top k number of responses to return and num_candidates controls the number of candidate solutions to search from.

The search uses heuristics to trim the final search down to num_candidates documents before rank ordering the final num_candidates documents. The higher num_candidates, the more expansive the search becomes.

parse_response/1 and get_wine_preview/1 both handle rendering useful information about returned wines to the user.

parse_response grabs the relevant information from a successful response–returning an error if it’s not present. get_wine_preview/1 uses HTTPoison and Floki to generate a preview from the URL of a given wine. You can make use of a database or Elasticsearch to track relevant information for each document as well.

Finally, you might have noticed a few constants present in some controller functions. You should declare each of them at the top of your module, like this:

@elasticsearch_endpoint "https://localhost:9200/wine/_knn_search"
@cacertfile_path "http_ca.crt"
@top_k 5
@num_candidates 100

With your controller complete, all you have left to do is adjust your index.html.heex template to handle search and search results. To do that, adjust your index.html.heex file to look like:

<section class="row">
  <form method="get" action="/">
    <input type="text" name="query" />
    <input type="submit" />
  </form>
</section>

<section class="row">
  <%= unless @wine_documents == [] do %>
    <h2>Results: </h2>
    <div class="container">
      <ul>
        <%= for wine <- @wine_documents do %>
          <li><a href={wine.url}><%= wine.title %></a></li>
        <% end %>
      </ul>
    </div>
  <% end %>
</section>

This will render a simple form and any search results that are present. That’s all you need! You’re ready to search for wines.

Searching for wines

With your application complete, you can fire up your server with:

mix phx.server

Navigate to the browser and enter a search like, “I want a Cabernet Sauvignon that’s under $25”. The first query might take a little bit of time to run, subsequent queries should be much faster. After some time you will see:

Congratulations, you’ve just implemented a Semantic Search tool in pure Elixir!

Conclusion

In this post, you learned how to create a semantic search tool for matching wines to users from natural language queries.

Notice that you didn’t need to learn how to train any complex models. Axon and AxonOnnx enable you to take advantage of the powerful pre-trained models from other ecosystems natively in an Elixir application.

You should also notice that integrating Nx, EXLA, and Axon into an existing Phoenix application is relatively painless.

I hope this gives you some ideas about how the budding Elixir machine learning ecosystem can benefit you–even if you don’t want to learn how to train models.

Until next time!