Traditional Machine Learning with Scholar

Introduction

Most of my posts have been focused on Axon, Bumblebee, and deep learning in the Elixir ecosystem.

A lot of early work in the Nx ecosystem centered around deep learning. While deep learning has resulted in incredible progress in many fields, there are a lot of cases where deep learning either doesn’t cut it, or just ends up being overkill for the task at hand.

In particular, machine learning for forecasting and time-series analysis, as well as machine learning on tabular data (which comprises a significant amount of business intelligence applications) are two domains where deep learning has not yet superseded traditional approaches. Particularly for tabular or structured data, approaches based on gradient boosting often outperform significantly larger deep learning models.

Aside from raw metrics, there are many other considerations when choosing a model to deploy in production. Even though a large transformer model might outperform a simple regression model, the regression model can run significantly faster on significantly cheaper hardware. Some traditional models also have an upper hand in terms of interpretability, which can make it easier to sell to business leaders when determining how to safely deploy a model to production.

There are also arguments for why using deep learning over more traditional approaches actually leads to a simpler technical solution and, in the long term, reduced costs (technical and otherwise). Despite this, I would argue that you should choose the simplest tool (deep learning or not) to ship and move on from there.

In this post, I will walk you through the basics of traditional machine learning in Elixir with Scholar. This should give you a better idea of the other tools available to you in the ecosystem, as well as some areas we’re still lacking (and could use some help!)

What and Why is Scholar?

Scholar is a set of machine learning tools built on top of Nx. All of the library functions are implemented as numerical definitions, which means they can be JIT-compiled to both CPUs and GPUs. It also means you can use Scholar interchangeably with the other libraries in the Nx ecosystem.

Scholar is meant to be a scikit-learn-like library. This means it offers implementations of non-deep-learning models like linear and logistic regression, k-nearest neighbors, k-means, and more. It also offers a plethora of machine learning utilities such as loss functions, distance functions, and more. That means Scholar might be useful to you, even if you’re working with Axon or Bumblebee.

A lot of Scholars functionality, specifically some of the unsupervised bits, can also be useful in conjunction with Explorer and VegaLite for conducting exploratory data analysis.

As of this writing, Scholar is still in pre-0.1 release, which means you might find some rough edges. There’s also still a lot of work left to fill the gap between Scholar and scikit-learn. If you’re interested in helping bring the Elixir ecosystem further along, contributions are welcome!

Linear Regression

Perhaps the simplest machine learning model, and likely the first one anybody is introduced to is a linear regression model. Linear regression tries to fit a function y = mx + b (where m and x can be arbitrarily high dimensional). Linear regression is great because:

It’s simple
It’s fast
It’s interpretable
A lot of things can be modeled with a line

While a lot of relationships are not necessarily linear, you can go a really long way with a simple linear regression model. A (made up) rule of thumb is that 80% of machine learning problems can be solved reasonably well with a linear regression model.

With Scholar, you can fit a linear regression model in a few lines of code. Start by creating some synthetic data:

m = :rand.uniform() * 10
b = :random.uniform() * 10

key = Nx.Random.key(42)
size = 100
{x, new_key} = Nx.Random.normal(key, 0.0, 1.0, shape: {size, 1})
{noise_x, new_key} = Nx.Random.normal(new_key, 0.0, 1.0, shape: {size, 1})
{noise_b, _} = Nx.Random.normal(new_key, 0.0, 1.0, shape: {size, 1})

y =
  m
  |> Nx.multiply(Nx.add(x, noise_x))
  |> Nx.add(b)
  |> Nx.add(noise_b)

:ok

:ok

This code block creates some synthetic data with a linear relationship. Rather than produce a perfectly straight line, we add some noise to ensure the relationship is not perfectly linear. This simulates what we’d see in real life a little more. You can visualize this data with VegaLite:

alias VegaLite, as: Vl

Vl.new(title: "Scatterplot Distribution", width: 720, height: 480)
|> Vl.data_from_values(%{
  x: Nx.to_flat_list(x),
  y: Nx.to_flat_list(y)
})
|> Vl.mark(:point)
|> Vl.encode_field(:x, "x", type: :quantitative)
|> Vl.encode_field(:y, "y", type: :quantitative)

null

Notice how the relationship contained in this scatter plot is relatively linear, but not perfectly so. This is much closer to what real-world data looks like.

The goal of linear regression is to find a line that captures the relationship present in the scatter plot above. You want to find the line that minimizes the total distance among all points in the training set. In practice, that means the line you draw might end up not intersecting any of the points in your dataset. The idea is that the curve fit in a linear regression model is a good average of generalization of the true relationship present.

With Scholar, you can implement linear regression in a single line of code:

model = Scholar.Linear.LinearRegression.fit(x, y)

%Scholar.Linear.LinearRegression{
  coefficients: #Nx.Tensor<
    f32[1][1]
    [
      [2.406812906265259]
    ]
  >,
  intercept: #Nx.Tensor<
    f32[1]
    [4.426097869873047]
  >
}

After running, you’ll see the output, which is a %LinearRegression{} struct containing the parameters of your model. In this case, the coefficients correspond to m and the intercept correspond to b. You can inspect both m and b to see how close the model was to correct:

IO.inspect(m)
IO.inspect(b)
:ok

2.181240456860072
4.435846174457203

:ok

Overall, not bad! You can truly see how well your model fits by visualizing it overlayed with your original data. First, you need to generate some predictions over the entire distribution. Notice that the graphic covers x values from -3.0 to 3.0, so you can generate 100 points in that range and predict their values using your model:

pred_xs = Nx.linspace(-3.0, 3.0, n: 100) |> Nx.new_axis(-1)
pred_ys = Scholar.Linear.LinearRegression.predict(model, pred_xs)
:ok

:ok

Next, you can use VegaLite to overlay the predicted plots and the actual distribution on top of one another:

Vl.new(title: "Scatterplot Distribution and Fit Curve", width: 720, height: 480)
|> Vl.data_from_values(%{
  x: Nx.to_flat_list(x),
  y: Nx.to_flat_list(y),
  pred_x: Nx.to_flat_list(pred_xs),
  pred_y: Nx.to_flat_list(pred_ys)
})
|> Vl.layers([
  Vl.new()
  |> Vl.mark(:point)
  |> Vl.encode_field(:x, "x", type: :quantitative)
  |> Vl.encode_field(:y, "y", type: :quantitative),
  Vl.new()
  |> Vl.mark(:line)
  |> Vl.encode_field(:x, "pred_x", type: :quantitative)
  |> Vl.encode_field(:y, "pred_y", type: :quantitative)
])

null

Not bad!

Beyond Linear Regression

Scholar supports a number of other machine learning algorithms including naive bayes, k-nearest neighbors, and logistic regression. In addition to machine learning algorithms, Scholar supports interpolation routines, principle component analysis (PCA), and distance functions. Scholar is pretty general purpose, but most of the APIs follow the same pattern as what you saw with the linear regression model.

For example, if you wanted to create a binary-classifier using a logistic regression model, you’d use essentially the same two lines of code from the previous section to fit the model and make predictions. For a simple example, you can create synthetic data by binarizing your original data:

binarized_y = Nx.greater(y, 5) |> Nx.squeeze()
:ok

:ok

This will convert all of your target data to be between 0 and 1. A logistic regression model is very similar to a linear regression model; however, it applies a logistic function on the output to squeeze the output between 0 and 1. This can be viewed as a class prediction. You can fit this model in the same way you fit your linear regression model:

model = Scholar.Linear.LogisticRegression.fit(x, binarized_y, num_classes: 2)

%Scholar.Linear.LogisticRegression{
  coefficients: #Nx.Tensor<
    f32[1]
    [1.0982909202575684]
  >,
  bias: #Nx.Tensor<
    f32
    -0.3180295526981354
  >,
  mode: :binary
}

Your coefficients are a bit different than your original model. You can check your work by computing your accuracy on the original dataset:

pred_y = Scholar.Linear.LogisticRegression.predict(model, x)
Scholar.Metrics.accuracy(binarized_y, pred_y)

#Nx.Tensor<
  f32
  0.7300000190734863
>

So we get around a 73% accuracy. Not bad! Let’s try another type of model:

model = Scholar.NaiveBayes.Gaussian.fit(x, binarized_y, num_classes: 2)
pred_y = Scholar.NaiveBayes.Gaussian.predict(model, x)
Scholar.Metrics.accuracy(binarized_y, pred_y)

#Nx.Tensor<
  f32
  0.7300000190734863
>

And even another:

model = Scholar.Neighbors.KNearestNeighbors.fit(x, binarized_y, num_classes: 2)
pred_y = Scholar.Neighbors.KNearestNeighbors.predict(model, x)
Scholar.Metrics.accuracy(binarized_y, pred_y)

#Nx.Tensor<
  f32
  0.7599999904632568
>

You should see how easy it is to quickly interchange model types. You don’t really need to have a deep understanding of machine learning to get started. You just need to know how to write some Elixir!

Conclusion

I hope this served as a short introduction to the Scholar library. There’s still a lot more to explore. As I mentioned before, contributions to Scholar are welcome. It’s a great way to get involved in the ecosystem. Implementing a function for Scholar is also a great way to get introduced to Nx itself!

Finally, I have to give a shout out to Mateusz Sluszniak who has done a lot of work on Scholar thus far.

Until next time!

Ready to take your product to the next level with machine learning and Elixir? We’re ready to help. Get in touch today to find out how we can put the latest tech to use for you.

Introduction

What and Why is Scholar?

Linear Regression

Beyond Linear Regression

Conclusion

Newsletter

Stay in the Know