posts

May 21, 2020

What you might want to know about GenServer

Estimated Reading Time: 13 minutes (2562 words)

I once gave a talk about GenServer in a local Elixir meet up in 2019. To prepare for the talk, I have done a lot of research and readings. With additional experience working with GenServer in a production environment, I have come to realize that there are a lot of caveats when using GenServer.

While GenServer is easy to use, there are actually a couple challenges when using GenServer in a production environment. So, in this post, I’ll attempt to write down my findings about GenServer.

The post will be break down into following sections:

Disclaimer: I am no expert in Elixir, Erlang and GenServer. What I wrote, might be wrong too. However, I tried my best to cross check multiple sources on what I wrote to ensure the correctness. I have attached the relevant links I referred to while writing this article for further references. Some of the points are purely my opinion based on my limited knowledge and experience. Do take it as a grain of salt.

Quick Introduction to GenServer

What is GenServer? For someone new to Elixir, GenServer usually came up to their mind when they need to implement a server process or stateful process.

However, diving a bit deeper, GenServer is actually an OTP behaviour that implement a client-server relation.

But, what is an OTP behaviour? Behaviour is basically common pattern that abstract generic and specific logic into different modules, such as behaviour module and callback module. All of these behaviours are derived from years of battle-tested production system. The OTP behaviours also take care of some of the edge cases for you.

When we use GenServer, we are using GenServer behaviour and implementing our own callback modules.

# Our own callback module
def MyServer do
  use GenServer

  def init(_) do
    {:ok, []}
  end

  def handle_call(:ping, _from, state) do
    {:reply, :pong, state}
  end
end

# Using GenServer generic behaviour to interact with our own callback modules
{:ok, pid} = GenServer.start_link(MyServer, [])
GenServer.call(pid, :ping)
#=> :pong

When using GenServer, we don’t need to know the internal implementation of GenServer. All we need to is to implement the behaviour callbacks. The callbacks act as the interface between GenServer and our module. This decouple the logic of managing process (what GenServer do) from our business logic (what our callback modules do).

Why not implementing our own GenServer behavior?

If GenServer is just a pattern that decouple the logic, can’t we write our own one and use it?

You can, but GenServer do a lot more than just decoupling these logic. GenServer take care of some of the cases that we are not aware of in a concurrent system. A few notable one are:

The edge cases handled by GenServer are worth writing another separate posts. If you would like to know more, there is this article by DockYard about how GenServer handle some of the concurrent conditions and edge cases. More details is also layout at the end of the Chapter 3 of the book “Designing for Scalability with Erlang/OTP.

I only learn about how OTP behaviour is designed to extract common and business logic behaviour and the edge cases in implementing your own GenServer after reading the book "Designing for Scalability with Erlang/OTP".

When you should and shouldn’t use GenServer?

Coming from a Ruby/Rails background, when I first know about GenServer, I have no idea on how I can use that in my application, especially in a web application.

It’s a cool new amazing concept for me,

but how can I utilize it?

That often came up to my mind when I first starting to learn Elixir.

Before we talk about when it’s a good time to use GenServer, let’s focus on when we shouldn’t use GenServer.

On a side note, There is also an article "To spawn, or not to spawn?" that talk about when you should spawn a process. The lessons from the article still applied on GenServer (since inherently, GenServer is just a process).

When you shouldn’t use GenServer

If you have read through the Elixir documentation of GenServer, you might come across this:

A GenServer, or a process in general, must be used to model runtime characteristics of your system. A GenServer must never be used for code organization purposes.

As mentioned above, our GenServer is just another stateful process. It’s a pattern to write stateful process. Hence, it should never be use for code organization. Use module for that instead.

While it is possible to do the following with GenServer too, it’s not really recommended as there are better alternatives:

1. Use it to execute simple asynchronous task/job

For this, using Task module is recommended instead. Depending on your requirement, rolling out your own GenServer just to execute asynchronous job can be too much.

Implementing a GenServer with Task.Supervisor is reasonable when you need more control over the task execution such as error handling, monitoring and job retry.

However do note that GenServer is a single process and will inherently become your bottleneck when the load increase.

On a side note, there is this article from DockYard where the author demonstrated on how we can implement job retry with GenServer and Task.Supervisor.

2. Storing state.

Start by using Agent. If it provides what you need, that’s great. A rule of thumb is to reach for tool that have a higher level abstraction. Only reach out to GenServer when you need extra customization.

If Agent doesn’t fit your requirement, then look into the combination of GenServer and ETS instead. Avoid writing your own GenServer to track key value pair, or other kind of state, unless it is a short term state. E.g. state of a game match.

Also, try to avoid storing global state with GenServer unless you are 100% sure that you won’t run the application on multiple nodes. When you start running GenServer process that store state in multiple nodes, things get really tricky. Chris Keathley wrote about the reasons really well in his article “The dangers of the Single Global Process”.

Well again, it really depends on your system requirements and you’ll have to make the design decision.

When you should use GenServer?

It’s a bit irony isn’t it. We have just go through a few use cases of GenServer in the previous section.

But that’s the reality of design decision. There is no silver bullet to every problems, it really depends on the context. The same apply for when and when you shouldn’t use GenServer.

So here are a list of scenario where it make sense to bring it GenServer:

1. To send periodic message, or to schedule tasks

When you need to send periodic message, using GenServer make sense as it allow you to utilize Process.send_after to send periodic message or schedule one off tasks.

Depending on your needs, consider using existing libarry like periodic by Saša Jurić (the author of Elixir in Action) instead of rolling out your own solution. (You can also read this article on the design behind the library)

If you need a more full-fledged solution for scheduling jobs, consider quantum that allow the use of cron like syntax to schedule jobs.

2. To gain more control over task execution of Task module.

As mentioned in previous section, bring in GenServer with Task.Supervisor when you need more control over task execution. For example, to ensure that a task is really executed and retry if there is failures (E.g. network failures where retry make sense).

3. To use ets as store.

ets is a good built-in in memory storage for BEAM. No doubt, there will be times when you’ll need this for your production application. Starting ets table in a GenServer is definitely the way to go.

This is because ets table is owned by the process create it. If the process is terminated, the ets table is also deleted. However, do avoid wrapping ets call with GenServer callbacks as follows:

defmodule MyETS do
  use GenServer
  @table_name :table

  def init(_) do
    :ets.new(@table_name, [:set, :named_table, :public, read_concurrency: true])
    {:ok, []}
  end

  # AVOID THIS
  # Wrapping look up in a GenServer.call
  def lookup(key) do
    GenServer.call(__MODULE__, {:lookup, key})
  end

  # DO THIS instead
  # Call :ets.lookup directly
  def lookup_v2(key) do
    case :ets.lookup(@table_name, key) do
      [{^key, value}] -> {:ok, value}
      _ -> {:error, :not_found}
    end
  end

  def handle_call({:lookup, key}, _from, _state) do
    case :ets.lookup(@table_name, key) do
      [{^key, value}] -> {:ok, value}
      _ -> {:error, :not_found}
    end
  end
end

This is because if we are wrapping :ets.lookup in the GenServer.call, we are losing the performance gained from using ets and limiting our usage of ets, like reading and writing concurrently with ets.

The GenServer.call will become the bottleneck as every lookup is going through the single GenServer process. Avoid that, unless you are doing this intentionally to act as a back pressure mechanism.

Limitations of GenServer

As mentioned, GenServer is just a process. Every process in BEAM has one mailbox, where the messages are processed synchronously. This is the reason why it can become the bottleneck of your system when the load increased.

As GenServer messages in the mailbox increased, it will start performing even slower due to the internal mechanism on how it process the messages. As your process mailbox get larger, the process will need to go through all the messages in the mailbox to match the message in the receive pattern again.

Here is how the Erlang documentation describe the mechanism of Process processing the messages:

Each process has its own input queue for messages it receives. New messages received are put at the end of the queue. When a process executes a receive, the first message in the queue is matched against the first pattern in the receive. If this matches, the message is removed from the queue and the actions corresponding to the pattern are executed.

However, if the first pattern does not match, the second pattern is tested. If this matches, the message is removed from the queue and the actions corresponding to the second pattern are executed. If the second pattern does not match, the third is tried and so on until there are no more patterns to test.

Using GenServer is fine until your load increases and it become the bottleneck. People commonly use ets or having a pool of GenServer processes to cope with the high load.

But, how do you know your GenServer process have too many messages in their mailbox? A quick way to check the messages length in your process mailbox is to use Process.info(genserver_process, :message_queue_len), which return the total number messages in the process mailbox.

If you would like to know more about it, here are some of the resources where I refer to and that are related:

Do and Don’t of GenServer

Here are some of the do and don’ts when you use GenServer:

1. Do have a separate supervisor for your GenServer process, instead of using the root supervisor.

Ideally, it’s always better to have different Supervisor for your GenServer process, instead of using the root application Supervisor. This allow us to avoid edge scenario where repeating failures of your GenServer process bring down your whole application.

The idea behind is to always design your supervision tree and think about how you need your system to behave when things go wrong.

According to Erlang documentation, OTP design principles define how we structure code in terms of processes, modules and directories, and supervision trees is introduced to help us model our processes based on the idea of workers and supervisors.

I guess, the takeaway is: think about the supervision tree of your GenServer whenever you use GenServer.

I recently came across this article on why process should be supervised, which I think is relevant to this point too.

2. Do add a catch all for your custom handle_info callback.

When we use GenServer, Elixir actually include a default catch all handle_info implementation (from the source code here). However, when you start overwriting by defining your own callback:

def handle_info(...) do
  ...
end

The default callback is then overridden.

If you don’t want unmatch message to raise error in your GenServer, don’t forget to implement a catch all handle_info.

3. Do understand when to use cast and when to use call.

As a newcomer to Elixir, the only difference I know about cast and call is:

But when I dive deeper, I found that calling cast on a GenServer process that doesn’t exists still return you :ok. With cast, there is no guarantee that it is executed. (Well it’s actually written clearly in the docs but I never read it in detail…)

There is also this Elixir forum threads which discuss about why we should use cast sparingly according to the documentation. Some people recommended to always use call and avoid cast even you don’t need the reply, so that it act as a back pressure and prevent overloading from the clients (and also ensure it’s really been processed).

Again, it really depends the nature of your system. But, do keep in mind of the trade offs of the decision. And, when in doubt, use call (Not the inventor of this quote, I probably read it somewhere else in the internet).

4. Don’t use atom for dynamically allocated name for GenServer name registration.

This is also mentioned clearly in Elixir GenServer documentation:

If there is an interest to register dynamic names locally, do not use atoms, as atoms are never garbage-collected and therefore dynamically generated atoms won’t be garbage-collected.

As mentioned, atoms are never garbage-collected. So, you could end up crashing your BEAM VM if your code happens to create too much dynamic naming GenServer.

The documentation suggested to setup our own local registry with Registry module. I have not much experience on this so I’ll probably stop here.

Wrap Up

Before I wrap up, There are a couple of well known Elixir library that is build on top of GenServer. To named a few:

All these libraries are build on top of plain OTP behaviour like GenServer and Supervisor, which then allow more specific use case. The authors take care of the generic behavior and allow us to implement application specific logic and code.

I also tried to look up a few real world use case of plain GenServer behaviour and here are what I found over the internet:

Hopefully, this post covers all the things you need to know about GenServer before using it in production. Again, I am no expert in this area and I am just presenting my findings (which could be wrong).

Different systems have different requirements. It is important to understand why one have a different approach in their context before following blindly. The same applies to some of the opinion here. I might say don’t do this and that, but you probably know your system better than me to make a better decision.