Writing a simple Redis Protocol parser in Elixir

Today, we are going to write a parser that parse Redis Protocol in Elixir.

There are tons of supported commands in Redis. Since this is our first attempt on implementing it, we will only be focusing on the GET and SET.

At the end of this post, you should be able to write a simple parser to parse request/response by Redis client/server. Here’s how this post is structured:

Redis Protocol Specification
Writing our Parser

Before getting into the implementation, let’s learn about the Redis Protocol Specification first.

If you want to get hands on, I have also written a Livebook version of it. Just click the button below to run it:

Else, this post assume you have prior knowledge of setting up Elixir project with mix as the post only includes code snippets instead of full fledge working example.

This post is inspired by Rust Tokio Mini-Redis Tutorial, where it walks through the reader to implement a mini Redis with tokio. This post is part of the series of implementing mini Redis in Elixir:

Part 1: Writing a simple Redis Protocol parser in Elixir
Part 2: Writing a mini Redis server in Elixir
Part 3: Benchmarking and writing a concurrent mini Redis in Elixir

Redis Protocol Specification

Redis Protocol Specification (RESP) is the protocol Redis client and server used to communicate with each other.

As mentioned aboved, it consists of multiple commands, and it’s not ideal to go through every of them in this post. Hence, I’ll just cover the basic we need to know for this post.

Supported Types

RESP basically support the following types. To allow us to differentiate different types, the protocol use the first byte as an identifier:

Simple Strings, where the first byte of the reply is “+”
Errors, where the first byte of the reply is “-”
Integers, where the first byte of the reply is “:”
Bulk Strings, where the first byte of the reply is “$”
Arrays, where the first byte of the reply is “*”

Most importantly, all of it will be ending with \r\n.

Types such as array and bulk strings might contain multiple lines and \r\n. In order to get the full data, we will need to parse through multiple lines.

In general, this is what you need to know about each type:

Type	Description
Simple Strings	Used by the server as it’s reply.
Errors	Used by the server to indicate errors.
Integers	Used by the server to return integer results such as `INCR`.
Bulk Strings	Used by the client/server for string inputs and results.
Arrays	Used by client to send commands to server or server to return a collections to clients

To know more, read the documentation from the Redis website.

Simple Strings vs Bulk Strings

Simple Strings is used to represent non binary safe string. It has less overhead since it doesn’t require the length of the string to be known. Normally this is used as a return result from the server. For example, many result of Redis command return +OK\r\n on success.

Bulk Strings, on the other hand, is used to represent binary safe string, up to 512MB in length. To aids with parsing the string, a length is required to be included. This mean that, we are able to parse the string correctly as the length is provided. An example of bulk string is as follow:

$6\r\nfoobar\r\n

To represent an empty string, the following is used:

$0\r\n\r\n

In the case where you need to represent nil, the following format can be used:

$-1\r\n

where a -1 length is used, which is also called Null Bulk String. This is what we need to return from the server if a GET command should return nil.

But what is binary safe and non binary safe string?

Binary safe string means that the string can contain any character, including characters like \0 that used to indicate a string is terminated in C.

Non binary safe string means the string cannot contain those character that might be used to indicate a termination.

To understand more, you can refer to the answers in this StackOverflow question.

How to send commands to server?

Now that we know the basic structure of how our reply would looks like, let’s talk about how a command is represented in RESP, especially GET and SET command.

`GET` command

From the GET command documentation, we know that the command support one argument. So to represent it as arrays, it will look something as follow:

["GET", "key"]

But, what would the raw command looks like if we were to send to the server as a client encoded it as RESP?

Let’s apply what we know about RESP. We know that we need to:

Encode the above to represent 2 data type, Array and Bulk String. Hence, it will contain multiple parts.

This is what we need to send to the server to represent a GET command:

*2\r\n$3\r\nGET\r\n$3\r\nkey\r\n

To make it easier to understand, let’s break down each parts into separate lines:

# Indicate an array with 2 elements
*2\r\n

# Indicate the first element is a Bulk String of 3 character
$3\r\n

# Actual content of the first element
GET\r\n

# Indicate the second element is a Bulk String of 3 character
$3\r\n

# Actual content of the second element
key\r\n

`SET` command

For SET command, we will be focusing on the basic one without supporting the additional options:

["SET", "key", "value"]

I’ll leave it as an exercise for you to come up with the raw reply. If you have trouble coming up with it, try to break it into multiple parts first, and then encode it with RESP.

Divide and Conquer.

Are you correct?

How do we know if we have the correct answer? Let’s pull in Elixir Redis client redix.

Mix.install([:redix])

Since redix library include the Redix.Protocol module that is in charge of parsing the protocol, we will use it to verify our answer.

["SET", "key", "value"]
|> Redix.Protocol.pack()
|> IO.iodata_to_binary()

#=> "*3\r\n$3\r\nSET\r\n$3\r\nkey\r\n$5\r\nvalue\r\n"

By now, we should able to understand how RESP works. So let’s proceed to the next stage where we write our code to encode and decode the raw input to a data structure and vice versa.

Writing our Parser

Before we started to write our own parser, let’s write some test case to help with us to verify our implementation easily.

defmodule ParserTest do
  use ExUnit.Case, async: true

  test "encode" do
    list = ["SET", "key", "value"]
    assert "*3\r\n$3\r\nSET\r\n$3\r\nkey\r\n$5\r\nvalue\r\n" == Parser.encode(list)
  end

  test "decode" do
    reply = "*2\r\n$3\r\nGET\r\n$3\r\nkey\r\n"
    assert ["GET", "key"] == Parser.decode(reply)
  end

  test "encode and decode" do
    reply = "*2\r\n$3\r\nGET\r\n$3\r\nkey\r\n"
    assert reply == reply |> Parser.decode() |> Parser.encode()

    list = ["GET", "KEY"]
    assert list == list |> Parser.encode() |> Parser.decode()
  end
end

If we run the code above, we’ll get Parser.decode/1 undefined error since we haven’t implement our module yet. So let’s write some code!

defmodule Parser do
  def encode(_commands) do
    # your impl
  end

  def decode(_string) do
    # your impl
  end
end

If you’re up to the challenge, implement the above first and utilize the existing test to verify your implementation.

Purposely left blank for those who want to implement it themselves first

…

Are you ready for my version of implmentation? Here it is.

defmodule Parser do
  def encode(commands) when is_list(commands) do
    result = "*#{length(commands)}\r\n"

    Enum.reduce(commands, result, fn command, result ->
      result <> "$#{String.length(command)}\r\n#{command}\r\n"
    end)
  end

  def decode(string) when is_binary(string) do
    %{commands: commands} =
      string
      |> String.trim()
      |> String.split("\r\n")
      |> Enum.reduce(%{}, fn reply, state ->
        case reply do
          "*" <> length ->
            state
            |> Map.put(:type, "array")
            |> Map.put(:array_length, String.to_integer(length))

          "$" <> length ->
            state
            |> Map.put(:type, "bulk_string")
            |> Map.put(:bulk_string_length, String.to_integer(length))

          value ->
            value = String.trim(value)
            Map.update(state, :commands, [value], fn list -> [value | list] end)
        end
      end)

    Enum.reverse(commands)
  end
end

Encode

Writing the encode/1 is fairly straightforward since we already know how to represent arrays and bulk string in RESP. All we need to do is to first specify the Array type and then concantenate it with all the Bulk String type we have by looping through it with Enum.reduce/2.

Decode

decode/1 can be quite tricky to write. Here, we are not utilizing the type and length information from our input.

I am kind of making a lot of assumption in the implementation as we know that we are only interested in any value that is not starting with type definition (*, $ and etc).

The implementation turn the following from:

*3\r\n$3\r\nSET\r\n$5\r\nmykey\r\n$3\r\nfoo\r\n

into:

*3
$3
SET <-- What we want
$5
mykey <-- What we want
$3
foo <-- What we want

and further convert it into:

["SET", "mykey", "foo"]

by ignoring any line that indicate the type.

It can also be implemented as a recursive function that might result in a better readability. Is this the ideal implementation? Definitely not! But this is a good start.

Wrap Up

It isn’t as hard as you think, right? Obviously, there are space of improvements of our parser. But once you understand the building blocks of how Redis Protocol works it wouldn’t be hard to write a simple one.

However, what we have done here is not perfect. There are tons of use cases and potential error cases that we didn’t handle. My implementation could be improved to utilize the length included in Array and Bulk String type.

Next up, we would be integrating this parser into our TCP server so that we can write a mini Redis server. With that, do you think our current parser implementation would still work?

Let’s find it out on the next post: Writing a mini Redis server in Elixir