The Charon RPC System

This is a quick overview of a system that I’ve been working on for a little while. At the current time it is not functional, but various pieces are coming into shape. This is a means to write down some of my thoughts.

The first part is to convey a basic overview of the whole system. There is nothing inherently necessary for this system to be gRPC specific. The basic concepts are transferable to other implementations (Thrift, REST, etc). I’ve chosen to implement this in gRPC as it is the RPC system I am most familiar with. As such, there will be terminology and specifics that assume implementation in a gRPC based universe.

If we look at a simple “request/response” service (say, Service A), we can generalize it as a service (implemented somehow) that takes requests of some sort, and answers those requests with some response. In the course of computing a response, Service A will likely send requests to other services and consume their responses to formulate a response. The picture will look something like this:

Overview of Charon RPC System

If we zoom into the interaction and implementation of a single backend service, we will endeavour to have this service be provided multiple backends; for example, multiple “jobs”. These jobs are the replicated dimension of the service. In a GCP (Google Cloud Platform) option, these jobs could each be located in different Cloud Zones or Regions. At this point, our picture starts to look a little like the following:

Multiple Jobs implementing a Service

As we can see here, Service A could talk to any of the backend jobs that implement Service B. There may be reasons for a particular task to talk to a specific backend job (availability, latency, load, etc), but from a functional point of view, any of the jobs (B-1, B-2, or B-3) are ok to send traffic to.

The question now becomes, how does Service A indicate what service to send traffic to, and how does it know what jobs comprise this service. This means we need to first be able to name a service. Having a naming convention, or template, means that all RPC originators will talk the same language, and all RPC API providers can also indicate (using the same language) which service(s) they are able to provide. This concept is largely known as “Service Discovery”. Conceptually we can think of this as something similar to the following:

Charon Service Naming

As we can see, we’ve chosen a URL-like schema. Where we use svc as the scheme, followed by no authority (as of now), and the first part of the URL path being the service name. Hence all service names will look like svc:///<service-name>. The service-name is a short, human consumable and recognizable, name. It is not a DNS (Domain Name Service) name. Now the question is, how do we translate this generic looking name into something that code can use (IP addresses & ports, etc). This is where we introduce the first Charon RPC System Service, the Discovery Service. This image should help show what this looks like:

Charon Service Discovery

As we can see, our service (Service A) will contact svc:///discovery-service and ask it via a GetService API call for the Service information object for the requested service (here svc:///service-b). The discovery service will then return the information about the requested service. Note, we will cover how the Discovery Service knows about any specific service a bit later. (Also, how the Discovery Service is found will be discussed later).

Now that we have information about the service we wish to send traffic to, how do we know how much traffic to send to each job that implements the service? For this we ask the Balancing Service to provide us with weights for each of the places we can send traffic. Now the picture looks more like the following:

Charon Load Balancing

As you can see here, our service send a GetWeights RPC call to the Balancing Service, which then responds with a set of weights for each of the backends where our traffic can be sent to. There are several things of note here.

The Balancing Service is not part of the dataplane. It is merely part of the control plane. If it becomes unavailable, our service could continue to operate, albeit without updated load balancing weights.
The Balancing Service could “drain” one of the backend job(s) by setting the weights of their traffic assignments to zero.
The Discovery Service knows nothing of weights to assign, only of possibly locations for backends.
The Balancing Service could assign weights differently based on the portion of Service A that is asking for weights, possibly matching portions of our service to “closer” portions of the backend service.

Once we have assigned load proportions to backend, how does the Balancing Service know what actual load all services using the same backend service have actually used? There are multiple things that the Balancing Service will need to know, such as “Usage”, “Load”, and “Health” of the backend service to make informed load assignments. The first of these is “front end” usage reporting. These are reported as shown here:

Charon Usage Reporting

As you can see, our service will periodically call ReportUsage on the Balancing Service to report the actual usage for each backend of a service it has sent traffic to. As you can see in here, the “front end”, our service, is sending these usage reports about the backends it is sending RPC traffic to.

However, so far we have not discussed at any length how load and health of the backends (implementations of a service) are transmitted or observed. For this we introduce another Charon RPC Service component, the Reporting Service. This service collects both load and health reports, aggregates them, and then ultimately makes this information available to the Balancing Service to modify load assignments as desired. This looks much like the following:

Charon Load/Health Reporting

As we can see here, load and health reporting can come from the backend service itself, or some form of external monitoring (e.g. K8S or Linux Kernel stats). These reports complete a feedback cycle with the Balancing Service the control element. How do we configure all these parts of the Charon RPC System Services? You guessed it, we introduce yet another service, the Config Service. This service implements a CRUD-like gRPC API and will configure the rest of the Charon services. This looks a bit like the following:

Charon Config Service

So far we’ve mostly talked about “service names”, and some sort of abstract “backend service” and “jobs implementing the service”. These are all abstract items in our universe. We now need slightly more concrete manifestations of these concepts. As such, we define the following:

Service: A collection of “Roots”
Root: A “Coord” within our universe
Coord: A vector, or an ordered set of dimension values
Dimension: A named vector, ordered

Yup, clear as mud. However, I think that an example will make things much more clear. Imagine that we have infrastructure where we are using K8S to run our “jobs” (say replica sets). We are using multiple K8S clusters for redundancy and reliability. Say, one in Asia, one in Europe, and a couple in the USA. For example’s sake, let’s call these k8s-asia, k8s-europe, k8s-usa, and k8s-usb. Within each of these K8S clusters we have a namespace and a job name, as well as each task’s name within the respective job. We could address (virtually) any task using a string similar to <k8s-cluster>/<namespace-name>/<job-name>/<task-name>. We’ve defined a “universe” with four dimensions: k8s-cluster, namespace-name, job-name, and task-name. We’ve also ordered each of these dimensions, where task-name only has meaning within the job-name that it is contained within. IE: there could be multiple task-name = "name-a" throughout our universe that are not related. We also stipulate that any dimension’s value is unique. IE: there are not two different k8s-usa/default/some-job “things”.

The configuration of the universe is really part of how your infrastructure is defined. There are likely common options depending on where and how you are building your infrastructure. There will be different options that feel natural, depending on if you’re implementing this in AWS, Azure, GCP, on-prem, or a combination of these. There is a tradeoff in the definition of this universe. Using a large set of highly granular dimensions will significantly increase the complexity of computations required for balancing the load. While having too few will be fraught with inability to have the control one desires. As a general rule, somewhere between 3 and 5 dimensions (inclusive) should be sufficient for most use cases. A picture of this would look something like the following:

Charon Coords

We’re getting to the end of this portion of my brain dump. The last part for this article is a small sample of what a configuration document could look like for a single service:

Charon Service Config

The next steps are in implementing a simple binary to prove the concepts out. At this point most of the base protobuf definitions are complete along with some very MVP parts of the code.