This is a quick overview of a system that I’ve been working on for a little while. At the current time it is not functional, but various pieces are coming into shape. This is a means to write down some of my thoughts.
The first part is to convey a basic overview of the whole system. There is nothing inherently necessary for this system to be gRPC specific. The basic concepts are transferable to other implementations (Thrift, REST, etc). I’ve chosen to implement this in gRPC as it is the RPC system I am most familiar with. As such, there will be terminology and specifics that assume implementation in a gRPC based universe.
If we look at a simple “request/response” service (say, Service A
), we can
generalize it as a service (implemented somehow) that takes requests of some
sort, and answers those requests with some response. In the course of computing
a response, Service A
will likely send requests to other services and consume
their responses to formulate a response. The picture will look something like
this:
If we zoom into the interaction and implementation of a single backend service, we will endeavour to have this service be provided multiple backends; for example, multiple “jobs”. These jobs are the replicated dimension of the service. In a GCP (Google Cloud Platform) option, these jobs could each be located in different Cloud Zones or Regions. At this point, our picture starts to look a little like the following:
As we can see here, Service A
could talk to any of the backend jobs that
implement Service B
. There may be reasons for a particular task to talk to a
specific backend job (availability, latency, load, etc), but from a functional
point of view, any of the jobs (B-1
, B-2
, or B-3
) are ok to send traffic
to.
The question now becomes, how does Service A
indicate what service to send
traffic to, and how does it know what jobs comprise this service. This means we
need to first be able to name a service. Having a naming convention, or
template, means that all RPC originators will talk the same language, and all
RPC API providers can also indicate (using the same language) which service(s)
they are able to provide. This concept is largely known as “Service Discovery”.
Conceptually we can think of this as something similar to the following:
As we can see, we’ve chosen a URL-like schema. Where we use svc
as the
scheme, followed by no authority (as of now), and the first part of the URL
path being the service name. Hence all service names will look like
svc:///<service-name>
. The service-name is a short, human consumable and
recognizable, name. It is not a DNS (Domain Name Service) name. Now the
question is, how do we translate this generic looking name into something that
code can use (IP addresses & ports, etc). This is where we introduce the first
Charon RPC System Service, the Discovery Service. This image should help
show what this looks like:
As we can see, our service (Service A
) will contact
svc:///discovery-service
and ask it via a GetService
API call for the
Service
information object for the requested service (here
svc:///service-b
). The discovery service will then return the information
about the requested service. Note, we will cover how the Discovery Service
knows about any specific service a bit later. (Also, how the
Discovery Service is found will be discussed later).
Now that we have information about the service we wish to send traffic to, how do we know how much traffic to send to each job that implements the service? For this we ask the Balancing Service to provide us with weights for each of the places we can send traffic. Now the picture looks more like the following:
As you can see here, our service send a GetWeights
RPC call to the
Balancing Service, which then responds with a set of weights for each of
the backends where our traffic can be sent to. There are several things of note
here.
- The Balancing Service is not part of the dataplane. It is merely part of the control plane. If it becomes unavailable, our service could continue to operate, albeit without updated load balancing weights.
- The Balancing Service could “drain” one of the backend job(s) by setting the weights of their traffic assignments to zero.
- The Discovery Service knows nothing of weights to assign, only of possibly locations for backends.
- The Balancing Service could assign weights differently based on the
portion of
Service A
that is asking for weights, possibly matching portions of our service to “closer” portions of the backend service.
Once we have assigned load proportions to backend, how does the Balancing Service know what actual load all services using the same backend service have actually used? There are multiple things that the Balancing Service will need to know, such as “Usage”, “Load”, and “Health” of the backend service to make informed load assignments. The first of these is “front end” usage reporting. These are reported as shown here:
As you can see, our service will periodically call ReportUsage
on the
Balancing Service to report the actual usage for each backend of a service
it has sent traffic to. As you can see in here, the “front end”, our service,
is sending these usage reports about the backends it is sending RPC traffic to.
However, so far we have not discussed at any length how load and health of the backends (implementations of a service) are transmitted or observed. For this we introduce another Charon RPC Service component, the Reporting Service. This service collects both load and health reports, aggregates them, and then ultimately makes this information available to the Balancing Service to modify load assignments as desired. This looks much like the following:
As we can see here, load and health reporting can come from the backend service itself, or some form of external monitoring (e.g. K8S or Linux Kernel stats). These reports complete a feedback cycle with the Balancing Service the control element. How do we configure all these parts of the Charon RPC System Services? You guessed it, we introduce yet another service, the Config Service. This service implements a CRUD-like gRPC API and will configure the rest of the Charon services. This looks a bit like the following:
So far we’ve mostly talked about “service names”, and some sort of abstract “backend service” and “jobs implementing the service”. These are all abstract items in our universe. We now need slightly more concrete manifestations of these concepts. As such, we define the following:
- Service: A collection of “Roots”
- Root: A “Coord” within our universe
- Coord: A vector, or an ordered set of dimension values
- Dimension: A named vector, ordered
Yup, clear as mud. However, I think that an example will make things much more
clear. Imagine that we have infrastructure where we are using K8S to run our
“jobs” (say replica sets). We are using multiple K8S clusters for redundancy
and reliability. Say, one in Asia, one in Europe, and a couple in the USA. For
example’s sake, let’s call these k8s-asia
, k8s-europe
, k8s-usa
, and
k8s-usb
. Within each of these K8S clusters we have a namespace and a job
name, as well as each task’s name within the respective job. We could address
(virtually) any task using a string similar to
<k8s-cluster>/<namespace-name>/<job-name>/<task-name>
. We’ve defined a
“universe” with four dimensions: k8s-cluster
, namespace-name
, job-name
,
and task-name
. We’ve also ordered each of these dimensions, where task-name
only has meaning within the job-name
that it is contained within. IE: there
could be multiple task-name = "name-a"
throughout our universe that are not
related. We also stipulate that any dimension’s value is unique. IE: there are
not two different k8s-usa/default/some-job
“things”.
The configuration of the universe is really part of how your infrastructure is defined. There are likely common options depending on where and how you are building your infrastructure. There will be different options that feel natural, depending on if you’re implementing this in AWS, Azure, GCP, on-prem, or a combination of these. There is a tradeoff in the definition of this universe. Using a large set of highly granular dimensions will significantly increase the complexity of computations required for balancing the load. While having too few will be fraught with inability to have the control one desires. As a general rule, somewhere between 3 and 5 dimensions (inclusive) should be sufficient for most use cases. A picture of this would look something like the following:
We’re getting to the end of this portion of my brain dump. The last part for this article is a small sample of what a configuration document could look like for a single service:
The next steps are in implementing a simple binary to prove the concepts out. At this point most of the base protobuf definitions are complete along with some very MVP parts of the code.