Envoy as an Inter DC Traffic Manager (Part 1) — Mocking multi DC traffic flow in a local K8s cluster

Envoy is a very powerful and versatile service proxy. There is a plethora of use cases for it. Ranging from scenarios where you might want to just use it as a simple load balancer for your horizontally scaled deployment, to using it as a sidecar for a very feature rich service mesh. This multipart blog post attempts to talk about one such scenario — using Envoy as a traffic flow controller, managing the flow of requests between two different data centers (DCs).

Why Envoy?

I have been lucky enough to put my knowledge on Envoy to good use when the team I was working for decided to move a part of their cluster from EKS to GKE. This decision was based on multiple reasons, the most important one being the proximity of the DC to the actual place of business. The team wanted a solution which was capable of redirecting the traffic flowing to their existing cluster, to a newly replicated one in GKE, in a very controlled way.

We had explored multiple approaches before zeroing in on leveraging Envoy. Writing a custom solution, tailor-made for all our use cases was one of the approaches. A custom solution was discarded due to limitations on time, increased surface area for bugs, and most importantly with the presence of Envoy, it felt like reinventing the wheel. We also had few other service proxies as contenders. But Envoy was a clear winner for us owing to its rich feature set and its popularity within and without our organization.

Most of our org was already familiar with Envoy due to its heavy usage across deployments. We had initially started using it because it was ahead of its time with support for HTTP2/gRPC. It also supports dynamic configuration using a control plane. We made heavy use of both these features. We already had pipelines, charts, dashboards and control planes setup for it. This made it easier for us to adapt it for our new scenario of inter DC traffic splitting.

Use Case

We had the following use cases for our inter DC traffic manager:

2. Conditional TLS termination and origination

3. Support for HTTP2/gRPC

4. Observability — Metrics and Logs

5. Scalability

Mock Environment Design

There was an obvious need for mocking the two separate data centers. Deploying and testing on even a staging environment would be painfully time consuming since we wanted to play with both Envoy’s bootstrap config as well as the Helm chart with which we deploy it.

To mock the DCs, we first need to mock at least one REST API (running on HTTP1.1) and one gRPC app (using HTTP2) to ensure compatibility between versions of HTTP in our traffic manager Envoy. The plan is to run two sets of deployments for each of these apps in a local K8s cluster (Docker desktop or Minikube). We would name the apps with a DC name suffix (ex: rest-dc-1) and configure them to provide logs mentioning their respective DCs.

Mock Environment

Each app will have a K8s service (denoted as SVC). Each set of apps (1 REST + 1 gRPC) will be fronted by a simple Envoy deployment acting as single point of access to that set of services(our mock DC). These Envoys will mimic the Ingress gateways of the K8s cluster in each mock DC.

Finally each of the simple Envoy deployments will have their own services whose ports will be exposed outside the cluster so that our main “DC Traffic Manager Envoy” can access them. We will be using envoy version v1.17.0 .

In reality, our main Envoy deployment can be done anywhere, within the source cluster or within one of the target clusters or even outside them. In our mock setup, we are placing it outside the clusters. In our actual deployment, we placed it within the source cluster, which also happened to be one of the target clusters, thereby reducing latency and avoiding the need for securing intra-cluster traffic to one DC.

Setting Up the Mock Environment

I have created a dummy REST and gRPC service and stored them as public docker images in my docker hub registry. I have also created K8s resources that you can directly apply to your local cluster to spin up the entire mock environment as per the above design.

All the required resources for running and testing the mock environment are available in this Github Repo.

All you have to do is:

After all the images are pulled and the deployments are ready, you should see the K8s artifacts created similar to the output shown below:

To test the environment setup, use the following commands

Make sure to run the above commands from within the cloned repo’s directory or update the ./proto/hello path appropriately while testing the gRPC services.

I have added references to the code and tools used for this environment in the above repository’s readme file.

Bootstrapping the Traffic Manager Envoy

Now that we have the environment setup out of the way, we can proceed to starting up the main Envoy instance to be used as the traffic manager for the mocked DCs (here on referred to as the ‘Envoy’). To begin with, we will configure it to forward all requests to the REST API to go to DC1, and all requests to the gRPC app to go to DC2. Once you have installed envoy on your local, and have setup its path, you can use the following command to start it up as the traffic manager with the config sourced from this blog’s repository directory.

Let’s take a look at the route config and the cluster config for the Envoy:

Route config
Cluster config

This a very simple configuration. We have added two upstream clusters to the envoy config, one for each exposed port of the mocked DCs. These ports are the listener ports for the access point envoys of the individual mock DCs. So both of them are capable of serving both the REST and the gRPC requests. But since we forward only REST requests to DC1 and gRPC requests to DC2, the response of both the request on our main Envoy would be as shown below.

The Envoy logs would show something similar to the following:

Envoy log

The purpose of using such a config is just to demonstrate the basic usage of our Envoy in its role as an ‘inter DC traffic manager’ for the mock environment we have created. We will use this as the foundation to add more capabilities that satisfy the complete set of use cases mentioned at the start of this blog. Right now we have accomplished:

As you may have noticed, individual access log statements of Envoy are in json format and are not printed in the same key order. But they all have the same set of keys. This is because we have chosen to use json_format for our access logs which helps in aggregating and filtering logs when coupled with a log aggregation service like Kibana.

In the next part of this blog, we will proceed to solve the rest of our use cases by leveraging the same mock environment setup. I will add a link here once the next part is published. Thanks for reading!

Link to Part 2 as promised.

Senior Consultant | Trekking & Cycling enthusiast