Envoy as an Inter DC Traffic Manager (Part 1) — Mocking multi DC traffic flow in a local K8s cluster

Envoy is a very powerful and versatile service proxy. There is a plethora of use cases for it. Ranging from scenarios where you might want to just use it as a simple load balancer for your horizontally scaled deployment, to using it as a sidecar for a very feature rich service mesh. This multipart blog post attempts to talk about one such scenario — using Envoy as a traffic flow controller, managing the flow of requests between two different data centers (DCs).

Why Envoy?

We had explored multiple approaches before zeroing in on leveraging Envoy. Writing a custom solution, tailor-made for all our use cases was one of the approaches. A custom solution was discarded due to limitations on time, increased surface area for bugs, and most importantly with the presence of Envoy, it felt like reinventing the wheel. We also had few other service proxies as contenders. But Envoy was a clear winner for us owing to its rich feature set and its popularity within and without our organization.

Most of our org was already familiar with Envoy due to its heavy usage across deployments. We had initially started using it because it was ahead of its time with support for HTTP2/gRPC. It also supports dynamic configuration using a control plane. We made heavy use of both these features. We already had pipelines, charts, dashboards and control planes setup for it. This made it easier for us to adapt it for our new scenario of inter DC traffic splitting.

Use Case

  1. Routing requests based on:
  • Upstream service
  • Individual API path|route
  • Percentage of traffic

2. Conditional TLS termination and origination

3. Support for HTTP2/gRPC

4. Observability — Metrics and Logs

5. Scalability

Mock Environment Design

To mock the DCs, we first need to mock at least one REST API (running on HTTP1.1) and one gRPC app (using HTTP2) to ensure compatibility between versions of HTTP in our traffic manager Envoy. The plan is to run two sets of deployments for each of these apps in a local K8s cluster (Docker desktop or Minikube). We would name the apps with a DC name suffix (ex: rest-dc-1) and configure them to provide logs mentioning their respective DCs.

Mock Environment

Each app will have a K8s service (denoted as SVC). Each set of apps (1 REST + 1 gRPC) will be fronted by a simple Envoy deployment acting as single point of access to that set of services(our mock DC). These Envoys will mimic the Ingress gateways of the K8s cluster in each mock DC.

Finally each of the simple Envoy deployments will have their own services whose ports will be exposed outside the cluster so that our main “DC Traffic Manager Envoy” can access them. We will be using envoy version v1.17.0 .

In reality, our main Envoy deployment can be done anywhere, within the source cluster or within one of the target clusters or even outside them. In our mock setup, we are placing it outside the clusters. In our actual deployment, we placed it within the source cluster, which also happened to be one of the target clusters, thereby reducing latency and avoiding the need for securing intra-cluster traffic to one DC.

Setting Up the Mock Environment

All the required resources for running and testing the mock environment are available in this Github Repo.

All you have to do is:

  1. Install a local K8s cluster (Minikube or Docker Desktop) and kubectl
  2. Clone this repository
  3. Change path to the repository and run kubectl apply -f ./k8s as below

After all the images are pulled and the deployments are ready, you should see the K8s artifacts created similar to the output shown below:

To test the environment setup, use the following commands

Make sure to run the above commands from within the cloned repo’s directory or update the ./proto/hello path appropriately while testing the gRPC services.

I have added references to the code and tools used for this environment in the above repository’s readme file.

Bootstrapping the Traffic Manager Envoy

Let’s take a look at the route config and the cluster config for the Envoy:

Route config
Cluster config

This a very simple configuration. We have added two upstream clusters to the envoy config, one for each exposed port of the mocked DCs. These ports are the listener ports for the access point envoys of the individual mock DCs. So both of them are capable of serving both the REST and the gRPC requests. But since we forward only REST requests to DC1 and gRPC requests to DC2, the response of both the request on our main Envoy would be as shown below.

The Envoy logs would show something similar to the following:

Envoy log

The purpose of using such a config is just to demonstrate the basic usage of our Envoy in its role as an ‘inter DC traffic manager’ for the mock environment we have created. We will use this as the foundation to add more capabilities that satisfy the complete set of use cases mentioned at the start of this blog. Right now we have accomplished:

  1. route|path level routing using the routes -> [match -> path] config.
  2. Support for both HTTP2/gRPC by setting clusters -> [cluster -> protocol_selection] as USE_DOWNSTREAM_PROTOCOL which switches the HTTP protocol version used by Envoy depending on the version in the incoming request.
  3. Observability: By means of configuring access logs of Envoy using a json_format file access log.

As you may have noticed, individual access log statements of Envoy are in json format and are not printed in the same key order. But they all have the same set of keys. This is because we have chosen to use json_format for our access logs which helps in aggregating and filtering logs when coupled with a log aggregation service like Kibana.

In the next part of this blog, we will proceed to solve the rest of our use cases by leveraging the same mock environment setup. I will add a link here once the next part is published. Thanks for reading!

Link to Part 2 as promised.

Senior Consultant | Trekking & Cycling enthusiast