Mirror Maker is one of the built in tools shipped with Kafka. It allows for easy replication of topics between Kafka clusters. We will be working with Kafka 2.4.1. The Apache Kafka documentation gives a brief description of how to use Mirror Maker
This blog describes how to setup Mirror Maker as a Kubernetes deployment. There are a few options available for a containerized mirror maker setup. What this article attempts is to show how to set it up from scratch. We will also discuss how to make Mirror Maker fault tolerant. This will be a simple setup with just a single consumer and a producer. It is also possible to have multiple consumers if necessary.
For this setup we will:
- Identify configurations
- Containerize Mirror Maker
- Deploy it on Kubernetes
Configuring Mirror Maker
Let’s identify the configurations, as we list down what this Mirror Maker setup will facilitate:
- Mirror records from a configurable source to a configurable sink
This is specified through consumer and producer properties using the
--producer.config command line arguments when starting the mirror maker process
We also configure a group id to identify the consumer component of the mirror maker.
- Specify the topics to be mirrored
We can configure which topics should be mirrored using the
--whitelist command line argument when starting the mirror maker process. This argument accepts a pattern to match topics of interest.
- Configure logging
The default logging configuration can be overridden by modifying the config/tools-log4j.properties file inside the Kafka installation directory. This config applies to all Kafka command line tools.
- Optionally make other configurations of interest dynamic
In all the above configurations, we have used environment variables to stand in for the actual configuration values. These variable names will be later replaced with their actual values from the environment using the entrypoint script. We use
envsubst command from the gettext package for this. So the actual entrypoint script will look like this:
Any additional properties in the above configurations can be similarly added as key:value pairs with the key being a valid property name in the corresponding configuration and the value constructed using environment variables. Command line arguments in the entrypoint script can also be substituted in the same way. Ex:
fetch.min.bytes=$MIN_FETCH_BYTES in consumer.properties
Containerizing Mirror Maker
Let’s first take a look a the Dockerfile used to build the Mirror Maker image:
Let me explain what this Dockerfile does. For creating the docker image, we use Ubuntu as the base image and install openjdk-8-jre which is needed for Kafka to run. Kafka is downloaded using wget from one of its download mirrors and gpg is used to verify its integrity. gettext is installed for use during runtime to initialize the configurations
Kafka is then extracted into
/opt/Kafka directory. The config files required to run Mirror Maker: consumer.properties , producer.properties and tools-log4j.properties are also copied into this directory. Lastly, we copy the entrypoint.sh script which initializes the config files and starts the mirror maker process.
The Kafka and Scala versions, and the URIs to be used in the docker build are made configurable. The have default values but can be overridden using
--build-args as shown below.
docker build -t <tag> . --build-args KAFKA_VERSION=2.5.0
For this example we will use
kafka-mirror-maker as the docker image tag.
Deploying to Kubernetes
We will use a simple k8s Deployment resource to deploy Mirror Maker. In case you are working with a remote k8s cluster, you will need to add the registry name in the tag during docker build and push it to that registry. This registry should be accessible from your k8s cluster.
docker build -t <your-registry/image-name> . && docker push <your-registry/image-name>
All the pieces come together in the deployment resource, when we add the necessary environment variables in the container specifications. We use
imagePullPolicy: IfNotPresent so that the local image is used. This in case you are using a local k8s cluster and has not pushed the image to a remote registry.
We deploy by running:
kubectl apply -f deployment.yaml
To tail the logs:
kubectl logs -f <pod-name>
Deployments can be done using tools like Helm to ease configuration management.
Fault Tolerant Configuration for Mirror Maker
Mirror maker replication can be made fault tolerant by adding these additional configurations.
This makes sure that offsets are committed in the source cluster only after the record is successfully replicated to the sink cluster. In the Mirror Maker crashes, on restart it will resume right where it left off.
The first config will make sure that only one record is produced at a time. The second config tells mirror maker to wait for the record to be acknowledged by the minimum in sync replicas for that topic before marking the record send as successful and moving on to the next record.
Mirror Maker also has a command line configuration:
--abort.on.send.failure which is set to true by default. This will make Mirror Maker stop and exit in case of failure to send a record to the sink cluster. The abort happens after retrying for a configured number of times (default is 2147483647). This abort will let us debug the actual issue causing send failure before replicating the remaining records from the source cluster.
Note: In cases when we are replicating records that do not have a key i.e. having a null key, the target broker will reject the record and causes the send to fail, if the topic’s
cleanup.policy is set to
compact . More specifically, it results in an
InvalidRecordException. This is because, for compacting records, Kafka brokers requires every record in a topic to have a key. This validation is done when records are written to topics for which the cleanup policy is set to compact. To fix this, either cleanup should be disabled or cleanup policy should be changed to
You can access the complete setup used for this demo at https://github.com/amalaruja/kafka-mirror-maker-demo