A Kafkaesque Experiment

As part of my interview prep, last night I challenged myself to do the following:

Make a Kubernetes cluster (on Google Cloud Platform)
…running Dockerized Zookeeper (1) and Kafka (2)
…with Kafka reporting stats into Datadog
Send in synthetic load from a bunch of Go programs moving messages around on Kafka
Then run an experiment to kill the Kafka master and watch how the throughput/latencies change.

Since thats a lot of that stuff I’ve never touched before (though I’ve read up on it, and it uses all the same general concepts I’ve worked with for 15 years) it should not be too surprising that I didn’t get it done. Yet.

The surprising thing is where I got stuck. I found a nice pair of Docker containers for Zookeeper and Kafka. I got Zookeeper up and running, and I could see it’s name in the Kubernetes DNS. My two Kafkas were up and running, and they found the Zookeeper via service discovery. So far so good. But then something went wrong with the place where I was going to run clients from; it could not talk to either of the Kafkas via TCP, connection timed out. What’s more, I couldn’t be sure that both of my Kafkas were even being advertised by Kubernetes DNS.

(Shower thought after writing this: perhaps my client container was started before the Kafka one, and as a result, it didn’t have the correct container-to-container networking magic set up. It would be interesting to read up on how that works and then debug it to see if I can see the exact problem. Or it might go away the next time I start the containers, this time in the right order. But… how can order matter? This would make it very difficult to operate these things.)

Learning how to debug in the container environment is one of the hardest things. It’s like walking around in a brewery in the dark armed only with a keychain flashlight and your nose, looking for the beer leak.

I think it is time to take a break from container-ville and use small, local Kafka on my Mac to develop the synthetic load generator. That will also be interesting, because I’m hoping to be able to generate spiky, floody flows of messages using feedback from producers to consumers. It is actually something I’ve had in mind for years, and never had the right situation calling on me to finally try it out.

Update: Well the load generator was fun hacking/learning. The final step would be to put it all together. That may come in the future, but for now I’m busy with a trip to New York.