A Kafkaesque Experiment

As part of my interview prep, last night I challenged myself to do the following:

  • Make a Kubernetes cluster (on Google Cloud Platform)
  • …running Dockerized Zookeeper (1) and Kafka (2)
  • …with Kafka reporting stats into Datadog
  • Send in synthetic load from a bunch of Go programs moving messages around on Kafka
  • Then run an experiment to kill the Kafka master and watch how the throughput/latencies change.

Since thats a lot of that stuff I’ve never touched before (though I’ve read up on it, and it uses all the same general concepts I’ve worked with for 15 years) it should not be too surprising that I didn’t get it done. Yet.

The surprising thing is where I got stuck. I found a nice pair of Docker containers for Zookeeper and Kafka. I got Zookeeper up and running, and I could see it’s name in the Kubernetes DNS. My two Kafkas were up and running, and they found the Zookeeper via service discovery. So far so good. But then something went wrong with the place where I was going to run clients from; it could not talk to either of the Kafkas via TCP, connection timed out. What’s more, I couldn’t be sure that both of my Kafkas were even being advertised by Kubernetes DNS.

(Shower thought after writing this: perhaps my client container was started before the Kafka one, and as a result, it didn’t have the correct container-to-container networking magic set up. It would be interesting to read up on how that works and then debug it to see if I can see the exact problem. Or it might go away the next time I start the containers, this time in the right order. But… how can order matter? This would make it very difficult to operate these things.)

Learning how to debug in the container environment is one of the hardest things. It’s like walking around in a brewery in the dark armed only with a keychain flashlight and your nose, looking for the beer leak.

I think it is time to take a break from container-ville and use small, local Kafka on my Mac to develop the synthetic load generator. That will also be interesting, because I’m hoping to be able to generate spiky, floody flows of messages using feedback from producers to consumers. It is actually something I’ve had in mind for years, and never had the right situation calling on me to finally try it out.

Update: Well the load generator was fun hacking/learning. The final step would be to put it all together. That may come in the future, but for now I’m busy with a trip to New York.

Interview Questions I Hope I Get

I have an interview coming up, and so my “keep in shape hacking time” has been recently devoted to interview preparation. I thought I would make a post about what’s in my head, both as a way to solidify it (no better way to learn something than by teaching it) and in case this interview goes bad, so that my next prospective employer can see what I’m thinking about.

If you, my current prospective employer are reading this, would you please not take advantage of this by removing these questions from your list? Come on guys, give me a break. If I’m going to be transparent in my thought processes, the least you can do is throw me a bone and ask at least one of these in person!

Question 1: Make something faster.

I think this would be an interesting question. Especially if it was literally asked in these words. The interviewee asks, “make what faster?” The interviewer says, “Whatever. Tell me about making something faster. Whatever you want to tell me about.”

This would be a fun question because it would test the interviewer’s ability to organize his/her thoughts, select the relevant ones, and present them. In fact, it is checking the interviewee’s ability to teach, which is a required skill for someone with like me (with a grey beard). You deliver the most value on a team not by pulling 80 hour weeks yourself, but by mentoring people so that the 4 people in your team manage (together) a 160 hour week, instead of the 60-80 hours of useful progress they might have managed without your (theoretically) sage advice.

My answer would be the following:

There are three possibilities for how to make something faster. First, you can not do it at all. Not doing it is an infinite speedup over doing it. How? Caching, usually. And hey, caching is easy. As easy as naming.

Second, you can do it better, by choosing a better algorithm, representation, or by doing less work. I might retell a story Átila told me about a challenge in which the right answer could be got with less than half the data sorted (he wrote his own quicksort that let him find the answer before finishing the sort, thereby blowing other contestants out of the water; they all did quicksort to completion, then looked for the answer in the sorted list). I also happen to have spotted a place in my prospective employer’s product where I could propose to change the representation, in order to reduce the compute power necessary to parse something. It takes an arrogant little prick to use the company’s own software as an example of what not to do, but depending on the feeling I have with the interviewer, I might do it.

Third, you can take the solution you have now, and optimize it. To do that, you measure, profile, tweak, and then measure again. I checked some code into the Go standard library that does that, and one of those would make a nice story.

Question 2: How would you add load shedding to the Go standard library’s HTTP server?

I thought of this question while reading chapter 22 of the new SRE book from Google and O’Reilly.

There are two layers where you could implement it. The “right” way to do it is the arrange that the server sends back a properly formatted error 503 via HTTP. Another way that you could do it would be to accept and close the connection immediately. Which you choose depends on who you think your clients are, and how you think they will respond to the two kinds of errors.

The “accept and close” technique should be capable of more “!QPS” (non-answered queries per second) because it can theoretically handle each reject without spawning a new goroutine. Goroutines are cheap, but they imply at least one mutex (to add the goroutine into the scheduler), some memory pressure (for the stack), and some context switches. Implementing this would be a simple matter of wrapping the net.Listener with our own, which intercepted Accept to do “accept and close” when load shedding is needed.

I just went to read the source, and the way Accept is called, it must return a valid connection and no error. So our load-shedding Accept needs to actually be a loop, where we accept and close until the logic decides it is time to finally let a connection through (i.e. “if load shedding then allow 1 in 100 connections” or whatever), and then we need to return with that connection.

So how to implement the other one? What you want is an http.HandlerFunc wrapper. Except: what if libraries you don’t control are registering handlers? How can you be sure your load shedding wrapper is always the first one called, so that it can send an error 503? I thought I had a solution to this, but it turns out that http.ServeMux is not an interface, but a concrete type. So it is not possible to trap http.ServeMux.Handle, and ensure that the load shedder is always on the front. This one is going to take some more thinking.

Of course there’s always monkey patching in Go which could be used to arrange for HandleFunc and friends to always arrange to put a load shedder on the front of the call chain. But not in production. Please, God, not in production.

Question 3: A coding challenge I found on Glassdoor

This one is a legitimate coding challenge, and I don’t want to post either the question or the answer here, because that’s just not cool. But it did lead me to an interesting observation…

After a little workout in the coding dojo (I slayed the challenge!) I hit the showers. And in the shower, I had an idea: what if you could interpret the []byte that you get from go-fuzz as input to this code challenge? Then I could fuzz my routine, looking for ways to crash it. I quickly got it working, but ran up against the fundamental problem of fuzzing. When your input is random, for lots of problems, it is hard to check the output to know if it is the right output for the given input. You have one routine right there that could check it, but it is the routine-under-test, and thus using it would be cheating (and pointless). The bottom line, which I already knew, is that fuzzers can only find crashes, not incorrect behavior.

git log ––grep “Résumé”

For a while now, it’s become clear that a useful and important piece of data about how a future colleague might work out is their open source contributions. While the conditions of open source work are often somewhat different than paid work, a person’s manner of expressing themselves (both interpersonally, on issue trackers for example and in code) is likely to tell you more about their personality than you can learn in the fake environment of an interview.

I am currently starting a job search for a full time job in the Lausanne, Switzerland area (including as far as Geneva and Neuchâtel) with a start date of September 1, 2016. Touching up my résumé is of course on my todo list. But it is more fun to look at and talk about code, so as a productive procrastination exercise, here’s a guided tour of my open source contributions.

I currently work with Vanadium, a new way of making secure RPC calls with interesting naming, delegation and network traversal properties. I believe wide-scale adoption of Vanadium would make IoT devices less broken, which seems like an important investment we need to be making in infrastructure right now.

I have been an occasional contributor to the Go standard library since 2011. Some examples I am particularly proud of include:

  • I profiled the bzip2 decoder and realized that the implementation was cache thrashing. The fixed code is smaller and faster.
  • I noticed that the HTTP implementation (both client and server) was generating more trash than was necessary, and prototyped several solutions. The chosen solution was is in this checkin. This was, for a while, the most intensely exercised code I’ve written in my career; it saved one allocation per header for most headers processed by the billions of HTTP transactions that Go programs do all around the world. It was later replaced with a simpler implementation that was made possible by an optimization in the compiler. The same allocation reduction technique was carried over into Go’s HTTP/2 implementation too, it appears.
  • I improved the TLS Client Authentication to actually work. I needed this for a project at work.

I wrote an MQTT server in Go in order to learn the protocol for work. It is by no means the fastest, but several people have found that the simplicity and idiomaticity of its implementation is attractive. It ships as part of the device software for the Ninja Sphere.

  • I found an interesting change to reduce network overhead in the MQTT serializer I used. Copies = good, more syscalls and more packets = bad.

I worked on control software for a drone in Go for a while. It still crashes instead of flying, but it was an interesting way to learn about control systems, soft realtime systems, websockets, and Go on ARM.

I have written articles about Go on this blog since I started using it in 2010. Some interesting ones in particular are:

I am also active on the Golang sub-reddit, where I like answering newbie requests for code reviews. I also posed a challenge for the Golang Challenge.

My earliest open source project that’s still interesting to look at today was a more scalable version of MRTG called Cricket. Today tools like OpenTSDB and Graphite are used for the same job, it seems. Cricket still works to the best of my knowledge, but development is dead because of lack of interest from users. Cricket is written in Perl.

My earliest open source project was Digger, from Bunyip. If you can unearth archeological evidence of it from the 1996 strata of the Internet geology, you’re a better digger than I am. It was written in C.

Dynamic DNS circa 2016

In the old days, if you had an ISP that changed your IP address all the time but you wanted to run a server, you used dynamic DNS, i.e. a hacky script talking to a hacky API on an hacky DNS provider.

These days, if you bring up a cloud server from time to time to work, it is likely to get a different IP address. But you might want a DNS record pointing at it so that it is convenient to talk to.

Same problem, different century.

Here’s my solution for a GCE server and a domain fronted by CloudFlare.

It has nella.org hard coded in it, so YMWDV (your mileage will definitely vary).

Learning Swift, sans Xcode

Say you are learning Swift. And like a good fanboi, the first thing you do is update to the latest and greatest because that’s like what you do when you are a nerd.

But you live in Osh, Kyrgyzstan. You have bitchin’ FTTH from Unilink, but access outside of Kyrgyzstan is still limited by the great firewall that Putin has put up in Moscow or whatever. I don’t know, but it’s slow as hell.

So you want to learn Swift, but Xcode is out of commission because it is upgrading. Well, sort of out of commission. It gives an “I’m upgrading” message, but it is still there in /Applications/Xcode.app.

So you can use this script to call Swift as an interpreter, and then you can learn Swift in Emacs, where you should be programming anyway, YOU FOOL.



$xc/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/swift \
 -frontend \
 -interpret $1 \
 -target x86_64-apple-darwin15.2.0 \
 -enable-objc-interop \
 -sdk $xc/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.10.sdk \
 -module-name `basename $1 .m`

echo "Exit code: $rc"

HTTP/2: Thanks Cloudflare and Go!

Look what happened today:

2015/12/04 11:38:07 fetching https://nella.org
2015/12/04 11:38:08 {200 OK 200 HTTP/2.0 2 0 map[Server:[cloudflare-nginx] Date:[Fri, 04 Dec 2015 05:38:08 GMT] Content-Type:[text/html] Set-Cookie:[__cfduid=d3a3ea49ee46eb6a6803e2eb7f597e26e1449207488; expires=Sat, 03-Dec-16 05:38:08 GMT; path=/; domain=.nella.org; HttpOnly] Vary:[Accept-Encoding] Cf-Ray:[24f529d18893372c-ARN]] 0xc8203bbf60 -1 [] false map[] 0xc8200be000 0xc8206cc420}

Thank you Go 1.6 and Cloudflare. You guys are bringing my website into the bright future of 2016 with no help at all from me. 🙂

Industrial-scale power storage and waste heat

There will, eventually, be a giant wind farm above my house. I say eventually because though Switzerland is not immune from NIMBYism, our court system deals efficiently enough with oppositions so that if something is allowed by law (zoning laws, eco-protection laws, etc) then it does go through. The opposition (and there’s always opposition) does a few court challenges, it goes up a couple layers, sometimes to the supreme court, and the court rather quickly says, “It’s legal, shut up. If you don’t like it, change the laws, don’t come begging us to do so.”

The opposition have two complaints. They choose which one to talk about depending on the context. If they are going for “shock and awe” they use Photoshopped pictures to show how “ugly” the windmills will be. I’m suspicious of their Photoshopping, because though I think the relative sizes are correct, I don’t think the visibility (i.e. brightness of the windmills themselves and the reduction in visibility due to natural haze) in their pictures is correct. Whatever, it’s true, they are giant industrial installations in areas previously used only for grazing and milking cows. But we should recall that raising and milking cows it itself a giant industrial operation with a twice-daily milk run with a diesel-powered truck through this scenic wonderland. Some of the milking barns even run off of polluting diesel generators…

If you tell them, “I don’t mind the windmills, they look like progress to me” (and I have!), then the opposition falls back onto their second line of defense. They say, “windmills do not produce energy when it is needed, so they can’t replace nuclear”.

So, first, that’s a straw-man. No one is talking about nuclear here; if we were we wouldn’t agree anyway because I’m pro-nuclear. I consider nuclear power to be green energy (and the founder of Greenpeace does as well). What I’m interested in is eliminating fossil fuels from electrical generation, and from transport use.

The Tesla battery technology for utilities is the missing part of the equation (update July 2017: I’ve just found out that Eaton, in my backyard at Le Lieu, make them too). They shift energy from the peak generation time to the time when the energy is needed, making it possible for windmills and solar to contribute to the baseline load that existing dirty electric plants provide. But they have so far been tested in giant, ugly, industrial installations, which even I would not like to see here in my backyard.

So that got me thinking about how the Mollendruz windmills could be hooked up to batteries.

Batteries heat up when they are charged, and part of what’s special about Tesla’s innovations is to integrate cooling and fire protection into the heart of their battery packs. So a utility-scale battery installation will create utility-scale waste heat. In the current utility scale batteries, this is appears to be dumped into the atmosphere. But remote heating is a mature and well respected technology in Switzerland. Wouldn’t it be interesting to put that waste heat to use heating our schools and government buildings?

I don’t know how near batteries need to be to windmills to be efficient. Windmills generate alternating current, because they are a rotative power source. And AC travels at lower loss than DC. So it seems that putting the batteries where the heat is needed would be ok.

But the batteries are still ugly. What can we do about that? If you are harvesting the heat from them, and they are already engineered to be installed into moving cars and houses, the utility scale batteries can probably be installed indoors. In Switzerland, when we need to put things indoors and we want the land to remain pretty, we put them underground. Near the Col du Mollendruz there’s an old military fort called Petra Felix. I wonder if there’s enough room inside of it to hold the batteries?

Hacking cars and fixing them

A few years ago, I read an academic paper on how to hack cars. Today news came out that what was previously demonstrated via direct access is also possible over the air.

I thought it would be fun to look at the firmware update file that fixes this, to see what format it is in, what’s in it, etc. To get an update for 2014 Jeep Cherokees, you need a VIN. It turns out a used car sales website posted the VINs of their inventory on their website, so I found one: 1C4PJMDB6EW255433

Then you put it into the UConnect website, which is a typical late 2000’s travesty of over-engineering. It wants you to use some plugin from Akami to download the file, but in small print tells you that you can also click on this link. But of course, there’s javascript insanity to prevent you from finding out what the link is. It is delivered via TLS, which is interesting. It is a 456 meg zip file. It also has a user-specific token on the end of it, and without that you get a 404 when you try to fetch it.

The zip file has an ISO inside of it:

$ unzip -l MY13_MY14_RA4_15_26_1.zip 
Archive:  MY13_MY14_RA4_15_26_1.zip
  Length     Date   Time    Name
 --------    ----   ----    ----
583661568  06-23-15 14:48   swdl.iso
 --------                   -------
583661568                   1 file

The ISO file is slightly bigger than the zip file, at 583 megs:

$ ls -l swdl.iso 
-rw-r--r--  1 jra  staff  583661568 Jun 23 14:48 swdl.iso

Inside the ISO file is:

dr-xr-xr-x  2 jra  staff  2048 Jun 23 16:47 bin
dr-xr-xr-x  2 jra  staff  2048 Jun 23 16:47 etc
dr-xr-xr-x  2 jra  staff  2048 Jun 23 16:47 lib
-r-xr-xr-x  1 jra  staff  1716 Jun 23 16:47 manifest
dr-xr-xr-x  4 jra  staff  2048 Jun 23 16:47 usr

And that manifest file? It is Lua, which is apparently read into the updater via execution.

So right. The updater itself apparently gives an attacker execute privs in the address space of the Lua interpreter via an unsigned file.

Jeeze, Chrysler, that’s like Game Set and Match, and I haven’t even looked into bin/ yet. WTF?

Update after reading some more…

Well something interesting happens in ioschk.lua, where the second block of 64 bytes from the ISO is read and then fed to “openssl rsautl”, using a public key that is on the device. But ioschk.lua is loaded from the ISO itself, and is called by install.sh, from the ISO. So it seems like if you want to make your own ISO, you need to remember to make install.sh’s call to isochk.lua a no-op.

Other interesting things I found while trolling around… they have the Helvetica Neue font, and right next to it a license file saying, “for evaluation only”. Jeeze, sure hope that Harman have paid up, or else they might have a bill in the mail.

There’s a file called cisco.sh which does the necessary to put the device on the Ethernet if a Linksys USB300M adapter is plugged in. It has some checks in it for an internal development mode, but those would be easy to bypass if you can in fact edit the ISO image at will.

So, all in all, it would be fun to play if I had a Jeep. But I’m still planning on getting a Tesla.