A while ago at work, I needed to learn a little about MQTT. After reading the spec and finding this package that handles reading and writing the messages themselves, I decided to write a server.
The result is here: https://github.com/jeffallen/mqtt.
After I finished, I wanted to know if my server was comparable to Mosquitto, so I wrote some load testing tools. I found that my Go server was generally comparable to Mosquitto, in that it had the same order of magnitude for transaction rate and latency (but it was strictly slower than Mosquitto; there is a cost to pay for the benefits Go gives you). As usual, Go programs compare quite favorably to C programs in terms of memory safety, overall length, and readability. It is also useful to consider that my implementation took only a couple of days to get finished. A C program with equivalent safety, correctness, and speed would take much longer to write. I’ve written them, I know.
Then I mentioned to my coworker Atila what I’d done. Knowing that Atila is incredibly competitive and a total D fanatic (and Go refusnik) I might have casually implied that D would have a hard time competing with my incredibly short, fast, correct, and beautiful Go version. Heh heh heh.
And so the code golf began. It only ended yesterday when Atila came back to me and begged me not to write any more benchmarks because he was tired of beating my sorry ass into the ground with his blazing fast D implementation. Fair ’nuff.
If I still have your attention…
Here are some interesting things from the experience and the code I wanted to point out.
It is often said that the solution to a problem in a domain specific language will be shorter and more correct than the solution in a general purpose language. Of course, the drawback is that you’ve got to write the DSL first, and then people who come along after you have to learn it. Implementing an MQTT server in Go really felt like writing in the correct domain specific language. It basically wrote itself, because channels and garbage collection meant that the hardest parts of a message passing server happen in the runtime instead of in the code.
Though Atila had a huge amount of fun tuning his D implementation, I didn’t do much at all to tune my implementation.
I used “go tool pprof” to fetch a CPU profile from it while it was under load from the pingtest program. But the resulting trace showed that it was spending most of its time jumping on and off of system threads. In other words the program is I/O bound, not CPU bound. It is making “too many” kernel calls, and thus speeding it up by looking for a hot spot or reducing garbage collection overhead isn’t interesting. The solution would have been to add some kind of buffering layer between the net.Conn and the message encoder/decoder. But the problem with an intervention like that is that you’d need to do it carefully so that the buffer would cut down on kernel/user context switches while there were very high transaction rates, but not introduce latency on messages when the transaction rate is low. And if the goal is massive scalability, with 1 million or more connections, each of which have a very low transaction rate, then this buffering technique wouldn’t help anything.
I did notice one nice optimization I was able to make when I was looking at the traffic with Wireshark. It is documented in this checkin. The lesson here is fewer calls to the kernel is not only a good idea for kernel/user context switch time, but may also save a significant amount of bandwidth.
As for Go techniques, I created something called a “receipt”, which lets you get synchronous feedback about a job sent in via a asynchronous mechanism. Take a look at how submitSync is used when we need to know when a certain message has departed. A receipt is a channel. The only two operations done on the channel are “make” and “close”. This makes it work like a condition variable. It flowed out of my fingers at the time; I don’t know if it is an idiom I’ll use again. It certainly didn’t give me any problems, so I guess I’ll keep it!
I don’t know or care where this code will go. If I receive pull requests, I’ll act on them. If someone needs to depend on this for a production-quality system, they might want to fork it and take long-term ownership of it. That’s OK with me too.