Taking an MQTT server out for a few holes on the code golf course

A while ago at work, I needed to learn a little about MQTT. After reading the spec and finding this package that handles reading and writing the messages themselves, I decided to write a server.

The result is here: https://github.com/jeffallen/mqtt.

After I finished, I wanted to know if my server was comparable to Mosquitto, so I wrote some load testing tools. I found that my Go server was generally comparable to Mosquitto, in that it had the same order of magnitude for transaction rate and latency (but it was strictly slower than Mosquitto; there is a cost to pay for the benefits Go gives you). As usual, Go programs compare quite favorably to C programs in terms of memory safety, overall length, and readability. It is also useful to consider that my implementation took only a couple of days to get finished. A C program with equivalent safety, correctness, and speed would take much longer to write. I’ve written them, I know.

Then I mentioned to my coworker Atila what I’d done. Knowing that Atila is incredibly competitive and a total D fanatic (and Go refusnik) I might have casually implied that D would have a hard time competing with my incredibly short, fast, correct, and beautiful Go version. Heh heh heh.

FORE!

And so the code golf began. It only ended yesterday when Atila came back to me and begged me not to write any more benchmarks because he was tired of beating my sorry ass into the ground with his blazing fast D implementation. Fair ’nuff.

If I still have your attention…

Here are some interesting things from the experience and the code I wanted to point out.

It is often said that the solution to a problem in a domain specific language will be shorter and more correct than the solution in a general purpose language. Of course, the drawback is that you’ve got to write the DSL first, and then people who come along after you have to learn it. Implementing an MQTT server in Go really felt like writing in the correct domain specific language. It basically wrote itself, because channels and garbage collection meant that the hardest parts of a message passing server happen in the runtime instead of in the code.

Though Atila had a huge amount of fun tuning his D implementation, I didn’t do much at all to tune my implementation.

I used “go tool pprof” to fetch a CPU profile from it while it was under load from the pingtest program. But the resulting trace showed that it was spending most of its time jumping on and off of system threads. In other words the program is I/O bound, not CPU bound. It is making “too many” kernel calls, and thus speeding it up by looking for a hot spot or reducing garbage collection overhead isn’t interesting. The solution would have been to add some kind of buffering layer between the net.Conn and the message encoder/decoder. But the problem with an intervention like that is that you’d need to do it carefully so that the buffer would cut down on kernel/user context switches while there were very high transaction rates, but not introduce latency on messages when the transaction rate is low. And if the goal is massive scalability, with 1 million or more connections, each of which have a very low transaction rate, then this buffering technique wouldn’t help anything.

I did notice one nice optimization I was able to make when I was looking at the traffic with Wireshark. It is documented in this checkin. The lesson here is fewer calls to the kernel is not only a good idea for kernel/user context switch time, but may also save a significant amount of bandwidth.

As for Go techniques, I created something called a “receipt”, which lets you get synchronous feedback about a job sent in via a asynchronous mechanism. Take a look at how submitSync is used when we need to know when a certain message has departed. A receipt is a channel. The only two operations done on the channel are “make” and “close”. This makes it work like a condition variable. It flowed out of my fingers at the time; I don’t know if it is an idiom I’ll use again. It certainly didn’t give me any problems, so I guess I’ll keep it!

I don’t know or care where this code will go. If I receive pull requests, I’ll act on them. If someone needs to depend on this for a production-quality system, they might want to fork it and take long-term ownership of it. That’s OK with me too.

Qz for Mobile Phones

I love Quartz. however, I find that the best time/place for me to read it is on the train during my commute. And Quartz’s overly clever and overly dynamic website hates my phone’s browser. So I made Mobile Quartz to help me read it. The front page has the current top articles from Quartz. Click the links to see the article.

I used this as a chance to write some Go on Google Appspot code, and it was really a pleasure.

Zero Downtime upgrades of TCP servers in Go

A recent post on the golang-nuts mailing list mentioned that Nginx can upgrade on the fly without ever stopping listening to it’s listen socket. The trick is to unset close-on-exec on the listen socket, then fork/exec a new copy of the server (on the upgraded binary) with an argument to tell it to use the inherited file descriptor instead of calling socket() and listen(s).

I wanted to see if I could achieve the same thing with Go, and what changes would be necessary to the standard libraries to make this possible. I got it working, without changing the standard library, so I wanted to explain what I did here.

The code for this post is here.

There are several interesting things I’d like to point out in this program. Twice, I used the pattern of “implement an interface in order to intercept method calls”. This is an important pattern in Go that I don’t think is widely understood and documented.

When I started thinking about this job I knew one of my problems would be to dig down inside of http.(*Server).Serve in order to get it to stop calling Accept() when the old server should shutdown gracefully. The problem is that there’s no hooks in there; the only way out of the loop (“accept, start go routine to process, do it again”) is for Accept to return an error. But if you think of Accept as a system call, you might think, “I can’t get in there and inject an error”. But Accept() is not a system call: it’s a method in the interface net.Listener. Which means that if you make your own object which satisfies net.Listener, you can pass that in to http.(*Server).Serve and do what you want in Accept().

The first time I read about embedding types in structures I got lost and confused. And when I tried it, I had pointers in the mix and I had lots of unexplained nil pointer errors. This time, I read it again and it made sense. Type embedding is essential when you want to interpose one of the methods of an interface. It lets you inherit all the implementations of the underlying object except for the one that you redefine. Take a look at stoppableListener in upgradable.go. The net.Listener interface requires three methods including Accept, Close, and Addr. But I only defined one of those, Accept(). How is it that stoppableListener still implements net.Listener? Because the other two methods “show through” from where they were embedded in it. Only Accept() has a more precise definition. When I wrote Accept(), then I needed to figure out how to talk to the underlying object, in order to pass on the Accept() call. The trick here is to understand that embedding a type creates a new field in your structure with the unqualified name of the type. So I can refer to the net.Listener inside of stoppableListener sl as sl.Listener, and I can call the underlying Accept() as sl.Listener.Accept().

Next I started wondering how to handle the “stopped” error code from Serve(). Immediately exiting with os.Exit(0) isn’t right, because there can still be go routines servicing HTTP clients. We need some way to know when each client is done. Interposition to the rescue again, since we can wrap up the net.Conn returned by Accept() and use it to decrement a count of the number of currently running connections. This technique of interposing the net.Conn object could have other interesting uses. For example, by trapping the Read() or Write() calls, you could enforce rate limiting on the connection, without needing the protocol implementation to know anything about it. You could even do other zany things like implement opportunistic encryption, again without the protocol knowing it was happening.

Once I knew that I would be able to stop my server gracefully, I needed to figure out how to start the new one on the correct file descriptor. Rog Peppe pointed me to the net.FileListener object, which one can construct from an *os.File (which you can make from a raw file descriptor in an int using os.NewFile).

The final problem is that net always sets the close-on-exec flag on file descriptors for sockets it opens. So I needed to turn that off on the listen socket, so that the file descriptor would still be valid in the new process. Unfortunately syscall.CloseOnExec does not take a boolean to tell it what you want (do close or don’t close). So instead, I pulled out the stuff from syscall that I needed and put it directly in upgradable.go. Not pretty, but nicer than hacking the standard library. (This is a nice thing about Go: there’s almost always a way to do what you want, even if you have to hack around just a bit.)

I tested that it works manually (using command-line GETs from another window). I also tested it under load using http_load. It is really cool to be able to set up a load test for 20 seconds and get 3937 fetches/sec, then do the test again, adding in a few “GET http://localhost:8000/upgrade” from another window and still getting 3880 fetches/sec, even as I replaced the running binary a few times during the load test!

Testing Go’s HTTP server for CVE-2011-3192 vulnerability

The recent DoS attack on Apache is caused by sending in a malformed Range header. I decided to send the same header into Go’s range header parser and see what happened. It passed with flying colors, giving the “invalid range” error, which would result in the Go webserver sending back HTTP response code 416 to the HTTP client.

I didn’t test this under load (which is part of what the DoS was doing), but my quick reading of parseRange leads me to believe that the only effect of sending Range headers in like this is that garbage is created under control of the attacker (for example, due to strings.Split(s[len(b):], “,”)). Of course, that’s bad and should be limited. However, this risk is no greater than other places in the server. For example an attacker could send an unlimited number of headers, or headers of unlimited length. This is already mentioned in the net/textproto package’s reader.go.

I did notice something strange while looking into this, which is that there’s a constant named maxHeaderLines (1024) that is never referenced. This is probably because the (incorrectly non-plurally-named) textproto.ReadMIMEHeader is now in charge of reading all the headers, but it does not accept a max headers parameter.

The fact that this particular constructed header causes no problem is no great surprise, because this attack was highly targeted at Apache. But it is nice to see that Go’s webserver is already (somewhat) hardened against this type of attack.

PS: Here’s a diff showing how I tested this. It might be interesting for beginner Go folks to see how to generate test data at test time.

diff --git a/src/pkg/http/range_test.go b/src/pkg/http/range_test.go
--- a/src/pkg/http/range_test.go
+++ b/src/pkg/http/range_test.go
@@ -6,13 +6,16 @@
 
 import (
 	"testing"
+	"fmt"
 )
 
-var ParseRangeTests = []struct {
+type testDef struct {
 	s      string
 	length int64
 	r      []httpRange
-}{
+}
+
+var ParseRangeTests = []testDef{
 	{"", 0, nil},
 	{"foo", 0, nil},
 	{"bytes=", 0, nil},
@@ -34,6 +37,16 @@
 	{"bytes=500-700,601-999", 10000, []httpRange{{500, 201}, {601, 399}}},
 }
 
+func init() {
+  var p string
+  for k:=0 ; k<1300 ; k++ {
+	p += ",5-"
+	p += fmt.Sprintf("%d", k)
+  }
+  x := "bytes=0-" + p
+  ParseRangeTests = append(ParseRangeTests, testDef{ x, 10, []httpRange{} })
+}
+
 func TestParseRange(t *testing.T) {
 	for _, test := range ParseRangeTests {
 		r := test.r

As set up right now, it won't pass. This is because I wanted to see the exact error it was returning. If you change the []httpRange{} to nil, it will pass.

PPS: Looks like someone else has been thinking about this as well.

A trip down the (split) rabbithole

Note: This post is out of date, and will become increasingly out of date when Go’s new contiguous stacks are implemented. I’m leaving it here because it is still interesting, even if out of date.

Go uses split stacks (also called segmented stacks in the literature) in order to allow thousands of stacks in the space that would normally be taken by hundreds of C-style contiguous stacks. There’s a discussion of how to add split stacks to GCC here. Note: it was written by the author of gccgo, around the time he started porting Go to Gcc, so it’s clear that he’s adding split stacks to Gcc to help Go. But if it’s helpful to C programmers as well, then so much the better!

Ian’s article is dedicated to how he planned to add split stacks to GCC. I want to dig down to the lowest level and trace every single step of the process of maintaining split stacks in the native Go runtime. I hope you, dear reader, do too. Here we go…

First, we need a program that’s going to use up all it’s stack. Since the point of this article is not to learn about Go’s stack-based object allocation we’ll use up the stack with return addresses, specifically to main:

package main

func main() {
	main()
}

Now, don’t go running that just yet, or else you’ll have the out of memory killer on your back. Instead, go run that under gdb, setting a breakpoint on runtime.morestack:

$ gdb 8.out
(gdb) b runtime.morestack
Breakpoint 1 at 0x80494de: file /home/jra/go/src/pkg/runtime/386/asm.s, line 150.
(gdb) r
Starting program: /home/jra/go-stuff/8.out 

Breakpoint 1, runtime.morestack ()
    at /home/jra/go/src/pkg/runtime/386/asm.s:150
150		get_tls(CX)
Current language:  auto; currently asm
(gdb) bt
#0  runtime.morestack () at /home/jra/go/src/pkg/runtime/386/asm.s:150
#1  0x08048c17 in main.main () at /home/jra/go-stuff/stack.go:3
#2  0x08048c1c in main.main () at /home/jra/go-stuff/stack.go:4
#3  0x08048c1c in main.main () at /home/jra/go-stuff/stack.go:4
#4  0x08048c1c in main.main () at /home/jra/go-stuff/stack.go:4
... same thing for lots and lots of pages ...
#1014 0x08048c1c in main.main () at /home/jra/go-stuff/stack.go:4
#1015 0x08049473 in runtime.mainstart ()
    at /home/jra/go/src/pkg/runtime/386/asm.s:85
#1016 0x0804e5cd in runtime.initdone ()
    at /home/jra/go/src/pkg/runtime/proc.c:145

Interesting stuff happens before and after this, so let’s try to sort through it. First, why did morestack get called? Because we are out of stack space. But how did we know? Ian’s explanation of split stacks lays this out pretty well: on the entry into every subroutine, we need to check how much space is left, and if it’s “not enough”, get a new stack to run on.

When you take a look at the assembly listing for main as it leaves 8g, there’s nothing to see:

$ 8g -S stack.go | head -10
--- prog list "main" ---
0000 (stack.go:3) TEXT    main+0(SB),$0-0
0001 (stack.go:4) CALL    ,main+0(SB)
0002 (stack.go:5) RET     ,

But when you take a look at the assembly listing for main generated by 8l, you see something else:

$ 8l -a stack.8 | head -8
codeblk [0x8048c00,0x805332a) at offset 0xc00
8048c00	main.main            | (3)	TEXT	main.main+0(SB),$0
8048c00	658b0d00000000       | (3)	MOVL	(GS),CX
8048c07	8b49f8               | (3)	MOVL	-8(CX),CX
8048c0a	3b21                 | (3)	CMPL	SP,(CX)
8048c0c	7709                 | (3)	JHI	,8048c17
8048c0e	31d2                 | (3)	MOVL	$0,DX
8048c10	31c0                 | (3)	MOVL	$0,AX
8048c12	e8c7080000           | (3)	CALL	,80494de+runtime.morestack
8048c17	e8e4ffffff           | (4)	CALL	,8048c00+main.main
8048c1c	c3                   | (5)	RET	,

So already we’ve found something interesting, which is that part of split stacks is implemented in the linker, not in the compiler. I don’t know why this is, but it’s probably another nice example of Broken Abstractions in Go.

And what’s that code doing? It says, “find the current goroutine’s struct G, and compare the current stack pointer to g.stackguard”. In the Go runtime (Linux, x86), the GS segment register points to a segment descriptor table entry which was set using modify_ldt(2). The segment descriptor arranges that references like 0(GS) refer to the base of the struct Gobuf, which in turn has a pointer to the current struct G for the goroutine. The comparison is checking to see if we are about to run off the end of the stack. But what are the actual contents of these things? Let’s have a look, with a newly started copy of the program:

$ gdb 8.out
(gdb) b main.main
Breakpoint 1 at 0x8048c00: file /home/jra/go-stuff/stack.go, line 3.
(gdb) r
Starting program: /home/jra/go-stuff/8.out 

Breakpoint 1, main.main () at /home/jra/go-stuff/stack.go:3
3	func main() {
Current language:  auto; currently minimal
(gdb) disas
Dump of assembler code for function main.main:
0x08048c00 :	mov    %gs:0x0,%ecx
0x08048c07 :	mov    -0x8(%ecx),%ecx
0x08048c0a :	cmp    (%ecx),%esp
0x08048c0c :	ja     0x8048c17
0x08048c0e :	xor    %edx,%edx
0x08048c10 :	xor    %eax,%eax
0x08048c12 :	call   0x80494de
0x08048c17 :	call   0x8048c00
0x08048c1c :	ret
End of assembler dump.
(gdb) si
0x08048c07	3	func main() {
(gdb) si
0x08048c0a	3	func main() {
(gdb) info reg
eax            0x0	0
ecx            0xb7dc3000	-1210306560
edx            0x1	1
ebx            0x806a55c	134653276
esp            0xb7dc50d4	0xb7dc50d4
ebp            0xbf8feedc	0xbf8feedc
esi            0xb7dc3060	-1210306464
edi            0xb7dd2100	-1210244864
eip            0x8048c0a	0x8048c0a
eflags         0x200296	[ PF AF SF IF ID ]
cs             0x73	115
ss             0x7b	123
ds             0x7b	123
es             0x7b	123
fs             0x0	0
gs             0x3f	63
(gdb) x 0xb7dc3000              <-- find out what (cx) is (i.e. stack guard)
0xb7dc3000:	0xb7dc4100

Voila! This time through, we are not going to end up in runtime.morestack because the current stack pointer (0xb7dc50d4) is greater than the stack guard (0xb7dc4100).

And how far above the end of the stack is the stack guard? This is explained in a giant comment in runtime.h, at least giant for the Go team’s standards, which is “less is better, even for comments”. 🙂 We can take a look at it for ourselves as well. According to the definition of struct G, the stack base is the next pointer down from the stackguard:

(gdb) x 0xb7dc3004               <-- cx + 4
0xb7dc3004:	0xb7dc50dc   <-- the stack base

Now, stack base in this context is “the place where the stack starts growing down from”. So we have the following interesting places:

  • 0xb7dc50dc: stack base
  • 0xb7dc50d4: current stack pointer
  • 0xb7dc4100: stack guard
  • 0xb7dc4000: end of the stack (see the constant StackGuard defined in runtime.h)

If we fast forward a few thousand iterations of main, the current stack pointer is going to move down, one return address at a time, until the current stack pointer is less than the stack guard. At that point, we know we need more stack. We still have 128 bytes to work with, but some routines in the Go runtime can’t or don’t want to check for stack underflow, so we set the stack guard high enough to leave room for them to run. On Windows, the user-space exception handler could always get called, so the stack guard is much higher (2048 bytes) to leave room for it to work, since we don’t get a chance to hook it and teach it about split stacks.

As an aside, when you are reading the sources and you see “#pragma textflag 7” or when you see an assembly routine that starts like “TEXT runtime·exit(SB),7,$0”, it’s a routine that will not be checking for possible stack underflow. So it, and any sub-calls, had better require a maximum of 128 bytes of stack. This “7 flag” is a note from the compiler (via the assembler) to the linker asking it to refrain from adding the stack underflow check.

To take a look at the next step of the process, we need to turn our attention to morestack, which is found in runtime/386/asm.s. Things get complicated here, but there’s two steps that are important for our investigation: first, morestack puts some information away in the current struct M so that it can use it later including the size of the current stack frame and argument list of the current function call and the return address of the caller of the function that just got pre-empted. It is going to need this information to link the new stack to the old one.

Note that morestack is in runtime/386/asm.s. It is processor-dependent, because it is grabbing stuff off the stack in a processor-dependent way. But it is not OS-dependent, which is interesting. On the other hand, the code that sets the segment registers (mentioned above) is OS dependent, because changing the global descriptor table requires ring-0 privileges.

When morestack has saved away what it needs, it changes stacks to the scheduler’s stack, and then transfers control to runtime.newstack. Allocating a new stack is a sizeable chunk of work, because stacks are allocated using exactly the same mechanism as any other piece of memory allocated by the Go runtime. So we need to make sure we have a full stack available for runtime.mallocgc to run on. Note that morestack leaves via a CALL, but kills itself if it returns, because it knows that newstack is going to retun in a funky way (keep reading…). We’ll have to remember to take a look at newstack to find out how the return address pushed by CALL gets cleaned up.

Newstack has some complexity related to reflection, so before we start let’s just agree to ignore that ok? We’ll come back to reflection low-level details some other day. Once we ignore the reflection stuff, newstack ends up relatively simple: allocate a new range of memory, decorate the top of it with something called a “struct Stktop”, then exit in a funky way.

The allocation step is fairly straightforward: calculate the size of the new stack, then . The size calculation is a bit interesting: the new stack will be the maximum of StackBig or of the amount of space needed by this function. StackBig is 4096 (8192 on Windows, see the comments for why). So if you’ve got one function in a call-chain that wants more stack space than the stack currently has, and even more than the stack is normally extended, it will get a stack of it’s own, and then when it calls another function, it may cause a second trip through morestack. Some magic happens inside of stackalloc to decide if it is currently safe to allocate a new stack with mallocgc, or if the new stack should come from a fixed-size allocator instead. Also, since the most commonly used stack is StackBig + StackExtra bytes long, there’s a shurtcut to avoid the heavier mallocgc route for that as well. So while the shorthand explanation of split stacks is that it is a technique to make stacks as easy to deal with as garbage collected objects, it’s not true that all stacks come directly from and go back into, the garbage collected pool.

Once the new stack is allocated, newstack “decorates” the top of it. It fills in a struct Stktop with information that will be used later when we are cleaning up the stack and transferring stacks from this one back to the last one. Next, newstack moves the frame and the arguments from the call that was preempted onto the new stack. With that we are ready to put the new stack to use.

This is where the funky exit comes. newstack uses gogocall, which is basically how the Go scheduler implements context switches. It switches onto the new stack and then arranges that the next RET that we hit (i.e. the RET at the end of the function that was preempted) will result in a jump to a certain function, called runtime.lessstack. gogocall exits with a JMP to the preempted function, so as not to mess up the stack it’s just put in order, and the preempted function starts running again on it’s new stack!

lessstack, as the name implies, is the counterpart to morestack. It runs in the context of the old stack, but its job is to save things and prepare for the call to oldstack (the counterpart to newstack). It needs to save the return value, then switch onto the scheduling stack, for the same reason that morestack needed to. oldstack, as the counterpart to newstack does what you’d expect: it gets us off of the current stack (freeing it) and starts executing on the old stack. We need to carry two pieces of state from the returning function, the return value (in AX) and the “return arguments”, i.e. the results of the computation that was left on the stack. I haven’t looked into the Go call semantics enough to understand the difference between AX and the stuff returned on the stack, but regardless of wether I understand it or not, oldstack saves it. Then oldstack pulls a trick like newstack, exiting via gogo, which moves to the old stack by setting SP and PC to the values that morestack staved from the caller way back when.

Open question: if morestack is called twice before oldstack is called, why isn’t the original caller’s SP/PC stomped?

So, now you’ve been all the way down the rabbit hole with me. Did you have fun? Got questions, or answers to my open question?

How to control your HTTP transactions in Go

The Go http pacakge has http.Get and http.Post, which make it easy to do GET and POST operations. They are meant for client use. They implement things from the point of view of a naïve client, one that just wants to give a URL and get back the results. They don’t want to chase redirects, they don’t want to set their headers specially, they just want to, in one line, get the results.

When you are writing an HTTP proxy server, you can’t be so naïve. For example, the HTTP client in a proxy server must not do redirect chasing. Instead, it needs to get the redirect status code and Location header, and then send it back down to the proxy’s client. If it wants to chase the redirect, fine.

If you read the guts of http.Get and http.Post, they prepare http.Request structures with the right stuff inside, then pass them to an internal function called send. So you might think you should do something like this as well, for instance writing something like what’s inside the http.send function yourself. However, this is the wrong way to go, there’s a better way.

The trick is to make an http.ClientConn, then feed the http.Request into it, and read the http.Response out of it. If you want to keep the connection open, you can (set req.Close to false), then you can do more HTTP requests on the same connection later. The ServeHTTP function in my proxy demonstrates how to do this, in the case where you close the connection. Creating a connection pool is on the to-do list, though my first guess on how to do it is to let a single goroutine act as the warehouse of open connections, checking them out and back in to the central cache in response to messages on a channel.

An HTTP proxy needs to copy through the headers from the proxy client to the origin server. However, some headers, like those related to the transfer format and connection closure should not be copied through. This is sort of a weakness in the HTTP protocol, that it confuses transport and content like this, but it is what it is and we’ve got to deal with it.

One thing that gave me a bit of trouble was oreq.RawURL. I originally set it to creq.RawURL, then used http.ParseURL(oreq.RawURL) to fill in oreq.URL (with an error check, of course). The problem is that http.Request.Write() will prefer the contents of RawURL to URL. That means that you send the full URL to the origin server, instead of only the path part. With some servers, they don’t care. With others, it results in wildly incorrect redirects, for instance fetching “http://blog.nella.org” (no trailing slash: should result in a redirect to /) gives a Location of “blog.nella.org/”, which in turn results in your browser “http://blog.nella.org/blog.nella.org/”. Redundantly, repetitively false and incorrect.

I also wondered if it even makes sense to re-parse the URL

With this addition to the proxy code mentioned yesterday, you’ve got a basic HTTP proxy that I’ve used to surf many different static and dynamic sites. It seems to work, at least well enough as a base to experiment with. Next stop, the cachability check, and adding the right code to ServeHTTP to redirect cachable requests to the object on disk instead of getting it from the origin server.

A rate-limiting HTTP proxy in Go

Hello Go-fans. Missed a week due to a nice little ski vacation, but I promise I was dreaming of Go while riding the ski lifts, so I’ve got something interesting to share with you this week.

I’ve worked in Africa and Indonesia in the past. There, I saw first-hand the possibilities of the Internet, but also the difficulties of using it in remote areas, over slow links, etc. I came up with ideas years ago for what I’d like in a proxy in the field that would make the limited bandwidth on a satellite connection go further. I’ve never had a chance to implement it until now, because hacking on Squid or other proxies just involved too much C hacking and core dumping to suit my patience, especially for a prototype to test ideas on. But Go, on the other hand; it’s just made for playing with this kind of thing!

So today what I have to show you is an HTTP proxy that simulates what happens when all the requests are squeezed through a tiny little channel. It does not simulate things down to the TCP level (i.e. congestion detection and backoff); nor does it simulate the long round-trip times that make satellite connections difficult to optimize. But it simulates low bandwidth, and when I later add features related to semi-offline use, it will help me show the benefits of them better, I think.

The HTTP package in Go seems to be a just a web client and a web server. The tutorials show how to use it for both. Making a server is easy: you call http.ListenAndServe(“:1234”, nil) and you end up with a web server on port 1234. But did you ever wonder what that extra argument is for? The nil at the end? It’s the secret to how to make a proxy, and it’s the first stop on our journey.

http.Handler

The second argument is an http.Handler. When you set it to nil, the http package provides a default, called http.ServeMux. It is designed to let you register callbacks for various sub-URLs inside one server. There’s also the http.FileServer handler, which finds files on disk matching the URL, and serves them up.

The trick to writing an HTTP proxy with the Go http package is to make your own http.Handler. Here’s something like what my first shot at it was:

type Proxy struct {
}

func NewProxy() *Proxy { return &Proxy{} }

func (p *Proxy) ServeHTTP(wr http.ResponseWriter, r *http.Request) {
  var resp *http.Response
  var err os.Error

  switch r.Method {
  default: {
    log.Print("Cannot handle method ", r.Method)
    http.Error(wr, "501 I only handle GET and POST", http.StatusNotImplemented)
    return
  }
  case "GET": {
    log.Printf("getting %v", r.RawURL)
    resp, _, err = http.Get(r.RawURL)
  }
  case "POST": {
    resp, err = http.Post(r.RawURL, r.Header["Content-Type"], r.Body)
    r.Body.Close()
  }
  }

  // combined for GET/POST
  if err != nil {
    http.Error(wr, err.String(), http.StatusInternalServerError)
    loghit(r, http.StatusInternalServerError, false)
    return
  }
  wr.SetHeader("Content-Type", resp.Header["Content-Type"])
  wr.WriteHeader(resp.StatusCode)

  io.Copy(wr, resp.Body)

  resp.Body.Close()
  loghit(r, resp.StatusCode, false)
}

And I use it like this:

func main() {
  proxy := NewProxy()
  err := http.ListenAndServe(":12345", proxy)
  if err != nil {
    log.Exit("ListenAndServe: ", err.String())
  }
}

Note that my new type, Proxy, doesn’t have anything in it. Though it might pick up more stuff later, for now, it’s job is to have the ServeHTTP method attached to it, so that *Proxy can satisfy the http.Handler interface. The real proxy behavior happens in ServeHTTP, where we turn incoming GETs and POSTs into outgoing GETs and POSTs, then arrange that the answers from the origin server go back to the requester. The key line that does that is io.Copy, where we copy the bytes arriving on the response from the origin server onto the request that came from the HTTP client (i.e. the web browser configured with localhost:12345 as it’s proxy server).

Now, it doesn’t take a rocket scientist to see that we can implement a caching proxy by looking at the request arriving in the http.Request, and either take something off of disk or get something from the network to fill in the http.ResponseWriter. That’s one direction we can expand this proxy, and I hope to do that in coming weeks, to implement a proxy that can use packages of offline content to dramatically reduce the amount of network traffic.

It does, however, take a rocket scientist to get HTTP proxying and caching right. For caching, there are several headers that have to be parsed and consulted, including Expires, E-tag, Cache-control and If-modified-since. Even getting proxying right, without caching is not so easy. To get anything working at all, the code above had to copy across the Content-Type header. But by not copying headers from the client inbound to the proxy, and by not copying all the headers back from the origin server, we are breaking cookies and HTTP redirect.

In fact, this issue of broken redirects lead me on a big side-trip where I learned more about the http package, and improved the proxy. I’ll write about that in a second post, so that we can keep going here to the next interesting thing.

You can see the source to the proxy here, but beware, ServeHTTP has morphed quite a bit from the super-simple GET/POST thing above. But you’ll have to read the next article to find out why.

linkio

The next thing I have to show you, the reader, is something called linkio. This is a package I wrote which wraps an io.Reader so that when you read from it, you get the bytes back at the same speed they would arrive if they were coming over a fixed capacity link to you, sharing bandwidth with all the other traffic on the link.

linkio shows two interesting Go design patterns. First, it is a wrapper for the io.Reader interface. The constructor takes an io.Reader as input, then stores it away, along with some other information. The object returned by the constructor satisfies io.Reader, so it can be plugged in exactly where the previous io.Reader was. Take a look at the code that needs to change to make use of linkio:

import (
  ...
  "jra-go.googlecode.com/hg/linkio"
)

var gLink *linkio.Link

func init() {
  gLink = linkio.NewLink(56 /* kbps */)
}

...

// then in ServeHttp, change this:
  io.Copy(wr, resp.Body)
// to this:
  io.Copy(wr, gLink.NewLinkReader(resp.Body))

Note how the real change in the program is one line! That’s because all the timing stuff, and the way to multiplex all the transactions onto a single link is put inside of linkio, and the goroutine launched inside the NewLink constructor.

The second interesting thing is that linkio uses a buffered channel to simulate the link itself. Each LinkReader wants to release it’s data in dribs and drabs to the reader, according to the current shared state of the link. The defining characteristic of a link is that it can send data for only one stream at a time (when we zoom in the the level of individual packets, anyway). So we simulate it with a goroutine that loops on the input channel (representing the interleaved flow of packets) and sleeps according to how long each packet would occupy the link in real life. If the link is unoccupied, then the loop gets blocked on the channel read. Once a simulated packet arrives (called a linkRequest in the code), the loop then becomes blocked on the sleep. When the sleep is done, the link goroutine needs to tell the stream that requested that this packet be simulated that the sleep is done. It does that by sending a bool on the channel that arrived with the simulated packet. (This pattern of sending a channel with a request to another goroutine is the Go idiom for asynchronous notification. A very useful trick: learn it!)

This scheme only works if the LinkReaders let loose their data a little at a time. If one sent in a simulated packet that was 10 megabytes, it would ruin the effect. That one “packet” would hog the link to the exclusion of all others, making that one stream perceive a bit rate of 100% of the link, instead of it’s equal share. The place where the LinkReaders enforce this fairness is in func (* linkio.LinkReader)Read. The prototype is Read([]byte) n int, os.Error err. When len(buf) is 1000, there’s no requirement to actually return 1000 bytes. You can return any number of bytes less than or equal to cap(buf), using n to tell the caller how many of the bytes in buf are valid, new bytes that they should read.

So, there you have it. A nice little example of a multi-threaded, rate-limiting HTTP proxy in Go. Stay tuned to the next article, where I explain why http.Get and http.Post are not really what you want in an HTTP proxy, and how to get what you DO want from the http package.

Gobs on the wire

This week, I want to talk about how to use Go to write a client/server system with both synchronous requests and asynchronous event notifications.

To help learn Go, I wrote a clone of the Conserver console server. Now, of course, the world didn’t need another console server, but it turned out to be an interesting experiment, because what a console server does is well suited to the tools Go gives a programmer. A console server’s job is to gather the output from one or more serial cables (or, when there’s a terminal server in the mix, one or more TCP connections implementing the Rtelnet “protocol”). It logs that output to files, and makes the output available in realtime to zero or more watchers. It allows one and only one of the watchers to be read-write, to actually control the device.

The Go version of the console server is called gocons. Go have a look at it, and then come back here. As usual, I want to point out some things I learned about Go while writing it.

My first attempt at making this thing work was with netchan. I thought it would be cool if the client could send in a request to the server asking to be notified on a certain channel when events happened on a certain console. But that crashed and burned because netchan can’t send channels. I think it would be theoretically possible to put the right stuff together to make it possible for netchan to send channels, but it would obviously be really hard, and the semantics of the channels would never really be the same. So, moving right along…

The next idea was to use the rpc package to implement the protocol between the client and the server. But that ran up onto another problem; RPC, by its very definition is synchronous. There is no clear way to use the RPC package to let the server send down asynchronous events to the client, like “these bytes arrived”. What I needed was to write my own wire protocol, one that could do both synchronous calls like “Can I listen to this? Yes.” and also asynchronous events like, “the console you are watching just output these bytes”. I’ve written protocols before, and it’s not hard, just a bit finicky. But Go provides the gob package, and that’s all you need to make a protocol.

So what you do is define types that represent the protocol messages and then let gob take responsibility for marshalling and unmarshalling, as well as figuring out the message delimiters. In our case, the types are called connReq and connReply. In a perfect world, these would be public types in some library, and the client and the server would both use them. In gocons, I got lazy and just copied and pasted them. The client does a gob.Decode on the net.TCPConn, and the result has to be a connReq (if it isn’t, something went wrong, and the client could either kill the connection, or try to decode the next thing on the connection). Because Go doesn’t have unions (they aren’t type safe) the connReq and connReply have to have all the fields in them that could be needed, even if a given protocol message doesn’t use them all. I haven’t calculated the protocol overhead of this, but because the potentially unused fields are things like byte slices and strings, it can’t be much; and empty byte slice will be encoded as a nil, not as a bunch of zeros for the whole underlying byte buffer.

A more sophisticated version of this idea might be to make a heirarchy of types, the base one being the simplest (with only an int in it to indicate the type) and the more complicated ones embedding the base one. But it’s hard to know what type you’d give to gob.Decode to write into; it seems like you’d have to separate out the two halves of the protocol message and send them with two calls to go.Encode. The first would be the int telling what kind it is, then the second gob would be precisely the right “shape” for the data. In any case, I didn’t do it that way for gocons. Simpler is better!

In the server, there are two pieces of code that are kind of interesting. One is the use of JSON for the config file format. The other is how incoming data it sent out to all the listeners.

First the simple one. It’s just an example of how to grab data out of a JSON file without using json.Unmarshall. I didn’t understand it, so while playing with json.Decode, I managed to get this working, and just left it at that. I don’t claim this is nice or pretty, but it works, and it might be useful to other people looking for examples on how to read JSON in Go.

The expected input is something like this:

{
    "consoles": {
        "firewall": "ts.company.com:2070",
        "web01": "ts.company.com:2071",
        "web02": "ts.company.com:2072",
        "web03": "ts.company.com:2073"
    }
}

The goal is to call addConsole once for each of the key/value pairs in the consoles hash.

Here’s how, if you don’t want to (or know how to) use json.Unmarshal:

  r, err := os.Open(*config, os.O_RDONLY, 0)
  if err != nil {
    log.Exitf("Cannot read config file %v: %v", *config, err)
  }
  dec := json.NewDecoder(r)
  var conf interface{}
  err = dec.Decode(&conf)
  if err != nil {
    log.Exit("JSON decode: ", err)
  }
  hash, ok := conf.(map[string]interface{})
  if !ok {
    log.Exit("JSON format error: got %T", conf)
  }
  consoles, ok := hash["consoles"]
  if !ok {
    log.Exit("JSON format error: key consoles not found")
  }
  c2, ok := consoles.(map[string]interface{})
  if !ok {
    log.Exitf("JSON format error: consoles key wrong type, %T", consoles)
  }
  for k, v := range c2 {
    s, ok := v.(string)
    if ok {
      addConsole(k, s)
    } else {
      log.Exit("Dial string for console %v is not a string.", k)
    }
  }

The general pattern here is that json.Decode gives you interface{}, and you need to use type selectors to work your way down the structure, and actually get the stuff you expect to be there out of it.

An infinitely prettier way to do this is to use json.Unmarshal. It’s a bit hard to understand from the docs, but this blog post makes it crystal clear.

The server is made up of a series of goroutines each running in blocking loops to handle their bit of i/o. Each console that is monitored has a read goroutine and a write one. The reader brings bytes in from it, and dispatches them to all the gocons clients who are listening. It maintains the list of clients in a linked list, but another data structure would work as well. The key is that the clients are not stored in the list as net.TCPConn’s, but as channels. What’s watching the other end of those channels for new data is the client’s proxy goroutine. Each time a client connects, a pair of goroutines are created, one for read, one for write. This allows us to do blocking reads on the input (search for dec.Decode for an example) without worrying about blocking other jobs in the server.

By keeping one single goroutine in charge of writing over the TCP connection, you don’t have to do any locking. You could, in principle, have multiple console managers all saying, “I’ve got something that should be multiplexed onto that TCP connection!” but not worry about them stomping on each other while writing to the connection. (The current implementation only listens to one console at a time.)

Here’s a snippet showing how we package up and send the notification that something new arrived on to all the console watchers:

    select {
      // a bunch of other channels to monitor here...
      case data := <-m.dataCh:
        if closed(m.dataCh) {
          break L
        }
        // multicast the data to the listeners
        for l := listeners; l != nil; l = l.next {
          if closed(l.ch) {
            // TODO: need to remove this node from the list, not just mark nil
            l.ch = nil
            log.Print("Marking listener ", l, " no longer active.")
          }
          if l.ch != nil {
            ok := l.ch <- consoleEvent{data}
            if !ok {
              log.Print("Listener ", l, " lost an event.")
            }
          }
        }

So, we make a new consoleEvent and send it to each listener. This is a bit wasteful: it makes a lot of garbage, which means the garbage collector will need to work harder. It might have been possible to make one consoleEvent, then send the same one to all the listeners. But if you are going to share memory like that, it is up to the programmer to insure that the receivers of the shared memory either treat it as read only, or use a mutex to control access to it. In our case, it is used in a read-only way, but far away in the code:

    // new event arrived from the console manager, so send it down the TCP connection
    case ev := <-evCh:
      reply.code = Data
      reply.data = ev.data
      err := enc.Encode(reply)
      if err != nil {
        log.Print("connection ", c.rw, ", failed to send data: ", err)
        break L
      }

This model, of two goroutines per thing that needs reading and writing, is magic. It dramatically reduced the code needed to implement gocons. The original console server has hundreds of lines of complicated code worrying about setting up the select mask, waiting on select, deciding if the signaled fd needs an accept(), or a read() or whatever (and finding the right data structures that correspond to the fd that just became usable). In gocons, and other idiomatic Go programs like the http server implemented in package http, you use blocking reads, and let the Go runtime arrange for the system has a whole to not block.

It is interesting to think about what would happen if writing to a client TCP connection were to block, though. As the blockingness is transitive in this system as written, it would eventually back up and block the thing that's supposed to be reading from the console, blocking all other clients. To protect against that, you need to put some firewalls in place; shared resources should not let individuals block them. You'd need to put short queues on the channels between the client proxy goroutines and the console manager reading goroutine, then have it do non-blocking writes on the channel, and if one would block, do something about it. For example, it could close the channel, in order to say, "hey, you're not draining things, and you need to clean up your act and come back to me later".

Writing and debugging this server in Go made me learn a bunch of things. I've still got lots to learn; the code still has some mysteries in it, like why I need a runtime.Gosched() to keep things from blocking, and how to deal with the fact that a closed channel causes a select to always fire on it. There's another workaround for a mystery hidden in setOwner: my first approach uncovered a bug (in the Go runtime, or in my understanding of it) that I had to workaround with a "pump goroutine" that forwards data from one place to another.

The impatient producer

Last week I showed how a channel could link a producer and a consumer, and I idly speculated on how you’d set the depth of the queue, relative to the rate and variability of the producer and the consumer’s need for new input.

This weekend, I got to thinking about my next interesting Go project. It will be an HTTP proxy that can do some nifty tricks to offload a slow Internet link. One of the features I thought I might add would be something to send all the new HTML it sees through a secondary processing system in order to find extra assets it might want to cache. This is a case of a producer that wants to act totally independently of the consumer. If the secondary processing goroutine is slow, or hung inserting something into a database, or whatever, the producer (the HTTP reply) can’t hang — he needs to reply to the client.

The solution is to hook up a dispatcher between the producer and one or more consumers. If you just wanted several consumers working, they can just all read from the same channel, and safely take the next item. But if you want to give some kind of guarantee to the producer that it won’t hang, you need to put the dispatcher in the loop. (Of course, the producer can also check for itself that it’s not going to hang by doing a non-blocking write on the channel, then throw away the work unit if it would block. But if you don’t want things getting thrown away, the dispatcher solution is better.) The dispatcher is written so that it can’t block. It finds a consumer ready to take the work unit, and if it doesn’t, then it makes a new one and sends the work unit to the new consumer. If the consumers are too slow, instead of making more of them (risking to make the system suffer collapse) it panics. Not nice, but better to fail fast, and clearly.

The result is in the jra-go.googlecode.com/hg repository as a module named autocon. Have a look.

As usual, I want to point out some things I liked/disliked about how this turned out.

First, take a look at the autoConsumer type: it’s private. Why? Because the interface this thing gives to the outside world (the producer) is just a channel. The producer doesn’t get a pointer to an object, it just gets the channel it will be talking to. If you wanted to tune or monitor the dispatcher, this wouldn’t work. But I decided to start out with the simplest thing, and this is it. In fact, when it came time to write the tests cases in autocon_test.go I had a problem; I wanted to know what was happening inside, but I didn’t even have a pointer to the autoConsumer to go dig into it, because once NewAutoConsumer returns, the only thing in the entire system that has a pointer to the autoConsumer is run(). I solved that problem with lexically bound channels, so that my workers had access to the same channel that the test routine did. Then the test routine empties the channel looking to see if the right amount of garbage comes out of it (nothing more, nothing less).

One thing that’s not too nice is I had to add a runtime.Gosched in the test routines to get them to work. And as usual with scheduling problems, it’s sensitive. When I was debugging scheduling problems using fmt.Print, I got different behavior than when I debugged using println. Why? Because the fmt package uses a channel to implement a leaky bucket of unused pp structures, and the scheduling decisions are disrupted by popping and pushing them inside of fmt.Println. Lesson: beware of fmt when debugging scheduler problems.

A better solution to finessing the scheduler with runtime.Gosched would be to arrange for a positive signal to come back that the workers had run and done their job and were ready to have the work checked. But because channels don’t have a “wait until 10 are outstanding” kind of primitive, I would have instead needed to implement some kind of condition variable. It didn’t seem like the right place, down in a test routine, to be doing that!

Another thing I really don’t like about Go, but don’t have a solution for, is the boilerplate “if closed()” after “for x := range ch”. When a goroutine is blocked on a read on a channel, and another goroutine closes it, the channel becomes readable. The result it returns is a nil. If you don’t want to litter your code with a bunch of “if x != nil”, you need to catch that case early and bail out. Luckily, the closed() primitive tells you what you need to know. I think the behavior of closed channels is just right in Go as it is; it is the unavoidable reality check you get when you go from beautiful theory into ugly practice.

And while I am beating up on myself, let me point out that I’m using a label and a labeled goto. Using labels always seems to me like some kind of failure; like it’s just four little characters short of a goto. But the need is undeniable; when you call break inside of nested for loops, the language spec says that the innermost loop will be broken. In this case, I don’t need it, but it somehow improves readability, in that the label explains exactly which loop we are exiting and why. I am deeply conflicted about loop labels, and I suppose the only answer is to keep using them until I get over it, or until I break the habit.

Using a channel as a queue

I try to write a Go article every Monday, but the holiday season has disrupted me a bit. So here’s a little something I dug out of my stocking to get things going again.

I thought an interesting way to play with the exp/draw package, the http client, and the image decoder would be a program to animate the tiles from Google Maps, showing the world scrolling by, as though you were in an airplane looking straight down and flying in a straight line. It came to me in one of those flying dreams… (not really).

The general idea is simple: fetch a tile, and display it, animating it by sliding it to the left while fetching the next one to the east and ensuring that you either get it before the last one scrolled off, or stop the scrolling. That’s a perfect job for goroutines; one should be fetching and one should be in charge of drawing, and should block if the other one is not done. And they should talk via a channel, like all well behaved goroutines do. Because channels are ordered and can be buffered, the channel between the fetcher and the displayer can play the part of a queue in this producer/consumer system. Simple and elegant, just how we like it to be in Go-land.

Here’s the program. You can also get by running “goinstall jra-go.googlecode.com/hg”, then go look in your $GOROOT/src/pkg/jra-go.googlecode.com/hg/cmd/mapscroll directory.

package main

import (
  "log"
  "fmt"
  "time"
  "http"
  "exp/draw"
  "exp/draw/x11"
  "image"
  "image/png"
  "image/jpeg"
)

func imagePuller(urls chan string, imgs chan *image.Image) {
  for url := range urls {
    r, _, err := http.Get(url)
    if err == nil {
      log.Print("Fetched ", url)
      ctype, found := r.Header["Content-Type"]
      if found {
        switch {
        default:
          log.Print("For ", url, ", unknown type: ", ctype)
        case ctype == "image/png":
          img, err := png.Decode(r.Body)
          if err == nil {
            imgs <- &img
          } else {
            log.Print("For ", url, ", decode error: ", err)
          }
        case ctype == "image/jpeg" || ctype == "image/jpg":
          img, err := jpeg.Decode(r.Body)
          if err == nil {
            imgs <- &img
          } else {
            log.Print("For ", url, ", decode error: ", err)
            return
          }
        }
      } else {
        log.Print("For ", url, ", no content type.")
      }
      r.Body.Close()
    } else {
      log.Print("Error fetching ", url, ": ", err)
    }
  }
}

func urlGen(urls chan string) {
  x := 33981
  y := 23179
  for {
    url := fmt.Sprintf("http://khm1.google.com/kh/v=74&x=%d&s=&y=%d&z=16&s=Ga", x, y)
    urls <- url
    x++
  }
}

func processEvent(ch <-chan interface{}) {
  for {
    if closed(ch) {
      log.Exit("X display closed.")
    }
    ev, ok := <-ch
    if !ok {
      // no events, return
      return
    }

    switch ev.(type) {
    case draw.ErrEvent:
      log.Exit("X11 err: ", ev.(draw.ErrEvent).Err)
    }
  }
}

func main() {
  urls := make(chan string, 4)
  imgReady := make(chan *image.Image, 4)

  go imagePuller(urls, imgReady)
  go urlGen(urls)

  xdisp, err := x11.NewWindow()
  if err != nil {
    log.Exit("X11 error: ", err)
  }
  screen := xdisp.Screen()

  tileHeight := 256
  tileWidth := 256
  numTiles := screen.Bounds().Dx()/tileWidth + 2
  tileStrip := image.NewRGBA(numTiles*tileWidth, tileHeight)

  // pre-load the tile strip
  for i := 0; i < numTiles; i++ {
    iptr := <-imgReady
    img := *iptr
    draw.Draw(tileStrip, image.Rect(i*tileWidth, 0, i*tileWidth+tileWidth, tileHeight), img, image.ZP)
  }

  topBlack := (screen.Bounds().Dy() - tileHeight) / 2
  for {
    for x := 0; x < tileWidth; x += 15 {
      then := time.Nanoseconds()
      draw.Draw(screen, image.Rect(0, topBlack, screen.Bounds().Dx(), topBlack+tileHeight), tileStrip, image.Pt(x, 0))
      now := time.Nanoseconds()
      frameTime := now - then

      // a flush is just a write on a channel, so it takes negligible time
      xdisp.FlushImage()

      toSleep := 0.1*1e9 - frameTime
      //      log.Print("Took ", frameTime, " ns to draw, will sleep ", toSleep, " ns")
      time.Sleep(toSleep)
      processEvent(xdisp.EventChan())
    }

    // shift tiles in tileStrip and put in new last one
    draw.Draw(tileStrip, image.Rect(0, 0, (numTiles-1)*tileWidth, tileHeight), tileStrip, image.Pt(tileWidth, 0))
    iptr := <-imgReady
    img := *iptr
    draw.Draw(tileStrip, image.Rect((numTiles-1)*tileWidth, 0, numTiles*tileWidth, tileHeight), img, image.ZP)
  }
}

Some little details to point out:

Note how I made the goroutine that fetches urls read from a chan string to get its URLs instead of making them up itself. This compartmentalizes the logic, and has nothing to do with parallelism. A good practice in Go is to use channels to improve clarity, not only as part of a scheme to increase parallelism. Your goal is to express the intrinsic "this hands to this" nature of your program, and if that results in a speedup with multiple cores, fine. If it just results in clearer code, even better!

The imgReady channel is buffered, 4 deep. That means I'll have 4 images queued up and ready at all times. When the head is popped, there are 3 left. If each tile took 1 second to scroll by, then the maximum delay for any one fetch would be 3 seconds, and the fetches should average under one second. For this little app, it's over thinking it; but the same analysis would hold for other asynchronous producer/consumer systems of this type.

The images themselves are not sent through the channel, only pointers to them. This is to reduce the amount of copying that the runtime has to do. For this application the difference would probably be almost undetectable, just a minor increase in CPU time (the dominant user of CPU time is the draw.Draw which moves the strip across the screen). But it seemed like a perfect place to be sending pointers instead of the objects themselves; in Go we share memory by communicating, not communicate by sharing. But a good compromise is where one goroutine says to the other "I'm done with the thing we can both access that is at the end of this pointer, you take it".

The animation in this beast is definitely not the smoothest and most beautiful. The way exp/draw/x11 puts stuff on the screen is pretty costly, and it is too far removed from the X server itself to be able to synchronize updates and avoid flashing artifacts. Remember that the "exp" in exp/draw/x11 means "experimental" and it probably won't stay around for the long haul. Perhaps linking against an OpenGL library using cgo would let me make the animation smoother. Perhaps I'll look into that one day...

Take a look at that unassuming processEvents hidden down in the middle of the inner loop. It is important to get things to close down nicely, and even to keep your window running without blocking on unhanded events. Without it, if you close the window, the system keeps fetching and updating the in-memory copy of the screen image. Every time through, processEvents empties the event channel, ignoring stuff it doesn't care about. This is important, because if the event channels fill up, the X11 window hangs because the sends to the event channel are blocking (see $GOROOT/src/pkg/exp/draw/x11/conn.go, search for eventc). Even if you have no UI, you still need to empty that sucker, because just moving the mouse over the window creates events. (I can't reproduce the theoretical hang I'm talking about now, so I could be wrong on this part. But empty the channel, just to be safe.)

Update: Andrew tells me that not bailing early is non-idiomatic (see imagePuller, "if found") and results in more indents than necessary. I tend to agree, but obviously I'm not quite yet in the habit of doing things the Go way. Thanks for the feedback, Andrew.

PS: The starting location is Lausanne, my beautiful city. The x++ causes it to fly due east from Lausanne. If you wait until you get to 180 degrees east, it will likely fail, because I didn't handle global wraparound right. 🙂