Gobs on the wire

This week, I want to talk about how to use Go to write a client/server system with both synchronous requests and asynchronous event notifications.

To help learn Go, I wrote a clone of the Conserver console server. Now, of course, the world didn’t need another console server, but it turned out to be an interesting experiment, because what a console server does is well suited to the tools Go gives a programmer. A console server’s job is to gather the output from one or more serial cables (or, when there’s a terminal server in the mix, one or more TCP connections implementing the Rtelnet “protocol”). It logs that output to files, and makes the output available in realtime to zero or more watchers. It allows one and only one of the watchers to be read-write, to actually control the device.

The Go version of the console server is called gocons. Go have a look at it, and then come back here. As usual, I want to point out some things I learned about Go while writing it.

My first attempt at making this thing work was with netchan. I thought it would be cool if the client could send in a request to the server asking to be notified on a certain channel when events happened on a certain console. But that crashed and burned because netchan can’t send channels. I think it would be theoretically possible to put the right stuff together to make it possible for netchan to send channels, but it would obviously be really hard, and the semantics of the channels would never really be the same. So, moving right along…

The next idea was to use the rpc package to implement the protocol between the client and the server. But that ran up onto another problem; RPC, by its very definition is synchronous. There is no clear way to use the RPC package to let the server send down asynchronous events to the client, like “these bytes arrived”. What I needed was to write my own wire protocol, one that could do both synchronous calls like “Can I listen to this? Yes.” and also asynchronous events like, “the console you are watching just output these bytes”. I’ve written protocols before, and it’s not hard, just a bit finicky. But Go provides the gob package, and that’s all you need to make a protocol.

So what you do is define types that represent the protocol messages and then let gob take responsibility for marshalling and unmarshalling, as well as figuring out the message delimiters. In our case, the types are called connReq and connReply. In a perfect world, these would be public types in some library, and the client and the server would both use them. In gocons, I got lazy and just copied and pasted them. The client does a gob.Decode on the net.TCPConn, and the result has to be a connReq (if it isn’t, something went wrong, and the client could either kill the connection, or try to decode the next thing on the connection). Because Go doesn’t have unions (they aren’t type safe) the connReq and connReply have to have all the fields in them that could be needed, even if a given protocol message doesn’t use them all. I haven’t calculated the protocol overhead of this, but because the potentially unused fields are things like byte slices and strings, it can’t be much; and empty byte slice will be encoded as a nil, not as a bunch of zeros for the whole underlying byte buffer.

A more sophisticated version of this idea might be to make a heirarchy of types, the base one being the simplest (with only an int in it to indicate the type) and the more complicated ones embedding the base one. But it’s hard to know what type you’d give to gob.Decode to write into; it seems like you’d have to separate out the two halves of the protocol message and send them with two calls to go.Encode. The first would be the int telling what kind it is, then the second gob would be precisely the right “shape” for the data. In any case, I didn’t do it that way for gocons. Simpler is better!

In the server, there are two pieces of code that are kind of interesting. One is the use of JSON for the config file format. The other is how incoming data it sent out to all the listeners.

First the simple one. It’s just an example of how to grab data out of a JSON file without using json.Unmarshall. I didn’t understand it, so while playing with json.Decode, I managed to get this working, and just left it at that. I don’t claim this is nice or pretty, but it works, and it might be useful to other people looking for examples on how to read JSON in Go.

The expected input is something like this:

{
    "consoles": {
        "firewall": "ts.company.com:2070",
        "web01": "ts.company.com:2071",
        "web02": "ts.company.com:2072",
        "web03": "ts.company.com:2073"
    }
}

The goal is to call addConsole once for each of the key/value pairs in the consoles hash.

Here’s how, if you don’t want to (or know how to) use json.Unmarshal:

  r, err := os.Open(*config, os.O_RDONLY, 0)
  if err != nil {
    log.Exitf("Cannot read config file %v: %v", *config, err)
  }
  dec := json.NewDecoder(r)
  var conf interface{}
  err = dec.Decode(&conf)
  if err != nil {
    log.Exit("JSON decode: ", err)
  }
  hash, ok := conf.(map[string]interface{})
  if !ok {
    log.Exit("JSON format error: got %T", conf)
  }
  consoles, ok := hash["consoles"]
  if !ok {
    log.Exit("JSON format error: key consoles not found")
  }
  c2, ok := consoles.(map[string]interface{})
  if !ok {
    log.Exitf("JSON format error: consoles key wrong type, %T", consoles)
  }
  for k, v := range c2 {
    s, ok := v.(string)
    if ok {
      addConsole(k, s)
    } else {
      log.Exit("Dial string for console %v is not a string.", k)
    }
  }

The general pattern here is that json.Decode gives you interface{}, and you need to use type selectors to work your way down the structure, and actually get the stuff you expect to be there out of it.

An infinitely prettier way to do this is to use json.Unmarshal. It’s a bit hard to understand from the docs, but this blog post makes it crystal clear.

The server is made up of a series of goroutines each running in blocking loops to handle their bit of i/o. Each console that is monitored has a read goroutine and a write one. The reader brings bytes in from it, and dispatches them to all the gocons clients who are listening. It maintains the list of clients in a linked list, but another data structure would work as well. The key is that the clients are not stored in the list as net.TCPConn’s, but as channels. What’s watching the other end of those channels for new data is the client’s proxy goroutine. Each time a client connects, a pair of goroutines are created, one for read, one for write. This allows us to do blocking reads on the input (search for dec.Decode for an example) without worrying about blocking other jobs in the server.

By keeping one single goroutine in charge of writing over the TCP connection, you don’t have to do any locking. You could, in principle, have multiple console managers all saying, “I’ve got something that should be multiplexed onto that TCP connection!” but not worry about them stomping on each other while writing to the connection. (The current implementation only listens to one console at a time.)

Here’s a snippet showing how we package up and send the notification that something new arrived on to all the console watchers:

    select {
      // a bunch of other channels to monitor here...
      case data := <-m.dataCh:
        if closed(m.dataCh) {
          break L
        }
        // multicast the data to the listeners
        for l := listeners; l != nil; l = l.next {
          if closed(l.ch) {
            // TODO: need to remove this node from the list, not just mark nil
            l.ch = nil
            log.Print("Marking listener ", l, " no longer active.")
          }
          if l.ch != nil {
            ok := l.ch <- consoleEvent{data}
            if !ok {
              log.Print("Listener ", l, " lost an event.")
            }
          }
        }

So, we make a new consoleEvent and send it to each listener. This is a bit wasteful: it makes a lot of garbage, which means the garbage collector will need to work harder. It might have been possible to make one consoleEvent, then send the same one to all the listeners. But if you are going to share memory like that, it is up to the programmer to insure that the receivers of the shared memory either treat it as read only, or use a mutex to control access to it. In our case, it is used in a read-only way, but far away in the code:

    // new event arrived from the console manager, so send it down the TCP connection
    case ev := <-evCh:
      reply.code = Data
      reply.data = ev.data
      err := enc.Encode(reply)
      if err != nil {
        log.Print("connection ", c.rw, ", failed to send data: ", err)
        break L
      }

This model, of two goroutines per thing that needs reading and writing, is magic. It dramatically reduced the code needed to implement gocons. The original console server has hundreds of lines of complicated code worrying about setting up the select mask, waiting on select, deciding if the signaled fd needs an accept(), or a read() or whatever (and finding the right data structures that correspond to the fd that just became usable). In gocons, and other idiomatic Go programs like the http server implemented in package http, you use blocking reads, and let the Go runtime arrange for the system has a whole to not block.

It is interesting to think about what would happen if writing to a client TCP connection were to block, though. As the blockingness is transitive in this system as written, it would eventually back up and block the thing that’s supposed to be reading from the console, blocking all other clients. To protect against that, you need to put some firewalls in place; shared resources should not let individuals block them. You’d need to put short queues on the channels between the client proxy goroutines and the console manager reading goroutine, then have it do non-blocking writes on the channel, and if one would block, do something about it. For example, it could close the channel, in order to say, “hey, you’re not draining things, and you need to clean up your act and come back to me later”.

Writing and debugging this server in Go made me learn a bunch of things. I’ve still got lots to learn; the code still has some mysteries in it, like why I need a runtime.Gosched() to keep things from blocking, and how to deal with the fact that a closed channel causes a select to always fire on it. There’s another workaround for a mystery hidden in setOwner: my first approach uncovered a bug (in the Go runtime, or in my understanding of it) that I had to workaround with a “pump goroutine” that forwards data from one place to another.