A rate-limiting HTTP proxy in Go

Hello Go-fans. Missed a week due to a nice little ski vacation, but I promise I was dreaming of Go while riding the ski lifts, so I’ve got something interesting to share with you this week.

I’ve worked in Africa and Indonesia in the past. There, I saw first-hand the possibilities of the Internet, but also the difficulties of using it in remote areas, over slow links, etc. I came up with ideas years ago for what I’d like in a proxy in the field that would make the limited bandwidth on a satellite connection go further. I’ve never had a chance to implement it until now, because hacking on Squid or other proxies just involved too much C hacking and core dumping to suit my patience, especially for a prototype to test ideas on. But Go, on the other hand; it’s just made for playing with this kind of thing!

So today what I have to show you is an HTTP proxy that simulates what happens when all the requests are squeezed through a tiny little channel. It does not simulate things down to the TCP level (i.e. congestion detection and backoff); nor does it simulate the long round-trip times that make satellite connections difficult to optimize. But it simulates low bandwidth, and when I later add features related to semi-offline use, it will help me show the benefits of them better, I think.

The HTTP package in Go seems to be a just a web client and a web server. The tutorials show how to use it for both. Making a server is easy: you call http.ListenAndServe(":1234", nil) and you end up with a web server on port 1234. But did you ever wonder what that extra argument is for? The nil at the end? It’s the secret to how to make a proxy, and it’s the first stop on our journey.

http.Handler

The second argument is an http.Handler. When you set it to nil, the http package provides a default, called http.ServeMux. It is designed to let you register callbacks for various sub-URLs inside one server. There’s also the http.FileServer handler, which finds files on disk matching the URL, and serves them up.

The trick to writing an HTTP proxy with the Go http package is to make your own http.Handler. Here’s something like what my first shot at it was:

type Proxy struct {
}

func NewProxy() *Proxy { return &Proxy{} }

func (p *Proxy) ServeHTTP(wr http.ResponseWriter, r *http.Request) {
  var resp *http.Response
  var err os.Error

  switch r.Method {
  default: {
    log.Print("Cannot handle method ", r.Method)
    http.Error(wr, "501 I only handle GET and POST", http.StatusNotImplemented)
    return
  }
  case "GET": {
    log.Printf("getting %v", r.RawURL)
    resp, _, err = http.Get(r.RawURL)
  }
  case "POST": {
    resp, err = http.Post(r.RawURL, r.Header["Content-Type"], r.Body)
    r.Body.Close()
  }
  }

  // combined for GET/POST
  if err != nil {
    http.Error(wr, err.String(), http.StatusInternalServerError)
    loghit(r, http.StatusInternalServerError, false)
    return
  }
  wr.SetHeader("Content-Type", resp.Header["Content-Type"])
  wr.WriteHeader(resp.StatusCode)

  io.Copy(wr, resp.Body)

  resp.Body.Close()
  loghit(r, resp.StatusCode, false)
}

And I use it like this:

func main() {
  proxy := NewProxy()
  err := http.ListenAndServe(":12345", proxy)
  if err != nil {
    log.Exit("ListenAndServe: ", err.String())
  }
}

Note that my new type, Proxy, doesn’t have anything in it. Though it might pick up more stuff later, for now, it’s job is to have the ServeHTTP method attached to it, so that *Proxy can satisfy the http.Handler interface. The real proxy behavior happens in ServeHTTP, where we turn incoming GETs and POSTs into outgoing GETs and POSTs, then arrange that the answers from the origin server go back to the requester. The key line that does that is io.Copy, where we copy the bytes arriving on the response from the origin server onto the request that came from the HTTP client (i.e. the web browser configured with localhost:12345 as it’s proxy server).

Now, it doesn’t take a rocket scientist to see that we can implement a caching proxy by looking at the request arriving in the http.Request, and either take something off of disk or get something from the network to fill in the http.ResponseWriter. That’s one direction we can expand this proxy, and I hope to do that in coming weeks, to implement a proxy that can use packages of offline content to dramatically reduce the amount of network traffic.

It does, however, take a rocket scientist to get HTTP proxying and caching right. For caching, there are several headers that have to be parsed and consulted, including Expires, E-tag, Cache-control and If-modified-since. Even getting proxying right, without caching is not so easy. To get anything working at all, the code above had to copy across the Content-Type header. But by not copying headers from the client inbound to the proxy, and by not copying all the headers back from the origin server, we are breaking cookies and HTTP redirect.

In fact, this issue of broken redirects lead me on a big side-trip where I learned more about the http package, and improved the proxy. I’ll write about that in a second post, so that we can keep going here to the next interesting thing.

You can see the source to the proxy here, but beware, ServeHTTP has morphed quite a bit from the super-simple GET/POST thing above. But you’ll have to read the next article to find out why.

linkio

The next thing I have to show you, the reader, is something called linkio. This is a package I wrote which wraps an io.Reader so that when you read from it, you get the bytes back at the same speed they would arrive if they were coming over a fixed capacity link to you, sharing bandwidth with all the other traffic on the link.

linkio shows two interesting Go design patterns. First, it is a wrapper for the io.Reader interface. The constructor takes an io.Reader as input, then stores it away, along with some other information. The object returned by the constructor satisfies io.Reader, so it can be plugged in exactly where the previous io.Reader was. Take a look at the code that needs to change to make use of linkio:

import (
  ...
  "jra-go.googlecode.com/hg/linkio"
)

var gLink *linkio.Link

func init() {
  gLink = linkio.NewLink(56 /* kbps */)
}

...

// then in ServeHttp, change this:
  io.Copy(wr, resp.Body)
// to this:
  io.Copy(wr, gLink.NewLinkReader(resp.Body))

Note how the real change in the program is one line! That’s because all the timing stuff, and the way to multiplex all the transactions onto a single link is put inside of linkio, and the goroutine launched inside the NewLink constructor.

The second interesting thing is that linkio uses a buffered channel to simulate the link itself. Each LinkReader wants to release it’s data in dribs and drabs to the reader, according to the current shared state of the link. The defining characteristic of a link is that it can send data for only one stream at a time (when we zoom in the the level of individual packets, anyway). So we simulate it with a goroutine that loops on the input channel (representing the interleaved flow of packets) and sleeps according to how long each packet would occupy the link in real life. If the link is unoccupied, then the loop gets blocked on the channel read. Once a simulated packet arrives (called a linkRequest in the code), the loop then becomes blocked on the sleep. When the sleep is done, the link goroutine needs to tell the stream that requested that this packet be simulated that the sleep is done. It does that by sending a bool on the channel that arrived with the simulated packet. (This pattern of sending a channel with a request to another goroutine is the Go idiom for asynchronous notification. A very useful trick: learn it!)

This scheme only works if the LinkReaders let loose their data a little at a time. If one sent in a simulated packet that was 10 megabytes, it would ruin the effect. That one “packet” would hog the link to the exclusion of all others, making that one stream perceive a bit rate of 100% of the link, instead of it’s equal share. The place where the LinkReaders enforce this fairness is in func (* linkio.LinkReader)Read. The prototype is Read([]byte) n int, os.Error err. When len(buf) is 1000, there’s no requirement to actually return 1000 bytes. You can return any number of bytes less than or equal to cap(buf), using n to tell the caller how many of the bytes in buf are valid, new bytes that they should read.

So, there you have it. A nice little example of a multi-threaded, rate-limiting HTTP proxy in Go. Stay tuned to the next article, where I explain why http.Get and http.Post are not really what you want in an HTTP proxy, and how to get what you DO want from the http package.