How to control your HTTP transactions in Go

The Go http pacakge has http.Get and http.Post, which make it easy to do GET and POST operations. They are meant for client use. They implement things from the point of view of a naïve client, one that just wants to give a URL and get back the results. They don’t want to chase redirects, they don’t want to set their headers specially, they just want to, in one line, get the results.

When you are writing an HTTP proxy server, you can’t be so naïve. For example, the HTTP client in a proxy server must not do redirect chasing. Instead, it needs to get the redirect status code and Location header, and then send it back down to the proxy’s client. If it wants to chase the redirect, fine.

If you read the guts of http.Get and http.Post, they prepare http.Request structures with the right stuff inside, then pass them to an internal function called send. So you might think you should do something like this as well, for instance writing something like what’s inside the http.send function yourself. However, this is the wrong way to go, there’s a better way.

The trick is to make an http.ClientConn, then feed the http.Request into it, and read the http.Response out of it. If you want to keep the connection open, you can (set req.Close to false), then you can do more HTTP requests on the same connection later. The ServeHTTP function in my proxy demonstrates how to do this, in the case where you close the connection. Creating a connection pool is on the to-do list, though my first guess on how to do it is to let a single goroutine act as the warehouse of open connections, checking them out and back in to the central cache in response to messages on a channel.

An HTTP proxy needs to copy through the headers from the proxy client to the origin server. However, some headers, like those related to the transfer format and connection closure should not be copied through. This is sort of a weakness in the HTTP protocol, that it confuses transport and content like this, but it is what it is and we’ve got to deal with it.

One thing that gave me a bit of trouble was oreq.RawURL. I originally set it to creq.RawURL, then used http.ParseURL(oreq.RawURL) to fill in oreq.URL (with an error check, of course). The problem is that http.Request.Write() will prefer the contents of RawURL to URL. That means that you send the full URL to the origin server, instead of only the path part. With some servers, they don’t care. With others, it results in wildly incorrect redirects, for instance fetching “http://blog.nella.org” (no trailing slash: should result in a redirect to /) gives a Location of “blog.nella.org/”, which in turn results in your browser “http://blog.nella.org/blog.nella.org/”. Redundantly, repetitively false and incorrect.

I also wondered if it even makes sense to re-parse the URL

With this addition to the proxy code mentioned yesterday, you’ve got a basic HTTP proxy that I’ve used to surf many different static and dynamic sites. It seems to work, at least well enough as a base to experiment with. Next stop, the cachability check, and adding the right code to ServeHTTP to redirect cachable requests to the object on disk instead of getting it from the origin server.

Leave a Reply