Protecting private keys in Go

Today I was looking at Upspin and thinking about private keys. I asked myself, “what would it take to make sure that there was one single copy of the private key in RAM, and that Go and the OS worked together to make sure it never went onto disk?”

(Some other people have talked about this too.)

It turns out the answer to that question is much harder than expected, but doable and I’ll try to do it.

So, the last part first. How do you ask the kernel to help you make sure that a given piece of RAM never ends up on disk? There’s a hint here. I have not looked to see if this technique has been picked up by OpenSSL or BoringSSL. The way Akamai hooked it in to ASN.1 parsing ugly and gross and we won’t be doing that, but the kernel system calls they are using are interesting.

mlock and madvise work at page granularity. Data structures exposed in Go programs are at byte granularity, so we’ll need some kind of technique to convince Go to put our data on a page all to itself. I’m thinking that make([]byte, 0, pagesize) will probably do what we want. Once the underlying page is marked correctly, we return []byte to the caller. It is their responsibility to (1) never reallocate the underlying storage (i.e. only write into it via copy, never append) and (2) make sure that all of the places in the code where the private key is used do not make copies off of this secure page.

Then we need to go hunting, looking for places that the Go runtime is going to make copies of our private key. This is easy but error-prone work (at least for a Go veteran, who can spot when a copy is happening), and it makes me wonder if undertaking a task like this in Rust would be easier; maybe you’d be able to mark the private key immutable and implement the Copy trait as panic, then see what broke? Although I think you would still need to audit the code, because it would be possible to copy out the bytes in a typesafe way, I think.

Here’s a list of places I’ve already noticed I’d need to work on:

  • in Upspin’s factotum.go, the private key comes off disk and goes into memory. That’s in a []byte, so we’d be able to put it into our safe []byte, no problem. However it is converted to a string and then back into a []byte, which would have to be avoided. No biggie, and also some nice cleanup because stores the private key as a string (unused) and also an ecdsaKeypair.
  • The ecdsa.PrivateKey ends up with a pointer in it to a big.Int, which has the private key in it. It gets there via UnmarshalText. But UnmarshalText converts to a string too, so it has to get fixed. It seems likely that UnmarshalText should be using Fscanf to read directly from the []byte it receives, since big.Int implements fmt.Scanner.
  • While constructing the big.Int to hold the private key, there is a risk of leaving parts, or maybe all of the private key around in memory. This is because the final length of the underlying “words” array where the private key is stored is not known ahead of time to the math/big library. However, we do know how long the private key is going to be, so we can store a decoy private key into the big.Int to get it to stretch out it’s words slice, and then scan the real private key in, so that the word slice underlying array is never reallocated while reading the private key. However, there does not appear to be a way to tell big.Int where it should put the []Word. Due to type safety, it would require unsafe to do that. But it can be done in package factotum before the first call to Unmarshal.
  • The implementation of big.Int.Bytes() is used from time to time, and it causes a copy of the private key to be made. It does not allow us to pass in “please write the bytes here” argument in order to make sure that it is copying the private bytes exactly where we want them to go (i.e. to the protected page). At first glance, this would appear to require a new API, which is too bad.
  • The factotum package does not expose the private key, so it cannot be accessed outside of this package. That makes auditing usage of the key easy to make sure that it stays private. Thank you Upspin hackers!

So, in summary, this is probably doable, except that controlling the backing store for the big.Int to be private requires unsafe-ness.

httptrace, a new Go debugging tool

Today I was investigating why HTTP redirects resulted in more persistent connections staying open than I was expecting. I found myself digging around deep inside net/http/transport.go and I noticed the new net/http/httptrace package. It is new in Go 1.7, which is currently in beta.

net/http/httptrace is lightly documented, and because it is new, there are no examples to look at. So I decided to share what I came up with here.

package main

import (
	"context"
	"fmt"
	"log"
	"net/http"
	"net/http/httptrace"
	"os"
)

func main() {
	ctx := httptrace.WithClientTrace(context.Background(), &httptrace.ClientTrace{
		PutIdleConn: func(err error) {
			if err == nil {
				fmt.Println("put idle: no err")
			} else {
				fmt.Println("put idle:", err.Error())
			}
		},
	})
	req, err := http.NewRequest("GET", "http://www.golang.org/", nil)
	if err != nil {
		log.Fatal("new req", err)
	}
	req = req.WithContext(ctx)
	_, err = http.DefaultClient.Do(req)
	if err != nil {
		fmt.Fprintf(os.Stderr, "fetch: %v\n", err)
		os.Exit(1)
	}
}

The key things I had to figure out (and hopefully you won’t have to since you found this post) were:

  • How to get a base context to add the tracer to (context.Background())
  • Which callback to use in the ClientTrace structure (for my purposes, PutIdleConn)
  • How to do an HTTP request with the context attached (NewRequest -> req.WithContext -> http.(*Client).Do)

If you are used to using the simple http.(*Client).Get method to do HTTP you’ll need to learn to use http.NewRequest in order to make a request object, and then pass that to a client. In this simple case, I used the DefaultClient. In a library, you should be using the http.Client passed into you by the caller, so that your library would work in contexts like Google App Engine.

Interview Questions I Hope I Get

I have an interview coming up, and so my “keep in shape hacking time” has been recently devoted to interview preparation. I thought I would make a post about what’s in my head, both as a way to solidify it (no better way to learn something than by teaching it) and in case this interview goes bad, so that my next prospective employer can see what I’m thinking about.

If you, my current prospective employer are reading this, would you please not take advantage of this by removing these questions from your list? Come on guys, give me a break. If I’m going to be transparent in my thought processes, the least you can do is throw me a bone and ask at least one of these in person!

Question 1: Make something faster.

I think this would be an interesting question. Especially if it was literally asked in these words. The interviewee asks, “make what faster?” The interviewer says, “Whatever. Tell me about making something faster. Whatever you want to tell me about.”

This would be a fun question because it would test the interviewer’s ability to organize his/her thoughts, select the relevant ones, and present them. In fact, it is checking the interviewee’s ability to teach, which is a required skill for someone with like me (with a grey beard). You deliver the most value on a team not by pulling 80 hour weeks yourself, but by mentoring people so that the 4 people in your team manage (together) a 160 hour week, instead of the 60-80 hours of useful progress they might have managed without your (theoretically) sage advice.

My answer would be the following:

There are three possibilities for how to make something faster. First, you can not do it at all. Not doing it is an infinite speedup over doing it. How? Caching, usually. And hey, caching is easy. As easy as naming.

Second, you can do it better, by choosing a better algorithm, representation, or by doing less work. I might retell a story Átila told me about a challenge in which the right answer could be got with less than half the data sorted (he wrote his own quicksort that let him find the answer before finishing the sort, thereby blowing other contestants out of the water; they all did quicksort to completion, then looked for the answer in the sorted list). I also happen to have spotted a place in my prospective employer’s product where I could propose to change the representation, in order to reduce the compute power necessary to parse something. It takes an arrogant little prick to use the company’s own software as an example of what not to do, but depending on the feeling I have with the interviewer, I might do it.

Third, you can take the solution you have now, and optimize it. To do that, you measure, profile, tweak, and then measure again. I checked some code into the Go standard library that does that, and one of those would make a nice story.

Question 2: How would you add load shedding to the Go standard library’s HTTP server?

I thought of this question while reading chapter 22 of the new SRE book from Google and O’Reilly.

There are two layers where you could implement it. The “right” way to do it is the arrange that the server sends back a properly formatted error 503 via HTTP. Another way that you could do it would be to accept and close the connection immediately. Which you choose depends on who you think your clients are, and how you think they will respond to the two kinds of errors.

The “accept and close” technique should be capable of more “!QPS” (non-answered queries per second) because it can theoretically handle each reject without spawning a new goroutine. Goroutines are cheap, but they imply at least one mutex (to add the goroutine into the scheduler), some memory pressure (for the stack), and some context switches. Implementing this would be a simple matter of wrapping the net.Listener with our own, which intercepted Accept to do “accept and close” when load shedding is needed.

I just went to read the source, and the way Accept is called, it must return a valid connection and no error. So our load-shedding Accept needs to actually be a loop, where we accept and close until the logic decides it is time to finally let a connection through (i.e. “if load shedding then allow 1 in 100 connections” or whatever), and then we need to return with that connection.

So how to implement the other one? What you want is an http.HandlerFunc wrapper. Except: what if libraries you don’t control are registering handlers? How can you be sure your load shedding wrapper is always the first one called, so that it can send an error 503? I thought I had a solution to this, but it turns out that http.ServeMux is not an interface, but a concrete type. So it is not possible to trap http.ServeMux.Handle, and ensure that the load shedder is always on the front. This one is going to take some more thinking.

Of course there’s always monkey patching in Go which could be used to arrange for HandleFunc and friends to always arrange to put a load shedder on the front of the call chain. But not in production. Please, God, not in production.

Question 3: A coding challenge I found on Glassdoor

This one is a legitimate coding challenge, and I don’t want to post either the question or the answer here, because that’s just not cool. But it did lead me to an interesting observation…

After a little workout in the coding dojo (I slayed the challenge!) I hit the showers. And in the shower, I had an idea: what if you could interpret the []byte that you get from go-fuzz as input to this code challenge? Then I could fuzz my routine, looking for ways to crash it. I quickly got it working, but ran up against the fundamental problem of fuzzing. When your input is random, for lots of problems, it is hard to check the output to know if it is the right output for the given input. You have one routine right there that could check it, but it is the routine-under-test, and thus using it would be cheating (and pointless). The bottom line, which I already knew, is that fuzzers can only find crashes, not incorrect behavior.

git log ––grep “Résumé”

For a while now, it’s become clear that a useful and important piece of data about how a future colleague might work out is their open source contributions. While the conditions of open source work are often somewhat different than paid work, a person’s manner of expressing themselves (both interpersonally, on issue trackers for example and in code) is likely to tell you more about their personality than you can learn in the fake environment of an interview.

I am currently starting a job search for a full time job in the Lausanne, Switzerland area (including as far as Geneva and Neuchâtel) with a start date of September 1, 2016. Touching up my résumé is of course on my todo list. But it is more fun to look at and talk about code, so as a productive procrastination exercise, here’s a guided tour of my open source contributions.

I currently work with Vanadium, a new way of making secure RPC calls with interesting naming, delegation and network traversal properties. I believe wide-scale adoption of Vanadium would make IoT devices less broken, which seems like an important investment we need to be making in infrastructure right now.

I have been an occasional contributor to the Go standard library since 2011. Some examples I am particularly proud of include:

  • I profiled the bzip2 decoder and realized that the implementation was cache thrashing. The fixed code is smaller and faster.
  • I noticed that the HTTP implementation (both client and server) was generating more trash than was necessary, and prototyped several solutions. The chosen solution was is in this checkin. This was, for a while, the most intensely exercised code I’ve written in my career; it saved one allocation per header for most headers processed by the billions of HTTP transactions that Go programs do all around the world. It was later replaced with a simpler implementation that was made possible by an optimization in the compiler. The same allocation reduction technique was carried over into Go’s HTTP/2 implementation too, it appears.
  • I improved the TLS Client Authentication to actually work. I needed this for a project at work.

I wrote an MQTT server in Go in order to learn the protocol for work. It is by no means the fastest, but several people have found that the simplicity and idiomaticity of its implementation is attractive. It ships as part of the device software for the Ninja Sphere.

  • I found an interesting change to reduce network overhead in the MQTT serializer I used. Copies = good, more syscalls and more packets = bad.

I worked on control software for a drone in Go for a while. It still crashes instead of flying, but it was an interesting way to learn about control systems, soft realtime systems, websockets, and Go on ARM.

I have written articles about Go on this blog since I started using it in 2010. Some interesting ones in particular are:

I am also active on the Golang sub-reddit, where I like answering newbie requests for code reviews. I also posed a challenge for the Golang Challenge.

My earliest open source project that’s still interesting to look at today was a more scalable version of MRTG called Cricket. Today tools like OpenTSDB and Graphite are used for the same job, it seems. Cricket still works to the best of my knowledge, but development is dead because of lack of interest from users. Cricket is written in Perl.

My earliest open source project was Digger, from Bunyip. If you can unearth archeological evidence of it from the 1996 strata of the Internet geology, you’re a better digger than I am. It was written in C.

Seeking around in an HTTP object

Imagine there’s a giant tar file on a HTTP server, and you want to know what’s inside it. You don’t know if it’s got what you are looking for, and you don’t want to download the whole thing. Is it possible to do something like “tar tvf http://example.com/giant.tar”?

This is not a theoretical problem just to demonstrate something in Go. In fact, I wasn’t looking to write an article at all, except that I needed to know the structure of the bulk patent downloads from the USPTO, hosted at Google.

So, how to go about this task? We’ve got a couple things to check and then we can plan a way forward. First, do the Google servers support the Range header? That’s easy enough to check using curl and an HTTP HEAD request:

$ curl -I https://storage.googleapis.com/patents/redbook/grants/2015/I20150106.tar
HTTP/1.1 200 OK
X-GUploader-UploadID: AEnB2UrryncnFxyuHvzmUvVADjSRQJXRogilQ-lE9-pFZwJwkU5jRYC27PBdIgFe4f18Xgol-YOBxi5QqGM5yhGgso2Rd_NsZQ
Expires: Sun, 17 Jan 2016 07:52:14 GMT
Date: Sun, 17 Jan 2016 06:52:14 GMT
Cache-Control: public, max-age=3600
Last-Modified: Sat, 07 Feb 2015 11:40:47 GMT
ETag: "558992439ef9046c834f9dab709e6b1a"
x-goog-generation: 1423309247028000
x-goog-metageneration: 1
x-goog-stored-content-encoding: identity
x-goog-stored-content-length: 2600867840
Content-Type: application/octet-stream
x-goog-hash: crc32c=GwtHNA==
x-goog-hash: md5=VYmSQ575BGyDT52rcJ5rGg==
x-goog-storage-class: STANDARD
Accept-Ranges: bytes
Content-Length: 2600867840
Server: UploadServer
Alternate-Protocol: 443:quic,p=1
Alt-Svc: quic=":443"; ma=604800; v="30,29,28,27,26,25"

Note the “Accept-Ranges” header in there, which says that we can send byte ranges to it. Range headers let you implement the HTTP equivalent of the operating system’s random-access reads (i.e. the io.ReaderAt interface).

So it would theoretically be possible to pick and choose which bytes we download from the web server, in order to download only the parts of a tarfile that have the metadata (table of contents) in it.

As an aside: The fact that Google is serving raw tarfiles here, and not tar.gz files, makes this possible. A tar.gz file does not have metadata available at predictable places, because the location of file n’s metadata depends on how well or badly file n-1’s data was compressed. Why are Google serving tar files? Because the majority of the content of the files are pre-compressed, so compressing the tar file wouldn’t gain much. And maybe they were thinking about my exact situation… though that’s less likely.

Now we need an implementation of the TAR file format that will let us replace the “read next table of contents header” part with an implementation of read that reads only the metadata, using an HTTP GET with a Range header. And that is where Go’s archive/tar package comes in!

However, we immediately find a problem. The tar.NewReader method takes an io.Reader. The problem with the io.Reader is that it does not let us get random access to the resource, like io.ReaderAt does. It is implemented that way because it makes the tar package more adaptable. In particular, you hook the Go tar package directly up to the compress/gzip package and read tar.gz files — as long as you are content to read them sequentially and not jump around in them, as we wish to.

So what to do? Use the source, Luke! Go dig into the Next method, and look around. That’s where we’d expect it to go find the next piece of metadata. Within a few lines, we find an intriguing function call, to skipUnread. And there, we find something very interesting:

// skipUnread skips any unread bytes in the existing file entry, as well as any alignment padding.
func (tr *Reader) skipUnread() {
    nr := tr.numBytes() + tr.pad // number of bytes to skip
    tr.curr, tr.pad = nil, 0
    if sr, ok := tr.r.(io.Seeker); ok {
        if _, err := sr.Seek(nr, os.SEEK_CUR); err == nil {
            return
        }
    }
    _, tr.err = io.CopyN(ioutil.Discard, tr.r, nr)
}

The type assertion in there says, “if the io.Reader is actually capable of seeking as well, then instead of reading and discarding, we seek directly to the right place”. Eureka! We just need to send an io.Reader into tar.NewReader that also satisfies io.Seeker (thus, it is an io.ReadSeeker).

So, now go check out package github.com/jeffallen/seekinghttp and see how we accomplish that.

This package implements not only io.ReadSeeker, but also io.ReaderAt. Why? Because after investigating a remote tarfile, I realized I also wanted to look at a remote zip file, and to do that I needed to call zip.NewReader, which takes both an io.ReaderAt and the size of the file. Why? Because the zipfile format has its table of contents at the end of the file, so the first thing it does is to seek to the end of the file to find the TOC.

A command-line tool to get the table of contents of tar and zip files remotely is in remote-archive-ls.

Dynamic DNS circa 2016

In the old days, if you had an ISP that changed your IP address all the time but you wanted to run a server, you used dynamic DNS, i.e. a hacky script talking to a hacky API on an hacky DNS provider.

These days, if you bring up a cloud server from time to time to work, it is likely to get a different IP address. But you might want a DNS record pointing at it so that it is convenient to talk to.

Same problem, different century.

Here’s my solution for a GCE server and a domain fronted by CloudFlare.

It has nella.org hard coded in it, so YMWDV (your mileage will definitely vary).

The most important thing when go-fuzzing

The most important thing to know, when you are using go-fuzz, is that the cover metric should be increasing.

I didn’t know that and I wasted one 12 hour run of fuzzing because my fuzzing function was misbehaving in a way that made it return the same useless error for every input no matter what. That meant that no matter what go-fuzz mutated in the input, it could not find a way to explore more code, and could not find any interesting bugs. It was trying to tell me this by not incrementing the cover metric it was reporting.

Do not do like I did. Watch that cover is going up before leaving your go-fuzz to go spend hours and hours wasting time.

Doing it the hard way

In my last post I offered to point out some things in Golang Challenge #2 submissions that struck me as “worthy of receiving a (polite) rebuke in code review”, otherwise known as WTFs.

This is opt-in abuse. I don’t mind abusing my colleagues, when I know I can take them out for lunch later and buy them a beer. Hassling random Golang Challenge entrants is not my style. But some have decided they are up for it, even if I’m remote and can’t buy them a beer afterwards.

io.Toys

First up, is Jason Clark. Jason’s entry is quite good, from an idiomatic Go point of view. He fell prey to the “under-testing” problem I mentioned, in that his entry does not handle message framing, or what happens if the buffer passed to Read is smaller than the incoming message. But what I’m really looking for for the purposes of this post are the WTF’s. And seeing io.TeeReader tipped me off that something interesting was about to happen:

// Write encrypts the contents of p and writes it to the underlying io.Writer.
func (w *SecureWriter) Write(p []byte) (int, error) {
	// Each message starts with a generated nonce. Only write the nonce once.
	if w.nonce == nil {
		var nonce [24]byte
		if _, err := io.ReadFull(io.TeeReader(rand.Reader, w.w), nonce[:]); err != nil {
			return 0, ErrNonceWrite
		}
		w.nonce = &nonce
	}

	// encrypted and send message
	_, err := w.w.Write(box.SealAfterPrecomputation(nil, p, w.nonce, w.key))
	if err != nil {
		return 0, err
	}
	return len(p), nil
}

So, the first problem here is that this is just wrong. The docs for NaCl are clear that security is compromised if you use the same nonce for two different messages. But second, the use of io.TeeReader here is overly clever. I didn’t even know what io.TeeReader did, but the docs are clear enough: “TeeReader returns a Reader that writes to w what it reads from r”. So what that line does is get the nonce from cyrpto.Rand, and send the nonce down the unencrypted channel (correct!) while also sending it into nonce, to be saved for later as w.nonce (not correct!). To reason about that line as a reader you need to keep two unrelated processes in mind at once (saving the nonce for later, and sending it down now), and then ask yourself if the error handling correct in both of those cases.

So, if ErrNonceWrite is returned, was a nonce sent down the connection anyway? What was it?

The simpler way of doing this is in two steps. First read into a local called nonce, and check the errors on reading. Once you are sure you have a good nonce, send it down with w.Write(). Checking errors correctly here is critical, because if you don’t manage to get a new nonce, you must not send the message, since you might be sending it with a nonce of all zeros. The second time you’ve done this, you’ve compromised your security.

There’s nothing wrong with using the nifty toys in the io package. And in fact, later in his submission, Jason uses io.MultiWriter in an interesting way. I can’t say that I’m personally 100% confident in the technique of using io.Copy to implement an echo server. I understand the theory, but it seems to me you become dependent on the right magic happening inside of io.Copy, instead of making sure that the logic of your program is front and center, where your reader can see it.

Always remember, code is for reading. When you work on a big system, with lots of coworkers, they will read and re-read your code for months and years after you’ve written it. Code is written once and read a thousand times. Make sure what’s there to read explains what you are doing and what you are depending on.

And now for a random global problem…

Trevor Arjeski got in touch and asked if he had any WTFs. By chance, the first file I clicked on was secure_io.go, and within the first few lines I knew, “we have a winner!”

In a crypto system, the quality of your random numbers matters. This has been known by professional cryptographers for decades, and to the rest of us since 1996. So when I saw that Trevor was importing “math/rand” instead of “crypto/rand”, I knew that he had a big problem coming.

Worse, I read down a few lines and saw the global “seeded”. The problem with this global is that access to it is not protected from a data race. There are three options for how to fix this:

  • Nuke it: This problem calls for using the “crypto/rand” package, which does not need seeding by the Go programmer.
  • Move it to init: Lazy initialization, controlled by a global flag can sometimes be more simply moved to explicit initialization, before main starts running. Go guarantees that init functions are run in a single-threaded context (the spec says: “Package initialization—variable initialization and the invocation of init functions—happens in a single goroutine, sequentially, one package at a time.”)
  • Use sync.Once to arrange for something to be lazily initialized once and only once.

I’ve said before that good testing matters. But to catch this data race, Trevor should have written a test that concurrently created multiple SecureWriters and written into them. Then he would have had to run “go test -race”. It is wonderful that Go has the race detector, but it is better to never write data races to begin with. And a good way to not write them is to not allocate any shared data.

Get in the habit of asking hard questions about every global you see: when it is initialized? Are there writes to it? How are they sequenced with respect to reads?

Want some practice? Move on to lines 14 and 15… 🙂

Addicted to encoding/binary?

Kristoffer Berdal got in touch and asked me to take a look. Kristoffer made a choice for readability that would eventually be a performance problem, if he wanted to benchmark and tune his implementation.

In his Writer we see a nice symmetrical list of calls to binary.Write. Reading through it, it makes the exact format of the protocol completely clear. So score points for self-documenting code!

But there’s a cost: check out the signature of binary.Write: func Write(w io.Writer, order ByteOrder, data interface{}) error

That means that each time we call into binary.Write, we’ll be doing a type conversion from the fundamental type of the third argument into an interface{}. This may involve an allocation, and once control gets inside of binary.Write, it will end up in the “fall back onto reflection” code path, because the nonce and (usually) the message are not less than 8 bytes long. Reflection is powerful and magical, and a strong point of Go. But it is not fast, when the alternative is not using reflection.

A better choice for writing the nonce and the message, which are simple []byte types, would have been to use the Write method of the underlying io.Writer. (Also, this saves the reader of puzzling over what the meaning of the little endian representation of a [24]byte is…)

One thing that’s interesting about Go is that it can be highly friendly and easy like Python, or highly precise and speedy like C. Choosing when to do one or the other (or finding a way to do both) is the trick that requires experience. It is always important to first get it right, then get it clear, then measure, and only if measurement says to do so, make it fast. In a case like this, Kristoffer has written it right, and clearly. But by turning to a function that uses interface{} where there was another option, he left some performance on the table. Worse, he might have left behind a trap that a colleague will need to deal with later when the prototype goes to production and starts handling gigbits of data per second.

Thanks

Thanks to those who gave me permission to pick on them. I hope that they’ve enjoyed the 15 minutes of fame, and that we’ve all learned something useful.