Fat Constants, Thin Constants

I play from time to time with a patch for Go that makes the Tiny runtime more capable. My current goal is to get a new backend for exp/draw working which writes to the SVGA screen. It would be cool to be able to decode the Go mascot and have him flying around the screen or something.

In the Tiny runtime, there’s no OS, so there’s no disk drivers, so there’s no filesystem, so there’s no files. Which makes decoding a PNG and showing it kind of hard. But if the program carries along the data with it, in the form of a []byte, then you could use bytes.NewReader to turn it into an io.Reader and then pass it to image/png.Decode. So that’s what I set out to do.

I got surprised by something, and that’s what I’m writing about today. Here’s the code that surprised me:

package main
func main() {
  println([...]byte{'f', 'o', 'o'}[:])
}

The resulting assembly is:

...
8048c26 8b442404             | (4)  MOVL  4(SP),AX
8048c2a 89c3                 | (4)  MOVL  AX,BX
8048c2c 83c300               | (4)  ADDL  $0,BX
// 102 = 'f'
8048c2f c60366               | (4)  MOVB  $102,(BX)
// put it in the array
8048c32 89c3                 | (4)  MOVL  AX,BX
8048c34 43                   | (4)  INCL  ,BX
// increment and do it again...
8048c35 c6036f               | (4)  MOVB  $111,(BX)
8048c38 89c3                 | (4)  MOVL  AX,BX
8048c3a 83c302               | (4)  ADDL  $2,BX
8048c3d c6036f               | (4)  MOVB  $111,(BX)
8048c40 890424               | (4)  MOVL  AX,(SP)
8048c43 c744240403000000     | (4)  MOVL  $3,4(SP)
8048c4b c744240800000000     | (4)  MOVL  $0,8(SP)
8048c53 c744240c00000000     | (4)  MOVL  $0,12(SP)
8048c5b c744241000000000     | (4)  MOVL  $0,16(SP)
8048c63 c744241403000000     | (4)  MOVL  $3,20(SP)
8048c6b c744241800000000     | (4)  MOVL  $0,24(SP)
8048c73 c744241c01000000     | (4)  MOVL  $1,28(SP)
8048c7b c744242000000000     | (4)  MOVL  $0,32(SP)
8048c83 e89d840000           | (4)  CALL  ,8051125+runtime.slicearray

This is a pretty inefficient way to build a constant! I wouldn’t even have noticed, except that when I changed the simple “foo” above into the actual PNG I wanted to include in my program, 8g took so much memory the out of memory killer woke up and killed my Firefox! And we were talking about a little 80 kb PNG! The resulting ELF binary was 1.3 megs!

The syntax I used, namely “make an array then take a slice of it” turns out to be excessively complicated. The routine I was using expected a slice, so that’s how I ended up tacking the [:] on the end. Word to the wise: when the compiler is complaining about wanting a slice instead of an array, don’t just add the [:] on the end, give the compiler a chance to make the slice itself:

package main
func main() {
  // there are 57 0's here:
  println([]byte{ 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, })
}

Here’s the resulting assembly:

8048c17 83ec2c               | (3)  SUBL  $44,SP
// allocate a new array that's 57 bytes long
8048c1a c7042439000000       | (4)  MOVL  $57,(SP)
8048c21 e85d1d0000           | (4)  CALL  ,804a983+runtime.new
8048c26 8b542404             | (4)  MOVL  4(SP),DX
// load the data from this place in the program's image (my zeros are there; the file would be later)
8048c2a 8d35dc590508         | (4)  LEAL  main.statictmp_0001+0(SB),SI
8048c30 89d7                 | (4)  MOVL  DX,DI
8048c32 fc                   | (4)  CLD ,
// with a loop count of 14
8048c33 b90e000000           | (4)  MOVL  $14,CX
// repeat
8048c38 f3                   | (4)  REP ,
// a move long
8048c39 a5                   | (4)  MOVSL ,
// and then one more byte (4 * 14 + 1 = 57)
8048c3a a4                   | (4)  MOVSB ,
// slice cap = 57
8048c3b c744242439000000     | (4)  MOVL  $57,main.autotmp_0000+36(SP)
// slice len = 57
8048c43 c744242839000000     | (4)  MOVL  $57,main.autotmp_0000+40(SP)
// underlying array = the new array
8048c4b 89542420             | (4)  MOVL  DX,main.autotmp_0000+32(SP)

OK, so this is closer to Not Totally Insane, which is what we want. But why? I suppose this is either a bug or a very deep part of the compiler. In the first program, the compiler thinks it needs to compute and put into the array every item. In the second program, the compiler understands that every item is unchanging and can live in the program text.

Though it is no longer assembling the array at runtime byte by byte, it is still calling runtime.new and then making a copy from the program text into the new array. Can we do better? If we move the array out of the function call and give it a local variable, it doesn’t change anything, it is still copied (not shown, check for yourself if you want). If we move the initialization of the array out to the global scope, then the Go compiler moves it into main.init() for us, doing the copy just the same.

So, can we do it without the copy at all? Yes! But it requires unsafe, slice brain surgery, and Cgo. Hang on tight…

A slice is a data structure on top of an underlying zone of memory. The definition of a slice from reflect.SliceHeader is:

type SliceHeader struct {
    Data uintptr
    Len  int
    Cap  int
}

So what we want to do is get a pointer to a place in memory where our contiguous bytes are stored, then set the Data pointer of the slice to that pointer (as well as making sure Len and Cap are correct). Mucking around with pointers like this can get you in trouble: if the garbage collector decides that the only reference it has to the thing at slice.Data, and it is ready to garbage collect your slice, then it’s going to try to garbage collect your underlying array as well. If that array came from someplace other than the Go heap, you’re going to corrupt something. But it is relatively easy to prevent a slice from ever getting GC’d; all you have to do is hold it in a map at global scope. For my application (a set of byte arrays representing contents of files), I already want to hold them in a map with keys equal to the filename and values equal to the byte slice.

I said we’d need Cgo too. But why? The answer is that if we let the Go compiler put the bytes away into the program text, it will know where they are, but we won’t. We can’t reference a symbol like “main.statictmp_0001″ to find the bytes themselves. And no matter what we do when we are trying to find out where the bytes are stored, Go copies them into memory alloced with new. Why? Probably because it needs to guarantee that the array will be writeable later, since it is not marked as immutable as strings are (the fact that strings are immutable means there might be another way to achieve our goal; we’ll have to come back to that in a bit) it can’t leave the bytes pointing to the read-only memory where the program text is stored.

So we need Cgo not because we want to call into a C subroutine, but because we want to have C put the bytes into the read-only segment of the program text with a symbol on them that we know, and can pass to unsafe.Pointer when we are doing our slice brain surgery.

Here is the resulting test program:

package main

/*
unsigned char const data__tmp_test_zip[] = { 0, 1, 2 };
*/
import "C"

import (
	"unsafe"
	"reflect"
)

var FileMap map[string][]byte

func main() {
	FileMap = make(map[string][]byte)

	s1 := C.data__tmp_test_zip[:]
	s1x := (*reflect.SliceHeader)(unsafe.Pointer(&s1))

	var buf [1]byte
	s2 := buf[:]
	s2x := (*reflect.SliceHeader)(unsafe.Pointer(&s2))
	s2x.Data = s1x.Data
	s2x.Len = s1x.Len
	s2x.Cap = s1x.Cap

	FileMap["/tmp/test.zip"] = s2
        println(FileMap["/tmp/test.zip"])
}

In this case, we are trying to store the three bytes 0, 1, 2. We do that inside the /* */ block right before the import "C". The slice brain surgery happens in main(). C.data__tmp_test_zip is type [3]C.uchar. To get a slice on top of it, we just ask for C.data__tmp_test_zip[:] and put it in s1. We use reflect to dig inside the slice and find the pointer to the underlying bytes.

(Thanks to Gustavo, who showed me how to do this with his mmap module.)

The next part, with s2 and s2x, is a second piece of brain surgery. The underlying array is a [3]C.uchar. But we want an API based on []byte, because that way we can pass them to things like bytes.NewReader to get an io.Reader that we can pass onwards to things like image/png.Decode. To do that, we make a new slice with an underlying [1]byte array, then we swap out the pointer to point to the array of bytes the C made for us. Voila, a []byte that points to the contents of our file, without a copy.

Leave a Reply