A simpler way to embed data

In my post about how to efficiently put data into a Go binary, I mentioned that strings are immutable, and can be accessed without causing the Go runtime to copy them. This turns out to be the key to a simpler way to achieve what I wanted to do.

By simpler I mean, “no cgo”. That’s a nice simplification, because up until recently, your final static binary image linked to the cgo code dynamically, and that made using my technique impossible in the context of the Tiny runtime, where there is no dynamic linker. Recently cgo has changed, but at the same time, I’ve discovered how to use native strings to do what I want, so let’s see how it works.

I shied away from strings at first because I understood them to be “unicode strings”, and thus not eligible to hold arbitrary bytes (i.e. bytes which turn out to create a non-valid unicode rune). That’s not true at all. In Go, the string type is in some ways an alias for “an immutable array of 8-bit octets”, i.e. an […]byte. True, many of the built-in functions that operate on string expect what’s inside of it to be valid UTF-8, and might malfunction if you give them random bytes. But there’s nothing to keep you from putting bad UTF-8 in, then never using the functions that expect good UTF-8.

So, that’s the plan. We put our individual bytes into the string, then we do brain surgery to turn them into a []byte:

package main

import (
        "unsafe"
        "reflect"
)

var empty [0]byte
var str1 string = "the string"
var str2 string = "another string"

func fix(s string) (b []byte) {
        sx := (*reflect.StringHeader)(unsafe.Pointer(&s))
        b = empty[:]
        bx := (*reflect.SliceHeader)(unsafe.Pointer(&b))
        bx.Data = sx.Data
        bx.Len = len(s)
        bx.Cap = len(s)
        return
}

func main() {
        b := fix(str1)
        println(b[0])
        b = fix(str2)
        println(b[0])
        b[0] = 'x'              // crash: write to ro segment
}

When you read the assembly of that program, there’s not a memcpy to be seen. The []byte you get points directly at the original bytes. You could also see that by taking the address of str[0] and b[0] and seeing that they are the same byte in memory.

The last line shows why Go is going to so much trouble to prevent me from doing this: the memory that is now underlying my []byte is read-only. At link time, the linker put it into a read-only segment, and now when I write to it, I get this (the equivalent of a segfault in Go):

unexpected fault address 0x80640f8
throw: fault

panic PC=0xf765b048
runtime.throw+0x3e /home/jeffall/go/src/pkg/runtime/runtime.c:73
	runtime.throw(0x80a3916, 0x80640f8)
runtime.sigpanic+0xc7 /home/jeffall/go/src/pkg/runtime/linux/thread.c:288
	runtime.sigpanic()
main.main+0xd8 /home/jeffall/go-stuff/str.go:27
	main.main()
runtime.mainstart+0xf /home/jeffall/go/src/pkg/runtime/386/asm.s:84
	runtime.mainstart()
runtime.goexit /home/jeffall/go/src/pkg/runtime/proc.c:148
	runtime.goexit()

Working on this has made me ask myself a few times, “why am I so intent on turning read-only memory into a []byte, thereby corrupting Go’s type safety?” I’m still grappling with that, stay tuned. (One reason why is that this whole idea came from working in the Tiny Go environment, where there’s currently almost no memory protection offered anyway. But that’s a dumb reason; if the non-existent OS can’t save you from yourself, you certainly should NOT stop the compiler from saving you!) Maybe there’s a third version coming which manages to keep it type safe and still do what I want. I suspect it’s going to have something to do with changing the interface of my filesystem object to keep the string itself internal, and only expose a method that returns an io.Reader.

3 thoughts on “A simpler way to embed data

  1. i liked the idea of this, and i wondered how simple a simple read-only filesystem could be made, given that the gob package can automate quite a bit.

    here’s some code that does this. about 200 lines. is this the kind of thing you’re thinking of?

    package main
    
    import (
            "bytes"
            "gob"
            "encoding/binary"
            "os"
            "strings"
            "fmt"
            "log"
            "io"
            "sync"
    )
    
    func main() {
            s, err := Encode(os.Args[1])
            if err != nil {
                    fmt.Printf("error: %v\n", err)
                    return
            }
            fmt.Printf("encoded: %d bytes\n", len(s))
    
            fs, err := Decode(s)
            if err != nil {
                    log.Exitf("decode: %v\n", err)
                    return
            }
            fmt.Printf("all:\n")
            show(fs, "/")
    }
    
    func show(fs *FS, path string) {
            f, err := fs.Open(path)
            if err != nil {
                    log.Printf("cannot open %s: %v\n", path, err)
                    return
            }
            if f.IsDirectory() {
                    fmt.Printf("d %s\n", path)
                    names, err := f.Readdirnames()
                    if err != nil {
                            log.Printf("cannot get contents of %s: %v\n", path, err)
                            return
                    }
                    for _, name := range names {
                            show(fs, path+"/"+name)
                    }
            }else{
                    fmt.Printf("- %s\n", path)
                    n, err := io.Copy(nullWriter{}, f)
                    if err != nil {
                            log.Printf("cannot read %s: %v\n", err)
                            return
                    }
                    fmt.Printf("        %d bytes\n", n)
            }
    }
    
    type nullWriter struct {}
    func (nullWriter) Write(data []byte) (int, os.Error) {
            return len(data), nil
    }
    
    // fsWriter represents file system while it's being encoded.
    // The gob Encoder writes to the the bytes.Buffer.
    type fsWriter struct {
            buf bytes.Buffer
            enc *gob.Encoder
    }
    
    // entry is the basic file system structure - it holds
    // information on one directory entry.
    type entry struct {
            name string               // name of entry.
            offset int                       // start of information for this entry.
            dir bool                       // is it a directory?
            len int                       // length of file (only if it's a file)
    }
    
    // FS represents the file system and all its data.
    type FS struct {
            mu sync.Mutex
            s string
            root uint32
            dec *gob.Decoder
            rd strings.Reader
    }
    
    // A File represents an entry in the file system.
    type File struct {
            fs *FS
            rd strings.Reader
            entry *entry
    }
    
    // Encode recursively reads the directory at path
    // and encodes it into a read only file system
    // that can later be read with Decode.
    func Encode(path string) (string, os.Error) {
            fs := &fsWriter{}
            fs.enc = gob.NewEncoder(&fs.buf)
            // make sure entry type is encoded first.
            fs.enc.Encode([]entry{})
    
            e, err := fs.write(path)
            if err != nil {
                    return "", err
            }
            if !e.dir {
                    return "", os.ErrorString("root must be a directory")
            }
            binary.Write(&fs.buf, binary.LittleEndian, uint32(e.offset))
            return string(fs.buf.Bytes()), nil
    }
    
    // write writes path and all its contents to the file system.
    func (fs *fsWriter) write(path string) (*entry, os.Error) {
            f, err := os.Open(path, os.O_RDONLY, 0)
            if err != nil {
                    return nil, err
            }
            defer f.Close()
            info, err := f.Stat()
            if info == nil {
                    return nil, err
            }
            if info.IsDirectory() {
                    names, err := f.Readdirnames(-1)
                    if err != nil {
                            return nil, err
                    }
                    entries := make([]entry, len(names))
                    for i, name := range names {
                            ent, err := fs.write(path+"/"+name)
                            if err != nil {
                                    return nil, err
                            }
                            ent.name = name
                            entries[i] = *ent
                    }
                    off := len(fs.buf.Bytes())
                    fs.enc.Encode(entries)
                    return &entry{offset: off, dir: true}, nil
            }
            off := len(fs.buf.Bytes())
            buf := make([]byte, 8192)
            tot := 0
            for {
                    n, _ := f.Read(buf)
                    if n == 0 {
                            break
                    }
                    fs.buf.Write(buf[0:n])
                    tot += n
            }
            return &entry{offset: off, dir: false, len: tot}, nil
    }
    
    // Decode converts a file system as encoded by Encode
    // into an FS.
    func Decode(s string) (*FS, os.Error) {
            fs := new(FS)
            r := strings.NewReader(s[len(s)-4:])
            if err := binary.Read(r, binary.LittleEndian, &fs.root); err != nil {
                    return nil, err
            }
            fs.s = s[0:len(s)-4]
            fs.dec = gob.NewDecoder(&fs.rd)
    
            // read dummy entry at start to prime the gob types.
            fs.rd = strings.Reader(fs.s)
            if err := fs.dec.Decode(new([]entry)); err != nil {
                    return nil, err
            }
    
            return fs, nil
    }
    
    func isSlash(c int) bool {
            return c == '/'
    }
    
    // Open opens the named path within fs.
    // Paths are slash-separated, with an optional
    // slash prefix.
    func (fs *FS) Open(path string) (*File, os.Error) {
            p := strings.FieldsFunc(path, isSlash)
            e := &entry{dir: true, offset: int(fs.root)}
    
            fs.mu.Lock()
            defer fs.mu.Unlock()
            for _, name := range p {
                    var err os.Error
                    e, err = fs.walk(e, name)
                    if err != nil {
                            return nil, err
                    }
            }
            if e.dir {
                    return &File{fs, "", e}, nil
            }
            return &File{
                    fs,
                    strings.Reader(fs.s[e.offset: e.offset+e.len]),
                    e,
            }, nil
    }
    
    func (fs *FS) walk(e *entry, name string) (*entry, os.Error) {
            if !e.dir {
                    return nil, os.ErrorString("not a directory")
            }
            contents, err := fs.contents(e)
            if err != nil {
                    return nil, err
            }
            for i := range contents {
                    if contents[i].name == name {
                            return &contents[i], nil
                    }
            }
            return nil, os.ErrorString("file not found")
    }
    
    // IsDirectory returns true if the file represents a directory.
    func (f *File) IsDirectory() bool {
            return f.entry.dir
    }
    
    // Read reads from a file. It is invalid to call it on a directory.
    func (f *File) Read(buf []byte) (int, os.Error) {
            if f.entry.dir {
                    return 0, os.ErrorString("cannot read a directory")
            }
            return f.rd.Read(buf)
    }
    
    // contents returns all the entries inside a directory.
    func (fs *FS) contents(e *entry) (entries []entry, err os.Error) {
            if !e.dir {
                    return nil, os.ErrorString("not a directory")
            }
            fs.rd = strings.Reader(fs.s[e.offset:])
            err = fs.dec.Decode(&entries)
            return
    }
    
    // Readdirnames returns the names of all the files in
    // the File, which must be a directory.
    func (f *File) Readdirnames() ([]string, os.Error) {
            f.fs.mu.Unlock()
            defer f.fs.mu.Unlock()
            entries, err := f.fs.contents(f.entry)
            if err != nil {
                    return nil, err
            }
            names := make([]string, len(entries))
            for i, e := range entries {
                    names[i] = e.name
            }
            return names, nil
    }
    

Leave a Reply