Where's all the magic? In the linker...

I have been trying to make a post per week about Go, but that requires learning something interesting during the week. I’m currently cycling between several little Go toys as I get the time. One is to make Go on raw hardware more useful/interesting. Another is a clone of the console server from conserver.com written in Go. Neither one of those little projects is at a point where I can really explain much about it, but not for want of trying… This week my Go console server project taught me that netchan cannot send channels (OK, I wasn’t really shocked at this, but I was hoping that it might work), and so I’ll need to make my own protocol, and include some proxy channel reader/writers at each end of the TCP connection to send the data into the channel like I want. On the raw-hardware side, I got stuck on code that ends up incorrect once it ends up in the ELF file. Go figure.

So though I don’t have the final answer yet, even what I know so far is at least a bit interesting, and that will be the topic of this week’s post. (Hopefully next week’s will be “I GOT IT! Here’s the solution…”. but don’t hold your breath.)

When you set GOOS to “tiny”, and then rebuild the runtime, you get a runtime system that knows how to run Go code on raw hardware with no underlying OS (see $GOROOT/src/pkg/runtime/tiny/README for some more info). Of course, if you try to compile something that reads a file, it will fail: there’s no kernel, let alone a filesystem. But you can write to the screen, which is nifty. I though it would be cool to add some more stuff to Go’s tiny runtime, for example a way to receive hardware interrupts over a channel. What’s really interesting about Go on raw hardware is that Go already provides a few of the things you need to write an OS (threads, scheduling, bounded arrays – a form of memory protection). I’m not proposing to write “the Go OS”. But I think it would be interesting to be able to run some kinds of servers on raw hardware; if you don’t need/want an OS under you, why should you have to have one?

So far, I’ve got a whole heap of code to compile when GOOS=tiny, including exp/draw and all its dependencies. My current goal is to put a white rectangle onto the VGA screen using exp/draw and a new backend that I’m calling exp/draw/svga.go. But something is going wrong, I’m getting a page fault from running off into bad memory. To debug that, it’s time to set up proper trap handlers. A nifty way to do that, borrowed from Plan 9, looks like this:

TEXT  intr0(SB),7,$0
  PUSHL $0
  PUSHL $0
  JMP intrcommon
TEXT  intr1(SB),7,$0
  PUSHL $0
  PUSHL $1
  JMP intrcommon
...
intrcommon:
  PUSHL DS
  PUSHL ES
  PUSHL FS
  PUSHL GS
  PUSHAL
  ...
  IRETL

What, you don’t know assembly? Please, I think you can figure out what’s happening here, but I’ll help anyway. For each type of interrupt, we emit a bit of code that sets up the stack right, then uses a common piece of code. This is something you can do in assembly that you can’t do in Go: jumping from inside of one routine to a label inside of another one. Also, you can’t futz with specific registers and the hardware stack, which is what you have to do in an interrupt service routine, and which explains why this code is assembler and not Go code.

When you run this code in Bochs and single step it, you find out what’s actually running on the processor is something more like this:

TEXT  intr0(SB),7,$0
  PUSHL $0
  PUSHL $0
  PUSHL DS
  PUSHL ES
  PUSHL FS
  PUSHL GS
  PUSHAL
  ... (rest of intrcommon here) ...
  IRETL
TEXT  intr1(SB),7,$0
  PUSHL $0
  PUSHL $1
  ...
 loop:
  JMP loop

Thus if you happen to hit interrupt 0, things work ok. But if you happen to hit interrupt 1, you are stuck in endless loop hell.

It seemed to me like 8a was incorrectly compiling this. So I added a bunch of debugging code to it, and I convinced myself that it was working right. One problem is that there is no debugging flag to help you understand the output .8 file you get after doing “8a test.s”. If 8a is not compiling it wrong, who else could be? Well, my interrupt service routines had to make it through one other piece of code before getting on the raw hardware: 8l. The problem must be in there, but I haven’t found it yet.

There’s way more than my bug in 8l, however. This is really where a lot of the magic of the Go system come from. It implements the split stacks Go uses to support thousands of threads. It also implements dependency tracking, so that the right object files are pulled in to satisfy all the imports. In contrast, the C linker is traditionally implemented as a rather stupid thing that just gathers .o files together. One thing you find in libraries meant for wide reuse is that all the functions are in their own file, because cc and ld are not smart enough to exclude unneeded code from the final binary except on the level of the .o file. But not all linkers are stupid: The GNU tool chain is smarter than this, at least on some platforms. Take a look at ld-s -gc-sections options if you are interested. And this article in MSDN talks about how .NET uses link time code generation for some optimizations.

I’ve confirmed that the instruction stream is coming into 8l correctly. The undocumented “-W” flag tells 8l to show you the contents of every .8 file it is reading. Take this little sample .s file:

TEXT foo1(SB), 7, $0
	NOP
	JMP bar
TEXT foo2(SB), 7, $0
	NOP
bar:
	NOP
	RET

Compile it with “8a test.s”, then look at it using “8l -W test.8”:

        ANAME   /
        ANAME   home
        ANAME   jra
        ANAME   go
        ANAME   src
        ANAME   cmd
        ANAME   8a
        ANAME   foo.s
(1)     HISTORY ,
(9)     HISTORY ,
        ANAME   foo1
(1)     TEXT    foo1+0(SB),7,$0-0
(2)     NOP     ,
        ANAME   bar
(3)     JMP     ,5(PC)
        ANAME   foo2
(4)     TEXT    foo2+0(SB),7,$0-0
(5)     NOP     ,
(7)     NOP     ,
(8)     RET     ,
(8)     END     ,
... and then 1000's of lines of dumps from other files here...

The output from -W shows that test.8 knows it is supposed to be jumping up into the next routine. My current guess at what’s happening is that 8l processes things one function at a time, never expecting dependencies between functions like this (since it is most definitely not something it should expect to be coming from 8g).

Bonus trivia nugget: The Go system defines a set of flags on functions that lets 8g and 8a communicate with 8l and tell it what to do. If a routine cannot, or should not check the stack size, then it needs to use a text line with 7 in it. If you leave out the 7, 8l will add in the stack checking code (which would be an extremely crash-worthy idea in an ISR, don’t you think?). We’ll talk more about stack checking one week when I dig into how split stacks work. For now, you can read Ian’s article on them.