Go Fun - the cost of threads

Here’s a little program I wrote in Go. In it, I wanted to explore how it might work to create a giant “onion” of threads, passing messages in towards the center, where they would be reflected and sent back out. Why? Just for fun, to solidify my grasp on the syntax, to see what would happen.

Here’s my program:

package main

import (
  "fmt"
  "time"
)

func make2() (chan int, chan int) {
  return make(chan int), make(chan int)
}

func makeAndEcho(ct int, in, out chan int) {
  var in2, out2 chan int;

  echo := func () {
    for !closed(in) {
      x := <-in
      out <- x
    }
  }

  pass := func () {
    for !closed(in) {
      select {
      case x := <-in: {
        in2 <- x
      }
      case x := <-out2: {
        out <- x
      }
      }
    }
  }

  if (ct == 0) {
    go echo()
  } else {
    in2, out2 = make2()
    makeAndEcho(ct-1, in2, out2)
    go pass()
  }
}

func try(depth, n int) {
  in, out := make2()
  makeAndEcho(depth, in, out)

  before := time.Nanoseconds()
  for i := 0; i < n; i++ {
    in <- 1
    _ = <- out
  }
  close(in)
  after := time.Nanoseconds()

  transits := int64(n*depth*2)
  fmt.Printf(
    "%v round-trips, %v nsec/trip, %v nsec/transit for %v transits\n",
    n, (after-before)/int64(n), (after-before)/int64(transits), transits);
}

func main() {
  try(10, 10000)
  try(100, 10000)

  ch := make(chan int, 1)
  before := time.Nanoseconds()
  for i := 0; i < 200000; i++ {
    ch <- i
          _ = <- ch
  }
  after := time.Nanoseconds()
  fmt.Printf(
    "no threads: %v nsec/transit for %v transits\n",
    (after-before)/200000, 200000)
}

When you run it, you get this:

10000 round-trips, 26794 nsec/trip, 1339 nsec/transit for 200000 transits
10000 round-trips, 256800 nsec/trip, 1284 nsec/transit for 2000000 transits
no threads: 170 nsec/transit for 200000 transits

So, what does that mean? Two things jump out: (1) the time per channel transit is lower for the deeply nested onion, and (2) the time per transit across threads is 10 times the time per transit across a channel that’s in the same thread. A quick guess why on the first one is that the more deeply nested onion triggers the same number of GC’s, so the overhead is spread over more transits, and makes each transit appear to be cheaper. I would have to investigate GCs more, and find out how to turn them off to check their impact.

Another thing that would be interesting would be to time the setup and teardown and understand if it goes up linearly with the number of channels. And how do you time the teardown? Timing the close(in) wouldn’t do it, you’d need to time from the close to when the final “I’m closed” message arrives on the out channel. (This part of Go is a bit messy. I was confused to find these random nil messages arriving on my channels when I was playing with another program.)

This little thing does not argue any particular point; not that threads are expensive, nor that channels are. It’s just interesting to see that this works, and that Go is able to handle a hundred threads that are all communicating with each other with no sweat.