(Updated: 9/8/2013 Added “Stateful Iterators” thanks to ryszard.)
I’ve been using Go to implement some performance-sensitive services
recently and, so far, it’s been a very nice. When writing Go code, it
often feels like a (slightly verbose) scripting language, but has nice
type-safety and good performance.
But I’ve realized that this feeling is really only true on a
line-to-line basis. What I’ve noticed recently is that a lot of my
code ends up being “collection munging”: creating, modifying, and
transforming collections. Python excels at this – the built-in core
functional working set (map
, reduce
, and friends) combined with
iterators/generators and list
/dict
comprehensions allows for some
very concise, readable code. And often it even has decent performance.
But it isn’t always good enough performance, and that was a reason for
moving to Go (or any higher performance, more systems-y
language). But I found myself even missing C++98 because collections
are actually pretty nice there. Iterators and templates allow code
nearly as concise as the Python version, even with all those type
declarations.
Unfortunately generics aren’t coming to Go soon . But
iterators are just interfaces. They abstract away the underlying
container to make it look like a sequence of values of some type with
a standard interface (e.g. Next()
, Prev()
, etc). This allows you
to write at least partially generic code, e.g. a sum function that
assumes an iterator producing a specific type. If each container can
generate an IntIterator
, we can write one Sum()
for all of
them. Without generics we still need SumInt()
vs. SumFloat()
,
but at least we don’t need SumIntSlice()
, SumIntDictValues()
,
SumFloatSlice()
, and SumFloatDictValues()
.
Once I realized that this is what I was missing, I wanted to know what
the canonical iterator pattern in Go is: what’s the interface and
the implementation? I ended up in this
thread
on the golang-nuts
mailing list which covers the basic
options for a forward-only iterator (which admittedly simplifies the
problem, mainly focusing on implementation rather than the appropriate
interface, but this is also the most common type of iterator we need).
But this discussion only covered the basics of the different options
and some pitfalls, but didn’t address performance issues and focused
primarily on the iterator’s implementation without much consideration
for the caller’s code, which I think is the more important part. The
whole idea is to provide a good abstraction to the caller and
hopefully not compromise performance. So which of the iterator
patterns gives the best balance? Or, a more realistic question – when
I’m writing code, which version should I use in each case: a) I just want
to get the code working b) I want to extract maximum performance or c)
I want a balance between readability and performance?