Saturday 21 February 2009

Lazy Evaluation Gotcha!

Note: I'm assuming a vague knowledge of LINQ, and I'd recommend reading up on Lazy Evaluation on other blogs as I only skim over the concept here.

Code-writing for the Conquest project is currently paused while I do some research around the subject area (RTS games), particularly around the unit AI, which is my immediate bugbear; in the meantime, I've been focusing my efforts on Setun, a project that I started early last year. I'll go into more detail on exactly what I'm aiming to achieve with the project in a later post, but for now I can summarise it as a Balanced Ternary virtual machine simulated at the logical electronics level; that is, I'm simulating And and Or gates, Decoders and other similar-sized components, but not going all the way down to transistors and shifting voltages. This is lower-level than a typical VM, but not a strict physical simulation.

I've been increasingly using LINQ in this project; each large-scale component (like registers and the ALU) is typically joined to the next by a bus, or a collection of electronic lines (i.e. wires) and I'm simulating this with an IEnumerable of ILine. Now, when you're handling lots of IEnumerable objects, concatenating them, picking out individual items, taking subsets and so-on, then LINQ is perfect; manipulation of sets is exactly what it's designed for.

However, one of the features of LINQ that often catches people out is Lazy Evaluation, which, if you're not familiar with it, simply means that any LINQ query you create (e.g. pick the fifth object out a list of strings) is not executed (or evaluated) until the results of the query are actually needed. In most cases, this is really useful; queries that are declared but never used are never evaluated and optimisations are performed once the full query is declared and applied, which is better than optimising piecemeal as the query is constructed.

But...

If the timing of the query is important, Lazy Evaluation can catch you out. If you declare a query to pick the first three items of a List, clear the list then try and display the items that you just queried for, you'll find that your query returns nothing. This is because the query now points to an empty list, as it was evaluated after the list was cleared.

Now, I've pretty heavily summarised Lazy Evaluation, because there's plenty of blogs that explain it in enough detail, so instead I'll focus on the behaviour that caught me out; that is, multiple evaluation.

I've been using the Select method to create sets of objects from other sets of objects - for example, I could take an IEnumerable of ILines, and wrap them all in Not gates, like this:


var lines = new List<ILine>();

//Insert some ILines here!

var invertedLines = lines.Select(line => new NotGate(line));


Which gives me a set of NotGates, one for each input line - this is all well and good. But, I tried the same trick with registers; unlike Not gates, which are pretty simple (inverting whatever their input is) registers store a value, only changing on clock ticks. When I tried to display the contents of the set of registers, they always came out as 0 - the initial value. No matter what input I gave, they wouldn't change.

A little investigation later, and I realised the problem: because Lazy Evaluation occurs when you use the result of a query, it happens every time you use it. Each time I iterated over the IEnumerable of registers to output the contents, the LINQ query was creating a new set of register objects, which of course contained the initial value.

So remember, if you're using Select to create sets of objects (for which it is very useful), always remember to stick a ToArray() call at the end; this forces the query to evaluate immediately, and only once.