The Skulduggery of IEnumerable - Part 1 : Of Count & Element Indexes

Okay, those of us who use ReSharper or have taken to lets use interfaces where we can so we can have flexibility would probably have used the IEnumerable construct at some point (ReSharper actually recommends it from time to time). However with my OCD nature about code and some landmines I steped on while coding has made me realize that this thing has its bits of skulduggery that may senak up on you. There are a few scenarios here which I will outline in this multi-part blog series. Hopefully save you the trouble of going throug the pains.

Count() & ElementAt() Skulduggery

Okay for this little bad boy, I will outline the two bits of skulduggery that can cause some pains. The example here is trivual in nature but when your app scales up, this will be an issue. Consider this console app that lists guitar heroes:

using System;
using System.Collections.Generic;
using System.Linq;

namespace IEnumerablePitfalls
{
    class Program
    {
        static readonly string[] GuitarHeroes =
        {
            "Jimi Hendrix",
            "Eric Clapton",
            "Micheal Schenker",
            "Rudolph Schenker",
            "Matthias Jabs",
            "John Petrucci",
            "Joe Satriani",
            "Steve Vai",
            "Paul Gilbert",
            "Adrian Smith",
            "Van Halen",
            "Akira Takasaki"
        };

        static void Main(string[] args)
        {
            TheCountAndElementAtSkulduggery();
        }

        static void TheCountAndElementAtSkulduggery()
        {
            Console.WriteLine("====================================================");
            Console.WriteLine("=================== Guitar Gods ====================");
            Console.WriteLine("====================================================");
            Console.WriteLine(string.Empty);

            IEnumerable<string> guitarHeroesWithTheLetterAInLowerCase = GuitarHeroes.Where(x => x.Contains('a'));

            for (var i = 0; i < guitarHeroesWithTheLetterAInLowerCase.Count(); i++)
            {
                Console.WriteLine("Guitar God {0} : {1}", i, guitarHeroesWithTheLetterAInLowerCase.ElementAt(i));
            }

            Console.WriteLine("Just press something and be done with this.");
            Console.ReadKey();
        }
    }
}

There are two issues there, namely:

  • Count()
  • ElementAt()

Count()

The first thing you will notice is that this is a method and not a property. This implies that the framework will do some jiggery pokery to get this thing to work. Now consider this, IEnumerable is something that does delayed execution so it makes sense that it needs to be a method and not a property. If you don't get it, delayed execution means that this bad boy does not have the values up-front so it needs to do some computation to give it to you. Now we could use JetBrain's awesome decompiler to see what Count() does but since M$FT has not released the source code so lets use that instead.

Source Code Permalink : http://referencesource.microsoft.com/#q=Enumerable.cs

The part of the code that is interest is as seen below:

// Did some formatting this is is easier to read.
public static int Count<TSource>(this IEnumerable<TSource> source)
{
  if (source == null)
  {
      throw Error.ArgumentNull("source");
  }

  ICollection<TSource> collection1 = source as ICollection<TSource>;
  if (collection1 != null)
      return collection1.Count;

  ICollection collection2 = source as ICollection;
  if (collection2 != null)
    return collection2.Count;

  int num = 0;
  using (IEnumerator<TSource> enumerator = source.GetEnumerator())
  {
    while (enumerator.MoveNext())
      checked { ++num; }
  }
  return num;
}

Now if you are using an ICollection based thing, this stuff works fine because the count is already in place BUT in our case, we ain't doing that. That this code shouws is that it is using an enumerator to basically count a row each time you call it. Now in the case of my console app for guitar heroes, it is not an issue but on a real application this can be an issue. Consider this, if you have an application with a 200,000 rows, the Count() method will enumerate 200,000 items, get the count of items from a variable that it increments, then only return you the count so you can use it to see how much items you have in your collection. Imagine doing this every time your loop increments.

  1. At Index 0 : Count 200,000 times.
  2. At Index 1 : Count 200,000 times.
  3. ...
  4. At Index 199999 : Count 200,000 times.

Skulduggery incarnate. Hands down the dumbest way to get the number of items in your collection.

Facepalm

ElementAt()

Now we have seen the skulduggery of Count(), we get to see the second one. Again no need for any decompilers here, M$FT has released the source code here:

Source Code Permalink : http://referencesource.microsoft.com/#q=Enumerable.cs

The bit of interest is here:

// Formatting done to make this easier to read
public static TSource ElementAt<TSource>(this IEnumerable<TSource> source, int index) 
{
        if (source == null) throw Error.ArgumentNull("source");

        IList<TSource> list = source as IList<TSource>;
        if (list != null) return list[index];

        if (index < 0) throw Error.ArgumentOutOfRange("index");

        using (IEnumerator<TSource> e = source.GetEnumerator()) 
        {
            while (true) 
            {
                if (!e.MoveNext()) throw Error.ArgumentOutOfRange("index");
                if (index == 0) return e.Current;
                index--;
            }
        }
   }

If the enumerator does not implementr IList which we aren't, we will need to use an interator for this. This equates to the code to do the following bits:

  1. Get the enumerator.
  2. Calls MoveNext().
    1. Repeat until you get to the item that is of interest.
  3. Return the item.

This can't be good. It does not matter if you are doing something like my faux application that just returns a small set of guitar heroes but it would matter in a real application. Imagine if you have 200,000 items and you are using this in a for loop. In addition to the Count() skulduggery that does internal counting, this does the same thing within the body, exponetially increasing the time it takes to process the data.

Consider this in conjunction with the Count() for a loop that must iterate 200,000 items in a for loop.

  1. At Index 0
    1.1 Count to 200,000 for to get the number of items in your collection.
    1.2 Return the first item you see (not bad.....yet.)
  2. At Index 1
    2.1 Count to 200,000 for to get the number of items in your collection.
    2.2 Interate until the second item and then return it.
  3. At Index 1
    3.1 Count to 200,000 for to get the number of items in your collection.
    3.2 Interate until the third item and then return it.
  4. ...
  5. At Index 199999
    5.1 Count to 200,000 for to get the number of items in your collection.
    5.2 Interate until the item number 200,000th and then return it.

Holy moses, that is a horrible faux way to get items by index in the IEnumerable collection.

Yet another example of skulduggery incarnate.

Double Facepalm

How would we fix this?

Well, the first order of business would be to use a collection THAT IS not one that has deffered/lazy loaded/(insert term for collection that does not load its stuff up-front).

Array

Lets just see how it would look like if we used a standard array:

string[] GuitarHeroes =
{
    "Jimi Hendrix",
    "Eric Clapton",
    "Micheal Schenker",
    "Rudolph Schenker",
    "Matthias Jabs",
    "John Petrucci",
    "Joe Satriani",
    "Steve Vai",
    "Paul Gilbert",
    "Adrian Smith",
    "Van Halen",
    "Akira Takasaki"
};

var filteredGuitarHeroes = GuitarHeroes.Where(x => x.Contains('a')).ToArray();

for(var i = 0; i < filteredGuitarHeroes.Length; i++)
{
    string currentHero = filteredGuitarHeroes[i];
    Console.WriteLine("Guitar God {0} : {1}", i, currentHero);
}

Now an array has the stuff up front already so the Length is already in place without any need for any jiggery pokery computations to get the length (which is why Length is a property and not a method). This equates the code actually meaning this up-front:

string[] GuitarHeroes =
{
    "Jimi Hendrix",
    "Eric Clapton",
    "Micheal Schenker",
    "Rudolph Schenker",
    "Matthias Jabs",
    "John Petrucci",
    "Joe Satriani",
    "Steve Vai",
    "Paul Gilbert",
    "Adrian Smith",
    "Van Halen",
    "Akira Takasaki"
};

for(var i = 0; i < 12; i++)
{
    string currentHero = GuitarHeroes[i];
    Console.WriteLine("Guitar God {0} : {1}", i, currentHero);
}

The 12 is there already because as I said earlier, an array has the stuff upfront so the array will know that there are 12 items there already.

Furthermore, since the data is already there up-front, there is no need for a fancy method to actually get your items via fancy iterators, just specify the index you want as seen in the example above.

Now what if you want to use a collection because you have a complex object or just plain love generics in .NET. Well fear not, because if you actually saw the source code from M$FT, if you were to use something that implements ICollection or IList, you are in the green. Just change the code to look like something along the lines of the following:

IList<> or ICollection

IList<string> guitarHeroesWithTheLetterAInLowerCase1 = GuitarHeroes.Where(x => x.Contains('a')).ToList();

ICollection<string> guitarHeroesWithTheLetterAInLowerCase2 = GuitarHeroes.Where(x => x.Contains('a')).ToList();

Because IList implements ICollection and ICollection is well...ICollection incarnate, it will return the actual count becase the ToList thing/method called at the end of that code snippet actually loads everything up front removing the need for any form of fancy computations to get the count.

If you recall the source code from M$FT that I showed earlier or if you don't here you go again:

  ICollection<TSource> collection1 = source as ICollection<TSource>;
  if (collection1 != null)
      return collection1.Count;

  ICollection collection2 = source as ICollection;
  if (collection2 != null)
    return collection2.Count;

It will return the count right away regardless if you used the Count as a field/property or as a method. So this means, these 2 ways to get the count are actually doing the same thing:

IList<string> guitarHeroesWithTheLetterAInLowerCase = GuitarHeroes.Where(x => x.Contains('a')).ToList();

var countFromProperty = guitarHeroesWithTheLetterAInLowerCase.Count;
var countFromMethod = guitarHeroesWithTheLetterAInLowerCase.Count();

ICollection<string> guitarHeroesWithTheLetterAInLowerCase2 = GuitarHeroes.Where(x => x.Contains('a')).ToList();

countFromProperty = guitarHeroesWithTheLetterAInLowerCase2.Count;
countFromMethod = guitarHeroesWithTheLetterAInLowerCase2.Count();

Both DO NOT require any form of computations to get the number of elements. Like the array example, you can get elements by simple specifyting the index like so:

IList<string> guitarHeroesWithTheLetterAInLowerCase = GuitarHeroes.Where(x => x.Contains('a')).ToList();

var firstItem = guitarHeroesWithTheLetterAInLowerCase[0];
var secondItem = guitarHeroesWithTheLetterAInLowerCase[1];

ICollection<string> guitarHeroesWithTheLetterAInLowerCase2 = GuitarHeroes.Where(x => x.Contains('a')).ToList();

firstItem = guitarHeroesWithTheLetterAInLowerCase2[0];
secondItem = guitarHeroesWithTheLetterAInLowerCase2[1];

Remember, both IList and ICollection are the same as the former implements ICollection anyway.

Since the truth is in the pudding, here is the pudding that shows IList implements ICollection for you :
http://referencesource.microsoft.com/#mscorlib/system/collections/ilist.cs,5d74f6adfeaf6c7d

IEnumerable

But what if I must use enumerate IEnumerable because it is the only way?

Well I personally can't think of a reason why you will need to enumerate an IEnumerable and not do a .ToList() to ensure everything is there but if you have to, the rule is painfully simple, don't use a for loop. Yes, you can enumerate this bad boy safely. The for..each loop comes to the rescue for your strange needs.

string[] GuitarHeroes =
{
    "Jimi Hendrix",
    "Eric Clapton",
    "Micheal Schenker",
    "Rudolph Schenker",
    "Matthias Jabs",
    "John Petrucci",
    "Joe Satriani",
    "Steve Vai",
    "Paul Gilbert",
    "Adrian Smith",
    "Van Halen",
    "Akira Takasaki"
};

IEnumerable<string> guitarHeroesWithTheLetterAInLowerCase = GuitarHeroes.Where(x => x.Contains('a'));

var i = 0;
foreach(var guitarHeroeWithTheLetterAInLowerCase in guitarHeroesWithTheLetterAInLowerCase)
{
    Console.WriteLine("Guitar God {0} : {1}", i, guitarHeroeWithTheLetterAInLowerCase);
    i += 1;
}

Since there are no Count operations nor the need to get item by the item's index, this would greatly speed up your code. This basically means the enumerator is only called once rather than twice for every item in the original code (one for Count() and the other for ElementAt()).

Conclusion

While it is still okay to use IEnumerable if you use a for..each loop, I would recommend using either Array, IList or ICollection as it affords you more flexibility in what you can do and not worry about forgetting about the skulduggery I mentioned which may cause you performance issues.

There are other reasons to avoid IEnumerable and also reasons to use IEnumerable but that would be covered in subsequent posts.

Part 1 : The Skulduggery of IEnumerable - Part 1 : Of Count & Element Indexes

Part 2 : The Skulduggery of IEnumerable - Part 2 : Of Enumerators & Generators