Tuesday, December 14, 2010

Performance Comparision Between For, While and Foreach Loop

Today I was posted one post regarding list to datatable conversion. But one of the my senior told me that try to avoid foreach loop.

So that I was googled/binged regarding the same.

Here I am showing you some figure for the same here.

Using the System.Diagnostics.Stopwatch class I ran some tests. 100,000 iterations in a for loop that did nothing inside took me 0.0003745 seconds. This was the code for the loop:
 
for (int i = 0; i < 100000; i++) ;

The while loop resulted 0.0003641 seconds, which is pretty much the same as the for loop. Here is the code I used:

int i=0;
while (i < 100000)
 i++;

 
The foreach loop has a slightly different purpose. It is meant for itterating through some collection that implements IEnumerable. It's performance is much slower, my test resulted in 0.0009076 seconds with this code:

int[] test = new int[100000];
foreach (int i in test) ;

foreach creates an instance of an enumerator (returned from GetEnumerator) and that enumerator also keeps state throughout the course of the foreach loop. It then repeatedly calls for the Next() object on the enumerator and runs your code for each object it returns.


So, it seems that 'while' is the fastest looping technique among the three available techniques in C#,  for a given processing within the loop. Right?

It varies. "while" and "for" have pretty much the same results. While had a slight advantage, but not one that great.


Im not an expert, but I have the feeling that a while and for loop, once compiled to MSLI, probably are both the exact same thing.

And I wouldn't be surprised if foreach was faster when iterating through objects, since the optimizer can "expect" whats going to happen... So I'd go with foreach being faster if you can use it, and the 2 others being the same thing, if foreach isn't applicable.

My logic here is that if you're looping through a collection, and you use a for loop for example, you're going to have to use the index of the collection, which depending on implementation, could be a minor performance hit to "seek" the object, as opposed to going through a well written iterator.

Just an example.So really: I wouldn't care too much about performance of these loops. This isn't C/C++, and when you compile, its not native code (at first). So it is safe to assume that the solution that seems the most efficient "logically" will be so in practice.

For each is slower for a number of reasons. One is it is using the IEnumerable interface, which requires some casting (assuming you aren't using a generic collection). My test above seemed to go along with that as well. A simple for/while/do loop is pretty much as simple as it gets.

As for the MSIL code, let's take a look. This is the MSIL for the for loop:

IL_0000:  nop
  IL_0001:  ldc.i4.0
  IL_0002:  stloc.0
  IL_0003:  br.s       IL_0009
  IL_0005:  ldloc.0
  IL_0006:  ldc.i4.1
  IL_0007:  add
  IL_0008:  stloc.0
  IL_0009:  ldloc.0
  IL_000a:  ldc.i4     0x186a0
  IL_000f:  clt
  IL_0011:  stloc.1
  IL_0012:  ldloc.1
  IL_0013:  brtrue.s   IL_0005

And here it is for the while loop:

IL_0000:  nop
  IL_0001:  ldc.i4.0
  IL_0002:  stloc.0
  IL_0003:  br.s       IL_0009
  IL_0005:  ldloc.0
  IL_0006:  ldc.i4.1
  IL_0007:  add
  IL_0008:  stloc.0
  IL_0009:  ldloc.0
  IL_000a:  ldc.i4     0x186a0
  IL_000f:  clt
  IL_0011:  stloc.1
  IL_0012:  ldloc.1
  IL_0013:  brtrue.s   IL_0005

So yes, they are exactly identical.

Although very handy, C#'s foreach statement is actually quite dangerous. In fact, I may swear off its use entirely. Why? Two reasons: (1) performance, and (2) predictability.

Performance

Iterating through a collection using foreach is slower than with for. I can't remember where I first learned that, perhaps in Patterns & Practices: Improving .Net Application Performance. Maybe it was from personal experience. How much slower? Well, I suppose that depends on your particular circumstances. Here are a few interesting references:

Predictability

I was looking at the C# Reference entry for foreach today and noticed this for the first time (italics added by me):

The foreach statement is used to iterate through the collection to get the desired information, but should not be used to change the contents of the collection to avoid unpredictable side effects.
What's that all about? Let's take this as an example:
foreach(MyClass myObj in List)
Looking deeper into the C# Language Specification... the iteration variable is supposed to be read-only, though apparently that doesn't stop you from updating a property of an object. Thus for instance it would be illegal to assign a new value to myObj, but not to assign a new value to myObj.MyProperty.
And that's all I can find. Why are there unpredictable side effects? I don't know. But seems best to heed Microsoft's warning.

Conclusion

Some argue that you shouldn't code for performance from the beginning, and therefore go ahead and use foreach whenever you want so long as you don't update the values. In my experience that's hogwash — most of the code I work on goes into environments where performance is extremely important. Besides, writing a for statement requires very little extra coding compared to a foreach statement. Furthermore, if you have a lot going on inside your iteration block, it can be easy to forget and accidentally update the iteration variable inside a foreach loop. Thus do I conclude: just avoid foreach altogether.

Honestly the academically correct answer is "It's Irrelevant".  You cant optimize performance by somehow picking "the best loop".  If you're doing performance optimizations you should start with a Big O analysis of your algorithms and Profiling.  You'd be amazed at how much faster keeping around a dictionary for lookups or using a sorted list + binary search is than looping over a list every time you need to find an object.

Doing so will give you unnoticeable performance increases at the cost of maintainability and programmer time and the effort you're putting in to such small performance boosts is better spent on a proper design and implementation.   Let's take a real world example of finding duplicates in a list to illustrate my point:
first let's assume that comparing two items is O(1).  We can implement this any number of ways, two of which are:

A) use nested loops with i on the outer loop and j on the inner loop, when list[i] == list[j] push the pair onto a list of duplicates.
B) copy the list to tmpList.  quicksort tmpList.  iterate over the list with i, when list[i] == list[i+1] push the pair onto the list of duplicates

No amount of optimization can change the fact that A runs in O(n^2) while B runs in O(2*n+n*log(n)).

No comments:

Post a Comment