Interesting Linq Problem


Check out the code below:

using System.Linq;
class Program
{
  static void Main()
  { 
      int[] data = { 1, 2, 3, 1, 2, 1 };
      foreach (var m in from m in data orderby m select m)
            System.Console.Write(m);
  }
}

Now, the question is:

frustration3  Is this code valid or not??

  If valid, how?

  If not valid, why?

(Source: Eric Lippert’s Blog)

Post to Twitter Post to Plurk Post to Yahoo Buzz Post to Delicious Post to Digg Post to Facebook Post to MySpace Post to Ping.fm Post to Reddit Post to StumbleUpon

,

  1. #1 by Kinnar Shah on November 5, 2009 - 12:12 am

    @Sanil: dude… are you human???!!
    sheer awesomeness!!!
    bow down to thee…

  2. #2 by Sanil on November 4, 2009 - 3:27 pm

    Really interesting…
    well! IMO that’s correct to me…
    the query will return a sorted enumerable collection

    though, I will like it to write like this:

    using System.Linq;
    namespace solution
    {
    class Program
    {
    static void Main()
    {
    int[] data = { 1, 2, 3, 1, 2, 1 };
    foreach (var m in from m in data orderby m select m)
    System.Console.Write(m);
    }
    }
    }

    Here the foreach block should work like this
    foreach (var m in [from m in data orderby m select m] )

    or

    foreach(var m = [from m' in data order by m' select m'])

    that would eventually looks like on further abstraction
    var m in m’ (where m’ is enumerable)

    take things between [ ] as another block.
    Here a block is code construct between braces
    { /// A block }

    when compiler will be working on above given code, it will take the two Ms (read as ams), and will internally create two different variables for this.
    actually compiler does this often, where it needs to do things like
    variable = variale * something; (here * is an operation)

    Let me prove this by running a Reflection (System.Reflection) based Method Body (The main method) analysis:

    Let’s assume, if the main() body has construct like this:
    public class program
    {
    public void main()
    {
    int[ ] data = { 1, 2, 3, 1, 2, 1 };
    foreach (var num in data)
    {
    System.Console.Write(num);
    }
    }
    }

    We will use above main() block, as given below to get our analysis done

    class Program
    {
    static void Main()
    {
    int[] data = { 1, 2, 3, 1, 2, 1 };
    foreach (var num in data)
    {
    System.Console.Write(num);
    }

    new Analyse().Run();
    }
    }

    class Analyse
    {
    public void Run()
    {
    Assembly asm = Assembly.GetAssembly(typeof(Program));
    MethodBody mb = asm.EntryPoint.GetMethodBody();
    System.Console.WriteLine(“\nMethod Name: “+asm.EntryPoint.Name);

    foreach (var locals in mb.LocalVariables)
    {
    System.Console.WriteLine(“\n {0}”, locals.LocalType.FullName);
    }
    System.Console.ReadKey();
    }
    }

    The Run method would actually list all the variables that are present after actual compilation, this will include the variables we have declared explicitly or implicitly as well as helper variables introduced by compiler to store temporary results and perform calculations.

    The output on listing local variables in compiled assembly is:
    ( The actual variables that are present after compilation)

    Method Name: Main

    System.Int32[]

    System.Int32

    System.Int32[]

    System.Int32

    System.Boolean

    A Little explanation:

    the first system.Int32[] is this array int[ ] data = { 1, 2, 3, 1, 2, 1 };

    next System.Int32 will receive value at each iteration and will be used in WriteLine()

    the second System.Int32[] refers to ForEach’s copy of array. Remember foreach doesn’t allow changing contents, so it makes a copy.

    the second Int32 is an index to track current iterator location

    last is a boolean which stores the result of condition check

    p.s: this is all guess work :P

    How did I decided that Ist System.int32 is what and what’s the role of the second one. Let’s edit our code a bit to play with a char array.

    class Program
    {
    static void Main()
    {
    char[] ch = { ‘a’, ‘b’, ‘c’ };
    foreach (var num in ch)
    {
    System.Console.Write(num);
    }

    new Analyse().Run();
    }
    }

    class Analyse
    {
    public void Run()
    {
    Assembly asm = Assembly.GetAssembly(typeof(Program));
    MethodBody mb = asm.EntryPoint.GetMethodBody();
    System.Console.WriteLine(“\nMethod Name: “+asm.EntryPoint.Name);

    foreach (var locals in mb.LocalVariables)
    {
    System.Console.WriteLine(“\n {0}”, locals.LocalType.FullName);
    }
    System.Console.ReadKey();
    }
    }

    This time the output appears like this:

    Method Name: Main

    System.Char[]

    System.Char

    System.Char[]

    System.Int32

    System.Boolean

    so you can see that, system.char is getting the assigned value that will be WriteLined (outputted)
    and thus comparing the ordinal similiarity between the two outputs it would be a right to think that first system.int32 is getting the current value of data (where iterator is pointing).

    Back to our actual code,

    class Program
    {
    static void Main()
    {
    int[] data = { 1, 2, 3, 1, 2, 1 };

    char[] ch = { ‘a’, ‘b’, ‘c’ };

    foreach (var m in from m in data orderby m select m)
    System.Console.Write(m);

    new Analyse().Run();
    }
    }

    class Analyse
    {
    public void Run()
    {
    Assembly asm = Assembly.GetAssembly(typeof(Program));
    MethodBody mb = asm.EntryPoint.GetMethodBody();
    System.Console.WriteLine(“\nMethod Name: “+asm.EntryPoint.Name);

    foreach (var locals in mb.LocalVariables)
    {
    System.Console.WriteLine(“\n {0}”, locals.LocalType.FullName);
    }
    System.Console.ReadKey();
    }
    }

    running same analysis on this code gives following output (explanation to each line is in corresponding brackets)

    Method Name: Main

    System.Int32[ ]
    (the actual data[] array)

    System.Int32
    (that will receive value at each iteration and will be WriteLined/outputted)

    System.Collections.Generic.IEnumerator`1[[System.Int32, mscorlib blah blah blah... ]]
    ( the generic IEnumerator generated from LINQ expression obtained, now all classes implementing IEnumerator has a method called MoveNext( ), hence it will not require any additional Sytem.int32 indexer, hence here it is absent in output)

    System.Boolean
    (condition check result as previously stated)
    From all this we get that compiler handle duplicate references smartly, until we play by his rules (according to C# language specs)

    making that Looong explanation short, here the var m (implicitly of type int32) is completely isolated from IEnumerator type ‘m’ in LINQ query due to somehow invisible scope block that compiler can see and work with when they encounter complex expressions like LINQ queries in such constructs.

    Hoping, that I’m here right in what I’m thinking, if you think I’m somewhere wrong do comment your opinion and theory/proofs here.

(will not be published)

Get Adobe Flash playerPlugin by wpburn.com wordpress themes