Proper Linq where clauses
I write a fair amount of linq in my day to day life, but mostly simple statements. I have noticed that when using where clauses, there are many ways to write them and each have the same results as far as I can tell. For example;
from x in Collection where x.Age == 10 where x.Name == "Fido" where x.Fat == true select x;
Appears to be equivalent to this at least as far as the results are concerned:
from x in Collection where x.Age == 10 && x.Name == "Fido" && x.Fat == true select x;
So is there really a difference other than syntax? If so, what is the preferred style and why?
The second one would be more efficient as it just has one predicate to evaluate against each item in the collection where as in the first one, it's applying the first predicate to all items first and the result (which is narrowed down at this point) is used for the second predicate and so on. The results get narrowed down every pass but still it involves multiple passes.
Also the chaining (first method) will work only if you are ANDing your predicates. Something like this
x.Age == 10 || x.Fat == true will not work with your first method.
EDIT: LINQ to Objects doesn't behave how I'd expected it to. You may well be interested in the blog post I've just written about this...
They're different in terms of what will be called - the first is equivalent to:
Collection.Where(x => x.Age == 10) .Where(x => x.Name == "Fido") .Where(x => x.Fat == true)
wheras the latter is equivalent to:
Collection.Where(x => x.Age == 10 && x.Name == "Fido" && x.Fat == true)
Now what difference that actually makes depends on the implementation of
Where being called. If it's a SQL-based provider, I'd expect the two to end up creating the same SQL. If it's in LINQ to Objects, the second will have fewer levels of indirection (there'll be just two iterators involved instead of four). Whether those levels of indirection are significant in terms of speed is a different matter.
Typically I would use several
where clauses if they feel like they're representing significantly different conditions (e.g. one is to do with one part of an object, and one is completely separate) and one
where clause when various conditions are closely related (e.g. a particular value is greater than a minimum and less than a maximum). Basically it's worth considering readability before any slight performance difference.
Read more... Read less...
The first one will be implemented:
Collection.Where(x => x.Age == 10) .Where(x => x.Name == "Fido") // applied to the result of the previous .Where(x => x.Fat == true) // applied to the result of the previous
As opposed to the much simpler (and
far fasterpresumably faster):
// all in one fell swoop Collection.Where(x => x.Age == 10 && x.Name == "Fido" && x.Fat == true)
when i run
from c in Customers where c.CustomerID == 1 where c.CustomerID == 2 where c.CustomerID == 3 select c
from c in Customers where c.CustomerID == 1 && c.CustomerID == 2 && c.CustomerID == 3 select c customer table in linqpad
against my Customer table it output the same sql query
-- Region Parameters DECLARE @p0 Int = 1 DECLARE @p1 Int = 2 DECLARE @p2 Int = 3 -- EndRegion SELECT [t0].[CustomerID], [t0].[CustomerName] FROM [Customers] AS [t0] WHERE ([t0].[CustomerID] = @p0) AND ([t0].[CustomerID] = @p1) AND ([t0].[CustomerID] = @p2)
so in translation to sql there is no difference and you already have seen in other answers how they will be converted to lambda expressions
Looking under the hood, the two statements will be transformed into different query representations. Depending on the
Collection, this might be optimized away or not.
When this is a linq-to-object call, multiple where clauses will lead to a chain of IEnumerables that read from each other. Using the single-clause form will help performance here.
When the underlying provider translates it into a SQL statement, the chances are good that both variants will create the same statement.