LINQ Mysteries: The Distinct Function
Recently I had to use the Distinct function introduced in LINQ. My surprise was that depending on where you put the Distinct clause you will receive different results.
Let us take the following example: Let dtAnswers
be a DataTable that has two columns, named answer_value
and answer_comment
. What I was seeking as a result was to return the count of the different answer_value
values from all the rows. So knowing how it is done in SQL, I’ve wrote the following LINQ query in code:
Dim nDifferentValues As Integer = _
(From answer in dtAnswers.Rows _
Distinct Select answer("answer_value").ToString()).Count()
My surprise was that this will return always the same thing, no matter what rows I have and what values I have in the answer_value
column. So after struggling several hours I’ve decided to try the Function syntax of LINQ:
Dim nDifferentValues As Integer = _
(From answer in dtAnswers.Rows _
Select answer("answer_value").ToString()).Distinct.Count()
Now this returns results correctly.
My guess is that the first query computes distinct on the rows first (by reference) and after that from the result selects answer("answer_value").ToString()
. While the second query first selects answer("answer_value").ToString()
and after that computes distinct.
So be careful where you put you LINQ functions 😉