Getting the Most out of LINQ

By on November 2nd 2010

Language Integrated Queries, or LINQ, is a language feature Microsoft introduced into .NET 3.5 which allows for a more functional approach to programming in .NET in that it allows you to easily work on collections of data, filtering them with custom operations, transforming them (mapping), or combining them (reducing) in interesting ways. I use LINQ all the time in my C# code these days, however, I've found that more and more, I'm using the so-called 'Lambda' syntax over the 'query' syntax.

The Query syntax is meant to be simpler, but that simplicity makes it far less powerful. But to start, here's a pair of identical queries, first in Query syntax, then Lambda. Note, I'm talking here about LINQ-to-Objects, but this translates to LINQ-to-SQL almost as easily.

        var collection; // Some sort of collection, anything that implements IEnumberable
        
        var queryResult = from c in collection
                          where c.value == "Awesome"
                          select c.otherField;
        
        var lambdaResult = collection.Where(c => c.value == "Awesome")
                                     .Select(c => c.otherField);

Two identical queries, with a collection being filtered and then mapped to a new imput. Now, I do find that Query syntax is far easier to map with than lambda syntax, so I typically start my queries in query syntax, to get the basics out of the way, and in the background the query syntax is converted to Lamda syntax before compilation, making these examples nearly identical in their bytecode output. The point of the Query syntax, I believe, was to be more familiar for those people accustomed to know SQL, but aren't comfortable with functional programming, however, using Lambdas in C# allows you to do some very powerful and useful things.

To take brief sidetrack into LINQ-to-SQL (or Entities, this would work either way), I use code similar to the following for a lot of queries I run in my data models.

        public class Model {
            var database = new DatabaseContext();
        
            private IQueryable GetBaseQuery()
            {
                return from d in database.Table
                       select new LocalDataType()
                       {
                            Id = d.Id
                            Value = d.field1,
                            Data = d.field2
                       };
            }
        }

LocalDataType is simply a Data Transfer Object (DTO) that I've defined as what needs to be returned from my Model. By having GetBaseQuery private, I can publish an interface of functions that know how to get the data out of database.Table, and translate it into LocalDataType, all while being able to apply their own filters, with virtually no code dupication. For instance, to get the data based on it's numeric Id, I can add the following function:

        private IEnumerable GetById(int id)
        {
            return GetBaseQuery().Where(c => c.Id == id);
        }

Let's say I have more complex filtering, where I may not always need to test something:

        private IEnumerable GetByCode(string[] validCodes)
        {
            if (validCodes.Length > 0) {
                return GetBaseQuery().Where(c => validCodes.Contains(c.Data));
            } else {
                return GetBaseQuery();
            }
        }

This allows me to quickly build variable queries that, in the case of LINQ-to-SQL, will always execute on the SQL Server. SQL is good at filtering, particularly over keyed values, so you're always going to have better performance doing as much filtering on the SQL Server, even before you count the reduced data coming over the wire.

My point is simple: the Query syntax of LINQ is a bit misleading. It suggests that queries are things that are built once and used as they are. However, they are in fact mercurial things, which can be built on and extended and made better as you work with them, and seeing that is a lot clearer in the Lambda syntax. Using LINQ in this fashion will make your code far more maintainable and easier to extend, since you'll be working with common building blocks, and you'll be avoiding that copy-paste mentality that can ofter pervade hard-to-maintain codebases.

Plus, there are some functions, like Aggregate, that don't exist in Query syntax. In my current project, I need to occasionally filter by course level, ie 100-level, 200-level, etc. The easiest way to do this, it turns out, is to test the first character of each course against a list of approved courses. In addition, there is a requirement that one choice, actually maps out to all 600-, 700-, and 800- level courses.

        private IEnumerable&lt;LocalDataType> GetByCourseLevel(IEnumerable&lt;int> courseLevel)
        {
            //Course Level is a list of ints 100, 200, 300, 400, etc, fed in based on the selections off a HTML form.
            string courseLevelWorking = courseLevel == null ? string.Empty : courseLevel.Aggregate&lt;int, string>(string.Empty, (workingString, next) => {
                if (next == 600) {
                    return workingString + "678";
                } else {
                    return workingString + (next / 100).ToString();
                }
            });
        
            var query = GetBaseQuery();
            if (courseLevelWorking != string.Empty) {
                query = query.Where(c => courseLevelWorking.Contains(c.CourseNumber.ToString()[0]));
            }
            return query;
        }

LINQ allows you to build amazingly complex queries in an additive and clean fashion, and taking your LINQ usage from simple, to complex yet maintainable requires very little work, with just a bit of forethought.