Monday, January 21, 2008

LINQ for Beginners

Someone asked me about what was on my MSN Messenger status and explained to him what I was doing. Apparently, I was doing LINQ and had it as my MSN status. LINQ stands for Language INtegrated Query, is a condename for a project for a set of extensions to the .NET Framework that encompasses language-integrated query, set and transform operations. It extends C# and VB with native language syntax for queries and provides class libraries to take advantage of these capabilities, available only in .NET Framework 3.5 (this simply means if you want to write LINQ queries, they have to be using the correct framework). Now, what does that mean to developers? The fact that queries are usually expressed in a specialized query language for different data sources makes it difficult for developers to learn a query language for each data source or data format that they must access. This is what LINQ is all about. It simplifies data access by providing a consistent model for working with data across various kinds of sources and formats. In LINQ, data is translated into objects, something that developers are more comfortable at working with. Understanding LINQ will give us an idea of its capabilities and its benefits (I'll save the cons from an enterprise DBA's perspective at a future post).

To understand LINQ, we need to know the basic parts of a query operation; namely, obtaining the data source, creating the query and executing the query. This is simply generic - any access to a data source will definitely have to do these steps.

class LINQBasics
{
static void Main()
{

//Obtaining the data source
string[] names = {"Charlie", "Joe", "Yia Wei" , "Bob", "Mike"};

// Create the query

// query is an IEnumerable
var query = from name in names
where name.Contains("i")

orderby name
select name;

// Execute the query

foreach (string name in query)
{
Console.Write(name);
}
}
}


Looking at the code above, the first thing that we need to do is to have a data source. In this case, it's an array of string which supports the generic IEnumerable(T) interface. This makes it available for LINQ to query. A queryable type does not require special modification to serve as a LINQ data source so long as it is already loaded in memory or else you would have to load it into memory so LINQ can query the objects. This is applicable to data sources like XML files. Next, is the query. A query specifies information to retrieve from the data source. If you are familiar with SQL, you know what this looks like - the kind which includes select, from, where and the likes. Looking at the code above, you'll notice that its not like your typical SQL statement as the from clause appeared before the select clause. There are a couple of reasons for this. One, this adheres to the programming concept of delaring the variable before using it. Also, from the point of view of Visual Studio, this makes it easy to provide the IntelliSense feature using the dot (.) notation as the variable has already been declared and that the framework has already inferred the correct type to the object, thus providing the appropriate properties and methods, making it easy for the developers to write their code. Let's look at how the code was constructed. The from clause specifies the data source, in this case, the names collection. The where clause applies the filter, in this case, the list of all elements in the collection containing the letter "i." The select clause specifies the type of the returned elements. This means that you can create an instance of the elements in your collection. An example could be creating an instance of an object with fewer attributes. The query variable, query, just stores the information required to produce the results when the query is executed maybe at a later point. Simply defining the query variable does not return any data nor takes any action. The third component of the code above is query execution. Like I said, the query variable does not contain any data but rather simply contains only the query commands. The actual execution of the query is when we iterate over the query variable. There are a couple of ways to do this. One of which is shown above. The use of a foreach statement iterates thru the query variable and execute it as well. This concept is called deferred query execution. This is very much important when dealing with data sources such as highly-transactional database systems as you minimize connecting to the database unless necessary (database connections are additional resources on the database server as well). You can opt to execute the query immediately by using aggregate functions such as Count, Max, Average and First or calling the ToList() or ToArray() methods. Another way is to bind the collection to a data-bound control in either a web or windows form control similar to how we do it in previous versions, specifying the DataSource property of the control to be the query variable and calling the DataBind() method.

One other thing to highlight is the use of the keyword var, which is a new keyword introduced in C# 3.0. What this does it it looks at the value assigned to the variable and determines and sets the appropriate one. This concept is called type inference. From the code above, the query variable, query, appears to be an array of string. So the compiler will automatically assume that it is a variable of type IEnumerable. This is helpful if you do not know the variable type during runtime. But this does not mean that any type can be assigned to the variable after the initial assignment - something like a dynamic type - since .NET is a strongly typed language platform. This simply means that an object can take on a different type and the compiler can simply handle that. Assigning a different type to an already existing one violates the concept of polymorphism in object-oriented programming. Let's say you assign the value 12 to the query variable, query. This will throw a type conversion exception as the original type of the variable is a string collection.

This is just a tip of the iceberg for LINQ. There are a lot of reqources out there for LINQ to SQL, LINQ to XML, LINQ to Objects, LINQ to Entities and LINQ to DataSets. I'll try to post more examples to make programming in LINQ a bit more appealing to developers.

This article is also posted on the MSSQLTips.com site

2 comments:

Anonymous said...

What a brilliantly written article, I always wondered what the fuss was about LINQ, and after reading and tryibng out the sample code, it totally makes sense.

Thanks
Mohamed

bassplayer said...

Thank you so very much for visiting my blog. If you have questions or feel like you want a specific topic posted on my blog, feel free to post your comments anytime and I'd be glad to write one for you.

Google