
Improving LINQ for working with IReadOnly collections
As you know, when using the IEnumerable <> interface where the collection is implied, problems can occur (see, for example, Problems using IEnumerable and LINQ against LSP ). Fortunately, in .NET v4.5 in 2012 (a bit late, but better late than never), the IReadOnlyCollection <> , IReadOnlyList <> , IReadOnlyDictionary <> interfaces appeared (hereinafter I will generically call them IReadOnly interfaces). Unlike IEnumerable <> , IReadOnly-interfaces make it possible to sufficiently and without unnecessary requirements indicate the functionality of the collection, which allows them to be recommended for use instead of IEnumerable <> wherever the collection is meant to be read. But here there is one difficulty. One of the important components that consumes and creates collections is LINQ and, especially, its part “LINQ to objects”. Unfortunately, IReadOnly interfaces appeared 5 years after LINQ, and are not used in it. All input and output collections of LINQ operations are of the base type IEnumerable <>, based on the limited capabilities of which, many operations involve extra costs: a complete sequential search or even the creation of intermediate copies of input collections. Moreover, returning the same IEnumerable <> from operations , LINQ requires, when using the result further, to again use exhaustive search and create intermediate copies. In this regard, I have long had the idea to “make friends” LINQ with IReadOnly interfaces.
Found on the vast Internet development on this topic did not satisfy me. For example, the Linq to Collections library offers an effective replacement of only some LINQ operations for the IReadOnlyList <> interface only., poorly optimized, and for some reason forcibly adds many ambiguous extension methods for the basic types. Another found project on this topic, CountableSharp , offers a small number of optimized LINQ operations only for the IReadOnlyCollection <> interface , in which a decorator is returned that delegates all calls to the base System.Linq.Enumerable and only the Count property is calculated in advance, without requiring a full iterating over the collection.
Further, I suggest using my Collection.LINQ library , which optimizes LINQ to objects by providing an efficient implementation of most operations for collections that implement IReadOnly-interfaces.
First of all, I will allow myself a small digression, which allows me to better understand and use the work I have done. As many of you have probably noticed, the IReadOnly series of interfaces clearly lack an interface for sets. Therefore, we will create it ourselves in such an obvious form:
The only thing that is ambiguous here is the finiteness of the set associated with the inheritance of IReadOnlyCollection <> . Alternatively, you could create an intermediate interface of an infinite set of IReadOnlySet <> inherited from IEnumerable <> . However, there is no practical need for it, since infinite sets are only of academic interest and it is difficult to imagine where they can be used. All sets existing in the base class library already implement the methods necessary for IReadOnlyFiniteSet <> , so there will be no problems with its implementation. Next, I will talk about optimizing LINQ operations, including for the IReadOnlyFiniteSet interface <>hoping that someday it will be made part of the base library.
So, we will consider what operations are implemented in Collection.LINQ and how exactly they increase the efficiency of “LINQ to objects”.
We start by specifying those operations where optimization is not required. The aggregation methods Aggregate (), Average (), Min (), Max (), Sum () do not need any optimization because IEnumerable <> is a necessary and sufficient interface for them ., but they return the value, not the collection. For the same reason, the methods All (), Any (), Count (), LongCount (), First (), FirstOrDefault (), Last (), LastOrDefault (), Single (), SingleOrDefault (), do not need to be optimized. taking a predicate filter parameter. The presence of a predicate dictates the need for exhaustive search, for which IEnumerable <> is sufficient . Obviously, the ToArray (), ToList (), ToDictionary (), ToLookup () methods also do not require optimization because they semantically imply a complete enumeration and creation of a copy. Only creating an array in the ToArray () method can be slightly optimized by knowing in advance the number of elements in the collection.
The methods for creating the Empty (), Range (), and Repeat () collections require one obvious optimization: they should return a non-basic IEnumerable <>, and the specific IReadOnly is the interface.
Now about the main optimization. In operations that return a collection, the result is created as a decorator, which delegates calls to members directly to the input collections, without preliminary iterations and creating copies of elements. For this optimization to work when applying several operations in sequence, it is also important that the returned collections are also an IReadOnly type. A similar internal optimization has already been implemented in LINQ to Objects: many methods first of all check the implementation of some interfaces by the input collection, and use them to perform actions more efficiently. But of course, iReadOnly-interfaces (which did not exist at the time LINQ appeared) are not used, and the returned collection always has only the base type IEnumerable <> . In my library, direct decoration optimization is used in the following LINQ operations.
Unfortunately, for some LINQ operations, the functionality of IReadOnly collections cannot add efficiency. These are operations that result in a collection obtained by filtering the values of elements in input collections. There is no way to do without a complete search and copying. The filtering operations SkipWhile (), TakeWhile (), Where (), as well as the methods of conditional grouping and joining Join (), GroupJoin (), GroupBy (), SelectMany (), remain not optimized.
Additionally, I will mention several optimizations in my Collection.Linq that are not related to IReadOnly interfaces:
To make sure that I did not miss anything and did more efficiently than the library System.Linq.Enumerable , I constantly looked at its source .
Using the Collection.LINQ library is easy: just enter your code line by line
add line
It doesn’t matter how you use LINQ to objects, in the form of query syntax or in fluent form. After connecting my library, your LINQ code will automatically use the most optimal methods, provided that collections that implement IReadOnly interfaces are input . The only exception is the methods for creating collections of the System.Linq.Enumerable class , which are not extension methods: Enumerable.Empty (), Enumerable.Range (), and Enumerable.Repeat (). These methods will need to be manually replaced with ReadOnlyFiniteSet.Empty () / ReadOnlyList.Empty (), ReadOnlyFiniteSet.Range () / ReadOnlyList.Range () and ReadOnlyList.Repeat (), respectively.
The entire library is presented in a public project on github . At the request of readers also creatednuget package with library .
Found on the vast Internet development on this topic did not satisfy me. For example, the Linq to Collections library offers an effective replacement of only some LINQ operations for the IReadOnlyList <> interface only., poorly optimized, and for some reason forcibly adds many ambiguous extension methods for the basic types. Another found project on this topic, CountableSharp , offers a small number of optimized LINQ operations only for the IReadOnlyCollection <> interface , in which a decorator is returned that delegates all calls to the base System.Linq.Enumerable and only the Count property is calculated in advance, without requiring a full iterating over the collection.
Further, I suggest using my Collection.LINQ library , which optimizes LINQ to objects by providing an efficient implementation of most operations for collections that implement IReadOnly-interfaces.
First of all, I will allow myself a small digression, which allows me to better understand and use the work I have done. As many of you have probably noticed, the IReadOnly series of interfaces clearly lack an interface for sets. Therefore, we will create it ourselves in such an obvious form:
public interface IReadOnlyFiniteSet : IReadOnlyCollection
{
bool Contains (T item);
}
The only thing that is ambiguous here is the finiteness of the set associated with the inheritance of IReadOnlyCollection <> . Alternatively, you could create an intermediate interface of an infinite set of IReadOnlySet <> inherited from IEnumerable <> . However, there is no practical need for it, since infinite sets are only of academic interest and it is difficult to imagine where they can be used. All sets existing in the base class library already implement the methods necessary for IReadOnlyFiniteSet <> , so there will be no problems with its implementation. Next, I will talk about optimizing LINQ operations, including for the IReadOnlyFiniteSet interface <>hoping that someday it will be made part of the base library.
So, we will consider what operations are implemented in Collection.LINQ and how exactly they increase the efficiency of “LINQ to objects”.
We start by specifying those operations where optimization is not required. The aggregation methods Aggregate (), Average (), Min (), Max (), Sum () do not need any optimization because IEnumerable <> is a necessary and sufficient interface for them ., but they return the value, not the collection. For the same reason, the methods All (), Any (), Count (), LongCount (), First (), FirstOrDefault (), Last (), LastOrDefault (), Single (), SingleOrDefault (), do not need to be optimized. taking a predicate filter parameter. The presence of a predicate dictates the need for exhaustive search, for which IEnumerable <> is sufficient . Obviously, the ToArray (), ToList (), ToDictionary (), ToLookup () methods also do not require optimization because they semantically imply a complete enumeration and creation of a copy. Only creating an array in the ToArray () method can be slightly optimized by knowing in advance the number of elements in the collection.
The methods for creating the Empty (), Range (), and Repeat () collections require one obvious optimization: they should return a non-basic IEnumerable <>, and the specific IReadOnly is the interface.
Now about the main optimization. In operations that return a collection, the result is created as a decorator, which delegates calls to members directly to the input collections, without preliminary iterations and creating copies of elements. For this optimization to work when applying several operations in sequence, it is also important that the returned collections are also an IReadOnly type. A similar internal optimization has already been implemented in LINQ to Objects: many methods first of all check the implementation of some interfaces by the input collection, and use them to perform actions more efficiently. But of course, iReadOnly-interfaces (which did not exist at the time LINQ appeared) are not used, and the returned collection always has only the base type IEnumerable <> . In my library, direct decoration optimization is used in the following LINQ operations.
for collections of type IReadOnlyCollection <> | |
Any (), Count (), LongCount () | immediately return the result using the Count property |
DefaultIfEmpty () | returns IReadOnlyCollection <>, depending on the Count property, the source or single-element |
Select (), Zip () | return IReadOnlyCollection <> - decorator |
Skip (), Take () | return IReadOnlyCollection <>, depending on the Count property, the original, empty or decorator |
Concat () | returns IReadOnlyCollection <>, depending on the Count properties one of the source or a decorator |
Reverse () | returns IReadOnlyCollection <>, depending on the Count property, the source or a decorator that creates a full copy of the collection upon request of the enumeration |
OrderBy (), OrderByDescending (), ThenBy (), ThenByDescending () | return an IReadOnlyCollection <>, depending on the Count property, the source or a decorator that creates a full copy of the collection and a sorted index when requesting an enumeration |
for sets of type IReadOnlyFiniteSet <> (in addition to what is available for IReadOnlyCollection <> collections ) | |
Contains () | immediately returns the result using the Contains () method |
Distinct () | returns the original IReadOnlyFiniteSet <> - set |
Except (), Intersect (), Union () | return IReadOnlyFiniteSet <>, depending on the Count property one of the source ones, empty or a decorator, when creating iterates over the smaller of the input sets |
DefaultIfEmpty () | returns IReadOnlyFiniteSet <>, depending on the Count property, the source or single-element |
Reverse () | returns IReadOnlyFiniteSet <>, depending on the Count property, the source or a decorator that creates a full copy of the set |
for lists of type IReadOnlyList <> (in addition to what is available for collections IReadOnlyCollection <> ) | |
ElementAt (), ElementAtOrDefault (), First (), FirstOrDefault (), Last (), LastOrDefault () | immediately return the result using IReadOnlyList <> methods |
DefaultIfEmpty () | returns IReadOnlyList <>, depending on the Count property, the source or single-element |
Skip (), Take (), Select (), Concat (), Zip (), Reverse () | return IReadOnlyList <> - decorator |
OrderBy (), OrderByDescending (), ThenBy (), ThenByDescending () | return IReadOnlyList <>, depending on the Count property, the source or decorator that creates a sorted index to delegate the receipt of elements by position number and enumeration |
Additionally, I will mention several optimizations in my Collection.Linq that are not related to IReadOnly interfaces:
- The LINQ SequenceEqual () operation is implemented as a direct delegation to the Equals () method of the IStructuralEquatable interface for collections that implement this interface (which appeared 3 years later LINQ).
- Significantly reduced the number of method overloads due to the widespread use of optional parameters (which appeared a year later LINQ).
- An obviously missing set operation is added - a symmetric difference in the form of the SymmetricExcept () method .
To make sure that I did not miss anything and did more efficiently than the library System.Linq.Enumerable , I constantly looked at its source .
Using the Collection.LINQ library is easy: just enter your code line by line
using System.Linq;
add line
using BusinessClassLibrary.Collections.Linq;
It doesn’t matter how you use LINQ to objects, in the form of query syntax or in fluent form. After connecting my library, your LINQ code will automatically use the most optimal methods, provided that collections that implement IReadOnly interfaces are input . The only exception is the methods for creating collections of the System.Linq.Enumerable class , which are not extension methods: Enumerable.Empty (), Enumerable.Range (), and Enumerable.Repeat (). These methods will need to be manually replaced with ReadOnlyFiniteSet.Empty () / ReadOnlyList.Empty (), ReadOnlyFiniteSet.Range () / ReadOnlyList.Range () and ReadOnlyList.Repeat (), respectively.
The entire library is presented in a public project on github . At the request of readers also creatednuget package with library .