Working on an architecture framework for a client of Liberty Fox, I discovered a little known caveat of the Entity Framework. If you implement the repository pattern (or any data access methodology), then I am sure you’ve created a method to retrieve data from the data source based on some business rules.
For Example: public IEnumerable<User> GetActiveUsers()
This method, as you may have guessed, returns an enumerable of the active users in the system. Now, because of the simplicity of LINQ, you would think you could do something like this in your presentation layer:
var users = repository.GetActiveUsers().Skip(10).Take(10)
This snippit of code would work just fine. You would end up retrieving users 11 through 20 in the result-set of active users. Now, being that LINQ is intelligently designed, you would think that it would tack-on the Skip() and Take() to the underlying SQL query and limit the result-set returned from the database, but you run the SQL profiler and you see:
SELECT UserId, UserName, LastActiveDate FROM Users
Oh No! Where is the “TOP(10)
“? Where is the “WHERE rowcount() > 10
“? I know LINQ is capable of this! I have seen it work before in my unit tests and in others’ example code. What’s going on?
Well, I spent an unnecessary amount of time figuring this out so that you don’t have to… As efficient developers, we’re taught to program against interfaces, and specifically, the least responsible interface to get the job done. To further clarify, if I’m working with a List
of users, I may pass it to a method as an IEnumerable
if I only need to foreach
over it, or if I need to run a for
loop with it, I may pass it as an ICollection
. So therefore, its not uncommon to want to return the result of my repository method as an IEnumerable
. The problem is that when you downcast a Linq-to-SQL query to an IEnumerable
, it immediately becomes a Linq-to-Objects query and all further filtering is done in memory. Yikes! So if you had a method that returns an IEnumerable
of all the users in the system and you use Linq to tack-on a filter, you’d expect that to be tacked-on as a WHERE clause in the underlying SQL query, but it won’t! It will load the entire table into memory and then do the filtering. This is extremely inefficient!
So, after my long rant of how I spent hours figuring this out, how can you avoid the same fate? Pass the result of your repository method as an IQueryable<User>
.
For Example: public IQueryable<User> GetActiveUsers()
In your presentation layer, you will still be able to work with it similar to how you would have if it was an IEnumerable
, but now any further limiters WILL be tacked-on to the SQL query, maintaining efficiency. I hope this saves somebody the time I lost. Cheers! 🙂
I would say that this is not a problem with returning IEnumerable(Of T) vs. IQueryable(Of T) so mach as it’s a problem that the method used to return data from the repository did not provide a way for you to specify what to return.
Your method returns all active users without the ability to restrict any further. Maybe returning all active users is appropriate in some scenarios. But in others, it’s detrimental to performance as you mentioned. You should have, therefore, implemented another method on your repository: IEnumerable(Of T) GetActiveUsers(int pageIndex, int numItemsPerPage). This method on the repository would then construct the appropriate LINQ statement to get only ‘numItemsPerPage’ of items for the specified ‘pageIndex’, only returning the needed/requested data. In this way, your repository becomes the single point of access to your data–as it should be. By returning IQueryable(Of T), you’re allowing data access concerns to be leaked into a higher lever layer–probably the business logic/services layer–which shouldn’t be concerned with the “How” of data retrieval at all, just the “what” aspect of the data to be retrieved.
Hello Craig and thank you for responding. You pose a very valid argument, but in rebuttal one could argue the note of scalability and maintainability. Once you add paging and sorting into the mixture, you can end up with a significant amount of overloaded methods in your logic (services) layer and end up with a maintainability nightmare in the sense of refactor or major changes. Additionally, one could argue that the logic itself should not be concerned with how many items to return at a time, or the fact that there are “pages” to be served at all. For example, on a desktop device you may want 50 items per page, but on a mobile device, you may want to implement “endless scrolling” by returning 10 elements at a time in which sense there is no “page” per say. Additionally, the same sorting and paging functionality is available to both the IQueryable and IEnumerable interfaces, so we are not exposing any additional sorting or paging functionality and an IEnumerable can be brought “back” to IQueryable (or turned into) by using the method AsQueryable(). (Although an argument could be made that using reflection you can access the underlying data provider.) Either way, each architect has their own preference of security (separation of concerns) vs. maintainability and in our scenario, this was an interesting and under-documented caveat that we stumbled upon.
Thank you for your feedback and opinion. Cheers!