Friday, March 6, 2009

DevExpress XPO - incomplete Linq support

In general, XPO is great. I like it because of two features:

  • the ability to automatically update the database schema when connecting from the new version of the object model
  • the default "mark-as-deleted" way of removing objects 

There's however one scar on its beautiful face - incomplete Linq support.

No matter how fancy the Linq-To-Whatever implementation is - without Skip and Take it can never be fully adopted in enterprise systems. And XPO's implementation of Linq lacks the support of the Skip operation.

I am afraid that this is because XPO's implementation of paging is flawed by design: if you forget about Linq for a second, the only way to paginate is to use the XPPageSelector over your collection.

   1: /* initial collection*/
   2: XPCollection<TheType> ds = new XPCollection<TheType>(TheSession);
   3:  
   4: /* paging */
   5: XPPageSelector ps = new XPPageSelector(ds);
   6:  
   7: ps.PageSize = PageSize;
   8: ps.CurrentPage = PageNumber;
   9:  
  10: return ds;

If you run the above code and in the same time you trace queries which are executed, you'll learn that:



  1. first, the whole collection of objects' identifiers is retrieved from the data source

  2. the page selector selects the identifier subset which refers to the page you select

  3. then, another query retrieves all the columns but only from the selected page of items


   1: /* first query sent by xpo to retrieve all the data */
   2: select N0."OID" from 
   3: "dbo"."X_RejestrDostepu" N0 
   4: where N0."GCRecord" is null order by N0."Data" asc
   5:  
   6: /* xpo engine selects identifiers of objects in selected page */
   7:  
   8: /* another query - 
   9:    selected page is retrieved by manually providing identifiers 
  10: */
  11: exec sp_executesql N'select 
  12: N0."GCRecord",N0."OID", .... ,N0."OptimisticLockField" 
  13: from "dbo"."X_RejestrDostepu" N0
  14: where (N0."GCRecord" is null and N0."OID" 
  15: in (@p0,@p1,@p2,@p3,@p4,@p5,@p6,@p7,@p8,@p9))',
  16: N'@p0 int,@p1 int,@p2 int,@p3 int,@p4 int,@p5 int,@p6 
  17: int,@p7 int,@p8 int,@p9 int',@p0=31,@p1=32,@p2=33,@p3=34,@p4=35,@p5=36,@p6=37,@p7=38,@p8=39,@p9=40

And if you realize that there are in fact two queries required to retrieve selected page of data, then you imagine how ineffective (difficult?) this could be when implementing Linq's Skip this way!


It's then no surprise that the only answer to the "when the Skip operator will be supported?" is "sorry, Skip is not supported":


http://community.devexpress.com/forums/p/72977/248858.aspx


Edit: if you find this post in early 2010 or later please consider it outdated. Both skip and take are now correcly handled by XPO's Linq provider. I've blogged about it.

2 comments:

George said...

Wiktor,

I happen to run across your comments regarding the DevExpress XPO. I always enjoy reading the views of others to either learn from them or to reinforce my thoughts and practices.

I was curious as to why the approach that DevExpress uses regarding Pagination is "flawed" by design and also considered inefficient (difficult)?

I've seen people implement many different pagination approaches to get the best performance when dealing with really large amounts of data. The approach that DevExpress uses is one that I've used for quite sometime.

I would agree with your remark about (difficult) because this approach typically doesn't allow you to take advantage of many of the neat features offered by various web controls/grids that offer you the ability to perform aggregate functions on the entire resultset. Since all of the data is not being returned on the first select, the aggregate methods aren't able to act on the entire dataset.

A nicely architected data access layer can solve this problem while still implementing the same pagination approach.

As for the efficiency, in my mind when dealing with large datasets, I would think that only returning the columns for only the rows required by the application would be extremely efficient, reduce memory consumption and reduce the overall bandwidth used.

Like I said at the beginning, I enjoy reading the opinions and thoughts of other developers so that I can evaluate my thoughts and maybe learn something new from a different angle.

I hope to hear your comments soon.

Sincerely,
George

Wiktor Zychla said...

George,

thanks for your comments.

As for the pagination, most ORMs I know use typical "select top X from Table where ID not in (select top Y)" or "select ... where ROW_NUMBER between X and Y" approach. thus, the database server returns EXACTLY these rows that are requested by the application.

With XPO approach, consider a table with 1 billion of rows and you ask for rows between 5 and 10. First, 1 billion of IDs is transfered to your application and then, another query asks for 5 specific results. This is extremely inefficient.

The two-selects approach devex uses with their pagination also means that Linq's Skip cannot be implemented because the result of the Linq expression should always be a single SQL clause. And this holds up to now, we have XPO 2010 released few days ago and Skip is still missing.

I have no idea why pagination should not be implemented with the typical approach I present at the beginning, the route taken by NHibernate, just to select a close competitor for XPO. NHibernate also targets plenty of different database systems but pagination is implemented using top-top / row_number approach.

Thanks in advance for your comments on this.