Tuesday, March 19, 2013

SQL Row_Number() OVER and how to cure indeterminate sort order

There's been plenty of wibbling on Stack Overflow about how paging and sorting at SQL Server level using the Row_Number() function can produce records ordered unpredictably between repeated executions of the same query; perhaps you wouldn't normally do this, perhaps you would and as required by my scenario, I did.

And saw it myself - the variations weren't great but they were there and that's annoying.

I checked and double-checked my stored procedure - it was doing what it was supposed to.

Came across this via Stack Overflow - http://msdn.microsoft.com/en-us/library/ms186734.aspx - and I'll repeat the most pertinent part here: -

There is no guarantee that the rows returned by a query using ROW_NUMBER() will be ordered exactly the same with each execution unless the following conditions are true: -

  • Values of the partitioned column are unique.
  • Values of the ORDER BY columns are unique.
  • Combinations of values of the partition column and ORDER BY columns are unique.

How to fix this ?

So - Microsoft themselves admit this is a problem. There is of course a solution, and mine was a table variable.

  InvoiceID INT,
  RowNumber BIGINT,
  TotalRowCount BIGINT

Because the actual results my query returns contained a number of identical field values in many cases (same Customer Name, same Status.. etc) the three rules outlined in the MSDN article weren't being met; hence the problem. You can guarantee the order by stripping the columns returned from the actual query back to a bare minimum - a key field (InvoiceID in my example), the RowNumber itself and (optionally) the result of the partition column expression.

Having defined our table, I then run the complete query but returning only those 3 fields needed: -

INSERT INTO @IDTable (InvoiceID, RowNumber, TotalRowCount)
  Row_Number() OVER
    WHEN @SortExpression = 'DateCreated'  
     THEN  i.DateCreated

    -- snip - long list of fields to order by

  ) AS RowNum,
  COUNT(*) OVER(PARTITION BY 1) as TotalRowCount 

  FROM ..
  WHERE ..

 ) results

 WHERE RowNum 
                BETWEEN (@PageIndex * @PageSize + 1) 
                AND     ((@PageIndex * @PageSize) + @PageSize)

That deals with correct ordering of results; now run another query to join back to @IDTable, this time also retrieving all the coulmns needed: -

 SELECT i.InvoiceID, c.CustomerName, .. etc
 FROM @IDTable idt
  INNER JOIN Invoice i WITH (NOLOCK) ON idt.InvoiceID = i.InvoiceID
  INNER JOIN Customer c ..

    -- and so on 

Ok so there's some overhead involved with running two queries as opposed to one; but the execution time is still fast and it gives me exactly what I want. You should have seen what was happening before..

No comments:

Post a Comment