Tuesday, March 19, 2013

SQL Row_Number() OVER and how to cure indeterminate sort order

There's been plenty of wibbling on Stack Overflow about how paging and sorting at SQL Server level using the Row_Number() function can produce records ordered unpredictably between repeated executions of the same query; perhaps you wouldn't normally do this, perhaps you would and as required by my scenario, I did.

And saw it myself - the variations weren't great but they were there and that's annoying.

I checked and double-checked my stored procedure - it was doing what it was supposed to.

Came across this via Stack Overflow - http://msdn.microsoft.com/en-us/library/ms186734.aspx - and I'll repeat the most pertinent part here: -

There is no guarantee that the rows returned by a query using ROW_NUMBER() will be ordered exactly the same with each execution unless the following conditions are true: -

  • Values of the partitioned column are unique.
  • Values of the ORDER BY columns are unique.
  • Combinations of values of the partition column and ORDER BY columns are unique.

How to fix this ?

So - Microsoft themselves admit this is a problem. There is of course a solution, and mine was a table variable.

DECLARE @IDTable TABLE
(
  InvoiceID INT,
  RowNumber BIGINT,
  TotalRowCount BIGINT
)


Because the actual results my query returns contained a number of identical field values in many cases (same Customer Name, same Status.. etc) the three rules outlined in the MSDN article weren't being met; hence the problem. You can guarantee the order by stripping the columns returned from the actual query back to a bare minimum - a key field (InvoiceID in my example), the RowNumber itself and (optionally) the result of the partition column expression.

Having defined our table, I then run the complete query but returning only those 3 fields needed: -

INSERT INTO @IDTable (InvoiceID, RowNumber, TotalRowCount)
 SELECT * FROM
 (
  SELECT DISTINCT i.InvoiceID,
  Row_Number() OVER
  (
   ORDER BY
   CASE
    WHEN @SortExpression = 'DateCreated'  
     THEN  i.DateCreated
   END ASC,

    -- snip - long list of fields to order by

  ) AS RowNum,
  COUNT(*) OVER(PARTITION BY 1) as TotalRowCount 

  FROM ..
   
  WHERE ..

 ) results

 WHERE RowNum 
                BETWEEN (@PageIndex * @PageSize + 1) 
                AND     ((@PageIndex * @PageSize) + @PageSize)
 ORDER BY RowNum

That deals with correct ordering of results; now run another query to join back to @IDTable, this time also retrieving all the coulmns needed: -

 SELECT i.InvoiceID, c.CustomerName, .. etc
 FROM @IDTable idt
  INNER JOIN Invoice i WITH (NOLOCK) ON idt.InvoiceID = i.InvoiceID
  INNER JOIN Customer c ..

    -- and so on 



Ok so there's some overhead involved with running two queries as opposed to one; but the execution time is still fast and it gives me exactly what I want. You should have seen what was happening before..

Wednesday, March 6, 2013

Sitecore - Illegal characters in path

Meh.


        /// checks for the existence of illegal characters in the URL, these being " < > | and SPACE
        /// for some reason Sitecore's Sitecore.Pipelines.PreprocessRequest.FilterUrlExtensions pipeline
        /// component doesn't handle the exception
        ///
        /// FilterUrlExtensions.Process > Sitecore.Web.RequestFilter.Process > RequestFilter.IsBlocked > RequestFilter.GetExtension
        ///     > System.IO.Path.GetExtension > System.IO.Path.CheckInvalidPathChars = where it goes bang, and isn't handled
        ///  
        /// This component fixes this behaviour by stripping illegal chars and then re-writing the URL using HttpContext.RewritePath
        ///
        /// Note that for some reason, if IIS Static File Compression is enabled for the site, this can cause resources to
        /// be returned to the browser with the correct encoding type (GZIP) but the browser fails to parse them
        ///
        /// Don't know why this is
        ///
        /// For this reason, the URL is only rewritten if the request URL contains these illegal chars


Let's just say this has caused problems with custom pipeline components I have worked on.

Friday, May 11, 2012

Diagnosis vs Design

This is a subject close to my heart, and my abilities too.

I am ok at diagnosis, but by no means a ninja - and sometimes as a solution architect you're going to need to diagnose a problem. Some are easy. Some are not.

I am from the design school of systems architecture which means doing it right in the first place. I can hold court on OO design til the cows come home, and I can design you a domain model that will model your system's data and - most importantly - its behaviour. Cleanly, concisely and to the point of your requirement.

I can do this across a huge range of technologies - web front end, Sitecore, Kentico, WCF, ServiceStack, Biztalk, MSMQ, SQL, Oracle. The building blocks of a lot of .NET solutions.

What I'm not so good at - compared to the above - is diagnosing where and how things are going wrong in an existing solution. I tip my hat to a select few who can do this better than me.

I can diagnose things that are wrong to a certain degree, and you can and should too. Today I had to sort out a non-working ASP.NET app. Here's what I did, in 10 minutes: -

  • Installed the IIS6 metabase compatibility components
  • Ran aspnet_regiis /i to install ASP.NET
  • Set the App Pool's identity to Local System
  • Set permissions on folders
This got the guy going, and I revelled slightly in my ability to spot what was wrong and correct it.

But problems I've seen in the past (and recently) have been a whole lot hairier and have required more tools, and more knowledge.
  • .NET Reflector - this thing is a life saver in so many situations. What does this DLL do ? Let's find out..
  • SQL Profiler - now this one I do know well. What's being sent to the server ? Answer in confidence. SQL Profiler is your friend
  • ANTS Memory Profiler - nice, but a bit bloated. Handy, but the daddy is...
  • WinDebug - see the Matrix as it truly is. Here is where your memory is being allocated. And why
I'll tip my hat to some colleagues of mine past and present who can do this diagnosis phase of systems engineering way better and quicker than I can: -

A guy I knew at Datacom who showed me The Way regarding debugging a live app using WinDebug

Another guy at Datacom who worked for me on an app of unfathomable architecture, and is the only person I know to date who got WS-AT working over MS DTC. We spent a pleasant evening, actually entire day and  night, debugging a WCF sync service that if I'd had more say would never have existed.. another story ..

A person I work for now, who can cut razor-like to the cause of and solution to many problems using the weapons in his arsenal, which are mostly Reflector but he also brings his networking knowledge to bear on things I don't get. Like Nick, an awesome troubleshooter

Working with all of these dudes has humbled me when finding problems. These guys can pick the bones of a solution to the n-th degree and tell me what is and isn't right.

I can do this to a degree, but I think I have a different mindset. I design. I usually do it well.

Both skillsets are needed when doing big stuff. 

Which are you ? Can you do both at the outer edges of ability ? I'm not sure anyone can.




Be careful what you delete for...

Be very careful deleting stuff from a Sitecore Analytics database before you hand it over to your hosting company.

The usual approach is to clear the Page, PageEvent and Session tables because these will be full - many months worth of team dev, in my case - of locally registered page views. BUT - Don't get overzealous and delete the contents of the IPOwner table.

 Sitecore Analytics refused to do anything when I did this. Required inspection of the logs, which revealed a problem with some record not actually existing in IPOwner, even though a cursory inspection showed that this logged unique IP's for a session.

Delete these with extreme caution... because analytics stops working, period !

Tuesday, April 17, 2012

Javascript Frameworks

Richard sent this list of Javascript / Web frameworks around the office today.

The stupendously named batman.js caught my eye, as I am prone to following daft names, although my knowledge of Coffeescript is not great (I've had a play).

However, the even more daftly named Mustache and it's big brother Handlebar.js (where do people get these names from ?) are more my thing and I am playing with them now. More to follow.

Initial Thoughts on RavenDB

It's good.

More to follow :)



But seriously, it's good and I'm scratching around the surface at the moment. I prefer its simpler-use approach to document (i.e. object) storage compared to Mongo. I like the fact it's natively JSON and I can serialize an entire object graph in one go.

I like the fact that I can embed it it an application. As in - embed the server.

Rumour is that Sitecore 7.0 is going to support it. Interesting. We at 5 Limes may find a use for it sooner rather than later.

And I'm currently exploring the way it implements Map/Reduce indexes

More to follow. Much more probably - this is my R & D subject at work, and I have to present it next week. Wonder if I can use it to present CQRS to the team in my next project ?

Friday, February 10, 2012

nHibernate 3.0 DateTimes

Ok, so - this sucks.

I want to be able to filter date ranges in an NH 3.0 query, and I don't want to have to first scan the whole table and then feed the results through LINQ to get my range. This table could contain millions of rows and my query will be hit on each request of a certain page.

This works: -

public IList InvoiceSearch(InvoiceSearchCriteria criteria)
        {
            IList list = null;

            using (var session = SessionFactory.OpenSession())
            {
                var query = session.QueryOver < Invoice > ();

                if (criteria.DateCreatedFrom != null)
                {
                    query = query.AndRestrictionOn(x => x.DateCreated).IsBetween(criteria.DateCreatedFrom.Value).And(criteria.DateCreatedTo.Value);
                }

  .. snip



What I'm doing here is filtering my results by a date range using nHibernate.Criterion.Lambda.IQueryOverRestrictionBuilder's innate ability to write good SQL and therefore limit the set I am querying. So far so good.

I have a requirement to show all thingos (Invoices in this case) in a given month and year. According to those that know, this is simply (?) solved by mapping a YEAR and MONTH function to my Fluent entity, and then querying by those properties in my QueryOver expression, thusly: -

Entity: -

        public virtual int InvoiceYear     { get; set; }
        public virtual int InvoiceMonth    { get; set; }

        Map: -

        Map(x => x.InvoiceYear).Formula("SELECT YEAR(DateCreated)");
        Map(x => x.InvoiceMonth).Formula("SELECT MONTH(DateCreated)");     


and then doing suchlike in my querying codeage: -

    list = session.QueryOver()
                  .Where(x => x.InvoiceYear == year)
                  .And(x => x.InvoiceMonth == month)
                  .List();




Well - it doesn't work. nHibernate throws a fit when trying to compile the query expression.

In the end, the only workable solution - one that doesn't involve a complete table scan into NH entities and from there a filter, cos that'll be scalable !!! - is as follows: -

using (var session = SessionFactory.OpenSession())
            {
                DateTime from = new DateTime(year, month, 1, 0, 0, 0),
                         to   = new DateTime(year, month, DateTime.DaysInMonth(year, month), 23, 59, 59);

                list = session.QueryOver < Invoice > ()
                              .Where(x => x.Active == true)
                              .AndRestrictionOn(x => x.DateCreated).IsBetween(from).And(to)
                              .List();

            }


Got to say that annoys me somewhat, but then I guess I need to spend more time studying expression trees. On the other hand, I have business requirements that a) need solving and b) must scale to meet demand.

Wish me luck :)