Calculating Quartiles in Analysis Services

Histogram with quartiles

Histogram with quartiles

In SQL Server Analysis Services 2008, there is a native Median function in MDX, but no native Quartile function. If you need to calculate quartiles, there are a few different ways to do this. You can either use native Excel functions, code the quartile algorithm in MDX or write the algorithm as an SSAS stored procedure in .NET. In this post, I cover each of these options and provide sample code for the MDX and stored procedure options.

Excel Function

The first option is to use the Excel QUARTILE() function. Many native Excel functions can be used directly from MDX via  the EXCELMDX assembly that ships with Analysis Server. One immediate problem with this is that you must have a copy of the desktop Excel client installed on the SSAS server. In many environments, this may not be an option. Even if it were possible to install Excel on the server, I’d be concerned that there could be a performance issue with SSAS having to launch EXCEL.EXE processes and then make out-of-process COM calls for every calculation. For these reasons, I have not tried to use Excel functions. If this is something that you need to use, it’s my understanding that you need to use SetToArray() if you want to pass the values of a set to an Excel function that expects an array parameter.

MDX

The second option for calculating quartiles is to implement the algorithm in MDX. Before you do this, you’ll ovbiously need to pick a quartile algorithm. In the absence of specific requirements, it was a safe bet to assume that most of my users were going to expect that my results match what they could produce in Excel, so I chose to use the Excel algorithm in the query below.

WITH SET OrderedData AS ORDER(
	NONEMPTY([Product].[Product].[Product], [Measures].[Internet Sales Amount]),
		[Measures].[Internet Sales Amount], BASC)

MEMBER [Measures].[RowCount] AS COUNT (OrderedData)

MEMBER [Measures].[i25] AS ( .25 *  ( [RowCount] - 1 ) ) + 1
MEMBER [Measures].[i25Lo] AS FIX([i25]) - 1
MEMBER [Measures].[i25Rem] AS ([i25] - FIX([i25]))
MEMBER [Measures].[n25Lo] AS (OrderedData.Item([i25Lo]), [Internet Sales Amount])
MEMBER [Measures].[n25Hi] AS (OrderedData.Item([i25Lo] + 1), [Internet Sales Amount])
MEMBER [Measures].[Quartile1] AS [n25Lo] + ( [i25Rem] * ( [n25Hi] - [n25Lo] ))
	,FORMAT_STRING='Currency'

MEMBER [Measures].[Quartile2] AS MEDIAN(OrderedData, [Internet Sales Amount])
	,FORMAT_STRING='Currency'

MEMBER [Measures].[i75] AS ( .75 *  ( [RowCount] - 1 ) ) + 1
MEMBER [Measures].[i75Lo] AS FIX([i75]) - 1
MEMBER [Measures].[i75Rem] AS ([i75] - FIX([i75]))
MEMBER [Measures].[n75Lo] AS (OrderedData.Item([i75Lo] ), [Internet Sales Amount])
MEMBER [Measures].[n75Hi] AS (OrderedData.Item([i75Lo] + 1), [Internet Sales Amount])
MEMBER [Measures].[Quartile3] AS [n75Lo] + ( [i75Rem] * ( [n75Hi] - [n75Lo] ))
	,FORMAT_STRING='Currency'

SELECT {[Measures].[Internet Sales Amount], [Measures].[Quartile1],
    [Measures].[Quartile2], [Measures].[Quartile3]}  ON 0 ,
    {[Product].[Product].&[486]} ON 1
FROM [Adventure Works]

This query compares the Internet Sales Amount of a product (&[486] , “All-Purpose Bike Stand”) from the Adventure Works sample database to the quartile values of its peers. It works, but it is a lot of code to have to repeat if you need to calculate quartiles over many different measures, members or sets. It’s entirely possible that there’s a simpler or more generalizable way to do this, but this was the best I was able to come up with.

Stored Procedure

The last alternative for calculating quartiles is to write an SSAS stored procedure in a .NET language and call it from within MDX.

Below is a version of a quartile function for SSAS written in C#. It’s been slightly simplified from my original source.  It assumes that the set has already been sorted and stripped of empty values. In preliminary testing, I found that in situations where you needed to run a calculation over the same set, this appeared to perform better than a version that implemented the sorting and nonempty filtering internally.

public static MDXValue QuartileFromOrderedNonEmpty(Set set, Expression exp, int QTile)
{
    if (QTile < 0) throw new ArgumentOutOfRangeException("QTile");
    if (QTile > 4) throw new ArgumentOutOfRangeException("QTile");

    List<double> values = new List<double>();           

    //convert the set to an array of doubles
    foreach (Tuple t in set.Tuples)
    {
        MDXValue mdxVal = exp.Calculate(t);
        values.Add(mdxVal.ToDouble());
    }  

    //calculate the quartile for the specified QTile (e.g. 1,2,3)
    if (values.Count >= 0)
    {
        int count = values.Count;
        if (QTile == 4 || count == 1)
        {
            return values[count - 1];
        }
        double percent = ((double)QTile) / 4.0;
        double qIdx = (percent * (double)(count - 1)) ;
        double qFloor = Math.Floor(qIdx) ;
        double qFrac = qIdx - qFloor;
        return values[(int)qFloor] + (qFrac * (values[(int)(qFloor + 1)]
            - values[(int)qFloor]));
    }
    else
    {
        return null;
    }   
}

Update 2011-Aug-12: The code above was simplified from the original that used an array instead of a List<> and I left in a call to values.Length. I changed this to values.Count.

If you do prefer to perform the sort within the function, it’s easy enough to add a call to Array.Sort<double>() after you create the array of values. If you need to filter out nonempty values within the function, you will find that there is no IsEmpty property on the MDXValue class. Assuming you do not want to treat empty values as zero, you’ll need to exclude these from your array of values. I have found that if you cast an MDXValue to char, empty values are converted to the null character so you can use the following line to test for empty values.

bool isEmpty = (mdxVal.ToChar() == 0x00);

When the method is compiled into a DLL and loaded into a server or database’s assembly cache, you can use it in an MDX query in the same way you would use the Median function, except that it takes an additional parameter specifying the desired quartile number. The example below shows how to use the Quartile stored procedure.

WITH SET OrderedData AS ORDER(
    NONEMPTY([Product].[Product].[Product], [Measures].[Internet Sales Amount]),
        [Measures].[Internet Sales Amount], BASC)

MEMBER [Measures].Q1 AS
    Quartile.QuartileFromOrderedNonEmpty(
        OrderedData, [Measures].[Internet Sales Amount], 1)
    ,FORMAT_STRING='Currency'

MEMBER [Measures].Q2 AS MEDIAN(OrderedData, [Measures].[Internet Sales Amount])
    ,FORMAT_STRING='Currency'

MEMBER [Measures].Q3 AS
    Quartile.QuartileFromOrderedNonEmpty(
        OrderedData, [Measures].[Internet Sales Amount], 3)
    ,FORMAT_STRING='Currency'

SELECT {[Measures].[Internet Sales Amount],
    [Measures].Q1 , [Measures].Q2 , [Measures].Q3 }  ON 0 ,
    {[Product].[Product].&[486]} ON 1
FROM [Adventure Works]

For now, using the stored procedure option appears to be the best alternative for my current project. I have used this function in a PerformancePoint Services solution where users may browse a multilevel user hierarchy and see where the current member in the hierarchy stands within it’s peer group by displaying a chart comparing the member’s measure value to the quartile values generated over its siblings.

It also lets a user select any measure from a scorecard and updates the chart to show the member values and quartiles for the new measure. To allow this, I used the calculation dimension technique described in Chapter 6 of Expert Cube Development with Microsoft SQL Server 2008 Analysis Services to create a Benchmarks dimension with Actual, 25th Percentile, 50th Percentile, and 75th Percentile members.

For more information on SSAS stored procedure development, see Mosha’s Best practices for server ADOMD.NET stored procedures, and take a look at some of the code in the Analysis Services Stored Procedure Project. You might also want to bookmark the AdomdServer namespace on MSDN.

11 Responses to Calculating Quartiles in Analysis Services

  1. Mike,

    Thanks for this post! I created several of statistical functions such as Kendall’s Tau and Sen’s Slope for a project a few years ago and was working on adding them to the Analysis Services Stored Procedures Project on codeplex. Would you mind if I wrapped this into that effort? I’ve already created a version that sorts and strips out empty values…

    Regards,

    Clayton

  2. Richard says:

    Hi Mike,

    Love the ideas you have put forward here, i’m currently trying to implement just this sort of calculation to a dataset of mine.

    Had no problems writing the SQL to obtain the required results for a few differing groupings of my dataset but would love to utilise this function to calculate it for the 1000’s of grouings across 5-10 variables.

    I attempted to compile the .dll utilising your code however I was unsucessfull in my many many attempts (due to inexperience in using c#, creating .dll’s ect).

    Is there any chance of obtaining a copy of the compiled .dll?

    Cheers,
    Richard

    • Mike H says:

      Let me see what I can pull together (assuming Clayton hasn’t already added the function to the ASSPP).

      By the way, as an alternative to executing the quartile function at query-time in SSAS, I’ve been experimenting with a SQL statement for pre-calculating quartiles in the SQL Server relational database before loading into a cube. I can also post that if you’re interested. It will process multiple sets at once. I’ve also got several thousand sets of records, though most have less than a hundred records and it churns through those fairly quickly.

  3. Richard says:

    Thanks Mike.

    Had a look on ASSPP and couldn’t see it, would be very helpful if you’re able to provide.

    Would also be interesting to see your SQL also.

    FYI, the dataset I’m working with has 100,000 records (which updates monthly) with a 6 tier hierachy for which I need to calculate rollup medians accross multiple variables (with 3-8 possible values each).
    Ultimatly I’d like to run calculations accross the set with every possible combination of these varables also (huge numbers of combinations)

    • Mike H says:

      I provided the SQL version in my last post. I don’t think that the DLL is ready to publicly post, but I zipped up a copy of the project and compiled DLLs and sent them to the yahoo.com.au email account you has associated with your profile if you want to give it a try. Let me know how it goes.

  4. Larry says:

    Hi Mike,

    Thanks for the great post. I’ve been looking everywhere for a Percentile calculation in MDX. The Excel percentile seems to have a limit of 64k rows. This seems to be an API limitation that restricts the Set size that can be passed to an Excel function. I modified your MDX to specify a value other than 0.25 and 0.75, since I was looking for the 90th percentile. 0.9 seems to work just fine. The results match up when I compare the results to a Percentile calculation in Excel using the same set of data coming out of the cube.

  5. elainexina says:

    Hi Mark,
    This is really a great post!
    I just got the requirement to make such an implementation and have successfully developed the stored proc.
    But, as I have multiple measures and I can’t make it work using your idea of “create a Benchmarks dimension with Actual, 25th Percentile, 50th Percentile, and 75th Percentile members”, I wonder if you would be so kind to send me your example solution with the mdx code in the cube calculations defining them?
    Would be very grateful!

  6. Pingback: Calculating Quartiles in Analysis Services - HTML CODE

  7. Alexandre D. says:

    Thank you mike for that post really interesting. I’m new to MDX and I’m trying to understand what is the FIX command. I googled it but I cannot find out. Could you explain that command? Cheers!

  8. Mark Landry says:

    Far simpler 75th Percentile is:

    MEDIAN( TOPCOUNT( «Set», Cint(COUNT(«Set»)/2)[, «Numeric Expression»] )[, «Numeric Expression»] )

    The secret is realizing it’s the same as the median of the upper half of the set. All sets above are the same.

Leave a comment