Pivoting data in SQL Server

Very often there is a need to pivot (cross-tab) normalized data for some reporting purposes. While this is best done with reporting tools (Excel is one example with powerful pivoting capabilities), sometimes it needs to be done on the database side.

The discussion here is limited to static pivoting (that is when the values to pivot are known). Dynamic pivoting is accomplished by using dynamic SQL and is based on the same techniques, just dynamically building the pivoting query.

Here are a few methods and techniques to implement that in SQL Server.

Given the following data:

 OrderId    OrderDate  Amount
----------- ---------- ------
1           2007-01-01 10.50
2           2007-01-26 12.50
3           2007-01-30 12.00
4           2007-02-14 13.75
5           2007-02-20 10.00
6           2007-03-06 15.00
7           2007-03-10 17.50
8           2007-03-29 20.00

We would like to return this result set:

 OrderYear   Jan   Feb   Mar
----------- ----- ----- -----
2007        35.00 23.75 52.50

Creating the sample Orders table:

CREATE TABLE Orders (
 order_id INT IDENTITY(1, 1) NOT NULL PRIMARY KEY,
 order_date DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
 amount DECIMAL(8, 2) NOT NULL DEFAULT 0
        CHECK (amount >= 0));
 
INSERT INTO Orders
  (order_date, amount)
SELECT '20070101', 10.50
UNION ALL
SELECT '20070126', 12.50
UNION ALL
SELECT '20070130', 12.00
UNION ALL
SELECT '20070214', 13.75
UNION ALL
SELECT '20070220', 10.00
UNION ALL
SELECT '20070306', 15.00
UNION ALL
SELECT '20070310', 17.50
UNION ALL
SELECT '20070329', 20.00;

Using CASE

CASE can be very handy for pivoting. It allows writing flexible logic and to transform the data as needed. Also, it is a very efficient method as it requires a one pass scan of the data.

SELECT DATEPART(yyyy, order_date) AS OrderYear,
       SUM(CASE WHEN DATEPART(m, order_date) = 1
                THEN amount ELSE 0 END) AS 'Jan',
       SUM(CASE WHEN DATEPART(m, order_date) = 2
                THEN amount ELSE 0 END) AS 'Feb',
       SUM(CASE WHEN DATEPART(m, order_date) = 3
                THEN amount ELSE 0 END) AS 'Mar'
FROM Orders
GROUP BY DATEPART(yyyy, order_date);

Using Matrix table

This method requires creating a matrix table with 1 for the significant columns and 0 for columns to be ignored. Then joining the matrix table to the data table based on the pivoting column will produce the results:

CREATE TABLE MonthMatrix (
  month_nbr INT NOT NULL PRIMARY KEY
            CHECK (month_nbr BETWEEN 1 AND 12),
  jan INT NOT NULL DEFAULT 0
      CHECK (jan IN (0, 1)),
  feb INT NOT NULL DEFAULT 0
      CHECK (feb IN (0, 1)),
  mar INT NOT NULL DEFAULT 0
      CHECK (mar IN (0, 1)),
  apr INT NOT NULL DEFAULT 0
      CHECK (apr IN (0, 1)),
  may INT NOT NULL DEFAULT 0
      CHECK (may IN (0, 1)),
  jun INT NOT NULL DEFAULT 0
      CHECK (jun IN (0, 1)),
  jul INT NOT NULL DEFAULT 0
      CHECK (jul IN (0, 1)),
  aug INT NOT NULL DEFAULT 0
      CHECK (aug IN (0, 1)),
  sep INT NOT NULL DEFAULT 0
      CHECK (sep IN (0, 1)),
  oct INT NOT NULL DEFAULT 0
      CHECK (oct IN (0, 1)),
  nov INT NOT NULL DEFAULT 0
      CHECK (nov IN (0, 1)),
  dec INT NOT NULL DEFAULT 0
      CHECK (dec IN (0, 1)));
 
-- Populate the matrix table
INSERT INTO MonthMatrix (month_nbr, jan) VALUES (1, 1);
INSERT INTO MonthMatrix (month_nbr, feb) VALUES (2, 1);
INSERT INTO MonthMatrix (month_nbr, mar) VALUES (3, 1);
INSERT INTO MonthMatrix (month_nbr, apr) VALUES (4, 1);
INSERT INTO MonthMatrix (month_nbr, may) VALUES (5, 1);
INSERT INTO MonthMatrix (month_nbr, jun) VALUES (6, 1);
INSERT INTO MonthMatrix (month_nbr, jul) VALUES (7, 1);
INSERT INTO MonthMatrix (month_nbr, aug) VALUES (8, 1);
INSERT INTO MonthMatrix (month_nbr, sep) VALUES (9, 1);
INSERT INTO MonthMatrix (month_nbr, oct) VALUES (10, 1);
INSERT INTO MonthMatrix (month_nbr, nov) VALUES (11, 1);
INSERT INTO MonthMatrix (month_nbr, dec) VALUES (12, 1);
 
-- Use the matrix table to pivot
SELECT DATEPART(yyyy, order_date) AS OrderYear,
       SUM(amount * jan) AS 'Jan',
       SUM(amount * feb) AS 'Feb',
       SUM(amount * mar) AS 'Mar'
FROM Orders AS O
JOIN MonthMatrix AS M
  ON DATEPART(m, O.order_date) = M.month_nbr
GROUP BY DATEPART(yyyy, order_date);

The David Rozenshtein method

This method is based on the formula: 1 – ABS(SIGN(x – y)). This formula returns 1 when x = y, and 0 otherwise. In a way it mimics the matrix table approach.

SELECT DATEPART(yyyy, order_date) AS OrderYear,
       SUM(amount * (1 - ABS(SIGN(DATEPART(m, order_date) - 1)))) AS 'Jan',
       SUM(amount * (1 - ABS(SIGN(DATEPART(m, order_date) - 2)))) AS 'Feb',
       SUM(amount * (1 - ABS(SIGN(DATEPART(m, order_date) - 3)))) AS 'Mar'
FROM Orders
GROUP BY DATEPART(yyyy, order_date);

The PIVOT operator in SQL Server 2005

SQL Server 2005 provides built-in pivoting mechanism via the PIVOT operator.

SELECT OrderYear,
       [1] AS 'Jan',
       [2] AS 'Feb',
       [3] AS 'Mar'
FROM (SELECT DATEPART(yyyy, order_date),
             DATEPART(m, order_date),
             amount
      FROM Orders) AS O (OrderYear, month_nbr, amount) 
PIVOT
(SUM(amount) FOR month_nbr IN ([1], [2], [3])) AS P;

There are some other methods to implement pivoting (like using subqueries, multiple joins, or the APPLY operator in SQL Server 2005) but I am not showing examples of those as I do not find them practical, especially with large number of values to pivot.

Additional Resources:

Using PIVOT and UNPIVOT
http://msdn2.microsoft.com/en-us/library/ms177410.aspx

Mohammad says:

April 27, 2010 at 10:59 am

Nice article.
You said that be some other methods like subquery and APPLY.

I will be interested if you show me the solution with APPLY table operator.

Here is my approach

SELECT P.product_name, D1.[1990], D2.[1991], D3.[1992], D4.[1993], D5.[1994], T.Total
FROM Products P
OUTER APPLY (SELECT SUM(qty * product_price) AS [1990]
FROM Sales S
WHERE S.product_name = P.product_name
AND S.sales_year = 1990) D1
OUTER APPLY (SELECT SUM(qty * product_price) AS [1991]
FROM Sales S
WHERE S.product_name = P.product_name
AND S.sales_year = 1991) D2
OUTER APPLY (SELECT SUM(qty * product_price) AS [1992]
FROM Sales S
WHERE S.product_name = P.product_name
AND S.sales_year = 1992) D3
OUTER APPLY (SELECT SUM(qty * product_price) AS [1993]
FROM Sales S
WHERE S.product_name = P.product_name
AND S.sales_year = 1993) D4
OUTER APPLY (SELECT SUM(qty * product_price) AS [1994]
FROM Sales S
WHERE S.product_name = P.product_name
AND S.sales_year = 1994) D5
OUTER APPLY (SELECT SUM(qty * product_price) AS Total
FROM Sales S
WHERE S.product_name = P.product_name) T

Thanks.

Plamen Ratchev says:

April 27, 2010 at 11:07 am

HI Mohammad, the method with APPLY is exactly what you are using. But I find it unnecessary, much simpler and more efficient to use the PIVOT operator.

April 27, 2010 at 12:03 pm

Thanks for the reply!
You right. But just I want to know and familiar with this method.

I have created a table-valued UDF for the query.

Now my question is: Using table-valued UDF for improving performance can be helpful?

CREATE FUNCTION dbo.Cross_tabbing (@product_name VARCHAR(15), @year INT)
RETURNS TABLE AS
RETURN
(SELECT SUM(qty * product_price) AS total
FROM Sales S
WHERE S.product_name = @product_name
AND S.sales_year = @year);
GO

SELECT P.product_name
, D1.total AS [1990]
, D1.total AS [1990]
, D1.total AS [1990]
, D1.total AS [1990]
, D1.total AS [1990]
, T.Total
FROM Products P
OUTER APPLY dbo.Cross_tabbing(P.product_name, '1990') D1
OUTER APPLY dbo.Cross_tabbing(P.product_name, '1991') D2
OUTER APPLY dbo.Cross_tabbing(P.product_name, '1992') D3
OUTER APPLY dbo.Cross_tabbing(P.product_name, '1993') D4
OUTER APPLY dbo.Cross_tabbing(P.product_name, '1994') D5
OUTER APPLY (SELECT SUM(qty * product_price) AS Total
FROM Sales S
WHERE S.product_name = P.product_name) T

April 27, 2010 at 12:29 pm

Check the execution plan for the query. I do not think conversion the subqueries to TVF will help, you still have the same number of instances invoked. Again, best is to avoid this method.

May 13, 2010 at 3:52 pm

Hi Plamen,
I have some questions about the article. First if we have a nonclustered index on order_date will be the index used by optimizer in CASE query method?
I mean is not better we have two computed column with PERSISTED like this:
order_month AS DATEPART(m, order_date) PERSISTED,
order_year AS DATEPART(yyyy, order_date) PERSISTED

then create an index like this:
CREATE NONCLUSTERED INDEX ix_0 ON newOrders (order_month) INCLUDE (order_year, amount);

so we can use a better approch with multiple outer joins:

;WITH Jan(order_year, amt) AS
(SELECT order_year, SUM(amount)
FROM newOrders
WHERE order_month = 1
GROUP BY order_year),

Feb(order_year, amt) AS
(SELECT order_year, SUM(amount)
FROM newOrders
WHERE order_month = 2
GROUP BY order_year),

Mar(order_year, amt) AS
(SELECT order_year, SUM(amount)
FROM newOrders
WHERE order_month = 3
GROUP BY order_year)

SELECT D.order_year, J.amt AS Jan, F.amt AS Feb, M.amt AS Mar
FROM (SELECT DISTINCT order_year FROM newOrders)D
LEFT JOIN Jan J ON D.order_year = J.order_year
LEFT JOIN Feb F ON D.order_year = F.order_year
LEFT JOIN Mar M ON D.order_year = M.order_year

the multiple outer join at least uses the index and runs very fast.
I am right or wrong?

May 13, 2010 at 4:24 pm

Mohammad,

A non-clustered index will not be used unless it is a covering index for the columns.

To see performance difference is best to test with your scenario data and look at the execution plan. There is no generic conclusion on what is more efficient. In pivoting normally you have to scan the whole table and breaking on multiple CTEs may not make sense.

May 13, 2010 at 8:11 pm

>
A non-clustered index will not be used unless it is a covering index for the columns.
<<
I am familiar with this. The nonclustered index (that I have posted) covers other needed columns with using INCLUDE.

Kettic says:

May 18, 2014 at 11:00 pm

Binding PivotGrid to data via DataProvider

Pivoting data in SQL Server

Leave a Reply

Leave a Reply Cancel reply

SITE MAP

SEARCH

You might also like

Leave a Reply

Leave a Reply Cancel reply

SITE MAP

SEARCH