Start Your SQL Engines
/0 Comments/in SQL Server, SQLRally/by Plamen RatchevS QLRally is only two days away! I am deadlocked here looking at the schedule and trying to make a list of sessions to attend. Such a great line-up of quality content and talented speakers, too bad my concurrency model is not as good as in SQL Server. 🙂 I may have to do some ad-hoc querying in the last moment to make my picks…
I am so excited to be part of the inaugural SQLRally event! Speaking with DBAs and developers I can see the big need for this small, yet big training event. The hunger for knowledge never ends in our field and to me SQLRally is just right. For the price (and location) it is hard to find an excuse not to attend. So, what are you waiting for, time to start the SQL engines for SQLRally!
Adding some last minute best practices updates to the content of my precon session Maximize Your SQL Server 2008 Coding Skills and getting ready to roll the wheels myself!
See you there!
T-SQL Enhancements in SQL Server 2011 (CTP1)
/0 Comments/in SQL Server/by Plamen RatchevSQL Server 2011 (code named Denali) CTP1 was announced in November 2010 during the SQL PASS Summit in Seattle. While a bit disappointing not to see the much anticipated full implementation of the window functions (hope we will still see that in a future CTP version), it offers some interesting new programmability features. These new enhancements address specific problems that we see very often in business applications. Here is a quick look at the key new features in T-SQL.
Feature: OFFSET
Application use: paging
Comments: provides simplified syntax and efficient method for data paging solutions
Listing 1: OFFSET example |
CREATE TABLE Customers ( |
Feature: THROW
Application use: error handling
Comments: allow to re-throw the original error
Listing 2: THROW example |
BEGIN TRY |
Feature: SEQUENCE
Application use: replacement for IDENTITY
Comments: ANSI standard method for sequences, improves on shortcomings of IDENTITY
Listing 3: SEQUENCE example |
CREATE TABLE Customers ( |
Feature: EXECUTE WITH RESULT SETS
Application use: manipulate stored procedure output result set
Comments: capabilities to rename output result set columns without changing the original stored procedure; no options to remove/add columns or remove a result set when multiple result sets are returned
Listing 4: EXECUTE WITH RESULT SETS example |
CREATE PROCEDURE CalculateSales </br/> |
Feature: describe result sets
Application use: determining the format of a response without actually running the query
Comments: replaces SET FMTONLY
Listing 5: Describe result sets example |
CREATE PROCEDURE CalculateSales </br/> |
Bonus feature (maybe): FORMATMESSAGE
Application use: format messages (C/C++ sprint style)
Comments: undocumented feature allows to format message that is not in sys.messages
Listing 6: FORMATMESSAGE example |
SELECT FORMATMESSAGE('There are %d products in department %s.', 10, 'remodeling');/* </br/> |
SQLRally
/0 Comments/in PASS, SQL Server, SQLRally, SQLSaturday/by Plamen RatchevS
QLRally is the new regional PASS conference hosted in Orlando. It will happen on May 11- May 13 2011 and brings two full days of technical training and an optional day of pre-conference seminars. If you have been to one of the PASS SQLSaturday events then you can expect the same atmosphere of great networking with SQL Server peers in casual environment. SQLRally extends the SQLSaturday format to bring it closer to the PASS Summit experience. You can expect more in depth coverage of SQL Server and professional development topics, as well as great variety of expert areas.
Currently the finalist abstracts for the pre-conference seminars are up for voting by the SQL Server community. My abstract “Maximize Your SQL Server 2008 Coding Skills” has been selected in the final three for the Developer track.
Here are the links to view all finalists in the different tracks:
BI Sessions:
http://www.sqlpass.org/sqlrally/2011/PreConsforVotingBI.aspxDBA
DBA Sessions:
http://www.sqlpass.org/sqlrally/2011/PreConsforVotingDba.aspxDeveloper
Developer Sessions:
http://www.sqlpass.org/sqlrally/2011/PreConsforVotingDev.aspxMisc
Misc Sessions:
http://www.sqlpass.org/sqlrally/2011/PreConsforVotingMisc.aspx
Voting page:
http://www.zoomerang.com/Survey/WEB22BD59JCQBT
All finalist abstracts have excellent content. Regardless of which one you vote for it will be a great conference to attend. For $299 it is a steal!
The Power of Simplicity
/4 Comments/in SQL Server/by Plamen RatchevSolving a problem very often results in unnecessary complex solutions. One of the first lessons I learned from my math teacher was to scrap any solution that exceeds a page. She would urge me to start all over and look for simpler way to resolve it. In her world there was always a short and simple solution, it was only a matter of seeing it.
I find this rule applicable to any type of problem. Only the dimension of the page size changes according to the subject matter. Many believe that finding the simpler and better solution is to “think outside the box”. But in my opinion it is exactly the opposite – to think inside the box. Know the fundamentals of your area of expertise, systematically apply them, and you will find a simple and elegant solution! Isaac Newton did not just discover the gravity when an apple fell on his head (if at all it did). It took him 20 years of hard work to explain gravity!
In the world of SQL it drills down to deep understanding of the set based nature of SQL and coming up with solution based on that. Thinking like a procedural programmer will not help.
What are the rules to simplicity? There are no rules! I like to use the following quotes as guidelines:
“The simplest way to achieve simplicity is through thoughtful reduction.”
John Maeda, The Laws of Simplicity“Make everything as simple as possible, but not simpler.”
Albert Einstein
Let’s illustrate this with one example in SQL. In out sample scenario the request is to retrieve a list of customers who always order the exact same product (regardless of what the product is). This is a very valid business problem because you may want to send targeted coupons to customers who always buy the same products.
There are different ways to solve this problem and Listing 1 shows one method. It is very close to describing the solution in plain English: select all customers where the customer has no other orders with different product SKU.
Listing 1 |
SELECT DISTINCT customer_nbr |
Is this the simplest way to solve the problem? This query is set based but still in a way mimics procedural thinking – examine all other customer orders and check that there is no other order with different product SKU.
If you think about the set of all customer orders, you will notice that these that we need have repeating attribute values, that is the same product SKU. Applying the MIN and MAX aggregate functions on that attribute will return the same value. Then here is our simplified solution: retrieve all customers that have equal MIN and MAX product SKU on all orders. Listing 2 shows the query.
Listing 2 |
SELECT customer_nbr |
This is more elegant and simpler solution! Thinking more about the set of all customer orders you will notice that the distinct count of product SKUs is 1 for the customers in the needed result set. That brings us to another elegant solution:
Listing 3 |
SELECT customer_nbr |
Pretty and simple, right? Try it next time when you see that two page query!
It’s a Matter of Style
/7 Comments/in SQL Server/by Plamen RatchevW riting SQL can be very enjoyable activity. Reading SQL can be also enjoyable (maybe like reading poetry to some), or very unpleasant… How do you write SQL with style that results in eye pleasing and easy to read/understand code? And does it matter?
Sometimes code writing drills down to concentrating on the task at hand and producing a brilliant piece of code, which looks like this:
Listing 1 |
select c.customer_name, o.order_amt, |
Or maybe like this:
Listing 2 |
SELECT C.CUSTOMER_NAME, O.ORDER_AMT, |
While this code performs exceptionally and solves the problem in a very clever way, is it really that good? What happens when the code review/test team gets their turn? Or when you/someone else has to modify it two years from now? To my opinion this code is a very long way from what a real production code should be. And yes, this is very real and it happens every day, even as I type this. Just pay attention on the next code review, or take a look at any online SQL forum (and no, it is not only the people that ask questions, unfortunately many SQL gurus that know it all would post an answer with similar “example” style).
How do you make this code look better? The answer is in the four basic principles of design: contrast, repetition, alignment, and proximity. Let’s look how applying these principles of design (which many think are applicable only to graphic design) can lead to stylish and enjoyable code.
Contrast
The idea is to use contrast for elements that a very different. One example is columns and reserved keyword. They are not the same and the code should make that distinction very clear. Let’s apply that:
Listing 3 |
SELECT C.customer_name... |
Here the SELECT keyword is capitalized to differentiate from the lower case column name. Also, the table alias is capitalized to indicate clearly the table source.
Repetition
Repeating the same element styles for all similar items adds consistency and organization throughout code. For example, repeat and maintain capitalization for all keyword, do not mix style in different context of the code. Like the style of the SELECT and FROM reserved keywords in Listing 4.
Listing 4 |
SELECT C.customer_name... FROM Customers AS C... |
This allows to “visualize” the shape of the query code. Now the eye can easily flow from one section of code to the next one and concentrate on each element.
Alignment
Code elements should not be placed arbitrarily on the lines of code. Every code item should have some visual connection with another item in the code. One example is aligning the start of each clause of the query (SELECT, FROM, WHERE, etc.) on a new line:
Listing 5 |
SELECT C.customer_name... |
Alignment creates a clean and pleasing look of the code structure.
Proximity
Code items that relate to each other should be grouped close together. When several items are in close proximity they become one visual unit. Like placing SELECT and column names together on the line, similar for FROM and table names or WHERE and predicates. Listing 6 demonstrates this.
Listing 6 |
SELECT C.customer_name, O.order_amt... |
This makes the code structure very clear and eliminates clutter.
Let’s apply all four principles to the initial query. Here is one way it may look:
Listing 7 |
SELECT C.customer_name, O.order_amt, D.qty |
I added a couple extra styling elements (compared to the original query), can you catch them?
Another form of alignment is this:
Listing 8 |
SELECT C.customer_name, O.order_amt, D.qty |
There are many different ways to style your SQL. You may agree or disagree with some elements, but the bottom line is this: style matters!
Column Aggregates
/10 Comments/in SQL Server, XML/by Plamen RatchevWe all know how to use the aggregate functions MIN, MAX, COUNT, etc. to calculate aggregates across rows. For example, using MAX we can determine the maximum value for group of rows. But what if we want to see the maximum value for each row across columns? There is no aggregate function that accepts list of columns…
Let’s look at example to illustrate this. Here is sample table with four data columns:
CREATE TABLE Foo (
keycol INT NOT NULL PRIMARY KEY,
col1 INT NOT NULL,
col2 INT NOT NULL,
col3 INT NOT NULL,
col4 INT NOT NULL);
INSERT INTO Foo VALUES(1, 5, 0, 1, 10);
INSERT INTO Foo VALUES(2, 0, 0, 3, 1);
INSERT INTO Foo VALUES(3, 0, 0, 0, 0);
INSERT INTO Foo VALUES(4, 9, 1, 22, 8);
INSERT INTO Foo VALUES(5, 8, 8, 8, 8);
Our goal is to calculate the maximum value for the four data columns for each row. With two or three columns it may be an easy task using a CASE expression. But adding more columns will make a very long and complex CASE expression. Is there an easier way?
One solution that provides a shortcut is utilizing the SQL Server capabilities to generate an XML result and then using XQuery to find the max value. Here is the query:
-- max across columns with XQuery
SELECT c.query('keycol').value('.', 'INT') AS keycol,
c.value('local-name(./*[data(.)=
max(../*[not(local-name(.)="keycol") ])][1])',
'VARCHAR(20)') AS max_col,
c.value('max(*[not(local-name(.)="keycol")])', 'FLOAT') AS max_val
FROM (SELECT keycol, col1, col2, col3, col4
FROM Foo
FOR XML PATH, TYPE) AS T(x)
CROSS APPLY x.nodes('row') AS R(c);
The tricky part here is the filtering of the key column from the evaluation for max. Here are the results:
keycol max_col max_val
------- -------- --------
1 col4 10
2 col3 3
3 col1 0
4 col3 22
5 col1 8
How about counting the number of none zero values across columns? Here is a solution for that using very similar query:
-- count non-zero columns with XQuery
SELECT c.query('keycol').value('.', 'INT') AS keycol,
c.value('count(*[not(local-name(.)="keycol")
and not(.=0)])',
'INT') AS cnt_non_zero
FROM (SELECT keycol, col1, col2, col3, col4
FROM Foo
FOR XML PATH, TYPE) AS T(x)
CROSS APPLY x.nodes('row') AS R(c);
And the results:
keycol cnt_non_zero
------- ------------
1 3
2 2
3 0
4 4
5 4
This shows the power of XQuery to solve this problem in untraditional way. The catch? Look at the execution plan of the queries…
More efficient method to solve the first problem is to use unpivoting. Here is solution using the UNPIVOT operator (the same can be done with ANSI query using cross join with numbers table):
-- max across columns with UNPIVOT
SELECT keycol, col AS max_col, val AS max_val
FROM (
SELECT keycol, val, col,
ROW_NUMBER() OVER(PARTITION BY keycol ORDER BY val DESC, col) AS rk
FROM Foo
UNPIVOT
(val FOR col IN (col1, col2, col3, col4)) AS U) AS T
WHERE rk = 1;
And very similar solution using unpivoting to solve the second problem:
-- count non-zero columns with UNPIVOT
SELECT keycol, COUNT(NULLIF(val, 0)) AS cnt_non_zero
FROM Foo
UNPIVOT
(val FOR col IN (col1, col2, col3, col4)) AS U
GROUP BY keycol;
Outer Joins
/12 Comments/in SQL Server/by Plamen RatchevOuter joins are one of the most widely misunderstood table operators. As useful as they are, it is very easy to incorrectly apply predicates in outer join queries and introduce logical bugs that are very difficult to troubleshoot (or spend hours trying to figure out why a particular query does not produce the desired results). This article demonstrates common misunderstanding about outer joins and how to properly use them.
First, let’s start with explanation of the logical processing of a query (listing here only the steps related to outer join operators, for full explanation of logical query processing read Anatomy of a Query).
1). FROM/CROSS JOIN: The FROM clause is processed to identify source tables; a cross join (Cartesian product) is formed between the two tables involved in the table operator.
2). ON: The ON clause predicates are applied and only rows that satisfy the predicate(s) (for which the predicates evaluate to TRUE) are included in the temporary result set.
3). ADD OUTER ROWS: The left side table in LEFT OUTER JOIN and right side table in RIGHT OUTER JOIN (or both tables in FULL OUTER JOIN) are the preserved tables. That means all rows (and selected attribute values) from this table(s) are present in the result set after the outer join operator is applied. At this phase the non-matching rows from the preserved table(s) are added back (non-matched based on the predicates in the ON clause). The attributes for the non-matched rows from the non-preserved table are added as NULLs.
4). WHERE: The predicates in the WHERE clause are applied and only rows for which the predicates evaluate to TRUE are included in the temporary result set.
The process repeats for any other table operators in the FROM clause taking the temporary result set from the prior table operator as left input.
The key concept here is to understand well phases 2, 3, and 4, and how filtering affects the result set. To illustrate this with example, let’s create two tables with loans and customers. Each customer can have personal or business loans (or both), or no loans at all.
CREATE TABLE Loans (
loan_nbr INT NOT NULL,
customer_nbr INT NOT NULL,
loan_date DATETIME NOT NULL,
loan_amount DECIMAL(15, 2) NOT NULL,
loan_type CHAR(1) NOT NULL,
CONSTRAINT ck_loan_type
CHECK (loan_type IN ('P', 'B')), -- P=Personal; B=Business
CONSTRAINT pk_loans
PRIMARY KEY (loan_nbr));
INSERT INTO Loans
(loan_nbr, customer_nbr, loan_date, loan_amount, loan_type)
VALUES(1, 1, '20080101', 1500.00, 'P');
INSERT INTO Loans
(loan_nbr, customer_nbr, loan_date, loan_amount, loan_type)
VALUES(2, 2, '20080215', 1000.00, 'P');
INSERT INTO Loans
(loan_nbr, customer_nbr, loan_date, loan_amount, loan_type)
VALUES(3, 1, '20080311', 5000.00, 'B');
INSERT INTO Loans
(loan_nbr, customer_nbr, loan_date, loan_amount, loan_type)
VALUES(4, 3, '20080312', 2000.00, 'P');
INSERT INTO Loans
(loan_nbr, customer_nbr, loan_date, loan_amount, loan_type)
VALUES(5, 4, '20080325', 1200.00, 'P');
INSERT INTO Loans
(loan_nbr, customer_nbr, loan_date, loan_amount, loan_type)
VALUES(6, 3, '20080327', 4000.00, 'B');
INSERT INTO Loans
(loan_nbr, customer_nbr, loan_date, loan_amount, loan_type)
VALUES(7, 5, '20080410', 3500.00, 'B');
INSERT INTO Loans
(loan_nbr, customer_nbr, loan_date, loan_amount, loan_type)
VALUES(8, 2, '20080412', 2000.00, 'P');
CREATE TABLE Customers (
customer_nbr INT NOT NULL,
customer_name VARCHAR(35),
PRIMARY KEY (customer_nbr));
INSERT INTO Customers (customer_nbr, customer_name)
VALUES(1, 'Peter Brown');
INSERT INTO Customers (customer_nbr, customer_name)
VALUES(2, 'Jeff Davis');
INSERT INTO Customers (customer_nbr, customer_name)
VALUES(3, 'Kevin Fox');
INSERT INTO Customers (customer_nbr, customer_name)
VALUES(4, 'Donna Louis');
INSERT INTO Customers (customer_nbr, customer_name)
VALUES(5, 'Greg Barrow');
INSERT INTO Customers (customer_nbr, customer_name)
VALUES(6, 'Richard Douglas');
-- Add foreign key for Loans
ALTER TABLE Loans
ADD CONSTRAINT fk_loans_customers
FOREIGN KEY (customer_nbr)
REFERENCES Customers(customer_nbr);
CASE #1: Predicates on the non-preserved table attributes
The first request is to retrieve list of all customers and only the total personal loan amount (if any) they have. The following query seems to describe the request properly:
SELECT C.customer_name,
SUM(COALESCE(L.loan_amount, 0)) AS total
FROM Customers AS C
LEFT OUTER JOIN Loans AS L
ON C.customer_nbr = L.customer_nbr
WHERE L.loan_type = 'P'
GROUP BY C.customer_name
ORDER BY customer_name;
However, the results do not seem correct:
customer_name total
---------------- --------
Donna Louis 1200.00
Jeff Davis 3000.00
Kevin Fox 2000.00
Peter Brown 1500.00
Customers Greg and Richard are missing from the output. To understand the problems let’s run this step by step:
1). Cross join:
SELECT C.customer_name, L.loan_amount, L.loan_type
FROM Customers AS C
CROSS JOIN Loans AS L;
This returns Cartesian product (every possible combination of a row from the Customers table and a row from the Loans table).
2). ON predicates:
SELECT C.customer_name, L.loan_amount, L.loan_type
FROM Customers AS C
INNER JOIN Loans AS L
ON C.customer_nbr = L.customer_nbr;
This results in the following output:
customer_name loan_amount loan_type
--------------- ------------ ---------
Peter Brown 1500.00 P
Jeff Davis 1000.00 P
Peter Brown 5000.00 B
Kevin Fox 2000.00 P
Donna Louis 1200.00 P
Kevin Fox 4000.00 B
Greg Barrow 3500.00 B
Jeff Davis 2000.00 P
At this stage only rows that match based on the predicate for customer match are included.
3). Add outer rows:
SELECT C.customer_name, L.loan_amount, L.loan_type
FROM Customers AS C
LEFT OUTER JOIN Loans AS L
ON C.customer_nbr = L.customer_nbr;
The query returns this result set:
customer_name loan_amount loan_type
--------------- ------------- ---------
Peter Brown 1500.00 P
Peter Brown 5000.00 B
Jeff Davis 1000.00 P
Jeff Davis 2000.00 P
Kevin Fox 2000.00 P
Kevin Fox 4000.00 B
Donna Louis 1200.00 P
Greg Barrow 3500.00 B
Richard Douglas NULL NULL
Here the outer rows are added, resulting in adding back a row for customer Richard, who has no loans at all and was excluded in the prior phase when the ON clause predicates were applied.
4). WHERE predicates:
SELECT C.customer_name, L.loan_amount, L.loan_type
FROM Customers AS C
LEFT OUTER JOIN Loans AS L
ON C.customer_nbr = L.customer_nbr
WHERE L.loan_type = 'P';
This result set is the base for the results in the first query attempt:
customer_name loan_amount loan_type
--------------- ------------ ---------
Peter Brown 1500.00 P
Jeff Davis 1000.00 P
Kevin Fox 2000.00 P
Donna Louis 1200.00 P
Jeff Davis 2000.00 P
Now it is very clear that the predicate in the WHERE clause filters the NULL for loan type for customer Richard, and customer Greg is excluded because he has only business loans. Incorrectly placing the predicate in the WHERE clause turns the outer join to inner join.
To correct this query it only requires to move the predicate on loan type from the WHERE clause to the ON clause:
SELECT C.customer_name,
SUM(COALESCE(L.loan_amount, 0)) AS total
FROM Customers AS C
LEFT OUTER JOIN Loans AS L
ON C.customer_nbr = L.customer_nbr
AND L.loan_type = 'P'
GROUP BY C.customer_name
ORDER BY customer_name;
This results in correct result set listing all customers and the total personal loan amount.
customer_name total
---------------- --------
Donna Louis 1200.00
Greg Barrow 0.00
Jeff Davis 3000.00
Kevin Fox 2000.00
Peter Brown 1500.00
Richard Douglas 0.00
CASE #2: Predicates on the preserved table attributes
Let’s look at another example demonstrating how filtering affect the preserved table attributes. The request is to retrieve the total loan amount for customer Richard Douglas, even if Richard does not have any loans at all. Since it is required to return always Richard’s account info, the following query seems to satisfy the request:
SELECT C.customer_name,
SUM(COALESCE(L.loan_amount, 0)) AS total
FROM Customers AS C
LEFT OUTER JOIN Loans AS L
ON C.customer_nbr = L.customer_nbr
AND C.customer_name = 'Richard Douglas'
GROUP BY C.customer_name
ORDER BY customer_name;
However, the result set returns information for all customers:
customer_name total
--------------- -----
Donna Louis 0.00
Greg Barrow 0.00
Jeff Davis 0.00
Kevin Fox 0.00
Peter Brown 0.00
Richard Douglas 0.00
As explained earlier, the reason for that is that outer rows are added back after the outer join predicates are applied, resulting in adding back all other customers. To get only Richard’s loan information requires moving the predicate filtering on customer name to the WHERE clause:
SELECT C.customer_name,
SUM(COALESCE(L.loan_amount, 0)) AS total
FROM Customers AS C
LEFT OUTER JOIN Loans AS L
ON C.customer_nbr = L.customer_nbr
WHERE C.customer_name = 'Richard Douglas'
GROUP BY C.customer_name
ORDER BY customer_name;
This results in correctly returning only Richard’s loan information:
customer_name total
---------------- -----
Richard Douglas 0.00
Plan Freezing
/0 Comments/in SQL Server/by Plamen RatchevPlan freezing refers to forcing a plan for specific query. There are different reasons to freeze a plan, one is to provide plan stability for solving parameter sniffing issues.
This functionality was first introduced in SQL Server 2005 via using the USE PLAN query hint or using a plan guide to force a plan. However, using this feature was rather difficult with the requirement to capture the exact SQL statement. That was normally done using SQL Profiler trace or via querying the Dynamic Management Objects. Another issue was that changing the metadata (like dropping an index that affects the plan) resulted in error when trying to use the plan guide.
SQL Server 2008 added an array of enhancements. First, it simplified the mechanism to create a plan guide from plan handle via the sp_create_plan_guide_from_handle stored procedure. This eliminates the need to extract and match the SQL statement. Another enhancement is that if the metadata changes then the plan guide is silently skipped and the query execution continues with whatever execution plan is best.
Below is example of utilizing the new capabilities in SQL Server 2008 to freeze a plan.
-- Clear procedure cache
DBCC FREEPROCCACHE;
GO
-- First query run to get plan for freezing
SET STATISTICS XML ON;
EXEC sp_executesql
N'SELECT orderid, customerid, orderdate, shippeddate
FROM Orders
WHERE customerid = N''CACYK''';
SET STATISTICS XML OFF;
GO
-- Create plan quide based on plan
DECLARE @plan_handle VARBINARY(64);
SELECT @plan_handle = plan_handle
FROM sys.dm_exec_query_stats AS S
CROSS APPLY sys.dm_exec_sql_text(S.sql_handle) AS T
WHERE text LIKE '%Order%';
EXEC sp_create_plan_guide_from_handle 'PlanGuide2008', @plan_handle = @plan_handle;
GO
-- Check that plan guide was created
SELECT * FROM sys.plan_guides;
GO
-- Execute and verify UsePlan="1" and PlanGuideName="PlanGuide2008"
-- in the XML plan output
SET STATISTICS XML ON;
EXEC sp_executesql
N'SELECT orderid, customerid, orderdate, shippeddate
FROM Orders
WHERE customerid = N''CACYK''';
SET STATISTICS XML OFF;
GO
EXEC sp_control_plan_guide N'DROP', N'PlanGuide2008';
Unpivoting Multiple Columns
/20 Comments/in SQL Server/by Plamen RatchevUnpivoting is the process of normalizing data. In earlier post I discussed unpivoting a single column. This demonstration is to show how to unpivot multiple columns.
The task is to normalize the following denormalized data, which represents product sales volumes by quarter:
product_nbr qtr1 sales_qtr1 qtr2 sales_qtr2 qtr3 sales_qtr3 qtr4 sales_qtr4
----------- ------ ----------- ------ ----------- ------ ----------- ------ -----------
1 2008Q1 100 2008Q2 20 2008Q3 15 2008Q4 10
2 2008Q1 80 2008Q2 15 2008Q3 20 2008Q4 10
3 2008Q1 70 2008Q2 5 2008Q3 10 2008Q4 15
Normalized data set should look like this:
product_nbr qtr sales
----------- ------ -----------
1 2008Q1 100
1 2008Q2 20
1 2008Q3 15
1 2008Q4 10
2 2008Q1 80
2 2008Q2 15
2 2008Q3 20
2 2008Q4 10
3 2008Q1 70
3 2008Q2 5
3 2008Q3 10
3 2008Q4 15
The first method uses CROSS JOIN with table with numbers (needs one number for each quarter) and CASE expressions to select the appropriate value (quarter or sales volume) for each quarter.
SELECT product_nbr,
CASE n
WHEN 1 THEN qtr1
WHEN 2 THEN qtr2
WHEN 3 THEN qtr3
WHEN 4 THEN qtr4
END AS qtr,
CASE n
WHEN 1 THEN sales_qtr1
WHEN 2 THEN sales_qtr2
WHEN 3 THEN sales_qtr3
WHEN 4 THEN sales_qtr4
END AS sales
FROM QuarterlySales AS S
CROSS JOIN
(SELECT 1 UNION
SELECT 2 UNION
SELECT 3 UNION
SELECT 4) AS Nums(n);
Alsmost seems natural that we should be able to do the same much easier using the UNPIVOT operator (introduced in SQL Server 2005). However, one of the limitations of the UNPIVOT operator is that it works only with a single column. But because SQL Server allows multiple table operators in the FROM clause, we can use two UNPIVOT operators. The catch is that the second UNPIVOT operator applies to the virtual table results from the first unpivot operator. That requires using a little trick to extract and match the quarter from the results of each UNPIVOT operator in order to produce the final result. Here is the query to unpivot using the UNPIVOT operator.
SELECT product_nbr, qtr, sales
FROM
(SELECT product_nbr,
qtr1, sales_qtr1,
qtr2, sales_qtr2,
qtr3, sales_qtr3,
qtr4, sales_qtr4
FROM QuarterlySales) AS S
UNPIVOT
(qtr FOR qtrx IN (qtr1, qtr2, qtr3, qtr4)) AS U1
UNPIVOT
(sales FOR sales_qtrx IN (sales_qtr1, sales_qtr2,
sales_qtr3, sales_qtr4)) AS U2
WHERE RIGHT(sales_qtrx, 1) = RIGHT(qtrx, 1);