Tag Archive for: database
Start Your SQL Engines
/0 Comments/in SQL Server, SQLRally/by Plamen RatchevS QLRally is only two days away! I am deadlocked here looking at the schedule and trying to make a list of sessions to attend. Such a great line-up of quality content and talented speakers, too bad my concurrency model is not as good as in SQL Server. 🙂 I may have to do some ad-hoc querying in the last moment to make my picks…
I am so excited to be part of the inaugural SQLRally event! Speaking with DBAs and developers I can see the big need for this small, yet big training event. The hunger for knowledge never ends in our field and to me SQLRally is just right. For the price (and location) it is hard to find an excuse not to attend. So, what are you waiting for, time to start the SQL engines for SQLRally!
Adding some last minute best practices updates to the content of my precon session Maximize Your SQL Server 2008 Coding Skills and getting ready to roll the wheels myself!
See you there!
T-SQL Enhancements in SQL Server 2011 (CTP1)
/0 Comments/in SQL Server/by Plamen RatchevSQL Server 2011 (code named Denali) CTP1 was announced in November 2010 during the SQL PASS Summit in Seattle. While a bit disappointing not to see the much anticipated full implementation of the window functions (hope we will still see that in a future CTP version), it offers some interesting new programmability features. These new enhancements address specific problems that we see very often in business applications. Here is a quick look at the key new features in T-SQL.
Feature: OFFSET
Application use: paging
Comments: provides simplified syntax and efficient method for data paging solutions
Listing 1: OFFSET example |
CREATE TABLE Customers ( |
Feature: THROW
Application use: error handling
Comments: allow to re-throw the original error
Listing 2: THROW example |
BEGIN TRY |
Feature: SEQUENCE
Application use: replacement for IDENTITY
Comments: ANSI standard method for sequences, improves on shortcomings of IDENTITY
Listing 3: SEQUENCE example |
CREATE TABLE Customers ( |
Feature: EXECUTE WITH RESULT SETS
Application use: manipulate stored procedure output result set
Comments: capabilities to rename output result set columns without changing the original stored procedure; no options to remove/add columns or remove a result set when multiple result sets are returned
Listing 4: EXECUTE WITH RESULT SETS example |
CREATE PROCEDURE CalculateSales </br/> |
Feature: describe result sets
Application use: determining the format of a response without actually running the query
Comments: replaces SET FMTONLY
Listing 5: Describe result sets example |
CREATE PROCEDURE CalculateSales </br/> |
Bonus feature (maybe): FORMATMESSAGE
Application use: format messages (C/C++ sprint style)
Comments: undocumented feature allows to format message that is not in sys.messages
Listing 6: FORMATMESSAGE example |
SELECT FORMATMESSAGE('There are %d products in department %s.', 10, 'remodeling');/* </br/> |
Defensive Database Programming
/1 Comment/in best practices, books/by Plamen RatchevDefensive programming is a much avoided topic by developers. Rushing to deliver features is almost always prioritized higher than spending time to defensively program code for robustness. Especially in the database world where this concept is much misunderstood. This is why it was real pleasure to read Alex Kuznetsov’s book "Defensive Database Programming with SQL Server".
Alex deals with this "inconvenient" topic using very practical approach. Instead of filling pages with theory and reasons why defensive programming is good, he dives right into simple examples from the daily work of every database professional. We have all seen (and ignored) many of these issues, but demonstrating how this affects our code and how simple it is to avoid these problems makes this book shine. It is not a complete catalog of defensive techniques, rather a good collection of examples to illustrate the need for defensive coding and applicable methods. It builds the mindset to think proactively and create robust solutions.
The book includes coverage of the following topics: basic defensive technique, code vulnerabilities, changes to database objects, upgrades, reusing code, data integrity and constraints, error handling, concurrency.
In many ways this book reminds me of the classic work by Donald Norman on designing everyday things (The Design of Everyday Things), which in similar manner demonstrates how defensive design can prevent human errors.
In conclusion, "Defensive Database Programming with SQL Server" is a wonderful addition to the library of every database professional. It should be required reading for all SQL practitioners.
SQLRally
/0 Comments/in PASS, SQL Server, SQLRally, SQLSaturday/by Plamen RatchevS
QLRally is the new regional PASS conference hosted in Orlando. It will happen on May 11- May 13 2011 and brings two full days of technical training and an optional day of pre-conference seminars. If you have been to one of the PASS SQLSaturday events then you can expect the same atmosphere of great networking with SQL Server peers in casual environment. SQLRally extends the SQLSaturday format to bring it closer to the PASS Summit experience. You can expect more in depth coverage of SQL Server and professional development topics, as well as great variety of expert areas.
Currently the finalist abstracts for the pre-conference seminars are up for voting by the SQL Server community. My abstract “Maximize Your SQL Server 2008 Coding Skills” has been selected in the final three for the Developer track.
Here are the links to view all finalists in the different tracks:
BI Sessions:
http://www.sqlpass.org/sqlrally/2011/PreConsforVotingBI.aspxDBA
DBA Sessions:
http://www.sqlpass.org/sqlrally/2011/PreConsforVotingDba.aspxDeveloper
Developer Sessions:
http://www.sqlpass.org/sqlrally/2011/PreConsforVotingDev.aspxMisc
Misc Sessions:
http://www.sqlpass.org/sqlrally/2011/PreConsforVotingMisc.aspx
Voting page:
http://www.zoomerang.com/Survey/WEB22BD59JCQBT
All finalist abstracts have excellent content. Regardless of which one you vote for it will be a great conference to attend. For $299 it is a steal!
The Power of Simplicity
/4 Comments/in SQL Server/by Plamen RatchevSolving a problem very often results in unnecessary complex solutions. One of the first lessons I learned from my math teacher was to scrap any solution that exceeds a page. She would urge me to start all over and look for simpler way to resolve it. In her world there was always a short and simple solution, it was only a matter of seeing it.
I find this rule applicable to any type of problem. Only the dimension of the page size changes according to the subject matter. Many believe that finding the simpler and better solution is to “think outside the box”. But in my opinion it is exactly the opposite – to think inside the box. Know the fundamentals of your area of expertise, systematically apply them, and you will find a simple and elegant solution! Isaac Newton did not just discover the gravity when an apple fell on his head (if at all it did). It took him 20 years of hard work to explain gravity!
In the world of SQL it drills down to deep understanding of the set based nature of SQL and coming up with solution based on that. Thinking like a procedural programmer will not help.
What are the rules to simplicity? There are no rules! I like to use the following quotes as guidelines:
“The simplest way to achieve simplicity is through thoughtful reduction.”
John Maeda, The Laws of Simplicity“Make everything as simple as possible, but not simpler.”
Albert Einstein
Let’s illustrate this with one example in SQL. In out sample scenario the request is to retrieve a list of customers who always order the exact same product (regardless of what the product is). This is a very valid business problem because you may want to send targeted coupons to customers who always buy the same products.
There are different ways to solve this problem and Listing 1 shows one method. It is very close to describing the solution in plain English: select all customers where the customer has no other orders with different product SKU.
Listing 1 |
SELECT DISTINCT customer_nbr |
Is this the simplest way to solve the problem? This query is set based but still in a way mimics procedural thinking – examine all other customer orders and check that there is no other order with different product SKU.
If you think about the set of all customer orders, you will notice that these that we need have repeating attribute values, that is the same product SKU. Applying the MIN and MAX aggregate functions on that attribute will return the same value. Then here is our simplified solution: retrieve all customers that have equal MIN and MAX product SKU on all orders. Listing 2 shows the query.
Listing 2 |
SELECT customer_nbr |
This is more elegant and simpler solution! Thinking more about the set of all customer orders you will notice that the distinct count of product SKUs is 1 for the customers in the needed result set. That brings us to another elegant solution:
Listing 3 |
SELECT customer_nbr |
Pretty and simple, right? Try it next time when you see that two page query!
It’s a Matter of Style
/7 Comments/in SQL Server/by Plamen RatchevW riting SQL can be very enjoyable activity. Reading SQL can be also enjoyable (maybe like reading poetry to some), or very unpleasant… How do you write SQL with style that results in eye pleasing and easy to read/understand code? And does it matter?
Sometimes code writing drills down to concentrating on the task at hand and producing a brilliant piece of code, which looks like this:
Listing 1 |
select c.customer_name, o.order_amt, |
Or maybe like this:
Listing 2 |
SELECT C.CUSTOMER_NAME, O.ORDER_AMT, |
While this code performs exceptionally and solves the problem in a very clever way, is it really that good? What happens when the code review/test team gets their turn? Or when you/someone else has to modify it two years from now? To my opinion this code is a very long way from what a real production code should be. And yes, this is very real and it happens every day, even as I type this. Just pay attention on the next code review, or take a look at any online SQL forum (and no, it is not only the people that ask questions, unfortunately many SQL gurus that know it all would post an answer with similar “example” style).
How do you make this code look better? The answer is in the four basic principles of design: contrast, repetition, alignment, and proximity. Let’s look how applying these principles of design (which many think are applicable only to graphic design) can lead to stylish and enjoyable code.
Contrast
The idea is to use contrast for elements that a very different. One example is columns and reserved keyword. They are not the same and the code should make that distinction very clear. Let’s apply that:
Listing 3 |
SELECT C.customer_name... |
Here the SELECT keyword is capitalized to differentiate from the lower case column name. Also, the table alias is capitalized to indicate clearly the table source.
Repetition
Repeating the same element styles for all similar items adds consistency and organization throughout code. For example, repeat and maintain capitalization for all keyword, do not mix style in different context of the code. Like the style of the SELECT and FROM reserved keywords in Listing 4.
Listing 4 |
SELECT C.customer_name... FROM Customers AS C... |
This allows to “visualize” the shape of the query code. Now the eye can easily flow from one section of code to the next one and concentrate on each element.
Alignment
Code elements should not be placed arbitrarily on the lines of code. Every code item should have some visual connection with another item in the code. One example is aligning the start of each clause of the query (SELECT, FROM, WHERE, etc.) on a new line:
Listing 5 |
SELECT C.customer_name... |
Alignment creates a clean and pleasing look of the code structure.
Proximity
Code items that relate to each other should be grouped close together. When several items are in close proximity they become one visual unit. Like placing SELECT and column names together on the line, similar for FROM and table names or WHERE and predicates. Listing 6 demonstrates this.
Listing 6 |
SELECT C.customer_name, O.order_amt... |
This makes the code structure very clear and eliminates clutter.
Let’s apply all four principles to the initial query. Here is one way it may look:
Listing 7 |
SELECT C.customer_name, O.order_amt, D.qty |
I added a couple extra styling elements (compared to the original query), can you catch them?
Another form of alignment is this:
Listing 8 |
SELECT C.customer_name, O.order_amt, D.qty |
There are many different ways to style your SQL. You may agree or disagree with some elements, but the bottom line is this: style matters!
Refactoring Entity-Attribute-Value Design
/5 Comments/in best practices/by Plamen RatchevEntity-Attribute-Value (often referenced as EAV) table design is one of the “wonderful” things that newcomers to the SQL land discover very quickly! Traditional procedural programmers have very hard time understanding why tables have fixed number of columns, and adding a new column requires table and possibly code change. Their great problem solving skills from the procedural world help them find a “natural” solution – the EAV design – where a pair of key (describing the attribute) and value allow to store data with flexibility.
Example is storing all employee information in 3 columns: key column to identify employee (entity), attribute key column (attribute) to identify the attribute stored (name, birth date, SSN, date of hire, etc.), and value column (value) to store the actual value.
|
Normally the value column has to be of some large length character data type, like VARCHAR(200), and data has to be casted to text when saved and then casted back to the original data type when retrieved.
Once all coding is done then comes the realization of how bad this approach is… And sometimes you just walk into a project where EAV has been already deployed and now it is your job to fix it. The typical signs of EAV implementation is the existence of very complex (and slow) queries to satisfy relatively trivial user requests for data and the lack of data integrity.
How do you fix EAV design?
Here is one example of EAV design and the refactoring process to correct it. In our case the information for bank loans is stored in a table representing EAV design. Listing 1 below shows the scripts to create the table and insert sample data.
Listing 1 |
-- EAV table |
Here the pair of loan number and customer number identifies the entity. The code column stores the attribute (amount, date, and type of loan) and the value column represents that actual value.
A quick demonstration of the problems caused by this design. Let’s write a query to retrieve customers with personal loans over 1,000 for the period Jan 1, 2008 through Jan 31, 2008. Here is a first attempt to satisfy the request:
Listing 2 |
SELECT A.loan_nbr, |
One of the first issues to notice is the multiple joins to retrieve each attribute. This is because each attribute is stored in separate row and we need to put back together the customer loan info. Besides that seems the code should be OK… except one little problem! The nature of the SQL is such that nothing guarantees that the predicates in the WHERE clause will be executed in the order listed. They will be expanded in the query plan and the cast to DATETIME or DECIMAL can be applied to the loan type value which will result in conversion error. I described this in more detail in my post Predicates in SQL.
To fix this you may be tempted to try a different approach: using table expressions to retrieve each attribute. Listing 3 shows a version of such query.
Listing 3 |
SELECT A.loan_nbr, |
However, this query has the same problem because the table expressions are expanded in the query plan and nothing guarantees the order of executing the predicates. One approach to retrieve the requested data successfully is to use a pivoting technique. The query below illustrates that.
Listing 4 |
SELECT loan_nbr, |
Here the CASE expressions guarantee that only the correct values will be casted to the specific data type. But then how efficient is this query for such a trivial request?
The refactoring process
Usually the first step of correcting an EAV design is to create a normalized table for storing the data. In our case the table structure can look like Listing 5 (note that here for clarify the loan type is spelled out as entire word; in real business scenario it may be sufficient to indicate only P or B; alternatively create table to store loan types and reference via a foreign key constraint).
Listing 5 |
CREATE TABLE Loans ( |
Now each attribute is in separate column with appropriate data type. This guarantees the integrity of the data as well as now we can define constraints to restrict data values (like the CHECK constraint for loan type).
The next step is to convert and transfer data from the old format to the new table. We already know that a pivoting technique works and can be used here. Here is the query to transfer from EAV to normalized format.
Listing 6 |
INSERT INTO Loans |
The last step in the refactoring process is to replace the old EAV table with view for backward compatibility (if there is code referencing that original table that cannot be changed at this time). There are different methods to accomplish this. For simplicity here we can use unpivoting using UNION and query for each attribute (note that you have to drop the original EAV table first because the view has the same name).
Listing 7 |
CREATE VIEW EAV_Loans |
If there is any old code that references the original EAV table for data modifications you will have to add instead of triggers to the view to handle it.
Now, let’s answer the same data request to retrieve customers with personal loans over 1,000 for the period Jan 1, 2008 through Jan 31, 2008, this time using the normalized table.
Listing 8 |
SELECT loan_nbr, customer_nbr, loan_date, loan_amount |
Simple and elegant, just as it should be!
Refactoring Ranges
/4 Comments/in best practices/by Plamen RatchevRefactoring is misunderstood so many times, mostly by developers living by the motto “If it’s not broken don’t fix it”. But in most cases this is invalid argument and leads to keeping inefficient and difficult to maintain solutions around. With tools and languages evolving, as well as knowledge about specific technology, there are so many opportunities to improve existing code in a way that will make a huge difference in many aspects. As Sir Winston Churchill said “To improve is to change; to be perfect is to change often”.
One example of refactoring that can lead to better code is the problem of finding ranges of existing values in SQL. A range (or also known as “island”) refers to finding and grouping consecutive values (dates, numbers, etc.) to a row representing the start point and the end point of the range. There are many variations of this problem but in essence the logic always drills down to finding a grouping factor to collapse the values to a range.
Let’s look at one example to illustrate the refactoring process. Given Sales table containing sale transactions our goal is to find:
1). Consecutive ranges of transaction numbers (transaction numbers that do not have gaps)
2). Consecutive ranges of sale dates (sale dates with no missing dates in the range)
These results can be used for variety of reporting and data analysis purposes. Here is how the Sales table looks:
CREATE TABLE Sales (
transaction_nbr INT NOT NULL PRIMARY KEY,
sale_date DATE NOT NULL,
amount DECIMAL(18, 2) NOT NULL);
INSERT INTO Sales VALUES
(1, '20091201', 100.00),
(3, '20091202', 15.00),
(4, '20091203', 102.50),
(5, '20091204', 110.00),
(6, '20091207', 98.25),
(9, '20091208', 20.00),
(11, '20091209', 160.00),
(12, '20091211', 250.00);
SELECT transaction_nbr, sale_date, amount
FROM Sales;
/*
transaction_nbr sale_date amount
--------------- ---------- ----------
1 2009-12-01 100.00
3 2009-12-02 15.00
4 2009-12-03 102.50
5 2009-12-04 110.00
6 2009-12-07 98.25
9 2009-12-08 20.00
11 2009-12-09 160.00
12 2009-12-11 250.00
*/
The requirement is to return the following two result sets:
/*
Ranges by transaction_nbr:
range_start range_end
----------- -----------
1 1
3 6
9 9
11 12
Ranges by sale_date:
range_start range_end
----------- ----------
2009-12-01 2009-12-04
2009-12-07 2009-12-09
2009-12-11 2009-12-11
*/
On versions prior to SQL Server 2005, a very common solution is to find the grouping factor for the ranges by using a subquery to find the minimum value in the range and then group all consecutive values in the range based on that. For transaction numbers the query to find grouping factor looks like this:
SELECT transaction_nbr,
(SELECT MIN(B.transaction_nbr)
FROM Sales AS B
WHERE B.transaction_nbr >= A.transaction_nbr
AND NOT EXISTS
(SELECT *
FROM Sales AS C
WHERE C.transaction_nbr = B.transaction_nbr + 1)) AS grp
FROM Sales AS A;
Here is the result set:
/*
transaction_nbr grp
--------------- -----------
1 1
3 6
4 6
5 6
6 6
9 9
11 12
12 12
*/
It is easy to see the groups created by the subquery. Now the task to finalize ranges is very trivial, simply grouping by the grouping factor and retrieving the MIN and MAX values in the range:
SELECT MIN(transaction_nbr) AS range_start,
MAX(transaction_nbr) AS range_end
FROM (
SELECT transaction_nbr,
(SELECT MIN(B.transaction_nbr)
FROM Sales AS B
WHERE B.transaction_nbr >= A.transaction_nbr
AND NOT EXISTS
(SELECT *
FROM Sales AS C
WHERE C.transaction_nbr = B.transaction_nbr + 1)) AS grp
FROM Sales AS A) AS T
GROUP BY grp;
This query satisfies the first task to find ranges by transaction number:
/*
range_start range_end
----------- -----------
1 1
3 6
9 9
11 12
*/
The logic for finding ranges by sale date is very similar. The only difference is that the subquery to find the grouping factor uses the date/time functions in SQL Server to check if the dates are consecutive. Here is the query solving the second task and the corresponding result set:
SELECT MIN(sale_date) AS range_start,
MAX(sale_date) AS range_end
FROM (
SELECT sale_date,
(SELECT MIN(B.sale_date)
FROM Sales AS B
WHERE B.sale_date >= A.sale_date
AND NOT EXISTS
(SELECT *
FROM Sales AS C
WHERE C.sale_date = DATEADD(DAY, 1, B.sale_date))) AS grp
FROM Sales AS A) AS T
GROUP BY grp;
/*
range_start range_end
----------- ----------
2009-12-01 2009-12-04
2009-12-07 2009-12-09
2009-12-11 2009-12-11
*/
While in this example the query does not look so difficult, with more complex scenarios it can become very difficult to understand, and performance will not be great.
SQL Server 2005 introduced the ranking functions (ROW_NUMBER, RANK, DENSE_RANK, and NTILE). This provides a new tool to solve out problem in more simplified and efficient manner. We can use a very simple math to find the grouping factor. Take a look at the following query and the results:
SELECT transaction_nbr,
ROW_NUMBER() OVER(ORDER BY transaction_nbr) AS rk,
transaction_nbr - ROW_NUMBER() OVER(ORDER BY transaction_nbr) AS grp
FROM Sales;
/*
transaction_nbr rk grp
--------------- -------------------- --------------------
1 1 0
3 2 1
4 3 1
5 4 1
6 5 1
9 6 3
11 7 4
12 8 4
*/
The query simply generates rank by transaction number, and defines expression subtracting the rank from the transaction number. Observing the transaction number and the rank columns it is easy to see that transaction numbers increase with 1 when there are no gaps, while ranks always increase with 1. Subtracting sequentially increasing numbers from set of numbers without gaps results in constant number (as both sequences increase with 1). When the set of numbers has gaps the subtraction results in different number. This is the base to define the grouping factor for our ranges.
Here is the final query to solve the first task to find ranges by transaction number:
SELECT MIN(transaction_nbr) AS range_start,
MAX(transaction_nbr) AS range_end
FROM (
SELECT transaction_nbr,
transaction_nbr - ROW_NUMBER() OVER(ORDER BY transaction_nbr) AS grp
FROM Sales AS A) AS T
GROUP BY grp;
/*
range_start range_end
----------- -----------
1 1
3 6
9 9
11 12
*/
Utilizing the ranking functions allows using new algorithm which results in simplified and better performing solution.
The solution for finding date ranges is very similar. Here are two versions with minor differences. The first version uses the difference between a fixed date (January 1, 2000) and the sale date value. This difference will generate sequential numeric values when there are no gaps between dates and will skip numbers when gaps exist.
-- Using number as grouping factor
SELECT MIN(sale_date) AS range_start,
MAX(sale_date) AS range_end
FROM (
SELECT sale_date,
DATEDIFF(DAY, '20000101', sale_date) -
ROW_NUMBER() OVER(ORDER BY sale_date) AS grp
FROM Sales AS A) AS T
GROUP BY grp;
/*
range_start range_end
----------- ----------
2009-12-01 2009-12-04
2009-12-07 2009-12-09
2009-12-11 2009-12-11
*/
The second version subtracts days (represented by rank based on sale date) from the sale date. This in effect generates constant date when sale dates are in sequence.
-- Using date as grouping factor
SELECT MIN(sale_date) AS range_start,
MAX(sale_date) AS range_end
FROM (
SELECT sale_date,
DATEADD(DAY, -1 * ROW_NUMBER() OVER(ORDER BY sale_date), sale_date) AS grp
FROM Sales AS A) AS T
GROUP BY grp;
The same technique can be used in many different scenarios. In more complex cases (like when partitioning by a column is required) there may be a need for multiple ranking functions. Here is one example:
Grouping with ROW_NUMBER
http://www.tangrainc.com/blog/2008/03/grouping-with-row_number/
Column Aggregates
/10 Comments/in SQL Server, XML/by Plamen RatchevWe all know how to use the aggregate functions MIN, MAX, COUNT, etc. to calculate aggregates across rows. For example, using MAX we can determine the maximum value for group of rows. But what if we want to see the maximum value for each row across columns? There is no aggregate function that accepts list of columns…
Let’s look at example to illustrate this. Here is sample table with four data columns:
CREATE TABLE Foo (
keycol INT NOT NULL PRIMARY KEY,
col1 INT NOT NULL,
col2 INT NOT NULL,
col3 INT NOT NULL,
col4 INT NOT NULL);
INSERT INTO Foo VALUES(1, 5, 0, 1, 10);
INSERT INTO Foo VALUES(2, 0, 0, 3, 1);
INSERT INTO Foo VALUES(3, 0, 0, 0, 0);
INSERT INTO Foo VALUES(4, 9, 1, 22, 8);
INSERT INTO Foo VALUES(5, 8, 8, 8, 8);
Our goal is to calculate the maximum value for the four data columns for each row. With two or three columns it may be an easy task using a CASE expression. But adding more columns will make a very long and complex CASE expression. Is there an easier way?
One solution that provides a shortcut is utilizing the SQL Server capabilities to generate an XML result and then using XQuery to find the max value. Here is the query:
-- max across columns with XQuery
SELECT c.query('keycol').value('.', 'INT') AS keycol,
c.value('local-name(./*[data(.)=
max(../*[not(local-name(.)="keycol") ])][1])',
'VARCHAR(20)') AS max_col,
c.value('max(*[not(local-name(.)="keycol")])', 'FLOAT') AS max_val
FROM (SELECT keycol, col1, col2, col3, col4
FROM Foo
FOR XML PATH, TYPE) AS T(x)
CROSS APPLY x.nodes('row') AS R(c);
The tricky part here is the filtering of the key column from the evaluation for max. Here are the results:
keycol max_col max_val
------- -------- --------
1 col4 10
2 col3 3
3 col1 0
4 col3 22
5 col1 8
How about counting the number of none zero values across columns? Here is a solution for that using very similar query:
-- count non-zero columns with XQuery
SELECT c.query('keycol').value('.', 'INT') AS keycol,
c.value('count(*[not(local-name(.)="keycol")
and not(.=0)])',
'INT') AS cnt_non_zero
FROM (SELECT keycol, col1, col2, col3, col4
FROM Foo
FOR XML PATH, TYPE) AS T(x)
CROSS APPLY x.nodes('row') AS R(c);
And the results:
keycol cnt_non_zero
------- ------------
1 3
2 2
3 0
4 4
5 4
This shows the power of XQuery to solve this problem in untraditional way. The catch? Look at the execution plan of the queries…
More efficient method to solve the first problem is to use unpivoting. Here is solution using the UNPIVOT operator (the same can be done with ANSI query using cross join with numbers table):
-- max across columns with UNPIVOT
SELECT keycol, col AS max_col, val AS max_val
FROM (
SELECT keycol, val, col,
ROW_NUMBER() OVER(PARTITION BY keycol ORDER BY val DESC, col) AS rk
FROM Foo
UNPIVOT
(val FOR col IN (col1, col2, col3, col4)) AS U) AS T
WHERE rk = 1;
And very similar solution using unpivoting to solve the second problem:
-- count non-zero columns with UNPIVOT
SELECT keycol, COUNT(NULLIF(val, 0)) AS cnt_non_zero
FROM Foo
UNPIVOT
(val FOR col IN (col1, col2, col3, col4)) AS U
GROUP BY keycol;