set based Archives - Tangra Inc.

Solving a problem very often results in unnecessary complex solutions. One of the first lessons I learned from my math teacher was to scrap any solution that exceeds a page. She would urge me to start all over and look for simpler way to resolve it. In her world there was always a short and simple solution, it was only a matter of seeing it.
I find this rule applicable to any type of problem. Only the dimension of the page size changes according to the subject matter. Many believe that finding the simpler and better solution is to “think outside the box”. But in my opinion it is exactly the opposite – to think inside the box. Know the fundamentals of your area of expertise, systematically apply them, and you will find a simple and elegant solution! Isaac Newton did not just discover the gravity when an apple fell on his head (if at all it did). It took him 20 years of hard work to explain gravity!
In the world of SQL it drills down to deep understanding of the set based nature of SQL and coming up with solution based on that. Thinking like a procedural programmer will not help.

What are the rules to simplicity? There are no rules! I like to use the following quotes as guidelines:

“The simplest way to achieve simplicity is through thoughtful reduction.”
John Maeda, The Laws of Simplicity
“Make everything as simple as possible, but not simpler.”
Albert Einstein

Let’s illustrate this with one example in SQL. In out sample scenario the request is to retrieve a list of customers who always order the exact same product (regardless of what the product is). This is a very valid business problem because you may want to send targeted coupons to customers who always buy the same products.
There are different ways to solve this problem and Listing 1 shows one method. It is very close to describing the solution in plain English: select all customers where the customer has no other orders with different product SKU.

Listing 1

SELECT DISTINCT customer_nbr 
FROM Orders AS O 
WHERE NOT EXISTS(SELECT * 
                 FROM Orders AS O1 
                 WHERE O1.customer_nbr = O.customer_nbr 
                   AND O1.sku <> O.sku);

Is this the simplest way to solve the problem? This query is set based but still in a way mimics procedural thinking – examine all other customer orders and check that there is no other order with different product SKU.
If you think about the set of all customer orders, you will notice that these that we need have repeating attribute values, that is the same product SKU. Applying the MIN and MAX aggregate functions on that attribute will return the same value. Then here is our simplified solution: retrieve all customers that have equal MIN and MAX product SKU on all orders. Listing 2 shows the query.

Listing 2

SELECT customer_nbr 
FROM Orders 
GROUP BY customer_nbr 
HAVING MIN(sku) = MAX(sku);

This is more elegant and simpler solution! Thinking more about the set of all customer orders you will notice that the distinct count of product SKUs is 1 for the customers in the needed result set. That brings us to another elegant solution:

Listing 3

SELECT customer_nbr 
FROM Orders 
GROUP BY customer_nbr 
HAVING COUNT(DISTINCT sku) = 1;

Pretty and simple, right? Try it next time when you see that two page query!

Tag Archive for: set based

The Power of Simplicity

SITE MAP

SEARCH