Predicates in SQL
One of the most confusing topics for developers coming from traditional languages like C, C++, Java, C#, and VB to SQL, is the evaluation of predicates. In most languages predicate logic is evaluated left to right. SQL is a different story. It follows the concept of all-at-once operations. This means all logical predicates will be evaluated at the same time. In practice, the SQL engine is free to process the logical expressions in any order it finds appropriate and cost effective to retrieve the data.
Below is example to illustrate this behavior. Given a table with account policies, where a policy may be represented only as numeric values for certain accounts, and a mix of alpha-numeric characters for other accounts. Knowing that account 2 has policies that can be represented only as numeric values, the goal is to retrieve policies that are greater than 50000.
An attempt to add a first predicate to filter on account 2 and then a second one to select policies greater than 50000 will fail. Even if the query is formed with a derived table to extract first all policies for account 2 and then the outer query filters on the policy, it will not work. The derived table is expanded in the main query plan and a single query plan is produced. Then the query engine is free to push up and down the predicates in the plan as it finds efficient.
CREATE TABLE AccountPolicies (
account_nbr INT,
policy VARCHAR(20),
PRIMARY KEY (account_nbr, policy));
INSERT INTO AccountPolicies VALUES(1, 'P1000234');
INSERT INTO AccountPolicies VALUES(1, 'P1020256');
INSERT INTO AccountPolicies VALUES(2, '1001');
INSERT INTO AccountPolicies VALUES(2, '5002');
INSERT INTO AccountPolicies VALUES(2, '50001');
INSERT INTO AccountPolicies VALUES(2, '50005');
INSERT INTO AccountPolicies VALUES(2, 'P50005');
-- Second predicate may be evaluated first
-- resulting in conversion error
SELECT account_nbr, policy
FROM AccountPolicies
WHERE account_nbr = 2
AND CAST(policy AS INT) > 50000;
-- The derived table is expended in the
-- query plan and predicates can be pushed
-- up or down in the plan
SELECT account_nbr, policy
FROM (SELECT account_nbr, policy
FROM AccountPolicies
WHERE account_nbr = 2) AS P
WHERE CAST(policy AS INT) > 50000;
One way to solve this is to use a CASE expression to force the desired evaluation because WHEN clauses in CASE are evaluated in order. This solution not only demonstrates how to handle the evaluation process, but adds verification that only policies with numeric values are casted (as usual business rules change and later account 2 may be allowed to create policies with alpha-numeric characters).
-- Enforce sequence of evaluating conditions
-- and cast only valid values
SELECT account_nbr, policy
FROM AccountPolicies
WHERE CASE
WHEN account_nbr = 2
THEN CASE
WHEN policy NOT LIKE '%[^0-9]%'
THEN CASE
WHEN CAST(policy AS INT) > 50000
THEN 'True'
ELSE 'False'
END
ELSE 'False'
END
ELSE 'False'
END = 'True';
Is the following code incorrect
;WITH CTETest AS
(
SELECT account_nbr, policy FROM AccountPolicies WHERE account_nbr = 2
)
SELECT account_nbr, policy
FROM CTETest
WHERE CAST(policy AS INT) > 50000;
Yes, this code the same problem as the derived table example. Here in a similar manner the CTE is expanded in the main query plan and a single query plan is produced. Then the optimizer may chose to evaluate first the predicate CAST(policy AS INT) > 50000 which will result in error.
Sir,
The only reason why you are getting error is because of this –> INSERT INTO AccountPolicies VALUES(2, 'P50005');
The account_nbr=2 but policy is alphnumeric. Explicit cast of alphanumeric to numeric is less forgiving than the implicit cast that sql engine performs.
Also, try using both the queries without the explicit Cast(policy as int) function as sql has an implicit cast function.
On broader perspective, there is a logic to the way sql engine processes predicates. This logic is determinate logic. Without this data retrieval would be a hell-of-a-task.
Cheers!
Vijay Chand
Hello Vijay,
The INSERT statement is correct. This is the whole point, to illustrate how the SQL Server engine is treating predicates.
Hi,
>>
WHEN clauses in CASE are evaluated in order
<<
Is IF statement same with CASE for evaluating? Or optimizer make a decision for trying which Boolean statement first evaluate (in terms of cost-based)
IF is a control of flow statement and not an expression like CASE. As such it cannot be used in queries and it is not part of the execution plan, so evaluation is different.
Wouldn't it be more readable to make it "WHEN policy LIKE '%[0-9]%'" rather than "WHEN policy NOT LIKE '%[^0-9]%'"?
Kevin,
The double negative guarantees that only numeric characters are allowed (in a way saying the value is not like not a number). If you use the LIKE predicate as you posted then it allows non numeric characters.
SELECT account_nbr, policy FROM AccountPolicies
WHERE account_nbr = 2
and ISNUMERIC(policy) = 1
AND policy > 50000;