UPSERT and More with MERGE
One very common requirement is to merge source data to a target table. For example, merge the daily changes to accounts data from branch office table to central accounts table. On SQL Server version 2005 and prior this was accomplished using separate INSERT and UPDATE statements. This involved checking if key column exists to perform update and insert if not, or attempt an update first and then if not rows were affected perform insert. Not to mention if we have to check if account is missing from the branch office data and needs to be deleted from the central accounts table. That way we end up with multiple (sometimes complex) statements to implement one transaction, accessing both source and target tables multiple times.
SQL Server 2008 offers a lot more elegant way using the MERGE statement (MERGE is supported by ANSI SQL). It makes data merging very simple and elegant, as well as efficient.
Here is the base syntax for MERGE:
[ WITH <common_table_expression> [,...n] ]
MERGE
[ TOP ( expression ) [ PERCENT ] ]
[ INTO ] target_table [ [ AS ] table_alias ]
[ WITH ( <merge_hint> ) ]
USING <table_source>
ON <search_condition>
[ WHEN MATCHED [ AND <search_condition> ]
THEN <merge_matched> ]
[ WHEN [TARGET] NOT MATCHED [ AND <search_condition> ]
THEN <merge_not_matched> ]
[ WHEN SOURCE NOT MATCHED [ AND <search_condition> ]
THEN <merge_matched> ]
<output_clause>
[ OPTION ( <query_hint> [ ,...n ] ) ]
;
Below is the initial setup for the source and target tables.
-- Static table as target for merging data
CREATE TABLE CentralOfficeAccounts (
account_nbr INT PRIMARY KEY,
company_name VARCHAR(35),
primary_contact VARCHAR(35),
contact_phone VARCHAR(12));
-- Dynamic table with daily updates to be merged
CREATE TABLE BranchOfficeAccounts (
account_nbr INT PRIMARY KEY,
company_name VARCHAR(35),
primary_contact VARCHAR(35),
contact_phone VARCHAR(12));
-- Sample centrall office static data
INSERT INTO CentralOfficeAccounts
VALUES (1, 'Bayside Motors', 'Jim Smith', '902-203-1234'),
(2, 'Dallas Industries', 'Joe Doe', '301-663-9134'),
(3, 'Sky Computer Systems', 'Jane Brown', '201-943-6053');
-- Daily updated branch office data
INSERT INTO BranchOfficeAccounts
VALUES (2, 'Dallas Industries, Inc.', 'Rick Gross', '301-663-9134'), -- changed
(3, 'Sky Computer Systems', 'Jane Brown', '201-943-6053'), -- same
(4, 'Auto Insurance Co.', 'Chris Jefferson', '313-601-6201'); -- new
Here is how MERGE can be used to perform an update on existing accounts and insert of new accounts.
-- Update existing and add missing
MERGE INTO CentralOfficeAccounts AS C -- Target
USING BranchOfficeAccounts AS B -- Source
ON C.account_nbr = B.account_nbr
WHEN MATCHED THEN -- On match update
UPDATE SET C.company_name = B.company_name,
C.primary_contact = B.primary_contact,
C.contact_phone = B.contact_phone
WHEN NOT MATCHED THEN -- Add missing
INSERT (account_nbr, company_name, primary_contact, contact_phone)
VALUES (B.account_nbr, B.company_name, B.primary_contact, B.contact_phone);
SELECT account_nbr, company_name, primary_contact, contact_phone
FROM CentralOfficeAccounts;
/*
-- Results:
account_nbr company_name primary_contact contact_phone
----------- ------------------------ ---------------- -------------
1 Bayside Motors Jim Smith 902-203-1234
2 Dallas Industries, Inc. Rick Gross 301-663-9134
3 Sky Computer Systems Jane Brown 201-943-6053
4 Auto Insurance Co. Chris Jefferson 313-601-6201
*/
Next step is to enhance the statement including predicates to check and update only accounts that have changed. The only difference here is the additional conditions in the MATCHED clause, results will be the same as before.
-- Update existing that changed and add missing
-- Use of predicates
MERGE INTO CentralOfficeAccounts AS C -- Target
USING BranchOfficeAccounts AS B -- Source
ON C.account_nbr = B.account_nbr
WHEN MATCHED -- On match update
AND (C.company_name <> B.company_name -- Additional search conditions
OR C.primary_contact <> B.primary_contact
OR C.contact_phone <> B.contact_phone) THEN
UPDATE SET C.company_name = B.company_name,
C.primary_contact = B.primary_contact,
C.contact_phone = B.contact_phone
WHEN NOT MATCHED THEN -- Add missing
INSERT (account_nbr, company_name, primary_contact, contact_phone)
VALUES (B.account_nbr, B.company_name, B.primary_contact, B.contact_phone);
Going one step further, let’s add the option to delete accounts that do not exist in the source table from the target table.
--Update existing that changed and add missing, delete missing in source
MERGE INTO CentralOfficeAccounts AS C -- Target
USING BranchOfficeAccounts AS B -- Source
ON C.account_nbr = B.account_nbr
WHEN MATCHED -- On match update
AND (C.company_name <> B.company_name -- Additional search conditions
OR C.primary_contact <> B.primary_contact
OR C.contact_phone <> B.contact_phone) THEN
UPDATE SET C.company_name = B.company_name,
C.primary_contact = B.primary_contact,
C.contact_phone = B.contact_phone
WHEN NOT MATCHED THEN -- Add missing
INSERT (account_nbr, company_name, primary_contact, contact_phone)
VALUES (B.account_nbr, B.company_name, B.primary_contact, B.contact_phone)
WHEN SOURCE NOT MATCHED THEN -- Delete missing from source
DELETE;
SELECT account_nbr, company_name, primary_contact, contact_phone
FROM CentralOfficeAccounts;
/*
-- Results:
account_nbr company_name primary_contact contact_phone
----------- ------------------------ ---------------- -------------
2 Dallas Industries, Inc. Rick Gross 301-663-9134
3 Sky Computer Systems Jane Brown 201-943-6053
4 Auto Insurance Co. Chris Jefferson 313-601-6201
*/
There is a lot more to MERGE, but this just shows the tremendous power it provides for merging data.
Note:
This code has been tested with SQL Server 2008 CTP 6 (February 2008). As of SQL Server 2008 Release Candidate 0 (June 2008) the clause “WHEN SOURCE NOT MATCHED” has been replaced with “WHEN NOT MATCHED BY SOURCE”. This makes the wording clearer. It is good to note that this clause of MERGE is not standard.
References:
MERGE (Transact-SQL)
http://technet.microsoft.com/en-us/library/bb510625(SQL.100).aspx