Keeping History Data in SQL Server
Very often there is the need to archive data on daily basis. Just had that question today and here is trimmed down solution based on a recent project.
The task is to keep history data on daily basis of changed data only (that is only rows that have changed for the last day). The scenario is a Loan table with loan number and loan amount. If there is change in any of the columns in the table, that change needs to be inserted into a history table. Further, the history table should be able to provide a convenient way to query the latest data (most recent updated row) for reporting purposes.
Here is the code to that solution:
-- Create sample tables.
CREATE TABLE Loans (
loan_nbr INTEGER NOT NULL PRIMARY KEY,
loan_amount DECIMAL(12, 2) DEFAULT 0.0 NOT NULL);
CREATE TABLE LoansHistory (
loan_nbr INTEGER NOT NULL,
loan_amount DECIMAL(12, 2) NOT NULL,
change_date DATETIME
DEFAULT CAST(DATEDIFF(day, 0, CURRENT_TIMESTAMP) AS DATETIME)
NOT NULL,
PRIMARY KEY (loan_nbr, change_date),
archive_date DATETIME NULL,
CHECK (change_date < archive_date));
GO
-- Sample data.
INSERT INTO Loans VALUES (1, 100.00);
INSERT INTO Loans VALUES (2, 150.00);
INSERT INTO Loans VALUES (3, 120.00);
INSERT INTO Loans VALUES (4, 160.00);
INSERT INTO LoansHistory VALUES (1, 100.00, '20070501', NULL);
INSERT INTO LoansHistory VALUES (2, 95.00, '20070503', NULL);
INSERT INTO LoansHistory VALUES (3, 80.00, '20070506', '20070508');
INSERT INTO LoansHistory VALUES (3, 85.00, '20070508', NULL);
GO
-- View to use for latest loans.
CREATE VIEW CurrentLoans (loan_nbr, loan_amount)
AS
SELECT loan_nbr, loan_amount
FROM LoansHistory
WHERE archive_date IS NULL;
GO
-- Insert changed rows.
-- SQL Server 2005.
INSERT INTO LoansHistory (loan_nbr, loan_amount)
SELECT loan_nbr, loan_amount
FROM Loans
EXCEPT
SELECT loan_nbr, loan_amount
FROM CurrentLoans;
-- SQL Server 2000.
INSERT INTO LoansHistory (loan_nbr, loan_amount)
SELECT loan_nbr, loan_amount
FROM ( SELECT 'Live' AS Source,
loan_nbr,
loan_amount
FROM Loans
UNION ALL
SELECT 'History' AS Source,
loan_nbr,
loan_amount
FROM CurrentLoans ) AS L
GROUP BY loan_nbr, loan_amount
HAVING COUNT(*) = 1
AND MIN(Source) = 'Live';
-- Update archive date for old history.
-- Runs in the same transaction as the insert of new history.
UPDATE LoansHistory
SET archive_date = CAST(DATEDIFF(day, 0, CURRENT_TIMESTAMP) AS DATETIME)
WHERE archive_date IS NULL
AND EXISTS (SELECT *
FROM LoansHistory AS H
WHERE H.loan_nbr = LoansHistory.loan_nbr
AND H.change_date > LoansHistory.change_date);
GO
-- Use the view for all reporting purposes.
SELECT loan_nbr, loan_amount
FROM CurrentLoans
ORDER BY loan_nbr;
-- Loans as of 2007-05-08.
SELECT loan_nbr, loan_amount
FROM LoansHistory
WHERE change_date <= '20070508'
AND COALESCE(archive_date, CURRENT_TIMESTAMP) > '20070508';
The process is fairly simple with keeping two dates: change date for when the last change occurred (and that is the date when the row was inserted in history), and archive date which indicates that this row is archived and there is another more recent change. If the archive data is NULL it indicates that this is the most recent change. To simplify the querying process both dates have time portion set to midnight.