Grouping with ROW_NUMBER
The new ranking functions in SQL Server 2005 have great impact on how problems can be solved. Here is just another example of that.
Given a table with events at different venues (all event dates are unique), find the duration intervals of start/end date of uninterrupted performances at a given venue. Solving this without the use of ROW_NUMBER would require a lot more complex SQL, not to mention that this solution has excellent performance. Running a test with 10 years of random sample data completes on average 15 milliseconds.
-- Create the sample events table
CREATE TABLE Events
(event_date DATETIME NOT NULL PRIMARY KEY,
event_venue VARCHAR(20) NOT NULL);
-- Insert venue event dates
INSERT INTO Events VALUES ('20080101', 'The Palace');
INSERT INTO Events VALUES ('20080201', 'The Palace');
INSERT INTO Events VALUES ('20080301', 'The Palace');
INSERT INTO Events VALUES ('20080401', 'The Palace');
INSERT INTO Events VALUES ('20080501', 'The Palace');
INSERT INTO Events VALUES ('20080601', 'Fox Theater');
INSERT INTO Events VALUES ('20080701', 'Fox Theater');
INSERT INTO Events VALUES ('20080801', 'Grand Hall');
INSERT INTO Events VALUES ('20080901', 'Grand Hall');
INSERT INTO Events VALUES ('20081001', 'Grand Hall');
INSERT INTO Events VALUES ('20081101', 'The Palace');
INSERT INTO Events VALUES ('20071201', 'The River Place');
INSERT INTO Events VALUES ('20081202', 'The River Place');
-- Group the event period dates at each venue
SELECT MIN(event_venue) AS venue,
MIN(event_date) AS venue_start_date,
MAX(event_date) AS venue_end_date
FROM (
SELECT event_venue , event_date ,
ROW_NUMBER() OVER (ORDER BY event_venue, event_date) -
ROW_NUMBER() OVER (PARTITION BY event_venue
ORDER BY event_date),
ROW_NUMBER() OVER (ORDER BY event_date) -
ROW_NUMBER() OVER (PARTITION BY event_venue
ORDER BY event_date)
FROM Events) AS X(event_venue, event_date, grp1, grp2)
GROUP BY grp1, grp2;
/*
-- Results
venue venue_start_date venue_end_date
-------------------- ----------------- --------------
The River Place 2007-12-01 2007-12-01
The Palace 2008-01-01 2008-05-01
Fox Theater 2008-06-01 2008-07-01
The Palace 2008-11-01 2008-11-01
Grand Hall 2008-08-01 2008-10-01
The River Place 2008-12-02 2008-12-02
*/
This solution does not work for the first record. In other words, change your input data to be
— Create the sample events table
CREATE TABLE Events
(event_date DATETIME NOT NULL PRIMARY KEY,
event_venue VARCHAR(20) NOT NULL);
— Insert venue event dates
INSERT INTO Events VALUES ('20080101', 'The Palace');
INSERT INTO Events VALUES ('20080401', 'The Palace');
INSERT INTO Events VALUES ('20080501', 'The Palace');
INSERT INTO Events VALUES ('20080601', 'Fox Theater');
INSERT INTO Events VALUES ('20080701', 'Fox Theater');
INSERT INTO Events VALUES ('20080801', 'Grand Hall');
INSERT INTO Events VALUES ('20080901', 'Grand Hall');
INSERT INTO Events VALUES ('20081001', 'Grand Hall');
INSERT INTO Events VALUES ('20081101', 'The Palace');
INSERT INTO Events VALUES ('20071201', 'The River Place');
INSERT INTO Events VALUES ('20081202', 'The River Place');
— Group the event period dates at each venue
SELECT MIN(event_venue) AS venue,
MIN(event_date) AS venue_start_date,
MAX(event_date) AS venue_end_date
FROM (
SELECT event_venue , event_date ,
ROW_NUMBER() OVER (ORDER BY event_venue, event_date) –
ROW_NUMBER() OVER (PARTITION BY event_venue
ORDER BY event_date),
ROW_NUMBER() OVER (ORDER BY event_date) –
ROW_NUMBER() OVER (PARTITION BY event_venue
ORDER BY event_date)
FROM Events) AS X(event_venue, event_date, grp1, grp2)
GROUP BY grp1, grp2;
/
— Results
venue venue_start_date venue_end_date
——————– —————– ————–
The River Place 2007-12-01 2007-12-01
The Palace 2008-01-01 2008-05-01
Fox Theater 2008-06-01 2008-07-01
The Palace 2008-11-01 2008-11-01
Grand Hall 2008-08-01 2008-10-01
The River Place 2008-12-02 2008-12-02
/
You can see that now the result is wrong for the Palace
Hi Naomi, the solution works exactly as it is supposed to work, even with your input. The goal here is to find uninterrupted performance dates, which means the performances at one venue without having performance at another venue in between. It is not to find performances without gaps of dates. In your input all is correct for The Palace because there is no performance at another venue between the dates of 20080101 and 20080401. To test this insert another row:
INSERT INTO Events VALUES ('20080201', 'Another Venue');
Then the result looks like this:
venue venue_start_date venue_end_date
——————– ———————– ———————–
The River Place 2007-12-01 00:00:00.000 2007-12-01 00:00:00.000
The Palace 2008-01-01 00:00:00.000 2008-01-01 00:00:00.000
Another Venue 2008-02-01 00:00:00.000 2008-02-01 00:00:00.000
The Palace 2008-04-01 00:00:00.000 2008-05-01 00:00:00.000
Fox Theater 2008-06-01 00:00:00.000 2008-07-01 00:00:00.000
Grand Hall 2008-08-01 00:00:00.000 2008-10-01 00:00:00.000
The Palace 2008-11-01 00:00:00.000 2008-11-01 00:00:00.000
The River Place 2008-12-02 00:00:00.000 2008-12-02 00:00:00.000