SQL Server – Gaps and Islands

Something that comes up with working with data is wanting to find gaps in values, as well as groupings of consecutive values. This is often referred to as the Gaps and Islands problem. Redgate has this article to fully explain.
This post will look at gaps and islands with consecutive integer ID values in a table, but these techniques can apply to other values, like consecutive dates.

Setup:
We can create a temp table and populate it with some test data. We’ll be dealing with the ID values. We’ve entered values from 1 through 10, but we left out 3, 6, and 7.

DROP TABLE IF EXISTS #Results;

CREATE TABLE #Results(
	ID INT NOT NULL PRIMARY KEY,
	[Name] VARCHAR(20) NOT NULL,
	CreatedDate DATETIME NOT NULL DEFAULT(GETDATE())
);

INSERT INTO #Results (ID, [Name])
VALUES
(1, 'First Record'),
(2, 'Second Record'),
(4, 'Third Record'),
(5, 'Fourth Record'),
(8, 'Fifth Record'),
(9, 'Sixth Record'),
(10, 'Seventh Record');

GO

SELECT * FROM #Results;

Gaps:
Now we’ll look for for gaps in the ID values. A gap would be a value in between two consecutive values that are greater than 1 apart.
Using the LEAD function, we can compare the next value in sequence with any one ID value. A gap would exist if we can’t add 1 to an ID value and find that in the table.

WITH cte AS (
	SELECT Id, LEAD (ID, 1) OVER (ORDER BY ID) AS NextValue
	FROM #Results
)
SELECT ID AS GapStart, NextValue AS GapEnd
FROM cte
WHERE (ID + 1) <> NextValue
ORDER BY ID;

We’ll get two rows back. One for values 2 and 4, where the record ID = 3 is missing, and another row for 5 and 8, since the ID values of 6 and 7 are missing.

— Find missing values
If we want a list of only the missing values, we can take a different approach. Here, the CTE will build a list of all of the integer values from the smallest ID value to the largest value. Then it’s a left join to #Results, to see what values are missing.

DECLARE @MaxId INT = (SELECT MAX(ID) FROM #Results);

WITH cte AS (
	SELECT MIN(ID) AS ID
	FROM #Results
	UNION ALL
	SELECT ID + 1 AS ID
	FROM cte
	WHERE ID < @MaxID
)
SELECT c.ID AS MissingId
FROM cte AS c
LEFT JOIN #Results AS r
	ON r.ID = c.ID
WHERE r.ID IS NULL
ORDER BY c.ID;

Alternatively, you may already have a numbers table, that has a list of numbers. If you are on SQL Server 2022 or later, you can also use the GENERATE_SERIES function to create the list of numbers.

DECLARE @MinId INT = (SELECT MIN(ID) FROM #Results);
DECLARE @MaxId INT = (SELECT MAX(ID) FROM #Results);

SELECT value
FROM GENERATE_SERIES(@MinId, @MaxId);

Islands:
We may also be interested in Islands, which would be a string of consecutive values without a break.
I ran across this mssqltips article which had a clever solution.
We can use the RANK function to assign consecutive numbers to each row. If we subtract the rank from the ID value, then the islands will all have the same value.

SELECT Id, 
    RANK () OVER (ORDER BY ID) AS RankValue,
    (ID - (RANK () OVER (ORDER BY ID))) AS IdRankDiff
FROM #Results;

Which returns these values:

Id	RankValue	IdRankDiff
1	1	0
2	2	0
4	3	1
5	4	1
8	5	3
9	6	3
10	7	3

We can group by the rank difference, and then find the start and end value for each island:

WITH cte AS (
	SELECT Id, 
		RANK () OVER (ORDER BY ID) AS RankValue,
		(ID - (RANK () OVER (ORDER BY ID))) AS IdRankDiff
	FROM #Results
)
SELECT MIN(ID) AS StartId, MAX(ID) AS EndId
FROM cte
GROUP BY IdRankDiff
ORDER BY MIN(ID);

StartId	EndId
1	2
4	5
8	10

SQL Rob

SQL Server and other database technologies

SQL Server – Gaps and Islands

Leave a comment Cancel reply

Share this:

Related

Leave a comment Cancel reply