A composite index is when we index two or more columns in a table. This will help to retrieve data quicker when we’re filtering on multiple attributes.
I always had it in my mind that the order of the columns in your query matters, but I’ve had to revisit that assumption.
Here’s a quick example. I’m using SQL Server 2022.
Setup:
CREATE TABLE dbo.IndexTest (
Id int NOT NULL IDENTITY(1,1),
FirstName varchar(30) NOT NULL,
MiddleName varchar(30) NULL,
LastName varchar(30) NOT NULL
);
INSERT INTO dbo.IndexTest(FirstName, MiddleName, LastName)
VALUES ('John', 'A', 'Doe'),
('Jane', 'Z', 'Doe'),
('Mike', NULL, 'Smith'),
('Susan', 'C', 'Jones'),
('Charles', 'G', 'Washington'),
('Mary', NULL, 'Johnson')
GO 10000
We’ll create a table to store names. We’ll insert some records for us to query. If you haven’t seen the ‘GO 10000’ syntax before, this will repeat the batch 10,000 times.
We’ll set SSMS to return the actual execution plan, then select all of the records from the table for John Doe:
SELECT * FROM dbo.IndexTest WHERE FirstName = 'John' AND LastName = 'Doe';
We see the query had to use a table scan to return our records. We would like to speed that up, so we’ll create a composite index. We’ll use first and last name, since we think those are the parameters we’ll be using most often.
We’ll add middle name as an included column, for reasons that we’ll see later.
CREATE INDEX IX_IndexTest ON dbo.IndexTest(FirstName, LastName) INCLUDE (MiddleName);
We’ll re-run the original SELECT:
SELECT * FROM dbo.IndexTest WHERE FirstName = 'John' AND LastName = 'Doe';
But we still get a table scan. The index helps to find the records that we want, but it still has to go back to the table to get the ID value. Because we added middle name as an included value, the index won’t be used to filter on middle name, but it will hold the middle name value for each record, so that we don’t have to go back to the table to get the value.
We’ll run the same query, but only return first, middle, and last name, without returning the ID.
SELECT FirstName, MiddleName, LastName FROM dbo.IndexTest WHERE FirstName = 'John' AND LastName = 'Doe';
Now we see an Index Seek being used. If we run this query in the same batch as the SELECT *, we’ll see that the index scan is much more effective (85% to 15% when I ran both).
Let’s run a query where we only filter on the last name:
SELECT FirstName, MiddleName, LastName FROM dbo.IndexTest WHERE LastName = 'Smith';
I was surprised that our index was still used, but it’s an Index Scan instead of the Seek we got before.
The order of the columns that we specify in the index matter. Since I used FirstName then LastName, the data in the index is ordered by first then last names. But the optimizer has determined that it is more efficient to scan the index rather than scan the table.
So we know that order matters. We’ll try specifying last name then first name in our WHERE clause, in that order:
SELECT FirstName, MiddleName, LastName FROM dbo.IndexTest WHERE LastName = 'Doe' AND FirstName = 'John';
We still get an Index Seek, so the optimizer is smart enough to use our index, even if the order of the WHERE attributes is different from the index order.