In an earlier post, I wrote about the Vector data type added to SQL Server 2025. This was just a brief introduction, so I wanted to go back and dig a little deeper. We’ll use a LLM to convert phrases into vectors, store them in SQL Server, and use new functions to search these phrases to return the one most similar.

Vectors:
SQL Server 2025 added a Vector data type. A Vector is an array of floating point values, with the entire array representing a given phrase or body of text, or some other object, like an image. When we declare the data type, we’ll specify the number of dimensions, which is the number of values in the array. At this point, Vector can handle up to 1998 dimensions.

Vector Search:
SQL Server 2025 added the VECTOR_DISTANCE function that allows us to compare vectors to see how similar they are. The function will return a decimal number between 0 and 1. The closer the distance is to 0, the more similar the phrases are.
In the VECTOR_DISTANCE function, the first parameter is for the distance metric that we want to use. There are three choices: Cosine, Euclidean, Dot. From what I’ve seen, cosine is the preferred metric for text comparisons, in most cases. Here’s a good article that explains the distance metrics in more detail.

Embedding:
Embedding is the process of converting text or an object into a vector. For this example, we’ll use OpenAI for embedding. You’ll want to use the same model for all of the vectors, since different models may not return the same values.
We’ll call to the OpenAI API to get our vectors. To use the OpenAI API, you’ll need an API key from them. We’ll use the text-embedding-3-small model, which uses 1536 dimensions.

Example:

To start, we’ll need a master key for the database, if there isn’t already one. You can replace xyz with the password you want to use. This key will protect the credential created in the next step.

CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'xyz';

Next we’ll create a credential for the OpenAI API. Different APIs will require different formats for the secret, but this will work for the model that we’ve chosen. Replace xyz with the API key.

CREATE DATABASE SCOPED CREDENTIAL [https://api.openai.com]
    WITH IDENTITY = 'HTTPEndpointHeaders', 
       secret = '{ "Authorization": "Bearer xyz"}';

Initially, I used a different name for the credential, but for an external API, the credential name needs to match the URL of the API endpoint (Although it doesn’t have to match the entire URL).

The next step is to create a reference to the API endpoint, using CREATE EXTERNAL MODEL.
Location is the URL for the OpenAI embeddings endpoint.
For Model Type, EMBEDDINGS is currently the only allowable value, but certainly others will be added later.

CREATE EXTERNAL MODEL OpenAiApiEmbeddings
AUTHORIZATION dbo
WITH (
    LOCATION = 'https://api.openai.com/v1/embeddings',
    API_FORMAT = 'OpenAI',
    MODEL_TYPE = EMBEDDINGS,
    MODEL = 'text-embedding-3-small',
    CREDENTIAL = [https://api.openai.com]
);

We’ll create a table where we’ll store some phrases, along with the embedding for that phrase.

DROP TABLE IF EXISTS dbo.Embedding;

CREATE TABLE dbo.Embedding(
    ID INT NOT NULL IDENTITY(1,1) PRIMARY KEY,
    Phrase NVARCHAR(MAX) NOT NULL,
    Embedding VECTOR(1536) NOT NULL
);

SELECT * FROM dbo.Embedding;

GO

Next, we’ll generate embeddings. We’ll create a short description for all four teams that play in the NFL’s NFC South division.
Calling the AI_GENERATE_EMBEDDINGS function will return the vectors, which we’ll store in the database. In the function call, we’ll specify to use the model that we created in the previous step.

INSERT INTO dbo.Embedding(Phrase, Embedding)
SELECT t.Phrase, AI_GENERATE_EMBEDDINGS(Phrase USE MODEL OpenAiApiEmbeddings)
FROM (VALUES
('The Atlanta Falcons play at the Mercedes-Benz stadium in Atlanta, Georgia'),
('The Carolina Panthers play at the Bank Of America stadium in Charlotte, North Carolina'),
('The New Orleans Saints play at the Caesars Superdome on New Orleans, Louisiana'),
('The Tampa Bay Buccaneers play at Raymond James stadium in Tampa, Florida')
) AS t(Phrase);

SELECT * FROM dbo.Embedding;

GO

Now we can perform a search. We’ll use AI_GENERATE_EMBEDDINGS to convert our search phrase, and then use the VECTOR_DISTANCE function to compare that search phrase to everything that we’ve inserted into the dbo.Embedding table.

DECLARE @SearchPhrase VARCHAR(MAX) = 'Plays in the state of Georgia';

DECLARE @SearchPhraseEmbedding VECTOR(1536);

SELECT @SearchPhraseEmbedding = AI_GENERATE_EMBEDDINGS(@SearchPhrase USE MODEL OpenAiApiEmbeddings)

SELECT Phrase,
    VECTOR_DISTANCE('cosine', @SearchPhraseEmbedding, Embedding) AS Distance
FROM dbo.Embedding
ORDER BY VECTOR_DISTANCE('cosine', @SearchPhraseEmbedding, Embedding);

GO

Results:

Phrase Distance
The Atlanta Falcons… 0.517087280750275
The Carolina Panthers… 0.672298550605774
The Tampa Bay Buccaneers… 0.700413107872009
The New Orleans Saints… 0.757963299751282

The closest phrase was for the Atlanta Falcons, where the phrase indicates that they play in the US state of Georgia.
Let’s try another search, for ‘Team with a pirate nickname’. Since ‘pirate’ doesn’t appear in any of the phrases, we’ll see how well the comparison works.

DECLARE @SearchPhrase VARCHAR(MAX) = 'Team with a pirate nickname';

DECLARE @SearchPhraseEmbedding VECTOR(1536);

SELECT @SearchPhraseEmbedding = AI_GENERATE_EMBEDDINGS(@SearchPhrase USE MODEL OpenAiApiEmbeddings)

SELECT Phrase,
    VECTOR_DISTANCE('cosine', @SearchPhraseEmbedding, Embedding) AS Distance
FROM dbo.Embedding
ORDER BY VECTOR_DISTANCE('cosine', @SearchPhraseEmbedding, Embedding);

GO

Results:

Phrase Distance
The Tampa Bay Buccaneers… 0.823994636535645
The Carolina Panthers… 0.885807394981384
The New Orleans Saints… 0.919179081916809
The Atlanta Falcons… 0.952559232711792

So the #1 result is the Tampa Bay Buccaneers, which is what we would expect.

Wrap-up:
These searches only had to look through 4 records, so it was fairly quick. We can create a vector index to help with searching larger number of records. However, VECTOR_DISTANCE won’t make use of the vector index. In this case, we can use the VECTOR_SEARCH function.

I’ve posted a script with all of the SQL statements from this post up to GitHub.

Links:

Redgate: AI in SQL Server 2025: Embeddings

Microsoft: Additional options with CREATE EXTERNAL MODEL