One of the new feature in SQL Server 2017 (I’m using RC2) is the addition of graph database capabilities.
What is a Graph Database?
The Neo4j site has a good basic explanation on what a graph database is. Neo4j is the leading Graph database system available. The main concepts are nodes and edges. A node represents an entity and can have properties stored along with it. Edges represent relationships between nodes, and edges can also have properties as well.
SQL Server implements Nodes and Edges as relational tables. As an example, I’ll create a Player table to store data on players for a football team. The only thing new here is the ‘As Node’ statement at the end of the table creation.
drop table if exists dbo.Player; create table dbo.Player( Team char(3) not null, Number tinyint not null, FirstName varchar(50), LastName varchar(50), Position char(2), primary key(Team, Number) ) as Node; insert into dbo.Player values ('ATL', 2, 'Matt', 'Ryan', 'QB'); insert into dbo.Player values ('ATL', 11, 'Julio', 'Jones', 'WR'); insert into dbo.Player values ('ATL', 21, 'Desmond', 'Trufant', 'CB'); insert into dbo.Player values ('ATL', 44, 'Vic', 'Beasley', 'LB'); select * from dbo.Player;
When we selection from this Node table, notice that a ‘Node Id’ column has been added. This stores JSON data with some information on each record, including a numeric Node Id for each record.
Edges are created the same way as a Node, with a ‘As Edge’ statement.
drop table if exists dbo.PlatoonsWith; create table dbo.PlatoonsWith( PlatoonName varchar(10) not null ) as Edge; select * from dbo.PlatoonsWith;
As an Edge table, there are attributes added to the table when created. There is the Edge ID, From Id and To Id. The From and To hold the node IDs that are being related.
Once the tables have been created, they can be viewed in SSMS under ‘Tables’, under a ‘Graph Tables’ directory that is new for SQL Server 2017.
We can create relationships between nodes by inserting records into the Edge table. We’ll insert the IDs for the two Nodes that we want to relate, plus any Edge properties. In this example, we’ll relate the two offensive players (Ryan and Jones).
declare @Node1 nvarchar(1000); declare @Node2 nvarchar(1000); select @Node1 = [$node_id_4FE94CB5420E440DA5CE8FE7781FA404] from dbo.Player where Team = 'ATL' and Number = 2; select @Node2 = [$node_id_4FE94CB5420E440DA5CE8FE7781FA404] from dbo.Player where Team = 'ATL' and Number = 11; insert into dbo.PlatoonsWith([$from_id_64FB3EBE4E3449D5A59D79FB0F8FA857], [$to_id_6A42816479484605BCD86836DFE8122B], PlatoonName) values (@Node1, @Node2, 'Offense'); select * from dbo.PlatoonsWith;
Querying Graph Data:
The MATCH keyword was added to T-SQL to aid in querying graph data. The Node table can be joined to the Edge table without the usual JOIN..ON clause, instead using the old style Table1, Table2 syntax.
For an example, we’ll find the offensive player joined to Matt Ryan.
select p1.FirstName + ' ' + p1.LastName as Player1, p1.Position as Player1Position, pw.PlatoonName, p2.FirstName + ' ' + p2.LastName as Player2, p2.Position as Player2Position from dbo.Player as p1, dbo.PlatoonsWith as pw, dbo.Player as p2 where match(p1-(pw)->p2) and p1.Number = 2 and p2.Team = 'ATL';
There isn’t any sort of protection against deleting nodes. Even if an Edge record refers to a Node, you’re still allowed to delete that Node. The Edge record with the orphaned Node will be left in place.
A script with all of the SQL from this post is posted on GitHub.
The Graph functions are definitely useful, but hopefully some improvements will be made. I’m not sure why the auto-generated columns include the extra characters appended to the column names, that does make it a little harder to query, since you have to go retrieve the column name after the table is created. It would also be useful if there were some sort of visualization tool to view the Nodes and their relations.