I’ve been reading lately on relational database theory, trying to get a better grasp on the basics. To start, I was going to summarize the relational algebra operators, which were outlined in E.F. Codd’s 1970 paper ‘A Relational Model of Data for Large Shared Data Banks’, which presented a lot of the concepts that serve as the cornerstone for the creation of relational databases.
These eight operators were laid out as the ways to work with data in a relation(a table). I also give an example in T-SQL. I have a script available on GitHub that creates some tables with sample data for these examples. It also includes the SQL from this post, so that anyone can run through the examples and see the results.
Operators:
1) Selection(Restrict) – Subset of rows from a table based on supplied criteria.
select * from dbo.Team where TeamCode = 'ATL';
2) Projection – All rows with a subset of attributes (columns).
select PlaceName, NickName from dbo.Team;
3) Join – Combine two tables that share a common attribute (column) or set of attributes.
select t.NickName, t.City, t.StateCode, s.StateName from dbo.Team as t inner join dbo.[State] as s on s.StateCode = t.StateCode;
4) Intersect – All rows common to two tables where the tables have the same attributes.
select StadiumName, City, StateCode from dbo.Team intersect select StadiumName, City, StateCode from dbo.SuperBowlHost;
5) Union – All rows from two tables combined with one another, where the tables have the same attributes. Union in T-SQL will return rows with distinct values, no duplicates.
select StadiumName, City, StateCode from dbo.Team union select StadiumName, City, StateCode from dbo.SuperBowlHost;
6) Difference(Except) – Elements from one table not in the other table.
select StadiumName, City, StateCode from dbo.Team except select StadiumName, City, StateCode from dbo.SuperBowlHost;
7) Product – Cross (or Cartesian) join – All rows from one table matched once with each row in a second table.
select t.NickName, t.City, t.StateCode, s.StateCode, s.StateName from dbo.Team as t cross join dbo.[State] as s;
8) Divide:
This was the most challenging operator to understand. I saw one post explain it as the opposite of Product. Division works with two sets with a common attribute (column), with one set (Call it set A) containing only the one attribute and the other (Set B) with two attributes. Division will return the one attribute from the B set that isn’t common to the two sets.
The example I used is for a table Team of football teams, and a table Player with attributes Player Name and Team. Division will return distinct Player Names for all people who played for every team in the Team set (In this case, we limit the set to Atlanta and New Orleans).
There isn’t a Division operator defined in T-SQL, so we have to write the operation using some of the other operations. Of course, there are several methods that can be used to get the same result.
So this query performs a Product (Cross Join) with all of the Players with the specified teams. When then use Difference to find all of the current Player records that don’t match this Cross Join (The players that haven’t played for both teams) and then another difference to find the players not in that set, which are the players that have played for both teams.
select distinct PlayerName from dbo.Player except select PlayerName from ( select PlayerName, TeamCode from (select PlayerName from dbo.Player) as t1, (select TeamCode from dbo.Team where TeamCode in ('ATL', 'NOR')) as t2 except select PlayerName, TeamCode from dbo.Player ) as t3;
On Simple Talk, Joe Celko has a more in-depth explanation of Divide.
Also, a slide deck from Lester McCann that gives a good explanation as well.