In a previous post I covered setting up SQL Server for R language integration. Now that we are set up, we can put this functionality to use.
Simple R Function Call:
First we’ll get the average from a list of numbers retrieved from SQL Server using one of R’s built-in functions. First we’ll make a list of numbers.
drop table if exists dbo.RTest; go create table dbo.RTest ( Measure tinyint not null ); go insert into dbo.RTest(Measure) values (1); insert into dbo.RTest(Measure) values (3); insert into dbo.RTest(Measure) values (4); insert into dbo.RTest(Measure) values (7); insert into dbo.RTest(Measure) values (8); insert into dbo.RTest(Measure) values (10); go 10000 select count(*) as RecordCount from dbo.RTest; go
This will give us a list of 60,000 numbers. Now it is easy enough to find the average of all of these numbers in T-SQL:
select avg(Measure) from dbo.RTest;
So a few things about R before we run some R code. R is case sensitive. We use When results are returned from R to SQL Server, they will need to be as a data frame.
Let’s use use R to find the average of the ‘Measure’ values in our test table. Here is the stored procedure call:
EXEC sp_execute_external_script @language=N'R', @script=N'OutputDataSet <-data.frame(mean(InputDataSet$Measure))', @input_data_1=N'SELECT Measure FROM dbo.RTest' WITH RESULT SETS (([Average] tinyint NULL));
The ‘sp_execute_external_script’ stored procedure is used to execute R code. In the @language parameter we specify ‘R’ since we want to run R code. In the @input_data_1 parameter, we specify the data set that we want to pass in, in this case we select all of the records from our 1 column table. The @script parameter is where we place the R code. Any data returned by the code in the @input_data_1 parameter is visible in our R code as ‘InputDataSet’ (We can use a different name by specifying in a different parameter, but we’ll go with the defaults to keep it simple).
Even though our input data set has only one attribute (Measure), it should be specified, InputDataSet$Measure. We use the R function mean to find the mean value of all of the Measure values. So even though it is a scalar value being returned, it needs to be converted to a data frame, which is assigned to the default output variable OutputDataSet.
For the output, we use RESULT SETS to specify the name, data type and nullability of each attribute in the data set. In this case, we get the value 5 as ‘Average’.
Installing R Packages:
One of the powerful things about R is all of the external libraries of functionality that are freely available to use. If we want to explore machine learning or advanced statistical functions, we’ll need to find and install these packages.
RStudio Support has a list of the most popular R packages.
Once we find a package that we’d like to use, we’ll need to download and install it. We’ll install stringr, which has regular expression and other string manipulation functionality.
Running install.packages in the sp_execute_external_script proc didn’t work for me, I got a message that the package wasn’t available.
SQL Server Central – Installing R packages in SQL Server R Services gives some other options to install packages.
I ended up using the R command line option to install stringr. Since the package was dependent on two other packages, they were retrieved and installed as well.
Other Links:
MSDN – Using R Code in Transact-SQL
Simple Talk