Here’s a quickstart tutorial in the R programming language. R is useful for working with statistics and producing charts and graphs of that data. R is a case-sensitive interpreted language.
Setup:
First, download and install the R package. There is also plenty of documentation available at this site as well.
Then download and install the R Studio IDE.
If you use Chocolatey both R and R Studio are available as packages:
cinst R.Project
cinst R.Studio
R Studio:
Once we open R Studio we’ll see several windows:
Console – To enter commands and run code
Environment – To view variable values – Includes a Data Import button
History – List of all previous commands
Packages tab – To view installed and available packages (libraries) – Clicking on a library hyperlink will take you to a help file for that package.
Tutorial – Data Import:
We’ll run through an example where we import data and perform some calculations. In this example, I’m importing a text file with the results of the Atlanta Falcons 2013 season. The file is available at:
2013 Falcons Results
In the Console window, run the command:
getwd()
To get the working directory. Copy the file there, or use:
setwd()
To set a new working directory. Use forward slash / as path separator.
To read the data into a Data Frame (Data Table) variable, run:
games = read.delim("2013FalconsResults.txt", header = TRUE, sep = "\t", quote = "", dec = ".", fill = FALSE)
For the ‘Read’ parameters:
header = TRUE – First row contains column names
sep – Delimiter (In this case, tab delimited)
quote – Character used to denote strings
dec – symbol for decimal places
fill – If true, will pad strings to equal length
In the Environment tab, you’re able to expand the ‘games’ variable to see the values. The values are arranged by column, as indicated by the import file headers.
We can also run the command:
print(games)
To write the values to the console window. Or to display only the values from a specified column, we run:
print(games[5])
Which will display the values in the 5th column, the Falcons’ score from each game.
We’ll pick a column that we want to run some calculations on. First, we will select a column and write its values to a Vector variable (one dimensional array of the same data type)
scores = as.vector(as.matrix(games[5]))
Or, using some shortcuts, the same line is:
scores = c(t(games[5]))
We first have to convert the Data Frame into a Matrix and then to a Vector.
We could also create a vector with the c command:
a <- c(1,2,3,4,5)
To create a sequence of 5 numbers. <- is the assignment symbol.
With our scores, we can calculate the highest score:
print(max(scores))
The lowest score:
print(min(scores))
Or the mean score:
print(mean(scores))
Display:
R Studio also has built-in functionality to visualize data.
plot(scores): Show all values from a vector plotted against the index (Week #).
hist(scores): Generated a Histogram.
R Studio allows plots to be saved as Image or PDF
Other Commands:
length(a) – to return # of elements in a
help(a) – to get help on command a
Functions: f <- function(z,y) { return (z-y) } : f(10, 7) – returns 3
data() – to get list of built-in datasets
Other links:
Intro To R – John Cook
R Tutorial
R Project Manuals