A brief introduction to R

Overview

Teaching: 40 min
Exercises: 0 min
Questions
  • What is R and when should I use it?

  • How do I use the RStudio graphical user interface?

  • How do I assign variables?

  • How can I create a function?

Objectives
  • Get introduced to R and RStudio.

  • Learn basic R syntax.

  • Assign values to variables.

  • Be able to install a package from CRAN.

  • Be able to define a simple function.

What is R?

R is a statistical computing environment that has become a widely adopted standard for data analysis various fields - notably bioinformatics.

Consider using R to get access to packages that implement solution to a given problems like

In this one-day course we will learn

RStudio - a graphical interface to R

While R can be used directly in the shell, it is much nicer with a graphical interface. RStudio is by far the best one - let’s learn how to use it.

Interacting with R

There are two main ways of interacting with R: using the console or by using script files (plain text files that contain your code).

The console window (in RStudio, the bottom left panel) is the place where R is waiting for you to tell it what to do, and where it will show the results of a command. You can type commands directly into the console, but they will be forgotten when you close the session. It is better to enter the commands in the script editor, and save the script. This way, you have a complete record of what you did, you can easily show others how you did it and you can do it again later on if needed. You can copy-paste into the R console, but the Rstudio script editor allows you to ‘send’ the current line or the currently selected text to the R console using the Ctrl-Enter shortcut.

At some point in your analysis you may want to check the content of variable or the structure of an object, without necessarily keep a record of it in your script. You can type these commands directly in the console. RStudio provides the Ctrl-1 and Ctrl-2 shortcuts allow you to jump between the script and the console windows.

If R is ready to accept commands, the R console shows a > prompt. If it receives a command (by typing, copy-pasting or sent from the script editor using Ctrl-Enter), R will try to execute it, and when ready, show the results and come back with a new >-prompt to wait for new commands.

If R is still waiting for you to enter more data because it isn’t complete yet, the console will show a + prompt. It means that you haven’t finished entering a complete command. This is because you have not ‘closed’ a parenthesis or quotation. If you’re in RStudio and this happens, click inside the console window and press Esc; this should help you out of trouble.

Basic R syntax and loading a package

Just like with python, we can perform simple operations using the R console and assign the output to variables

1 + 1
[1] 2

<- is the assignment operator. It assigns values on the right to objects on the left. So, after executing x <- 3, the value of x is 3. The arrow can be read as 3 goes into x. You can also use = for assignments but not in all contexts so it is good practice to use <- for assignments. = should only be used to specify the values of arguments in functions, see below.

Save some keystrokes..

In RStudio, typing Alt + - (push Alt, the key next to your space bar at the same time as the - key) will write ` <- ` in a single keystroke.

foo <- 1 + 1
foo + 1
[1] 3

Commenting

We can add comments to our code using the # character. It is useful to document our code in this way so that others (and us the next time we read it) have an easier time following what the code is doing.

We can create some random numbers from the normal distribution and calculate the mean by using two built-in functions

randomNumbers <- rnorm(10)
mean(randomNumbers)
[1] 0.4869248

Variable Naming Conventions

Historically, R programmers have used a variety of conventions for naming variables. The . character in R can be a valid part of a variable name; thus the above assignment could have easily been random.numbers <- rnorm(10). This is often confusing to R newcomers who have programmed in languages where . has a more significant meaning (like in Python). There are many ‘standards’ in use e.g. random.numbers, random_numbers or randomNumbers. Choose what you prefer, but, just as with British or American spelling, the rule is to be consistent!

All those built-ins..

In contrast to Python, R has many functions available without loading any extra packages (2383 to be exact). These are mostly functions and becoming proficient with R is a lot about learning how these work.

While plain R already comes with many functions, one almost always wants to make use of code contributed as packages. The concept is very similar to Python but has some important differences. Say we want to use the readxl package which allows us to read Excel sheets directly to R. If you by any chance do not have the readxl package on your system you have to install it first. This can be done via the menues in RStudio, or by typing:

install.packages("readxl")

After that, we can load the package by:

library(readxl) # equivalent in Python would have been 'from readxl import *'

To see what became available we can

ls('package:readxl') # or read the documentation
help(package='readxl')

Sometimes you just want to use one function and do not want to load all functions. Then you can, without loading the package call the functions using the :: syntax.

readxl::readxl_example()
 [1] "clippy.xls"    "clippy.xlsx"   "datasets.xls"  "datasets.xlsx"
 [5] "deaths.xls"    "deaths.xlsx"   "geometry.xls"  "geometry.xlsx"
 [9] "type-me.xls"   "type-me.xlsx" 

Reading function documentation

All R functions are documented and you can read about them using RStudio documentation pane, or typing ?object, eg

?mean

Or, hit F1 while the cursor are in the object you want to get read documentation about.

R data types

R has three main data types that we need to know about, the two main ones are numeric which is both integers and floats, character, and factor which is like integer but with a character label attached to number (or level).

is.numeric(1)
[1] TRUE
is.character("foo")
[1] TRUE
is.factor(factor(c("a", "b", "c", "c")))
[1] TRUE

There are numerous data structures and classes but mainly three we need to care about.

Vectors Scalars act like vectors of length 1.

foo <- c(1, 2, 3)
foo[2]
[1] 2
bar <- 1
bar[1]
[1] 1

Indexing starts at 1!

Unlike python (and indeed most other programming languages) indexing starts at 1. bar[0] is however still syntactic correct, it just selects the null-vector from vector bar thus always a vector of length 0.

Matrices work like one would expect, values in two dimensions.

(foo <- matrix(c(1, 2, 3, 4, 5, 6), nrow=2))
     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6

Lists are widely used and work like vectors except that each element can be any data structure - so you can have lists of matrices, lists of lists etc.

(foo <- list(bar=c(1, 2, 3), baz=matrix(c(1, 2, 3, 4, 5, 6), nrow=2)))
$bar
[1] 1 2 3

$baz
     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6
foo[[2]]
     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6
foo[2]
$baz
     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6
foo$baz
     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6

Finally, data frames but they are so important they deserve a small section on their own.

For a rough comparison:

Functions

Although this course is focused on applied data analysis and not so much on the programming, it is often useful to be able to make simply functions to avoid having to repeat your code.

Functions in R behave very similar to Python

pintsToLitre <- function(pints) {
    litre <- pints * 0.56
    return(litre)
}

Fill in the blanks

What should the values of the blanks be to get the corresponding output?

plusValue <- ____ {
    return(x + ____)
}
plusValue(1)
plusValue(1, 3)
2
4

Key Points

  • R is a strong statistical computing environment

  • Thousands of packages for R

  • Use variable <- value to assign a value to a variable in order to record it in memory.

  • Objects are created on demand whenever a value is assigned to them.