20 July 2016

Who Am I?

Steve Pederson

R User for >10 years

Co-ordinator, Bioinformatics Hub
Level 4, Santos Petroleum Engineering Building

Who Am I?

The Bioinformatics Hub

Also Today:

  • Jimmy Breen & Hien To (Bioinformatics Hub)
  • Reuben Buckley (Centre For Computational Genetics)
  • Mohammad Kamruzzaman (Centre for Water Management and Reuse)

Who Am I?

The Bioinformatics Hub

Later This Week:

  • Joey Gerlach (eRSA)
  • Steve Delean (Biological Sciences)

Other Information

Slack

Please join the #radelaide-2016 channel

R Drop In

  • Previously Wed 12:30 - 2:30
  • Now Tues 11am - 1:00pm

Why use R?

  • The main software/language used for analysis of biological data (along with Python)
  • Can handle extremely large datasets
  • We can easily perform complex analytic procedures
  • Many processes come as inbuilt functions
  • Huge user base of biological researchers

Why use R?

Other Key Reasons

  • Avoids common Excel pitfalls
  • Reproducible Research

Automatic Conversion

A common Excel problem

Excel is notorious for converting values from one thing to another inappropriately.

  • Gene names are often converted to dates (e.g. SEPT9)

  • Genotypes can be converted into numeric values (e.g. the homozygote "1/1")

In R we generally work with plain text files.

Reproducible Research

  • Research is littered with mistakes from Excel
  • Studies have made Phase III trials
  • We have code to record and exactly repeat our analysis
  • We can find and correct errors more easily than if they are copy/paste errors

Using R

With power comes great responsibility - Uncle Ben

With this extra capability, we need to understand a little about:

  1. Data Types
  2. Data Structures

That's tomorrow's content

Using R

Today we will start with:

  1. An Introduction to RStudio
  2. Reading data into R
  3. Manipulating and cleaning data
  4. Visualising data
  5. Writing Reports

Introduction to RStudio

RStudio

  1. Open RStudio then
  2. File > New File > R Script
  3. Save As Introduction.R

RStudio

The Script Window

  • This is just a text editor.
  • We enter our commands here but they are not executed
    • We can keep a record of everything we've done
    • We can also add comments to our code
    • Comments start with the # symbol

The Script Window

The Console

  • Where we execute commands
  • Is essentially the "engine"
  • We can execute commands directly in the Console or send from the Script Window

Executing Code from the Script Window

Enter the following in the Script Window

# Create our first R object
x <- 5
  • Lines of code are sent to the Console by either:
    • Ctrl + Enter
    • Copy & Paste into the Console
    • Clicking the Run button at the top right

Executing Commands from the Script Window

We can view the contents of the object x by:

  • entering it's name directly in the Console, or
  • entering it's name in the Script Window & sending it to the Console
x
print(x)

The R Environment

Where have we created the object x?

  • Is it on your hard drive somewhere?
  • Is it in a file somewhere?

The R Environment

  • We have placed x in our R Workspace
  • More formally known as your Global Environment

The R Environment

  • The Environment is like your desktop
  • We keep all our relevant objects here and can save all the objects in your workspace as an .RData object
save.image()

The R Environment

  • In the R Environment, we can create objects of multiple types.
  • We first give them a name (e.g. x) and then assign a value to it using the <- symbol.
  • This is like an arrow putting the value into the object.
  • Can also work the other way, but is rarely done
# Using the reverse assignment operator
5 -> x

RStudio

Other Tabs and Features

  • Next to the Environment Tab is the History Tab
    • Contains everything executed in the Console
    • Useful for when we've been lazy
  • Best coding practice is still to enter code in the Script Window and execute

RStudio

Other Tabs and Features

In the bottom right are a series of tabs

  1. Files: This shows your current working directory
  2. Plots: Every time you make a graph it appears here
  3. Packages: NEVER CLICK OR UN-CLICK ANYTHING HERE
  4. Help: We'll explore this later

RStudio

Other Tabs and Features

  • Every tab can be resized using the buttons in the top right
  • Window separators can also be be moved

RStudio

Cheatsheet and Shortcuts

Help > Cheatsheets > RStudio IDE Cheat Sheet

Page 2 has lots of hints:

  • Ctrl + 1 places focus on the Script Window
  • Ctrl + 2 places focus on the Console
  • Ctrl + 3 places focus on the Help Tab

Introducing Projects

  • Projects help keep an analysis organised
  • Very useful for managing multiple analyses
  • Integrated with Git and SVN version control
  • Always goes back to "where-you-were-last-time"

Introducing Projects

Let's set one up for this course: File > New Project

Introducing Projects

  • Choose either a New or Existing Directory
  • Navigate to where you think is suitable for keeping the course notes
  • The project name will automatically be assigned as the directory name