15 April 2019
If you've started a new session since last time:
library(tidyverse)
transport.csv is in your data folderFiles paneView FileWhat problems do we face here?
What problems do we face here?
Let's try writing code for this instead of using the GUI
This will fail
transport <- read_csv("data/transport.csv")
R uses the first row to guess how many columns there are
R to ignore any lines beginning with #comment = "#"transport <- read_csv("data/transport.csv", comment = "#")
transport
Now R is guessing the correct number of columns \(\implies\) the file will load
What does all that red (or blue) stuff mean?
R has assumed the first row contains column namesR to ignore these using: col_names = FALSEtransport <- read_csv("data/transport.csv",
comment = "#",
col_names = FALSE)
transport
What has R used for column names?
What impact has the missing data in X5 had?
NA (na = "-")transport <- read_csv("data/transport.csv",
comment = "#",
col_names = FALSE,
na = "-")
transport
- to skip a columnR to guess any remaining columns using ?col_types argument: col_types = "-?????"transport <- read_csv("data/transport.csv",
comment = "#",
col_names = FALSE,
na = "-",
col_types = "-?????")
transport
ncharacter columns can be specified as ctransport <- read_csv("data/transport.csv",
comment = "#",
col_names = FALSE,
na = "-",
col_types = "-ccnnc")
transport
n instead of c)transport <- read_csv("data/transport.csv",
comment = "#",
col_names = FALSE,
na = "-",
col_types = "-ccnnn")
transport
NB: No warning will be given if a numeric column contains non-numeric characters
Let's change that back to the correct code:
transport <- read_csv("data/transport.csv",
comment = "#",
col_names = FALSE,
na = "-",
col_types = "-ccnnc")
transport
vector of namesmyNames <- c("gender", "name", "weight", "height", "method")
transport <- read_csv("data/transport.csv",
comment = "#",
col_names = myNames,
na = "-",
col_types = "-ccnnc")
transport
c() functionThe most common function in R is c()
combineR object, or vectorNULLc()
c() function<- to assign this vectorWhat would happen if I gave too many or too few names?
After we've edited a file, we might also wish to export it
?write_csv
write_delim().csv, .txt, .tsv etc.R objects can be exported using write_rds()The best way to export this is:
write_csv(transport, "data/transport_clean.csv")
Download the file geneCounts.out (output from featureCounts)
read_delim()
Chr, Start, End and Strand Columnsbasename()_Fem_hisat2_sorted.bam from the end of the column names