15 April 2019
If you've started a new session since last time:
library(tidyverse)
transport.csv
is in your data
folderFiles
paneView File
What problems do we face here?
What problems do we face here?
Let's try writing code for this instead of using the GUI
This will fail
transport <- read_csv("data/transport.csv")
R
uses the first row to guess how many columns there are
R
to ignore any lines beginning with #
comment = "#"
transport <- read_csv("data/transport.csv", comment = "#") transport
Now R
is guessing the correct number of columns \(\implies\) the file will load
What does all that red (or blue) stuff mean?
R
has assumed the first row contains column namesR
to ignore these using: col_names = FALSE
transport <- read_csv("data/transport.csv", comment = "#", col_names = FALSE) transport
What has R
used for column names?
What impact has the missing data in X5
had?
NA
(na = "-"
)transport <- read_csv("data/transport.csv", comment = "#", col_names = FALSE, na = "-") transport
-
to skip a columnR
to guess any remaining columns using ?
col_types
argument: col_types = "-?????"
transport <- read_csv("data/transport.csv", comment = "#", col_names = FALSE, na = "-", col_types = "-?????") transport
n
character
columns can be specified as c
transport <- read_csv("data/transport.csv", comment = "#", col_names = FALSE, na = "-", col_types = "-ccnnc") transport
n
instead of c
)transport <- read_csv("data/transport.csv", comment = "#", col_names = FALSE, na = "-", col_types = "-ccnnn") transport
NB: No warning will be given if a numeric column contains non-numeric characters
Let's change that back to the correct code:
transport <- read_csv("data/transport.csv", comment = "#", col_names = FALSE, na = "-", col_types = "-ccnnc") transport
vector
of namesmyNames <- c("gender", "name", "weight", "height", "method") transport <- read_csv("data/transport.csv", comment = "#", col_names = myNames, na = "-", col_types = "-ccnnc") transport
c()
functionThe most common function in R
is c()
combine
R
object, or vector
NULL
c()
c()
function<-
to assign this vectorWhat would happen if I gave too many or too few names?
After we've edited a file, we might also wish to export it
?write_csv
write_delim()
.csv
, .txt
, .tsv
etc.R
objects can be exported using write_rds()
The best way to export this is:
write_csv(transport, "data/transport_clean.csv")
Download the file geneCounts.out
(output from featureCounts)
read_delim()
Chr
, Start
, End
and Strand
Columnsbasename()
_Fem_hisat2_sorted.bam
from the end of the column names