16 April 2019
stringr
contains functions for text manipulation
tidyverse
str_detect()
, str_extract()
, str_replace()
grepl()
, grep()
, gsub()
etc from base
stringr::str_detect()
library(tidyverse) x <- c("Hi Mum", "Hi Mother")
str_detect()
returns a logical vectorstr_detect(string = x, pattern = "Mum") str_detect(string = x, pattern = "Hi")
stringr::str_detect()
We can use common regex
syntax:
[]
str_detect(x, "h") str_detect(x, "[Hh]")
.
str_detect(x, "Mo") str_detect(x, "M.")
stringr::str_extract()
We can use str_extract()
to extract patterns
str_extract(string = x, pattern = "Hi M.")
This can be helpful if no matches are found
str_extract(x, "Mum")
stringr::str_replace()
Common syntax for extracting/modifying text strings
str_replace(x, pattern = "Mum", replacement = "Dad")
string
"Hi Mum" for the pattern
"Mum", andstringr::str_replace()
We can specify wild-cards as .
str_replace(x, "M.", "Da")
We can also match any number of wild-cards by using +
str_replace(x, "M.+", "Dad")
stringr::str_replace()
We can also capture words/phrases/patterns using (pattern)
str_replace(x, "(Hi) (M.+)", "\\2! \\1!")
Patterns are numbered in the order they are "captured"
stringr::str_replace()
We can also specify alternatives instead of wild-cards ([]
)
str_replace(x, "[Mm]", "b")
str_replace()
only replaces the first match in a stringstr_replace_all()
replaces all matchesstr_replace_all(x, "[Mm]", "b")
stringr::str_replace()
Alternative patterns can be specified using the conventional OR
symbol |
str_replace(x, "(Mum|Mother)", "Maternal Parent")
str_count(x, "[Mm]") str_length(x) str_split_fixed(x, pattern = " ", n = 2) str_to_lower(x) str_to_title("a bad example") str_pad(c("1", "10", "100"), width = 3, pad = "0")
A common data type in statistics is a categorical variable (i.e. a factor
)
pet_vec <- c("Dog", "Dog", "Cat", "Dog", "Cat")
character
vectorpet_factors <- as.factor(pet_vec) pet_factors
We can manually set these categories as levels
pet_factors <- factor(pet_vec, levels = c("Dog", "Cat"))
level
str(pet_factors) as.integer(pet_factors) as.character(pet_factors)
What would happen if we think a factor
is a character
, and we use it to select values from a vector
/matrix
/data.frame
?
What would happen if we think a factor
is a character
, and we use it to select values from a vector
/matrix
/data.frame
?
names(pet_vec) <- pet_vec pet_vec pet_vec[pet_factors] pet_vec[as.character(pet_factors)]
This is why I'm very cautious about read.csv()
and the standard data.frame
etc