16 April 2019
stringr contains functions for text manipulation
tidyversestr_detect(), str_extract(), str_replace()grepl(), grep(), gsub() etc from basestringr::str_detect()library(tidyverse)
x <- c("Hi Mum", "Hi Mother")
str_detect() returns a logical vectorstr_detect(string = x, pattern = "Mum") str_detect(string = x, pattern = "Hi")
stringr::str_detect()We can use common regex syntax:
[]str_detect(x, "h") str_detect(x, "[Hh]")
.str_detect(x, "Mo") str_detect(x, "M.")
stringr::str_extract()We can use str_extract() to extract patterns
str_extract(string = x, pattern = "Hi M.")
This can be helpful if no matches are found
str_extract(x, "Mum")
stringr::str_replace()Common syntax for extracting/modifying text strings
str_replace(x, pattern = "Mum", replacement = "Dad")
string "Hi Mum" for the pattern "Mum", andstringr::str_replace()We can specify wild-cards as .
str_replace(x, "M.", "Da")
We can also match any number of wild-cards by using +
str_replace(x, "M.+", "Dad")
stringr::str_replace()We can also capture words/phrases/patterns using (pattern)
str_replace(x, "(Hi) (M.+)", "\\2! \\1!")
Patterns are numbered in the order they are "captured"
stringr::str_replace()We can also specify alternatives instead of wild-cards ([])
str_replace(x, "[Mm]", "b")
str_replace() only replaces the first match in a stringstr_replace_all() replaces all matchesstr_replace_all(x, "[Mm]", "b")
stringr::str_replace()Alternative patterns can be specified using the conventional OR symbol |
str_replace(x, "(Mum|Mother)", "Maternal Parent")
str_count(x, "[Mm]")
str_length(x)
str_split_fixed(x, pattern = " ", n = 2)
str_to_lower(x)
str_to_title("a bad example")
str_pad(c("1", "10", "100"), width = 3, pad = "0")
A common data type in statistics is a categorical variable (i.e. a factor)
pet_vec <- c("Dog", "Dog", "Cat", "Dog", "Cat")
character vectorpet_factors <- as.factor(pet_vec) pet_factors
We can manually set these categories as levels
pet_factors <- factor(pet_vec, levels = c("Dog", "Cat"))
levelstr(pet_factors) as.integer(pet_factors) as.character(pet_factors)
What would happen if we think a factor is a character, and we use it to select values from a vector/matrix/data.frame?
What would happen if we think a factor is a character, and we use it to select values from a vector/matrix/data.frame?
names(pet_vec) <- pet_vec pet_vec pet_vec[pet_factors] pet_vec[as.character(pet_factors)]
This is why I'm very cautious about read.csv() and the standard data.frame etc