Before we start
- R is a programming language and RStudio is the IDE that assists in using R.
- There are many benefits to learning R, including writing reproducibile code, ability to use a variety of datasets, and a broad, open-source community of practioners.
- Files related to analysis should be organized within a single working directory.
- R uses commands containing functions to tell the computer what to do.
- Documentation for each function is available within RStudio, or users can ask for help from one of many online forums, cheatsheets, or email lists.
Introduction to R
-
<-is used to assign values on the right to objects on the left - Code should be saved within the Source pane in RStudio to help you
return to your code later.
- ‘#’ can be used to add comments to your code.
- Functions can automate more complicated sets of commands, and require arguments as inputs.
- Vectors are composed by a series of values and can take many forms.
- Data structures in R include ‘vector’, ‘list’, ‘matrix’, ‘data.frame’, ‘factor’, and ‘array’.
- Vectors can be subset by indexing or through logical vectors.
- Many functions exist to remove missing data from data structures.
Starting with data
- Use
read.csvto read tabular data in R. - A data frame is the representation of data in the format of a table where the columns are vectors that all have the same length.
-
dplyrprovides many methods for inspecting and summarizing data in data frames. - Use factors to represent categorical data in R.
- The
lubridatepackage has many useful functions for working with dates.
Manipulating, analyzing and exporting data with tidyverseData manipulation using dplyr and
tidyr
Exporting data
- Use the
dplyrpackage to manipulate data frames. - Use
select()to choose variables from a data frame. - Use
filter()to choose data based on values. - Use
mutate()to create new variables. - Use
group_by()andsummarize()to work with subsets of data.
Data visualization with ggplot2
- start simple and build your plots iteratively
- the
ggplot()function initiates a plot, andgeom_functions add representations of your data - use
aes()when mapping a variable from the data to a part of the plot - use
facet_to partition a plot into multiple plots based on a factor included in the dataset - use premade
theme_functions to broadly change appearance, and thetheme()function to fine-tune - the
patchworklibrary can combine separate plots into a single figure - use
ggsave()to save plots in your favorite format and dimensions
Control Flow
- Use
ifandelseto make choices. - Use
forto repeat operations.