R & RStudio overview
RNA Bioscience Initiative | CU Anschutz
2024-10-21
Instructors (me, Srinivas Ramachandran, Jay Hesselberth)
TAs (Christina Akirtava and Charlie Moffatt)
Read the syllabus.
Your grades are based on attendance / participation, problem sets, and a final project. Your lowest problem set grade will be dropped.
If you are sick, let me and Srinivas know, and stay home. We will record all classes and make them available on Panopto.
All course details are on the website.
We use Canvas for problem set submission & grading.
If you get stuck during class: use the #class
channel in slack. TAs will come over.
If you need help outside of class (in order):
Prior to each block (and sometimes prior to a class), check and complete material in the “Prepare” column on the class schedule.
On the day of class and before class starts, start the day’s “assignment” in Posit Cloud. This will contain blank exercises that you’ll fill in during class.
You’ll also have access to the slides, but it’s probably better for the first few classes to just have the exercises open.
You’ll have a problem set assigned at the end of each class. Our expectation is that you spend a 30-90 minutes on each problem set.
Problem sets will get progressively more difficult.
You can work in groups for problem sets (see the Syllabus), but during the Bootcamp you should avoid it.
If you feel like you’re stuck on something silly, reach out through slack or office hours.
We’ll talk about the problem set at the end of each class. You are welcome to use the remaining class time to start and possibly finish the problem set.
Learn the fundamentals of R programming (class 1)
Become familiar with “tidyverse” suite of packages
Practice reproducible analysis using Quarto/Rmarkdown (Rigor & Reproducibility)
Review R basics
Review Quarto/Rmarkdown (Exercise #9)
See menu:
Help > Cheat Sheets > RStudio IDE Cheat Sheet
Try simple math.
[1] 17
Assign a numeric value to an object.
<-
and =
are assignment operators.<-
.x <- 1
reads “set the value of x
to 1”.=
and ==
are two different operators.
a =
is used for assignment (e.g., x = 1
)
a ==
tests for equivalence (e.g. x == 1
says “does x
equal 1?”)
Vectors are a core R data structure.
A vector is an ordered collection of elements of the same type (e.g. numeric, character, or logical).
Later you will see that every column of a data.table / tibble is a vector.
Operations on vectors propagate to all the elements of the vectors.
Let’s create some vectors.
c
function combines values together (e.g., c(1,2,3)
)A data.frame is a rectangle, where each column is a vector, and each row is a slice across vectors.
data.frame columns are vectors, and can have different types (numeric, character, factor, etc.).
A data.frame is constructed with data.frame()
.
[1] "data.frame"
x y
1 1 2
2 2 4
3 3 6
A tibble is a modern reimagining of the data.frame, keeping what time has proven to be effective, and throwing out what is not.
Tibbles are data.frames that are lazy and surly: they do less (i.e. they don’t change variable names or types, and don’t do partial matching) and complain more (e.g. when a variable does not exist).
This forces you to confront problems earlier, typically leading to cleaner, more expressive code. Tibbles also have an enhanced print()
method which makes them easier to use with large datasets containing complex objects.
tibble()
does much less than data.frame()
:
row.names()
Create a data.frame and tibble.
Now echo the contents of df
and tbl
to the console and inspect
An R package is a collection of code, data, documentation, and tests that is easily shareable.
A package often has a collection of custom functions that enable you to carry out a workflow. eg. DESeq for RNA-seq analysis.
The most popular places to get R packages from are CRAN, Bioconductor, and Github.
Once a package is installed, one still has to “load” them into the environment using a library(<package>)
call.
Let’s do the following to explore R packages:
Quarto is a fully reproducible authoring framework to create, collaborate, and communicate your work.
Quarto lets you render Rmarkdown documents (in addition to Jupyter notebooks, etc.)
Quarto supports a number of output formats including pdfs, word documents, slide shows, html, etc.
A Quarto document is a plain text file with the extension .qmd
and contains the following basic components:
---
.# heading
and *italics*
.Let’s do the following to explore Quarto documents:
Your first problem set is in problem-sets/class-01.qmd
Course website: https://rnabioco.github.io/molb-7950