Class 10: Jumping into more complex R, tips, resources
Rui Fu
2022-07-06
Source:vignettes/class-10.Rmd
class-10.Rmd
1. Speeding up repeated Rmd knitting
Read the Guide to RMarkdown for an exhaustive description of the various formats and options for using RMarkdown documents. Note that HTMLs and slides for this class were all made from Rmd.
Caching
You can speed up knitting of your Rmds by using caching to store the results from each chunk, instead of rerunning them each time. Note that if you modify the code chunk, previous caching is ignored.
For each chunk, set {r, cache = TRUE}
Or the option can be set globally at top of the document. Like this:
knitr::opts_chunk$set(cache = TRUE)
2. Finding useful packages
In most cases, what you need is already made into well-documented packages, and you don’t have to reinvent the wheel (but sometimes you should?). Depending on where the package is curated, installation is different. Some examples below:
-
Gviz
- visualize gene model -
VennDiagram
- making custom venn diagrams -
emo
- inserting emojis into Rmd
3. Git - version control
Advantages of using version control include:
rolling back code if needed
branched development, tackling individual issues/tasks
collaboration, etc
Git was first created by Linus Torvalds for coordinating development of Linux. Read this guide for Getting started and try out the Interactive practice.
# for bioinformatics, get comfortable with command line too
# and avoid using windows for compatibility issues
ls
git status # list changes
git blame assignments.Rmd # see who contributed
This can be handled by Rstudio as well (new tab next to Connections
and Build
)
Put your code on GitHub
As you write more code, especially as functions and script pipelines, hosting and documenting them on GitHub is great way to make them portable and searchable. Even the free tier of GitHub accounts now has private repositories (repo).
If you have any interest in a career in data science/informatics, GitHub is also a common showcase of what (and how well/often) you can code. After some accumulation of code, definitely put your GitHub link on your CV/resume.
Example repo (RBI) - valr
Refer to package examples and guide to building packages to build your own package and website documentation.
Asking for help with other packages on GitHub
Every package should include README, installation instructions, and a maintained issues
page where questions and bugs can be reported and addressed. Example: readr GitHub page Don’t be afraid to file new issues, but very often your problems are already answered in the closed
section.
4. Other tips/best practices
a. Code and data organization
Read this: A Quick Guide to Organizing Computational Biology Projects
NEWS.md # markdown document for tracking progress
data # data directory for storing raw and processed data
dbases # database files downloaded from other places
docs # publication documents and random project files
your-project.Rproj # Make an Rproject in Rstudio for every project
results # store all RMarkdown here
src # store useful scripts here
b. styler
, clean up code readability
Refer to this style guide often, so you don’t have to go back to make the code readable/publishable later.
styler::style_text("
my_fun <- function(x,
y,
z) {
x+
z
}
")
#> my_fun <- function(x,
#> y,
#> z) {
#> x +
#> z
#> }
# styler::style_file # for an entire file
# styler::style_dir # or folder
c. Benchmarking, with microbenchmark
and profvis
# example, compare base and readr csv reading functions
path_to_file <- system.file("extdata", "gene_tibble.csv", package = "pbda")
res <- microbenchmark::microbenchmark(
base = read.csv(path_to_file),
readr = readr::read_csv(path_to_file),
times = 5
)
print(res, signif = 2)
microbenchmark:::autoplot.microbenchmark(res)
# example, looking at each step of a script
library(dplyr)
p <- profvis::profvis({
path_to_file <- system.file("extdata", "gene_tibble.csv", package = "pbda")
c1 <- readr::read_csv(path_to_file)
c1 <- c1 %>%
filter(padj < 0.00001) %>%
mutate(gene = stringr::str_to_lower(gene))
})
p
d. shiny
, interactive web app for data exploration
Making an interactive interface to data and plotting is easy in R. Examples and corresponding code can be found at https://shiny.rstudio.com/gallery/.
e. Running R script from terminal
Save functions in .R files to easily source()
them later. If you have code that does not need input from beginning to end, .R files can be executed from the command line/terminal as well.
# example, mat.R that makes matrix and does calculations
Rscript /Users/rf/teach/practical-data-analysis/R/mat.R
5. Finding help online
The R studio community forums are a great resource for asking questions about tidyverse related packages.
StackOverflow provides user-contributed questions and answers on a variety of topics.
For help with bioconductor packages, visit the Bioc support page
Find out if others are having similar issues by searching the issue on the package GitHub page.
Bioinformatic resources
For general bioinformatics advice, we found the following text very useful.
Bioinformatics Data Skills by Vince Buffalo
For statistics related to bioinformatics, this free course is excellent:
PH525x series - Biomedical Data Science
For more detailed descriptions of single cell RNA-Seq analysis:
Cheat sheets
Rstudio links to common ones here: Help
-> Cheatsheets
. More are hosted online, such as for regular expressions.
Useful to keep your own stash too.
6. Offline help
The RBI fellows hold standing office hours on Thursdays, walk-ins 1-2pm, by appointment 2-4pm, in RC1-South Room 9101. We are happy to help out with coding and RNA/DNA-related informatics questions. Send us an email to let us know you will be stopping by (rbi.fellows@cuanschutz.edu
).
7. Sometimes code is just broken
No one writes perfect code. A certain highly-regarded and cited scRNA-seq package drew cell type progression arrows in the wrong direction, for months before the issue was fixed!
If you suspect bugs or mishandled edge cases, go to the package GitHub and search the issues section to see if the problem has been reported or fixed. If not, submit an issue that describes the problem. The reprex package makes it easy to produce well-formatted reproducible examples that demonstrate the problem. Often developers will be thankful for your help with making their software better.