R Bootcamp Problem Set 2
Setup
Start by loading libraries you need analysis in the code chunk below. When in doubt, start by loading the tidyverse package.
Problem Set
Each problem below is worth 4 points.
Use the data files in the data/
directory to answer the questions.
For this problem set, you are allowed to help each other, but you are not allowed to post correct answers in slack.
The problem set is due 12pm on Aug 30.
Grading rubric
- Everything is good: 5 points
- Partially correct answers: 3-4 points
- Reasonable attempt: 2 points
Question 1
Import the data set data_transcript_exp_subset
using the readr package, and assign it to a variable called exp_tbl
.
The file is located at the following path data/data_transcript_exp_subset.csv.gz
Question 2
This data frame is a subset (100 lines) of transcript-level gene expression data where transcript abundance was measured at two different time points of a certain treatment conducted in triplicates.
The column names have the format of molecule_time_replicate
First, explore the structure of the data set using some of the functions we learned in class (e.g., summary()
)
Comment on whether this data set is tidy, and if not, list the reasons why. Remember: in tidy data, every column represents a single variable and every row represents a single observation
Answer below
Question 3
How will you reshape the data frame so that each row has only one experimental observation?
Hint: Use pivot_longer()
Question 4
How will you modify the dataframe so that multiple variables are not present in a single column?
Hint: Use separate()
Question 5
How will you save your output as a TSV file?
Hint: Use the readr cheatsheet to figure this out.
https://rstudio.cloud/learn/cheat-sheets