R Bootcamp Problem Set 2

Author

Your Name Here

Published

October 21, 2024

Setup

Start by loading libraries you need analysis in the code chunk below. When in doubt, start by loading the tidyverse package.

Problem Set

Each problem below is worth 4 points.

Use the data files in the data/ directory to answer the questions.

For this problem set, you are allowed to help each other, but you are not allowed to post correct answers in slack.

The problem set is due 12pm on Aug 30.

Grading rubric

Everything is good: 5 points
Partially correct answers: 3-4 points
Reasonable attempt: 2 points

Question 1

Import the data set data_transcript_exp_subset using the readr package, and assign it to a variable called exp_tbl.

The file is located at the following path data/data_transcript_exp_subset.csv.gz

Question 2

This data frame is a subset (100 lines) of transcript-level gene expression data where transcript abundance was measured at two different time points of a certain treatment conducted in triplicates.

The column names have the format of molecule_time_replicate

First, explore the structure of the data set using some of the functions we learned in class (e.g., summary())

Comment on whether this data set is tidy, and if not, list the reasons why. Remember: in tidy data, every column represents a single variable and every row represents a single observation

Answer below

Question 3

How will you reshape the data frame so that each row has only one experimental observation?

Hint: Use pivot_longer()

Question 4

How will you modify the dataframe so that multiple variables are not present in a single column?

Hint: Use separate()

Question 5

How will you save your output as a TSV file?

Hint: Use the readr cheatsheet to figure this out.

https://rstudio.cloud/learn/cheat-sheets