R Bootcamp Problem Set 3

Author

Insert your name here

Published

October 21, 2024

Setup

Start by loading libraries you need analysis in the code chunk below. When in doubt, start by loading the tidyverse package.

Problem Set

Each problem below is worth 5 points.

Use the data files in the data/ directory to answer the questions.

For this problem set, you are allowed to help each other, but you are not allowed to post correct answers in slack.

The problem set is due 12pm on Aug 31.

Grading rubric

  • Everything is good: 5 points
  • Partially correct answers: 3-4 points
  • Reasonable attempt: 2 points

Question 1

Load the palmerpenguins package. Inspect the penguins tibble with summary.

Use drop_na() to remove rows with NA values in the penguins tibble. How many rows were removed from the tibble?

Then, use replace_na() to replace NA values in bill_length_mm and bill_depth_mm with a value of 0.

Question 2

Use arrange, filter, and select on a dataframe. Do the following, in order:

  1. Import the data set data/data_transcript_exp_tidy.csv.
  2. Sort the tibble by expression data (count) from highest to lowest level.
  3. Filter the tibble by count > 100
  4. Select all columns except for type

Question 3

How will you:

  1. create a new column log10count that contains log10 transformed count values and
  2. rearrange the columns in the following order: ensembl_transcript_id, type, time, replicate, count, log10count.

(Note that we have dropped extra)

Hint: Use mutate and select

Question 4

Calculate a per-transcript sum, while keeping the time information?

Hint: Use group_by with multiple variables, and summarise the “count” values using sum()