R Bootcamp Problem Set 3
Setup
Start by loading libraries you need analysis in the code chunk below. When in doubt, start by loading the tidyverse package.
Problem Set
Each problem below is worth 5 points.
Use the data files in the data/
directory to answer the questions.
For this problem set, you are allowed to help each other, but you are not allowed to post correct answers in slack.
The problem set is due 12pm on Aug 31.
Grading rubric
- Everything is good: 5 points
- Partially correct answers: 3-4 points
- Reasonable attempt: 2 points
Question 1
Load the palmerpenguins
package. Inspect the penguins
tibble with summary
.
Use drop_na()
to remove rows with NA
values in the penguins
tibble. How many rows were removed from the tibble?
Then, use replace_na()
to replace NA
values in bill_length_mm
and bill_depth_mm
with a value of 0.
Question 2
Use arrange
, filter
, and select
on a dataframe. Do the following, in order:
- Import the data set
data/data_transcript_exp_tidy.csv
. - Sort the tibble by expression data (
count
) from highest to lowest level. - Filter the tibble by
count
> 100 - Select all columns except for
type
Question 3
How will you:
- create a new column
log10count
that contains log10 transformedcount
values and - rearrange the columns in the following order: ensembl_transcript_id, type, time, replicate, count, log10count.
(Note that we have dropped extra
)
Hint: Use mutate
and select
Question 4
Calculate a per-transcript sum, while keeping the time
information?
Hint: Use group_by
with multiple variables, and summarise
the “count” values using sum()