R Bootcamp Problem Set 9
Problem Set
Use the data files in the data/
directory to answer the questions.
For this problem set, you are allowed to help each other, but you are not allowed to post correct answers in slack.
The problem set is due 5pm on Sept 11.
A quantitative PCR experiment
This problem set is a more complex version of the qPCR experiment we discussed in class.
Here is the experimental setup:
Three cell lines (
wt
,TF-mutant
,RL-mutant
) were treated with a drug that induces interferon expression.After specific time points (0, 4, 8, 12, 24 hours), cells were harvested and actin and interferon mRNA were analyzed by quantitative PCR (with 3 biological replicates, and 3 technical replicates).
Data for the Biological replicates come from three independent cultures and RNA samples.
Data for the technical replicates come the same RNA sample, measured in 3 independent qPCR reactions.
The data are in data/
:
data/qpcr_names_ps.tsv.gz
data/qpcr_data.tsv.gz
Libraries
Load the libraries you need for analysis below.
Load the data
Load the data sets and inspect.
Tidy the data (4 points)
Given the experimental setup and the shape of the tibbles, you should be able to answer: Are these data tidy?
- What are the variables in the data?
Answer
- Are the variables the column names?
Answer
The names are encoded in the following order:
gt
, time
, gene
, rep_tech
, rep_bio
.
Question 1 (4 points)
Calculate summary statistics for the experiment.
Calculate the mean of the technical replicates within each group of genotype, time, gene, and biological replicate.
Calculate the mean and standard deviation of the biolgical replicates (which is the mean of technical replicates, above).
You should have a tibble that looks like this:
# A tibble: 36 × 5
gt time gene bio_mean bio_sd
<chr> <int> <chr> <dbl> <dbl>
1 RL-mutant 0 GAPDH 0.444 0.0759
2 RL-mutant 0 IFN-beta 3.32 0.188
3 RL-mutant 4 GAPDH 1.61 0.487
4 RL-mutant 4 IFN-beta 18.4 1.15
5 RL-mutant 8 GAPDH 3.25 1.06
6 RL-mutant 8 IFN-beta 32.2 1.82
7 RL-mutant 12 GAPDH 3.90 0.911
8 RL-mutant 12 IFN-beta 47.5 3.78
9 RL-mutant 24 GAPDH 7.93 3.41
10 RL-mutant 24 IFN-beta 76.7 4.75
# ℹ 26 more rows
# ℹ Use `print(n = ...)` to see more rows
Question 2 (4 points)
Create a plot of expression by time from the data, using the mean of the biological replicates as the
y
value.Color the plot by genes.
Use
ggplot2::geom_pointrange()
do represent the standard deviation of the data. Alternatively, useggplot2::geom_errobar()
withgeom_point()
.Draw a line through the points with
geom_line()
.Facet the plot by genotype.
Change the colors of the of the plot with a
scale
function.Update the labels on the plot (“time (hours)”, etc.).
Question 3 (4 points)
- What can you say about the expression of GAPDH and IFN in the different cell types?
Answer.
- Can you come up with a simple molecular mechanism to explain the results?
Answer.
Question 4 (4 points)
Reformat the data from Question 2 such that you calculate a ratio of IFN to GAPDH. Start with the data Question 1.2, above.
Re-plot the data as in Question 2, but leave out the color as you have collapsed the two genes into one value.
Question 5 (4 points)
Is there more spread across the technical replicates, or across the biological replicates (across the whole experiment)?
To get at this question, calculate the standard deviations across the two sets of replicates separately. Which one has a greater spread (max - min)? And what might this mean?
Answer.
Grading rubric
- Everything is good: full points
- Partially correct answer: depends on how many steps are correct
- Reasonable attempt: half points
Submit
Be sure to click the “Render” button to render the HTML output.
Then paste the URL of this Posit Cloud project into the problem set on Canvas.