R Bootcamp Problem Set 4
Problem Set
Use the data files in the data/
directory to answer the questions.
For this problem set, you are allowed to help each other, but you are not allowed to post correct answers in slack.
The problem set is due 12pm on Sept 1.
Grading rubric
- Everything is good: full points
- Partially correct answer: depends on how many steps are correct
- Reasonable attempt: half points
Question 1 2 points
- Load the tidyverse and here packages.
- Import datasets:
data/data_rna_protein.csv.gz
.
data_rna_protein.csv.gz
: This is a combined dataset from an RNAseq and SILAC proteomics experiment, where a transcription factor (TF) was differentially expressed and the fold change in RNA and protein calculated between TF-expressing and non-expressing cells.
Question 2 9 points
Using the imported data set, carry out the following:
Inspect the data so you know what you are dealing with (
summary()
etc).Select only the following columns:
geneid
,iDUX4_logFC
,iDUX4_fdr
,hl.ratio
, andpval
.Rename them as follows: rna_FC = iDUX4_logFC, rna_pval = iDUX4_fdr, protein_FC = hl.ratio, protein_pval = pval (hint: use
dplyr::rename()
)Drop all rows with
NA
values in them (hint: use a function fromtidyr
)Remove duplicate rows (hint: use
dplyr::distinct()
).Arrange the table by descending
rna_FC
and ascendingprotein_FC
.Conduct steps 2-7 by piping the output of one step to another (i.e, a single workflow & remember to comment).
Save the output of this workflow into a new object.
Question 3 9 points
How well do the overall rna_FC
and protein_FC
values correlate in this experiment?
Using the output from Question 2, do the following:
Create a scatter plot of
rna_FC
vsprotein_FC
. observe how the points scatter.Add a line to the plot that would indicate perfect 1:1 correlation. Hint: use
geom_abline()
with itsslope
argument.Add a linear model fit using
geom_smooth()
using itsmethod = 'lm'
argument. Observe how the x=y line deviates from your geom_smooth line.Calculate the Spearman correlation coefficient. (Hint: This uses a base R math function called
cor
. Use?cor
or Google to learn more and how to specify method asspearman
.Using all of the information from above, comment on the correlation between
rna_FC
andprotein_FC
below.
Answer
Submit
Be sure to click the “Render” button to render the HTML output.
Then paste the URL of this Posit Cloud project into the problem set on Canvas.