── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Attaching package: 'rstatix'
The following object is masked from 'package:stats':
filter
Attaching package: 'janitor'
The following object is masked from 'package:rstatix':
make_clean_names
The following objects are masked from 'package:stats':
chisq.test, fisher.test
here() starts at /Users/mtaliaferro/Documents/GitHub/molb-7950
Problem Set Stats Bootcamp - class 12
Hypothesis Testing
biochem <- read_tsv("http://mtweb.cs.ucl.ac.uk/HSMICE/PHENOTYPES/Biochemistry.txt", show_col_types = FALSE) |>
janitor::clean_names()
# simplify names a bit more
colnames(biochem) <- gsub(pattern = "biochem_", replacement = "", colnames(biochem))
# we are going to simplify this a bit and only keep some columns
keep <- colnames(biochem)[c(1, 6, 9, 14, 15, 24:28)]
biochem <- biochem[, keep]
# get weights for each individual mouse
# careful: did not come with column names
weight <- read_tsv("http://mtweb.cs.ucl.ac.uk/HSMICE/PHENOTYPES/weight", col_names = F, show_col_types = FALSE)
# add column names
colnames(weight) <- c("subject_name", "weight")
# add weight to biochem table and get rid of NAs
# rename gender to sex
b <- inner_join(biochem, weight, by = "subject_name") |>
na.omit() |>
rename(sex = gender)
Problem # 1
Does mouse sex explain mouse total cholesterol levels? Make sure to run chunks above.
1. Examine and specify the variable(s) (1 pt)
The response variable y is \(??\)
The explantory variable x is \(??\)
Make a violin plot: (2 pt)
response variable on the y-axis
explanatory variable on the x-axis
Get n, mean, median, sd (1 pt)
Is it normally distribute? (1 pt)
Answer here
Is it variance similar between groups? (1 pt)
Answer here
What kind of test are you picking and why? (1 pt)
Answer here
2. Declare null hypothesis \(\mathcal{H}_0\) (1 pt)
\(\mathcal{H}_0\) is that \(??\) does not explain \(??\)
3. Calculate test-statistic, exact p-value and plot (2 pt)
My interpretation of the result
# i have pre-selected some families to compare
myfams <- c(
"B1.5:E1.4(4) B1.5:A1.4(5)",
"F1.3:A1.2(3) F1.3:E2.2(3)",
"A1.3:D1.2(3) A1.3:H1.2(3)",
"D5.4:G2.3(4) D5.4:C4.3(4)"
)
# only keep the familys in myfams
bfam <- b |>
filter(family %in% myfams) |>
droplevels()
# simplify family names and make factor
bfam$family <- gsub(pattern = "\\..*", replacement = "", x = bfam$family) |>
as.factor()
# make B1 the reference (most similar to overall mean)
bfam$family <- relevel(x = bfam$family, ref = "B1")
Problem # 2
Does mouse family explain mouse total cholesterol levels? Make sure to run chunk above.
1. Examine and specify the variable(s) (1 pt)
The response variable y is \(??\)
The explantory variable x is \(??\)
Make a plot: (2 pt)
response variable on the y-axis
explanatory variable on the x-axis
Get n, mean, median, sd (1 pt)
Is it normally distribute? (1 pt)
Answer here
Is it variance similar between groups? (1 pt)
Answer here
What kind of test are you picking and why? (1 pt)
Answer here ### 2. Declare null hypothesis \(\mathcal{H}_0\)
\(\mathcal{H}_0\) is that \(??\) does not explain \(??\) (1 pt)
3. Calculate test-statistic, exact p-value and plot (2 pt)
My interpretation of the result