-- Attaching core tidyverse packages ------------------------ tidyverse 2.0.0 --
v dplyr 1.1.3 v readr 2.1.4
v forcats 1.0.0 v stringr 1.5.0
v ggplot2 3.4.3 v tibble 3.2.1
v lubridate 1.9.2 v tidyr 1.3.0
v purrr 1.0.2
-- Conflicts ------------------------------------------ tidyverse_conflicts() --
x dplyr::filter() masks stats::filter()
x dplyr::lag() masks stats::lag()
i Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Attaching package: 'rstatix'
The following object is masked from 'package:stats':
filter
Attaching package: 'janitor'
The following object is masked from 'package:rstatix':
make_clean_names
The following objects are masked from 'package:stats':
chisq.test, fisher.test
here() starts at /Users/jayhesselberth/devel/rnabioco/molb-7950
Problem Set Stats Bootcamp - class 12
Hypothesis Testing
biochem <- read_tsv("http://mtweb.cs.ucl.ac.uk/HSMICE/PHENOTYPES/Biochemistry.txt", show_col_types = FALSE) |>
janitor::clean_names()
Warning in FUN(X[[i]], ...): unable to translate '<U+00C4>' to native encoding
Warning in FUN(X[[i]], ...): unable to translate '<U+00D6>' to native encoding
Warning in FUN(X[[i]], ...): unable to translate '<U+00DC>' to native encoding
Warning in FUN(X[[i]], ...): unable to translate '<U+00E4>' to native encoding
Warning in FUN(X[[i]], ...): unable to translate '<U+00F6>' to native encoding
Warning in FUN(X[[i]], ...): unable to translate '<U+00FC>' to native encoding
Warning in FUN(X[[i]], ...): unable to translate '<U+00DF>' to native encoding
Warning in FUN(X[[i]], ...): unable to translate '<U+00C6>' to native encoding
Warning in FUN(X[[i]], ...): unable to translate '<U+00E6>' to native encoding
Warning in FUN(X[[i]], ...): unable to translate '<U+00D8>' to native encoding
Warning in FUN(X[[i]], ...): unable to translate '<U+00F8>' to native encoding
Warning in FUN(X[[i]], ...): unable to translate '<U+00C5>' to native encoding
Warning in FUN(X[[i]], ...): unable to translate '<U+00E5>' to native encoding
# simplify names a bit more
colnames(biochem) <- gsub(pattern = "biochem_", replacement = "", colnames(biochem))
# we are going to simplify this a bit and only keep some columns
keep <- colnames(biochem)[c(1, 6, 9, 14, 15, 24:28)]
biochem <- biochem[, keep]
# get weights for each individual mouse
# careful: did not come with column names
weight <- read_tsv("http://mtweb.cs.ucl.ac.uk/HSMICE/PHENOTYPES/weight", col_names = F, show_col_types = FALSE)
# add column names
colnames(weight) <- c("subject_name", "weight")
# add weight to biochem table and get rid of NAs
# rename gender to sex
b <- inner_join(biochem, weight, by = "subject_name") |>
na.omit() |>
rename(sex = gender)
Problem # 1
Can mouse sex explain mouse cholesterol? {.smaller}
STEP 1: Null hypothesis and variable specification
\(\mathcal{H}_0:\)
?? is the response variable
?? is the explanatory variable
STEP 2: Fit linear model and examine results
Fit summary:
Coefficient summary:
Collecting residuals and other information
add residuals and other information
STEP 4: Visualize the error around fit
# plot of data with mean and colored by residuals
STEP 3: Visualize the error around the null (mean weight)
Plot the fit error and the null error as 2 panels
Calculate \(R^2\)
\(R^2 = 1 - \displaystyle \frac {SS_{fit}}{SS_{null}}\)
check agains Rsq in your fit