Problem Set Stats Bootcamp - class 12

Hypothesis Testing

Author

Neelanjan Mukherjee

Published

October 20, 2025

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.2
✔ ggplot2   4.0.0     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.1.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Attaching package: 'rstatix'


The following object is masked from 'package:stats':

    filter



Attaching package: 'janitor'


The following object is masked from 'package:rstatix':

    make_clean_names


The following objects are masked from 'package:stats':

    chisq.test, fisher.test


here() starts at /Users/jayhesselberth/devel/rnabioco/molb-7950
biochem <- read_tsv("http://mtweb.cs.ucl.ac.uk/HSMICE/PHENOTYPES/Biochemistry.txt", show_col_types = FALSE) |>
  janitor::clean_names()

# simplify names a bit more
colnames(biochem) <- gsub(pattern = "biochem_", replacement = "", colnames(biochem))

# we are going to simplify this a bit and only keep some columns
keep <- colnames(biochem)[c(1, 6, 9, 14, 15, 24:28)]
biochem <- biochem[, keep]

# get weights for each individual mouse
# careful: did not come with column names
weight <- read_tsv("http://mtweb.cs.ucl.ac.uk/HSMICE/PHENOTYPES/weight", col_names = F, show_col_types = FALSE)

# add column names
colnames(weight) <- c("subject_name", "weight")

# add weight to biochem table and get rid of NAs
# rename gender to sex
b <- inner_join(biochem, weight, by = "subject_name") |>
  na.omit() |>
  rename(sex = gender)

Problem # 1

Can mouse sex explain mouse cholesterol? {.smaller}

STEP 1: Null hypothesis and variable specification

\(\mathcal{H}_0:\)

?? is the response variable

?? is the explanatory variable

STEP 2: Fit linear model and examine results

Fit summary:

Coefficient summary:

Collecting residuals and other information

add residuals and other information

STEP 4: Visualize the error around fit

# plot of data with mean and colored by residuals

STEP 3: Visualize the error around the null (mean weight)

Plot the fit error and the null error as 2 panels

Calculate \(R^2\)

\(R^2 = 1 - \displaystyle \frac {SS_{fit}}{SS_{null}}\)

check agains Rsq in your fit

Compare to traditional t-test

Provide your interpreation of the result