Problem Set Stats Bootcamp - class 12

Hypothesis Testing

Author

Neelanjan Mukherjee

Published

October 21, 2024

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Attaching package: 'rstatix'


The following object is masked from 'package:stats':

    filter



Attaching package: 'janitor'


The following object is masked from 'package:rstatix':

    make_clean_names


The following objects are masked from 'package:stats':

    chisq.test, fisher.test


here() starts at /Users/mtaliaferro/Documents/GitHub/molb-7950

biochem <- read_tsv("http://mtweb.cs.ucl.ac.uk/HSMICE/PHENOTYPES/Biochemistry.txt", show_col_types = FALSE) |>
  janitor::clean_names()

# simplify names a bit more
colnames(biochem) <- gsub(pattern = "biochem_", replacement = "", colnames(biochem))

# we are going to simplify this a bit and only keep some columns
keep <- colnames(biochem)[c(1, 6, 9, 14, 15, 24:28)]
biochem <- biochem[, keep]

# get weights for each individual mouse
# careful: did not come with column names
weight <- read_tsv("http://mtweb.cs.ucl.ac.uk/HSMICE/PHENOTYPES/weight", col_names = F, show_col_types = FALSE)

# add column names
colnames(weight) <- c("subject_name", "weight")

# add weight to biochem table and get rid of NAs
# rename gender to sex
b <- inner_join(biochem, weight, by = "subject_name") |>
  na.omit() |>
  rename(sex = gender)

Problem # 1

Does mouse sex explain mouse total cholesterol levels? Make sure to run chunks above.

1. Examine and specify the variable(s) (1 pt)

The response variable y is \(??\)
The explantory variable x is \(??\)

Make a violin plot: (2 pt)

response variable on the y-axis

explanatory variable on the x-axis

Get n, mean, median, sd (1 pt)

Is it normally distribute? (1 pt)

Answer here

Is it variance similar between groups? (1 pt)

Answer here

What kind of test are you picking and why? (1 pt)

Answer here

2. Declare null hypothesis \(\mathcal{H}_0\) (1 pt)

\(\mathcal{H}_0\) is that \(??\) does not explain \(??\)

3. Calculate test-statistic, exact p-value and plot (2 pt)

My interpretation of the result

# i have pre-selected some families to compare
myfams <- c(
  "B1.5:E1.4(4) B1.5:A1.4(5)",
  "F1.3:A1.2(3) F1.3:E2.2(3)",
  "A1.3:D1.2(3) A1.3:H1.2(3)",
  "D5.4:G2.3(4) D5.4:C4.3(4)"
)

# only keep the familys in myfams
bfam <- b |>
  filter(family %in% myfams) |>
  droplevels()

# simplify family names and make factor
bfam$family <- gsub(pattern = "\\..*", replacement = "", x = bfam$family) |>
  as.factor()


# make B1 the reference (most similar to overall mean)
bfam$family <- relevel(x = bfam$family, ref = "B1")

Problem # 2

Does mouse family explain mouse total cholesterol levels? Make sure to run chunk above.

1. Examine and specify the variable(s) (1 pt)

The response variable y is \(??\)
The explantory variable x is \(??\)

Make a plot: (2 pt)

response variable on the y-axis

explanatory variable on the x-axis

Get n, mean, median, sd (1 pt)

Is it normally distribute? (1 pt)

Answer here

Is it variance similar between groups? (1 pt)

Answer here

What kind of test are you picking and why? (1 pt)

Answer here ### 2. Declare null hypothesis \(\mathcal{H}_0\)

\(\mathcal{H}_0\) is that \(??\) does not explain \(??\) (1 pt)

3. Calculate test-statistic, exact p-value and plot (2 pt)

My interpretation of the result