Problem Set Stats Bootcamp - class 12

Hypothesis Testing

Author

Neelanjan Mukherjee

Published

October 21, 2024

-- Attaching core tidyverse packages ------------------------ tidyverse 2.0.0 --
v dplyr     1.1.3     v readr     2.1.4
v forcats   1.0.0     v stringr   1.5.0
v ggplot2   3.4.3     v tibble    3.2.1
v lubridate 1.9.2     v tidyr     1.3.0
v purrr     1.0.2     
-- Conflicts ------------------------------------------ tidyverse_conflicts() --
x dplyr::filter() masks stats::filter()
x dplyr::lag()    masks stats::lag()
i Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Attaching package: 'rstatix'


The following object is masked from 'package:stats':

    filter



Attaching package: 'janitor'


The following object is masked from 'package:rstatix':

    make_clean_names


The following objects are masked from 'package:stats':

    chisq.test, fisher.test


here() starts at /Users/jayhesselberth/devel/rnabioco/molb-7950
biochem <- read_tsv("http://mtweb.cs.ucl.ac.uk/HSMICE/PHENOTYPES/Biochemistry.txt", show_col_types = FALSE) |>
  janitor::clean_names()
Warning in FUN(X[[i]], ...): unable to translate '<U+00C4>' to native encoding
Warning in FUN(X[[i]], ...): unable to translate '<U+00D6>' to native encoding
Warning in FUN(X[[i]], ...): unable to translate '<U+00DC>' to native encoding
Warning in FUN(X[[i]], ...): unable to translate '<U+00E4>' to native encoding
Warning in FUN(X[[i]], ...): unable to translate '<U+00F6>' to native encoding
Warning in FUN(X[[i]], ...): unable to translate '<U+00FC>' to native encoding
Warning in FUN(X[[i]], ...): unable to translate '<U+00DF>' to native encoding
Warning in FUN(X[[i]], ...): unable to translate '<U+00C6>' to native encoding
Warning in FUN(X[[i]], ...): unable to translate '<U+00E6>' to native encoding
Warning in FUN(X[[i]], ...): unable to translate '<U+00D8>' to native encoding
Warning in FUN(X[[i]], ...): unable to translate '<U+00F8>' to native encoding
Warning in FUN(X[[i]], ...): unable to translate '<U+00C5>' to native encoding
Warning in FUN(X[[i]], ...): unable to translate '<U+00E5>' to native encoding
# simplify names a bit more
colnames(biochem) <- gsub(pattern = "biochem_", replacement = "", colnames(biochem))

# we are going to simplify this a bit and only keep some columns
keep <- colnames(biochem)[c(1, 6, 9, 14, 15, 24:28)]
biochem <- biochem[, keep]

# get weights for each individual mouse
# careful: did not come with column names
weight <- read_tsv("http://mtweb.cs.ucl.ac.uk/HSMICE/PHENOTYPES/weight", col_names = F, show_col_types = FALSE)

# add column names
colnames(weight) <- c("subject_name", "weight")

# add weight to biochem table and get rid of NAs
# rename gender to sex
b <- inner_join(biochem, weight, by = "subject_name") |>
  na.omit() |>
  rename(sex = gender)

Problem # 1

Is there an association between mouse calcium and sodium levels?

1. Make a scatterplot to inspect variable

2. Are they normal (enough)?

Which test will you use and why?

3. Declare null hypothesis \(\mathcal{H}_0\)

\(\mathcal{H}_0\) is that there is no dependency/association between \(calcium\) and \(sodium\)

4. Calculate and plot the correlation on a scatterplot

Problem # 2

Do mouse calcium levels explain mouse sodium levels? If so, to what extent?

Use a linear model to do the following:

1. Specify the Response and Explanatory variables — (2 pts)

The response variable y is The explantory variable x is

2. Declare the null hypothesis — (1 pts)

The null hypothesis is …

3. Use the lm function to create a fit (linear model)

also save the slope and intercept for later

4. Add residuals to the data and create a plot visualizing the residuals

5. Calculate the \(R^2\) and compare to \(R^2\) from fit

\(R^2 = 1 - \displaystyle \frac {SS_{fit}}{SS_{null}}\)

\(SS_{fit} = \sum_{i=1}^{n} (data - line)^2 = \sum_{i=1}^{n} (y_{i} - (\beta_0 \cdot 1+ \beta_1 \cdot x)^2\)

\(SS_{null}\) <80><94> sum of squared errors around the mean of \(y\)

\(SS_{null} = \sum_{i=1}^{n} (data - mean)^2 = \sum_{i=1}^{n} (y_{i} - \overline{y})^2\)

6. Using \(R^2\), describe the extent to which calcium explains sodium levels

7. Report (do not calculate) the \(p-value\) and your decision on the null

The null hypothesis is …

Calcium levels to predict sodium levels.

Problem # 3

What is the association between mouse weight and age levels for different sexes?

1. Calculate the pearson correlation coefficient between weight and age for females and males

2. Describe your observations

The relationship between weight and age…