2025-10-20
Uploading Raw Data to Repositories
Git & GitHub: Why Version Control Matters
High-Performance Computing with Slurm
Starting an Informatics Analysis
Building Interactive Shiny Dashboards
Reproducibility & Transparency
Long-term Preservation
Best for: Genomics data (RNA-seq, ChIP-seq, microarrays, etc.)
Key Features:
What to Upload:
Resources:
Best for: General research data, code, manuscripts, protocols
Key Features:
What to Upload:
Pro tip: Connect GitHub repo to auto-create Zenodo releases
Resources:
Feature | NCBI GEO | Zenodo |
---|---|---|
Best for | Genomics data | General data/code |
File types | FASTQ, BAM, etc. | Any |
Metadata | Structured, genomics-focused | Flexible |
DOI | No (uses accession) | Yes |
Size limit | Large files OK | 50 GB per dataset |
Journal preference | Required for genomics | Accepted for supplements |
Strategy: Use GEO for raw genomics data, Zenodo for everything else
The Problem Without Git:
analysis_final.R
analysis_final_v2.R
analysis_final_v2_actually_final.R
analysis_final_v2_actually_final_USE_THIS.R
Collaboration nightmares:
What Git Does:
GitHub: Social Network for Git
Reproducibility:
Collaboration:
Experimentation:
Backup:
Repository (repo): Project folder tracked by Git
Commit: Snapshot of your project at a point in time
Branch: Parallel version of your code
Remote: Online copy (GitHub, GitLab)
Common Workflow:
git add
)git commit
)git push
)Essential Resources:
Practice:
Commit Messages:
What to Track:
Use .gitignore to exclude unwanted files
Your laptop is great, but…
HPC Clusters provide:
Access: Available to CU Boulder researchers
Resources:
Getting Started:
Documentation: https://curc.readthedocs.io/
What is Slurm?
Why not just run directly?
#!/bin/bash
#SBATCH --job-name=rnaseq_align
#SBATCH --nodes=1
#SBATCH --ntasks=8 # 8 CPU cores
#SBATCH --mem=32G # 32 GB RAM
#SBATCH --time=04:00:00 # 4 hours max
#SBATCH --output=align_%j.out
#SBATCH --error=align_%j.err
# Load required modules
module load star/2.7.10
# Run analysis
STAR --genomeDir /path/to/genome \
--readFilesIn sample.fastq.gz \
--runThreadN 8 \
--outFileNamePrefix output_
Documentation:
Getting Help:
Best Practices:
/scratch
for temporary filesMake Your Data Interactive:
Use Cases:
What is Shiny?
Two Main Components:
Reactivity: Outputs automatically update when inputs change
Palmer Penguins: Perfect dataset for learning Shiny
We’ll Build:
library(shiny)
library(tidyverse)
library(palmerpenguins)
ui <- fluidPage(
titlePanel("Palmer Penguins Explorer"),
sidebarLayout(
sidebarPanel(
# Inputs go here
),
mainPanel(
# Outputs go here
)
)
)
server <- function(input, output, session) {
# Reactive logic goes here
}
shinyApp(ui = ui, server = server)
ui <- fluidPage(
titlePanel("Palmer Penguins Explorer"),
sidebarLayout(
sidebarPanel(
selectInput("x_var", "X-axis variable:",
choices = c("bill_length_mm", "bill_depth_mm",
"flipper_length_mm", "body_mass_g")),
selectInput("y_var", "Y-axis variable:",
choices = c("bill_length_mm", "bill_depth_mm",
"flipper_length_mm", "body_mass_g"),
selected = "bill_depth_mm"),
checkboxGroupInput("species", "Select species:",
choices = c("Adelie", "Chinstrap", "Gentoo"),
selected = c("Adelie", "Chinstrap", "Gentoo"))
),
mainPanel(
plotOutput("scatter_plot"),
tableOutput("summary_table")
)
)
)
server <- function(input, output, session) {
# Reactive filtered data
filtered_data <- reactive({
penguins |>
filter(species %in% input$species) |>
drop_na()
})
# Scatter plot
output$scatter_plot <- renderPlot({
ggplot(filtered_data(),
aes(x = .data[[input$x_var]],
y = .data[[input$y_var]],
color = species)) +
geom_point(size = 3, alpha = 0.7) +
labs(x = input$x_var, y = input$y_var) +
theme_minimal()
})
# Summary table
output$summary_table <- renderTable({
filtered_data() |>
group_by(species) |>
summarize(n = n(), .groups = "drop")
})
}
Add More Interactivity:
# In UI sidebarPanel:
sliderInput("point_size", "Point size:",
min = 1, max = 5, value = 3),
checkboxInput("show_smooth", "Show trend line", FALSE),
downloadButton("download_plot", "Download Plot")
# In server:
output$download_plot <- downloadHandler(
filename = function() {
paste0("penguins_plot_", Sys.Date(), ".png")
},
content = function(file) {
ggsave(file, plot = current_plot(),
width = 8, height = 6)
}
)
Development Workflow:
Debugging Tips:
print()
or browser()
in server functionoptions(shiny.reactlog = TRUE)
Options:
Quick Deploy to shinyapps.io:
Getting Started:
For Bioinformatics:
Practice: Start simple, add features incrementally
I can help you:
Example prompts:
“Create a Shiny app to visualize my RNA-seq results with a volcano plot”
“Add a download button for the filtered data table”
“Why isn’t my plot updating when I change the input?”
Note: For running apps, use the Shiny Assistant (@shiny
)
Recommended Project Organization:
my_project/
├── README.md # Project overview
├── data/
│ ├── raw/ # Original, untouched data
│ └── processed/ # Cleaned, filtered data
├── scripts/
│ ├── 01_download_data.sh
│ ├── 02_quality_control.R
│ └── 03_analysis.R
├── results/
│ ├── figures/
│ └── tables/
├── docs/ # Documentation, notes
└── environment/ # Conda/renv files
Before You Begin:
First Analysis Steps:
What is Positron?
How I Can Help You Start:
Ask Positron Assistant:
“I have raw RNA-seq FASTQ files. Help me set up a project and write a quality control script using FastQC”
I can help with:
Remember: I can see your files, variables, and session info - share context!
Tips for Better Assistance:
I can help with:
You: “I need to set up a new RNA-seq project with data from GEO accession GSE123456”
I provide:
You can then ask: “Now write a FastQC script for these files”
Complete Workflow:
Every step is reproducible and documented!
Data Repositories:
Git & GitHub:
CU Alpine:
Positron:
Key Takeaways:
Next Steps: