MOLB 7950 Syllabus

Course Overview

MOLB 7950 is a hands-on tutorial of skills and theory needed to process, analyze, and visualize output from large biological data sets. We emphasize the R statistical computing environment.

🗓️ Class will run from Aug 26 - Oct 30

📍 Classes will be held in-person in a location to be determined

🕘 Class time is 9:00-10:30am

MOLB 7950 is a three credit hour course.

The course is divided into blocks:

Bootcamp

THe Bootcamp block covers R programming and introduces important statistical concepts and approaches. We will also cover data types you will encounter during biological data analysis and approaches for their analysis.

During the bootcamp block, we will meet everyday for 90 minutes to cover fundamental concepts you will need throughout the course.

Experimental blocks

After Bootcamp, will cover experimental approaches used to analyze DNA and RNA. Each block spans ~4 weeks, with each week focused on a particular type of experiment (see below). Each block covers statistical concepts needed for rigorous analysis and analysis approaches to process raw data to results (tables and figures) using reproducible coding techniques.

In most weeks we will discuss and analyze data from a publication. You are responsible for reading the week’s material before class begins on Monday.

Block experiments

  1. The DNA block covers genome sequencing for identifying mutations, and two approaches for analyzing chromatin state (ChIP-seq and MNase-seq).

  2. The RNA block covers RNA-seq, alternative splicing, differential gene expression, and RNA:protein interactions.

Schedule

Classes begin on August 26 and end on October 30. Dates are from the Fall 2024 Academic Calendar.

During the Bootcamp block, classes will be held every day, Mon-Fri from 9:00-10:30am.

During the DNA & RNA blocks, we will have in-class exercises and discussion on Mon-Wed-Fri 9:00-10:30am.

Location

Classes will be held in-person in a room TBD. All classes will be recorded and made available through Canvas.

Policies

Attendance

Class attendance is a firm expectation; frequent absences or tardiness are considered cause for a grade reduction.

if you are sick, please let us know (e-mail Srinivas and Matt) and stay home.

Anticipated absences outside of sickness should be reported to the instructors of a given block as soon as possible to make plans for possible accommodation.

We will record all lectures on Panopto and they will be available online through Canvas.

Late and missed work

We have a late work policy for homework assignments:

  • If a problem set set is late but within 24 hours of due date/time, the grade will be reduced by 50%

  • If a problem set is returned any later, no credit will be given.

  • All regrade requests must be discussed with the professor within one week of receiving your grade. There will be no grade changes after the final project.

Diversity & Inclusiveness

Our view is that students from all diverse backgrounds and perspectives will be well-served by this course, that students’ learning needs be addressed both in and out of class, and that the diversity that the students bring to this class iss a resource, strength, and benefit.

Disability Policy

Students with disabilities who need accommodations are encouraged to contact the Office of Disability, Access & Inclusion as soon as possible to ensure that accommodations are implemented in a timely fashion.

Honor code

Academic dishonesty will not be tolerated and is grounds for dismissal from the class with a failing grade (“F”). For other information, please consult the Graduate Student Handbook.

ChatGPT will probably be able to answer most coding questions you ask of it. While it is useful for fleshing out an initial approach from pseudocode, we do not recommend using it, as these conceptual approaches are an essential foundation for buildling expertise in bioinformatic analysis.

Problem Sets

  • Problem sets will be assigned at the end of each class.

  • You can use external resources but must explicitly cite where you have obtained code (both code you used directly and “paraphrased” code / code used as inspiration). Any reused code that is not explicitly cited will be treated as plagiarism.

  • You can discuss the content of assignments with others in this class. If you do so, you must acknowledge your collaborator(s) at the top of your assignment, for example: “Collaborators: Hillary and Bernie”. Failure to acknowledge collaborators will result in a grade of 0. You may not copy code and/or answers directly from another student. If you copy other work, both parties will receive a grade of 0.

  • The problem set with the lowest score for each student will be dropped.

  • Rather than copying someone’s work, ask for help. You are not alone in this course!

Professionalism

  • Please refrain from texting or using your computer for anything other than coursework during class.

Assignments and Grading

The course measures learning through daily problem sets, a final project, and your participation.

Type % of grade
Problem Sets 60
Final Project 20
Participation 20

Grades will be assigned as follows:

Percent total points Grade
>= 95 A
>= 90 A-
>= 85 B+
>= 80 B

Problem sets

We reinforce concepts with problem sets assigned at the end of class that should take ~60 minutes to complete.

Problems sets assigned on Friday will be more substantial, requiring ~1-2 hours to complete.

Together the problem sets constitute 60% of your grade.

Assigned Due Grades By Who grades Time to complete (approx)
Mon @ 12pm Tues @ 5pm Wed @ 5pm Instructors / TAs 60 min
Tue @ 12pm Wed @ 5pm Thurs @ 5pm Instructors / TAs 60 min
Wed @ 12pm Thurs @ 5pm Fri @ 5pm Instructors / TAs 60 min
Thurs @ 12pm Fri @ 5pm Tues @ 5pm Instructors / TAs 60 min
Fri @ 12pm Mon @ 5pm Wed @ 5pm Instructors / TAs 1-2 hr

Final projects

Final projects can be completed in groups of 1-3 people. Projects will involve analysis of existing public data sets and end with a short presentation the last week of class. The final project constitutes 20% of your grade.

Grading Rubrics

Problem Set Rubric

Problem sets are worth 60% of your grade. Values in parentheses represent point values for each level from 20 points total. This rubric will be assessed at the end of the semester.

Criteria Expert Competent Needs Improvement
Coding style Student has gone beyond what was expected and required, coding manual is followed, code is well commented Coding style lacks refinement and has some errors, but code is readable and has some comments Many errors in coding style, little attention paid to making the code human readable
Coding strategy Complicated problem broken down into sub-problems that are individually much simpler. Code is efficient, correct, and minimal. Code uses appropriate data structure (list, data frame, vector/matrix/array). Code checks for common errors Code is correct, but could be edited down to leaner code. Some “hacking” instead of using suitable data structure. Some checks for errors. Code tackles complicated problem in one big chunk. Code is repetitive and could easily be functionalized. No anticipation of errors.
Presentation: graphs Graph(s) carefully tuned for desired purpose. One graph illustrates one point Graph(s) well chosen, but with a few minor problems: inappropriate aspect ratios, poor labels. Graph(s) poorly chosen to support questions.
Presentation: tables Table(s) carefully constructed to make it easy to perform important comparisons. Careful styling highlights important features. Table(s) generally appropriate but possibly some minor formatting deficiencies. Table(s) with too many, or inconsistent, decimal places. Table(s) not appropriate for questions and findings. Major display problems.
Achievement, mastery, cleverness, creativity Student has gone beyond what was expected and required, e.g., extraordinary effort, additional tools not addressed by this course, unusually sophisticated application of tools from course. Tools and techniques from the course are applied very competently and, perhaps,somewhat creatively. Chosen task was acceptable, but fairly conservative in ambition. Student does not display the expected level of mastery of the tools and techniques in this course. Chosen task was too limited in scope.
Ease of access for instructor, compliance with course conventions for submitted work Access as easy as possible, code runs! Satisfactory Not an earnest effort to reduce friction and comply with conventions and/or code does not run

Participation rubric

Attendance & participation is worth 20% of your grade. Values in parentheses represent point values for each level from 20 points total. This rubric will be assessed at the end of the semester.

Criteria Expert Competent Needs improvement
Attendance (physically present for class, or coordinating with instructor when absent) Attends class regularly (5) Attends most classes (4) Attends some classes (0-3)
Preparation (activities required for in-class participation, like surveys and software installation) Completes requested activities prior to class (5) Completes most requested activities prior to class, sometimes needs to finish during class (4) Rarely completes requested activities prior to class, often takes class time to complete (0-3)
Engagement (in-class activities like coding exercises and discussion) Actively engages in class activities (10) Sometimes engages in class activities (8) Doesn’t engage in class activities (0-7)

Acknowldgements & Attribution

Instructor contributions

Several people have contributed to course development over the past several years.

  • Sujatha Jagannathan contributed the original R bootcamp material.
  • Srinivas Ramachandran contributed material for the DNA block, including lecture material and examples for yeast chromatin accessibility and factor mapping.
  • Matt Taliaferro contributed material for the RNA block, including lecture material and examples for RNA expression and splicing analysis.
  • Kent Riemondy and Kristen Wells contributed material for single-cell RNA sequencing.
  • Jay Hesselberth and Neel Mukherjee revamped much of this material in Fall 2023.

External resources

We have borrowed from several (open licensed) resources for course content, including: