Visualization & (un)supervised analysis

Author

Harm Nijveen

III: Visualisation and unsupervised analysis

The goals for week 3

In the third week, the goals are:

  1. Use classed objects
  2. Use visualizations to explore large data sets
  3. Apply clustering methods
  4. Analyse a data set using PCA
  5. Understand Differential gene expression
Tip 1.1: What are classed objects’?

In week 2 you already encountered object oriented programming (OOP). From Advanced R by Hadley Wickham “A class defines the behavior of objects by describing their attributes and their relationship to other classes. The class is also used when selecting methods, functions that behave differently depending on the class of their input.”

How week 3 is organized

The material you are supposed to work through each day is a chapter in the book. During the scheduled tutorials background and context will be given. Also, there will be help available if you get stuck with the coding.

There will be three days of working through the chapters (Monday - Wednesday).

Monday Chapter 8 Tuesday Chapter 9 Wednesday Chapter 10,

We will mostly work on data associated with the paper ABA Is Required for Plant Acclimation to a Combination of Salt and Heat Stress by Suzuki et al. from 2016.

The assignment

Like before, at the end of Wednesday (23:59), a coding peer-feedback assignment is due, which you submit via feedback fruits on Brightspace. Your assignment will be reviewed by two students, as you will review the assignment of two other students. For this week the assignment consists of creating a set of R functions in a separate R-script, that can be used to perform RNA-seq analyses. Like before Completing the assignment and participation in the code-review is mandatory.

You need to hand in a script file called rnaseq_functions.R that should contain the definition of four functions you will write during the coming three days:

  • load_atlas_se(experiment_id, from_disk)
  • get_normalized_counts(se, log, min_count, min_samples)
  • two_gene_scatter_plot(genes, logcounts, coldata, title)
  • pca_plot(logcounts, coldata, center, scale)

The script should also contain a function called run_tests() that runs each of the functions with appropriate input to test them.

The code-review (peer-feedback via Brightspace) is due Thursday at 13:00. Instructions for how to give feedback can be found in Chapter 3. The items we expect you to give feedback on as well.

The exam

The exam of week 3 does count for your final grade of the course.

At the end of week 3 we expect you to be able to:

  • Determine the class and structure of a classed object.

  • Construct various types of plots to visually analyse your data.

  • Perform hierarchical clustering and explain the key characteristics.

  • Perform k-means clustering and explain the key characteristics.

  • Explain the difference between supervised and unsupervised analysis.

  • Perform a Principal Component Analysis and explain key characteristics of PCA.

  • Explain Differential Gene Expression analysis.