IV: Going furthR

The goals for week 4

In this final week, we will consolidate a lot of what was learnt in the first three weeks, with the emphasis on increasing your confidence in writing your own R code to solve data-related issues.

The learning objectives of week 4 are:

  1. Apply data science competences to write your own R code to solve data analytic tasks
  2. Understand common sources of error in data, and apply approaches to deal with these errors
  3. Understand the importance of colour choice for visualisations of different types of data
  4. Apply data science competences to reproduce publication-quality data visualisations
  5. Apply string operations for character string manipulation

How week 4 is organised

Similar to previous weeks, the first three days are intended to work through these objectives, while Thursday is kept for code peer-review and exam preparation, and Friday (morning) the exam.

The content of the first three days is as follows:

  • Monday: Dealing with errors and outliers

  • Tuesday: Exploring genetic and phenotypic diversity in Arabidopsis thaliana

  • Wednesday: (if needed) Completion of the data analysis from Tuesday / String operations

The assignment

As for the previous weeks, there is an assignment due on Wednesday evening (23:59), which forms the basis of the peer-review exercise on Thursday.

The assignment instructions are given at the end of day 3 (Assignment for week 4 code peer-review), with an accompanying dataset that can be downloaded from Brightspace in the Week 4 Content Assignment section.

The exam

The exam of week 4 counts towards your final grade for the course.

At the end of week 4 we expect you to be able to:

  • give a practical (“rule of thumb”) definition of what a data outlier is, and what to do with such data points if they arise

  • apply different methods for identifying errors in data, and be able to correct / curate these before performing an analysis

  • evaluate the advantages and disadvantages of different approaches to dealing with “big data” in R

  • understand the importance of different colour palettes for different data types and visualisation purposes

  • apply basic string operations such as string concatenation, string splitting, as well as be familiar with regular expressions for pattern matching and string manipulation.