I: A first look at R

Author

Rens Holmer & Mark Sterken

Published

February 11, 2026

The goals for week 1

In this course you will learn how to conduct data analysis in R. As such, the assignments in this book are meant to take you through the steps of data analysis. Furthermore, it introduces concepts and ways to work with R.

In the first week, the goals are:

Acquire basic knowledge on using R and R studio
Recognize and load common data formats
Apply common statistical tools for inspecting and analysing data
Document your code in a clear & concise way.
Apply the data-science cycle (load, inspect, clean, analyse, present)

Tip 1: What is the ‘data-science cycle’?

You are familiar with collecting data, even with some analysis and interpretation. For instance, in Plant Science in Practice you gathered a dataset on wild plants via fieldwork and in Reproduction of Plants you carried out a small experiment in a laboratory setting. During lab work you follow a particular protocol, and it is no different for data-analysis. We use an iterative cycle of loading, inspecting, cleaning, analyzing, and presenting data. It is normal to go back and forth in the cycle. The main assumption is that any data set is imperfect and has issues. In the first week, we will not bother with imperfection in what you use, but we will discuss it.

How week 1 is organized

The material you are supposed to work through each day is a chapter in the book. During the scheduled tutorials background and context will be given. Also, there will be help available if you get stuck with the coding.

There will be three days of working through the chapters (Monday - Wednesday). The exercises in this book require you to use data. In the first week, this data is available via a link on Brightspace. For each day of the course, there is a data set available.

For Chapter 1 we all work on the same dataset¹.
For Chapter 2 you have your own, personal dataset based on². This data set you find in a folder under your own name.
For Chapter 3, we start with a shared dataset³. For the assignment, you will also have your own, personal data set³. This data set you find in a folder under your own name.

The answers to the assignments will be posted on Brightspace the morning after (so the answers for Chapter 1 will appear on Tuesday).

Next to the chapters, there is some (light) reading to support your view on data analysis. After Chapter 2 you are expected to read a paper by Itai Yanai and Martin Lercher⁴. After Chapter 3 you are expected to read a critique on this paper by Teppo Felin et al.⁵. These two papers should give you a firm grip on hypothesis testing and data analysis.

The assignment

At the end of Wednesday (23:59), a coding peer-feedback assignment is due, which you submit via feedback fruits on Brightspace. Your assignment will be reviewed by two students, as you will review the assignment of two other students. The instructions for this assignment you find in Chapter 3. Completing the assignment and participation in the code-review is mandatory.

You need to hand in a .html file (generated from a .qmd file). Practice with generating these types of files before the assignment is due. Ideally when the tutorials are given.

The code-review (peer-feedback via Brightspace) is due Thursday at 13:00. Instructions for how to give feedback can be found in Chapter 3. The items we expect you to give feedback on as well.

The exam

The exam of week 1 does not count for your final grade of the course. It does however show how the exams in this course go, and you will also receive feedback and a grade (again, it will not count) on what you hand in. The assignment on Wednesday will prepare you for the exam on Friday.

At the end of week 1 we expect you to be able to:

Complete a given .qmd file with answers and completed code-blocks;
Start a data-analysis project in R (set a work directory, install and activate packages);
Load the data formats covered in week 1 (.tsv, .csv, .xlsx, .Rdata);
perform and interpret the outcome of basic checks on loaded data using functions (e.g. dim(), ncol(), summary(), …);
Combine objects that in essence fit together (cbind(), rbind());
Test data for normality (e.g. by making a qqplot or by shapiro.test());
Conduct a t-test (t.test());
Conduct a Wilcoxon rank sum test (wilcox.test());
Be familiar with correlation and clustering;
Translate a p-value to a biological interpretation;
Complete {ggplot2} code to make a histogram, a qqplot, a boxplot, and a scatterplot;
Be able read a boxplot, qqplot, a scatterplot, and a histogram and translate these figures to a biological interpretation.

--- author: "Rens Holmer & Mark Sterken" date: "2026-02-11" format: html: toc: true toc-depth: 3 code-fold: true code-tools: true link-external-newwindow: true link-external-icon: true editor: source execute: eval: false warning: false message: false --- # I: A first look at R ## The goals for week 1 In this course you will learn how to conduct data analysis in R. As such, the assignments in this book are meant to take you through the steps of data analysis. Furthermore, it introduces concepts and ways to work with R. In the first week, the goals are: 1. Acquire basic knowledge on using R and R studio 2. Recognize and load common data formats 3. Apply common statistical tools for inspecting and analysing data 4. Document your code in a clear & concise way. 5. Apply the data-science cycle (load, inspect, clean, analyse, present) ::: {#tip-data-science-cycle .callout-tip appearance="simple"} ### What is the 'data-science cycle'? You are familiar with collecting data, even with some analysis and interpretation. For instance, in [Plant Science in Practice](https://studyhandbook.wur.nl/modules/NEM11305?type=Cursory) you gathered a dataset on wild plants via fieldwork and in [Reproduction of Plants](https://studyhandbook.wur.nl/modules/CLB10803?type=Cursory) you carried out a small experiment in a laboratory setting. During lab work you follow a particular protocol, and it is no different for data-analysis. We use an iterative cycle of loading, inspecting, cleaning, analyzing, and presenting data. It is normal to go back and forth in the cycle. The main assumption is that any data set is imperfect and has issues. In the first week, we will not bother with imperfection in what you use, but we will discuss it. ::: ## How week 1 is organized The material you are supposed to work through each day is a chapter in the book. During the scheduled tutorials background and context will be given. Also, there will be help available if you get stuck with the coding. There will be three days of working through the chapters (Monday - Wednesday). The exercises in this book require you to use data. In the first week, this data is available via a link on [Brightspace](https://brightspace.wur.nl/). For each day of the course, there is a data set available. 1. For [Chapter 1]((week1/day1.qmd)) we all work on the same dataset[@Warmerdam2018]. 2. For [Chapter 2]((week1/day2.qmd)) you have your own, personal dataset based on [@Schaveling2026]. This data set you find in a folder under your own name. 3. For [Chapter 3]((week1/day3.qmd)), we start with a shared dataset[@Willig2023]. For the assignment, you will also have your own, personal data set[@Willig2023]. This data set you find in a folder under your own name. The answers to the assignments will be posted on Brightspace the morning after (so the answers for Chapter 1 will appear on Tuesday). Next to the chapters, there is some (light) reading to support your view on data analysis. After [Chapter 2]((week1/day2.qmd)) you are expected to read a paper by Itai Yanai and Martin Lercher[@Yanai2020]. After [Chapter 3]((week1/day3.qmd)) you are expected to read a critique on this paper by Teppo Felin *et al.*[@Felin2021]. These two papers should give you a firm grip on hypothesis testing and data analysis. ### The assignment At the end of Wednesday (23:59), a coding peer-feedback assignment is due, which you submit via feedback fruits on [Brightspace](https://brightspace.wur.nl/). Your assignment will be reviewed by two students, as you will review the assignment of two other students. The instructions for this assignment you find in [Chapter 3]((week1/day3.qmd)). **Completing the assignment and participation in the code-review is mandatory.** You need to hand in a .html file (generated from a .qmd file). Practice with generating these types of files *before* the assignment is due. Ideally when the tutorials are given. The code-review (peer-feedback via [Brightspace](https://brightspace.wur.nl/)) is due Thursday at 13:00. Instructions for how to give feedback can be found in [Chapter 3]((week1/day3.qmd)). The items we expect you to give feedback on as well. ### The exam The exam of week 1 **does not count for your final grade of the course**. It does however show how the exams in this course go, and you will also receive feedback and a grade (again, it will not count) on what you hand in. The assignment on Wednesday will prepare you for the exam on Friday. At the end of week 1 we expect you to be able to: - Complete a given .qmd file with answers and completed code-blocks; - Start a data-analysis project in R (set a work directory, install and activate packages); - Load the data formats covered in week 1 (.tsv, .csv, .xlsx, .Rdata); - perform and interpret the outcome of basic checks on loaded data using functions (e.g. `dim()`, `ncol()`, `summary()`, ...); - Combine objects that in essence fit together (`cbind()`, `rbind()`); - Test data for normality (e.g. by making a qqplot or by `shapiro.test()`); - Conduct a t-test (`t.test()`); - Conduct a Wilcoxon rank sum test (`wilcox.test()`); - Be familiar with correlation and clustering; - Translate a p-value to a biological interpretation; - Complete {ggplot2} code to make a histogram, a qqplot, a boxplot, and a scatterplot; - Be able read a boxplot, qqplot, a scatterplot, and a histogram and translate these figures to a biological interpretation.