litteR User Manual

Dennis Walvoort Wageningen Environmental Research, The Netherlands
Willem van Loon Rijkswaterstaat, The Netherlands

September 2021








1 Introduction

litteR is a user-friendly tool for analyzing litter data (e.g., beach litter data). The current version (0.9.1) contains routines for:

The focus of this version of litteR is to provide a a user-friendly, flexible, robust, transparent, and relatively simple tool for litter analysis. Although litteR is distributed as an R-package, experience with R is not required. If you need more information on how to install R, RStudio, and litteR, please consult our installation guide.

Litter data are count data. As has been illustrated in the histogram below (copied with permission from Hanke et al., 2019), litter data generally have skewed distributions. All procedures in litteR are based on robust statistical methods. They do not require distributional assumptions and are relatively robust for outliers.



This user guide consists of two parts. In the first part, the user interface is described, the second part provides details on the technicalities.

For applications with (a previous version of) litteR see Schulz et al. (2019). litteR is the successor of the Litter Analyst software (Schulz et al., 2017).




2 Loading the litteR-package

Before litteR can be used, it should be installed or updated in case you installed litteR before. See our installation guide fore details.

You need to install litteR only once, but you need to load this package each time you start RStudio.

The litteR-package should be loaded in RStudio before you can use it. This can be done by running the following code in the R-console or the RStudio-console:

library(litteR)

A startup messsage appears that gives some essential instructions to start using litteR.




3 User interface

3.1 Create a new project

The easiest way to start working with litteR is to create an empty project directory. This directory can be filled with example and reference files by running:

create_litter_project("d:/work/litter-projects/beach-litter")

in the RStudio-console. For more information on how to obtain and use RStudio, consult its website or read our installation guide.

The argument of function create_litter_project (i.e., the quoted part in parentheses) is an existing work directory on your computer. This can be any valid directory name with sufficient user privileges. Note for MS-Windows users: R requires forward slashes!

It is also possible to run create_litter_project() without an argument. In that case, a simple graphical user interface pops up for interactive directory selection.

3.2 Perform litter analysis

litteR can be started typing litter() in the RStudio console (see the figure below).



After entering litter(), a simple graphical user interface pops up for file selection. An example of a file selection dialogue is given below.





4 Input

litteR needs three input files:

  1. a type file, which contains all litter types and litter groups that are allowed to use;
  2. a data file, which contains litter counts for each monitoring event.
  3. a settings file, which contains all settings needed to perform a litteR run;

These input files are described below.

4.1 Type file

The type file contains a list of all litter types that are allowed to use in the data file. It also indicates to which litter group each litter type belongs. Two example files, named ‘types-ospar.csv’ and ‘types-ospar-tc-sup-fish-plastic.csv’ are automatically generated when using the create_litter_project-function, a described earlier in this tutorial. A type file assigns each litter type (type_name) to one or more litter groups. The first 10 rows of ’types-ospar-tc-sup-fish-plastic.csv are given in the table below.


type_name included SUP FISH PLASTIC
Plastic: Yokes [1] x x x
Plastic: Bags [2] x x x
Plastic: Small_bags [3] x x x
Plastic: Bag_ends [112] x x x
Plastic: Drinks [4] x x x
Plastic: Cleaner [5] x x x
Plastic: Food [6] x x x
Plastic: Toiletries [7] x x x
Plastic: Oil_small [8] x x
Plastic: Oil_large [9] x x


The following columns are in this table:

The user may use one of the provided type files as a template for his own type file. litteR will use the type file that has been specified in the settings-file.

litteR performs regional aggregation at the group level. In order to perform regional aggregation at the type level (the columns in the data file), a group with only one or a few litter types of interest can be constructed in the type file, and then regionally aggregated by running litteR.

4.2 Data file

litteR supports a simple and flexible data format. It is similar to the OSPAR-format. The data are stored in so called wide format: each row refers to a single survey, each column to a single litter type or metadata. The table below gives an example of a small part (i.e., the upper left corner) of a data file.


location_code date Plastic: Yokes [1] Plastic: Bags [2] Plastic…
NL001 2012-01-27 0 3
NL001 2012-04-20 0 8
NL001 2012-07-22 0 1
NL001 2012-10-19 0 2
NL001 2013-02-19 0 24
: : : : :


The columns location_code and date are always required and define unique records (rows) with litter survey data for a specific date and location (e.g., a specific beach, or a location along a river). litteR will use these data to estimate statistics (as the median and trend) for each location_code.

Column location_code may contain location codes (as in the example above), but also full names like ‘Bergen’, ‘Noordwijk’, and ‘La Grève des Courses’. Full names may be more clear when interpreting the results.

The date column gives the monitoring date in ISO format, i.e., YYYY-mm-dd (for example 2021-09-21, to indicate 21 September 2021). For convenience, the OSPAR-format (dd/mm/YYYY) is currently also supported (for example 21/09/2021, to indicate 21 September 2021).

Columns Plastic: Yokes [1], Plastic: Bags [2], … contain the counts for specific litter types. Each litter type (column name) should be listed in the litter type file. Only litter types in the litter type file are valid column names. All column names that are not valid litter types are considered as optional metadata. These columns are ignored by litteR and do not affect the results.

There is one exception: the column region_code is optional and should be available when the locations (in column location_code) also need to be spatially aggregated. Each region_code is related to one or more location_code(s) that are part of that region.

In the data file below, one region_code (NL) is provided for all locations in location_code. Therefore, litteR will spatially aggregate the results for all locations (NL001 … NL004) within the specified region (NL).


region_code location_code date Plastic: Yokes [1] Plastic: Bags [2] Plastic…
NL NL001 2012-01-27 0 3
NL NL001 2012-04-20 0 8
NL NL001 2012-07-22 0 1
: : : : : :
NL NL004 2017-04-14 0 0
NL NL004 2017-07-11 1 0
NL NL004 2017-10-18 0 1


A data file can be constructed easily from existing litter files. As an example consider the OSPAR-format below:


Beach ID Beach name Country Survey date Plastic: Yokes [1] Plastic: Bags [2] Plastic…
NL001 Bergen Netherlands 2012-01-27 0 3
NL001 Bergen Netherlands 2012-04-20 0 8
NL001 Bergen Netherlands 2012-07-22 0 1
: : : : : : :


One can simply rename existing columns to the names required by litteR. This can be done with a spreadsheet program or a text editor. For instance, renaming Beach ID, Country and Survey date to respectively location_code, region_code, and date gives the following valid litteR format:


location_code Beach name region_code date Plastic: Yokes [1] Plastic: Bags [2] Plastic…
NL001 Bergen Netherlands 2012-01-27 0 3
NL001 Bergen Netherlands 2012-04-20 0 8
NL001 Bergen Netherlands 2012-07-22 0 1
: : : : : : :


Column Beach name is not recognized by litteR, and is therefore ignored.

As an alternative, one may also add new columns with valid litteR names to the data file and fill them with the contents of existing columns. See the example below:


region_code location_code date Beach ID Beach name Country Survey date Plastic…
Netherlands Bergen 27/01/2012 NL001 Bergen Netherlands 27/01/2012
Netherlands Bergen 20/04/2012 NL001 Bergen Netherlands 20/04/2012
Netherlands Bergen 22/07/2012 NL001 Bergen Netherlands 22/07/2012
: : : : : : : :


This can be done quite easily with a spreadsheet program. The original columns of the OSPAR-format (Beach ID, Beach name, Country, and Survey date) are ignored by litteR.

It is advised to use region_codes and location_codes that are easily recognized by the user. For instance, in the example above, location_code ‘Bergen’ is easier to interpret than location_code ‘NL001’. Obviously, this choice does not affect the litteR-results.

4.3 Settings file

The settings file contains all settings needed to run litteR. An example of the contents of a settings file is given in the figure below:


# litteR settings file

# Period to analyse (YYYY-mm-dd)
date_min: 2012-01-01
date_max: 2017-12-31

# Percentage of total count to analyse (0 < percentage_total_count <= 100)
percentage_total_count: 80

# Data file.
# Note: the datafile must be in the same path as the settings file
# Note: the file extension should be .csv
file_data: beach-litter-nl-2012-2017.csv

# Type file. Defines the types and their groups
file_types: types-ospar.csv

# Select trend figures to plot in the report
# Note: this can be zero, one, or more than one location_code, region_code,
# group_code, and/or type_name
location_code: ["NL001", "NL004"]
region_code: ["NL"]
group_code: ["TC", "SUP", "FISH"]
type_name: ["Plastic: Bags [2]"]

# figure quality (high or low)
figure_quality: high

# cutoff value vertical axis with litter counts (percentage)
cutoff_count_axis: 100


The settings-file contains the following entries:

4.4 Data Quality Control

All input files are validated by litteR. The following validation rules apply:

  1. all required columns (see above) should be available;
  2. the date format should be valid, i.e. preferably YYYY-mm-dd (ISO). For convenience, the OSPAR-date format (dd/mm/YYYY) is currently also supported;
  3. litter type names should be specified in the type file;
  4. litter counts in the data file are natural numbers (ISO 80000-2). However, in some cases data files contain real numbers due to preprocessing. Think about normalizing survey lengths to a common length. Although litteR prefers natural numbers, real numbers are also allowed for convenience. In case non-natural numbers are found, litteR will give a warning but will continue the analysis.;
  5. all records should be unique, duplicated records will be removed with a warning;
  6. all cells should be filled with the appropriate data type (numbers, text or dates).
  7. the data file should be a comma-separated values file (CSV), i.e., a text file where the columns are separated by commas (,) and not by spaces, semicolons (;) or tabs.




5 Output

litteR produces three output files:

  1. a report, containing all analysis results
  2. a CSV-file, containing all beach litter statistics
  3. a log-file, containing all log data.

For convenience, all input and output files are stored as a snapshot in a directory with names like litteR-results-20210904T221809, where the final part of the name is a timestamp.

5.1 Report

litteR produces an HTML-report that can best be viewed with modern web browsers like Mozilla FireFox, Google Chrome, or Safari. These browsers are freely available from the internet.

The filename of each report starts with ‘litter-results’, followed by a timestamp: YYYYmmddTHHMMSS and the extension html. For example: litteR-results-20210904T221809.html

This section briefly describes each section in the HTML-report

5.1.1 Settings

This section gives a summary of the settings in the settings file.

5.1.2 Data Quality Control

In this section (potential) problems in the input files are reported. These problems are also stored in the log file.

5.1.3 Outlier analysis

For each location_code in the data file, adjusted boxplots are given of the total count for the detection of outliers. Outliers are given as dots (if any) in adjusted box-and-whisker plots. Adjusted boxplots are more suitable for outlier detection in case of skewed distributions than traditional box plots. An example of these box-and-whisker plots are given below.



5.1.4 Descriptive statistics

For each location_code and group/type name, the following statistics are estimated:

  • mean count, i.e., the arithmetic mean of the counts for each litter type;
  • median count, i.e., the median of the counts for each litter type;
  • relative count: the contribution of each litter type to the total count of litter types (%);
  • coefficient of variation (CV): the ratio of the standard deviation to the mean of the counts for each litter type (%);
  • ratio of MAD and median (RMAD, %);
  • number of surveys;
  • Theil-Sen slope: a robust non-parametric estimator of slope (litter counts / year);
  • p-value: the p-value associated with the one-tailed Mann-Kendall test to test the null hypothesis of
    • no monotonically increasing trend in case the Theil-Sen slope is greater than zero;
    • no monotonically decreasing trend in case the Theil-Sen slope is smaller than zero;

These statistics will be estimated for all litter types with the greatest counts making up a percentage of the total count and for all litter groups. This percentage is given as percentage_total_count in the settings file.

The descriptive statistics for the litter types and groups are stored in a CSV-file with a name starting with litteR-results and ending with a timestamp. The statistics for litter groups are also printed as a table and shown as bar plots in the report: one plot for each location_code column of the data file. An example is given in the figure below. If you want other groups, or only a subset of groups, you should modify the type file.



5.1.5 Regional descriptive statistics

When the data file contains column region_code, the data for the location_codes in that region are spatially aggregated in a stepwise fashion:

  1. First a (summary) statistic is estimated for each location (location_code) within that region (region_code).
  2. Next, the same statistic is computed for the results in step 1 to obtain the regional statistic for that region.

Note that these statistics are so called intra-block statistics, i.e., data from individual location_codes are not merged.

The summary statistics are:

  • regional mean: the mean of the means of the individual locations (location_code) within a region (region_code) for each litter group;

  • regional median: the median of the medians of the individual locations (location_code) within a region (region_code) for each litter group;

  • regional slope: the median of the Theil-Sen slopes of the individual locations (location_code) within a region (region_code) for each litter group. Data from different locations have not been mixed in the computation of the Theil-Sen slopes. This method is similar to the one in Gilbert (1987) except that in our procedure all locations within a region contribute equally to the regional trend.

  • p_value: the p-values for each regional trend (slope) are computed by means of the expressions given in Van Belle & Hughes, 1984 (Eqs. 2 and 7) and Gilbert, 1987 (Eqs. 17.1 - 17.5).

5.1.6 Trend analysis

For each location_code, and the type names and group codes specified in the settings file, trends are estimated by means of the Theil-Sen slope estimator: a robust non-parametric estimator of slope (counts / year). The significance of the estimated slopes is tested by means of the Mann-Kendall test. The Mann-Kendall test is a non-parametric test and as such does not make distributional assumptions on the data.

The figure below gives examples of trend plots for total count (TC), single use plastics (SUP), and plastic bags at the beach of Terschelling (The Netherlands). In each plot, the black dots are the observations, the thin gray line segments connect the dots and guide the eye, and the red line is the Theil-Sen slope.