Plotting maps using R
This article is the first in a series focused on building our own datasets to plot maps in R. Soon, you will find here the links to all the articles.
- Plotting maps using R: Introduction
- Exploring the administrative divisions: Obtaining maps from Natural Earth or GADM and preparing our workspace
- Generating our coordinate datasets: Working with the attribute table and the vertices tools in QGIS, using Natural Earth’s maps
- Using a custom dataset to plot maps: R code to plot, using maps from GADM
In this article, we will…
- Review the structure of the geographical data included in ggplot2.
- Plot a map using this data.
R is one of the most popular programming languages in data science and statistics, thanks in part to the huge amount of packages that make our work more convenient.
A very efficient way to represent information about a country or region is by using a map. In this article, we will explore how we can use maps in R, and, in the following articles, how to generate our own for a certain region of interest.
Using ggplot2
ggplot2 is one of the most popular packages in R, used to plot our data easily. This package accepts the different data structures we have in R, such as vectors, data frames, and tibbles as our source of information, and provides us with several primitives that help us create graphs.
# install the package in case we don't have it yet
install.package('ggplot2')# load the package
library('ggplot2')# create a data set
fruits <- c("Apple", "Orange", "Pear")
quantity <- c(10, 20, 15)
data <- data.frame(fruits, quantity)# print the dataset in the console
data# a quite simple graph, bars
# 'aes' is used to select which information goes in each axis
# geom_bar is the primitive to show the information as bars
gg <- ggplot(data, aes(fruit, quantity)) + geom_bar(stat="identity", fill="light green")# show the plot
gg
After running this code we can see in the console the information we stored in data
and the plot we just made.
> data
fruits quantity
1 Apple 10
2 Orange 20
3 Pear 15
For more detailed information on using ggplot2, you can check the following links.
- Documentation: https://ggplot2.tidyverse.org/
- Documentation: https://www.rdocumentation.org/packages/ggplot2/versions/3.3.3
Now let’s work with some maps.
Plotting maps
The ggplot2 package includes the function map_data, this allows us to create data frames with the required points to draw a map. Using this function we can get a map of the whole world, or maps of a few selected countries such as the United States, New Zealand, Italy, and France. If we want maps from other countries, we will generate them using the QGIS software as we will see in the next article. By now, let’s work with the Italy map.
library(ggplot2)italy_map <- map_data("italy")
str(italy_map)
head(italy_map)
When we run this piece of code, head
will print the first five entries in the data frame.
> head(italy_map)
long lat group order region subregion
1 11.83295 46.50011 1 1 Bolzano-Bozen <NA>
2 11.81089 46.52784 1 2 Bolzano-Bozen <NA>
3 11.73068 46.51890 1 3 Bolzano-Bozen <NA>
4 11.69115 46.52257 1 4 Bolzano-Bozen <NA>
5 11.65041 46.50721 1 5 Bolzano-Bozen <NA>
6 11.63282 46.48045 1 6 Bolzano-Bozen <NA>
The structure of the data frame is the following.
- long: Number representing the longitude.
- lat: Number representing the latitude.
- group: Number used to differentiate each polygon. Each region in our map requires a unique id in this column, for example, each province or state, each island, or each enclave.
- order: Each point that belongs to a polygon has an order number associated with it, this is used to tell R in which order the points must be drawn.
- region and subregion: These columns have the name of each territory. Each country has different administrative regions (states, provinces, department, and so on), because of this, we need to know a little bit about the political division of the country we are working with.
Now that we have our data in italy_map
and explored its structure, let’s plot our first map.
# we will need an additional dataset for this example, tibble will help us with this
install.package(tibble)
library(tibble)# generating some random data to color the regions of our map, we can ommit this step if we already have a dataset with information we want to plot
# if we use our own dataset, we need to double check that we have columns named 'region' and 'value'
reg_data <- tibble(region=unique(italy_map$region),
value=sample(100, length(region)))# initializing a new plot
gg <- ggplot()# base layer
# 'data=italy_map' and 'map=italy_map' are used to indicate the coordinates (i.e., points) that we want to plot
# 'map_id' allows us to identificate each region later to add more information to the map
gg <- gg + geom_map(data=italy_map, map=italy_map,
aes(long, lat, map_id=region),
color="#b2b2b2", size=0.1, fill=NA)# we will show the random data we prepared beforehand in the map we just created
# 'data=reg_data' is the information we want to add
# 'fill=value' tells ggplot2 which column from 'reg_data' should be used to determine the color
gg <- gg + geom_map(data=reg_data, map=italy_map,
aes(fill=value, map_id=region),
color="#b2b2b2", size=0.1)# our map is almost ready, but right now it is a little bit stretched
# give the map a nice square appearance using 'coords_fixed'
gg <- gg + coord_fixed(ratio = 1)# finally it's time to show our map
# optional: between each of the previous intructions, we can show the map to view our progress
gg
In conclusion
As we could see in the previous example, ggplot2 is an excellent tool to create graphs. Unfortunately, it only includes the coordinates from a few countries (the regions from United States, France, Italy and New Zealand, and the countries from the world), because of this we want to generate our own datasets with coordinates to graph any country or any region.
In the following articles, we will learn a little bit about administrative regions and how to use the QGIS software to explore the maps that Natural Earth and GADM provides us, with which we will generate some datasets to graph any map in R.