Generating our coordinates datasets to plot in R
This article is the third in a series focused on building our own dataset to plot maps in R. You can find all the articles in the following links.
- Plotting maps using R: Introduction
- Exploring the administrative divisions: Obtaining maps from Natural Earth or GADM and preparing our workspace
- Generating our coordinate datasets: Working with the attribute table and the vertices tools in QGIS, using Natural Earth’s maps
- Using a custom dataset to plot maps: R code to plot, using maps from GADM
In this article, we will…
- Use expressions to select data.
- Obtain vertices from the map.
- Generate a CSV file ready to be used in R.
Now that QGIS is installed and we have some shapefiles available, it is time to select some useful information from them and generate a fully compatible dataset with ggplot2.
Selecting data of interest
As in the previous article, load the shapefile using Layer > Add Layer > Add Vector Layer…, and then open the attribute table by pressing the F6 shortcut.
At the top of the attribute table, click on Select features using an expression. After that, the window Select by Expression will open, where we will be able to select any data we want.
The central panel shows a huge amount of options that can be used to build an expression. Click on Fields and Values to display all the columns or fields that we previously saw on the attribute table, click on iso_a2 and let’s take a look. In the right panel, click on All unique or 10 samples to show a small portion of the contents on this column.
It is highly encouraged to use iso_a2 to avoid any spelling mistakes with the countries’ names. However, if you prefer, and are sure of the spelling of the name, we can use the admin field. Be careful not to use name_en (or any other of the name columns) because these contain the names of the states, departments, or any other division, not the names of the countries, which are the ones that we need.
In this example, we will generate a map of Central America, so write this expression on the left panel. Remember that admin has the English spelling of the countries’ names.
admin = 'Guatemala' OR admin = 'Belize' OR admin = 'El Salvador' OR admin = 'Honduras' OR admin = 'Nicaragua' OR admin = 'Costa Rica' OR admin = 'Panama'
Once you wrote the expression on the left panel, press the Select Features button to choose the entries on the table that meet the criteria. After that, return to the main QGIS window and look at the results of the selection using an expression.
Generating the vertices for the dataset
The regions represented in QGIS are called polygons and they have multiple vertices. The more detailed the map, for example, those obtained from GADM, the higher amount of vertices shown on the map. Let’s generate a file that contains all the vertices so we can use it in R to plot a map.
Once we have selected the regions to export, go to the main menu and select Vector > Geometry Tools > Extract Vertices.
A new window will open, under Input layer make sure that the option Selected features only is enabled, if not, we will be exporting vertices for the complete map instead of only our desired region. After checking this, click on Run and, after a few seconds, it will generate the vertices. If we are working with many regions, for example, exporting the whole world in a single operation, this task may take several minutes.
Optionally, under Vertices you can click on the button on the right and select the option Save to File… to save the vertices for future use.
After running the task, check the Layers panel on the left side of the main window, where a new layer called Vertices has appeared and shown the vertices as points in the map.
If we want to explore information about the vertices, click on the name of the Vertices layer and press F6 to open the attribute table. Some differences that you will find are that, previously, each region (department, state, and so on) had only one entry, but now there is an entry for every single vertex that delimits that region. Additionally, new columns such as vertex_index, vertex_part, vertex_part_ring, and vertex_part_index were created, those will be extremely important when plotting in R.
To continue, generate a Comma Separated Value file. To do this, go to the Layers panel, right-click on Vertices and select Export > Save Features As…
In the Save Vector Layer As… window, choose the Comma Separated Value [CSV] format, select the location and the file name to save, and check the fields you want to export. To make our file as lightweight as possible, it is recommended to first click on Deselect All and then enable only the following fields:
- name: Name of the region (departments, states, or any other)
- admin: Name of the country to which the region belongs
- vertex_part and vertex_part_ring: These fields will help us to identify islands and territories that are not directly connected to a region (that is, enclaves and exclaves)
- vertex_part_index: This tells us in which order the coordinates must be joined together to draw the region
If you want to keep some additional information you are free to do so, but the minimum requirement for R is the previous selection.
Down in the window, in Layer Options go to the GEOMETRY menu and pick the option AS_XY. Finally, click on OK and wait a moment for QGIS to generate the CSV file.
After the task is complete, we will have a CSV file with seven columns that will be used to plot the map. For this example, our map of Central America will have 19722 entries on its CSV file. In countries with more detailed frontiers or coastlines, Canada for example, the number of entries may be much higher. If you explore the CSV file, you will realize that most entries have 0 as their value on the vertex_part and vertex_ring columns as this field only changes when a region has islands, enclaves, or exclaves.
In conclusion
There is a lot of information available when we load a map, however, to have an easier time later, we must pick only the data that will be relevant to our graphs.
In the next article, we will load this CSV file in R and finally, we will be able to plot our custom map. We will also work with a dataset generated from GADM, and check the difference between its fields and Natural Earth’s ones.