6.4 Working with sf data
As noted earlier, maps created using
coord_sf() rely heavily on tools
provided by the sf package, and indeed the sf package contains many more useful tools for
manipulating simple features data. In this section I provide an introduction to a few
such tools; more detailed coverage can be found on the sf package website
To get started, recall that one advantage to simple features over other representations of spatial data is that geographical units can have complicated structure. A good example of this in the Australian maps data is the electoral district of Eden-Monaro, plotted below:
ozmaps::abs_ced %>% filter(NAME == "Eden-Monaro") edenmonaro <- ggplot(edenmonaro) + geom_sf() p <-+ coord_sf(xlim = c(147.75, 150.25), ylim = c(-37.5, -34.5)) p + coord_sf(xlim = c(150, 150.25), ylim = c(-36.3, -36)) p
As this illustrates, Eden-Monaro is defined in terms of two distinct polygons, a large one on the Australian mainland and a small island. However, the large region has a hole in the middle (the hole exists because the Australian Capital Territory is a distinct political unit that is wholly contained within Eden-Monaro, and as illustrated above, electoral boundaries in Australia do not cross state lines). In sf terminology this is an example of a
MULTIPOLYGON geometry. In this section I’ll talk about the structure of these objects and how to work with them.
First, let’s use dplyr to grab only the geometry object:
edenmonaro %>% pull(geometry)edenmonaro <-
The metadata for the edenmonaro object can accessed using helper functions. For example,
st_geometry_type() extracts the geometry type (e.g.,
st_dimension() extracts the number of dimensions (2 for XY data, 3 for XYZ),
st_bbox() extracts the bounding box as a numeric vector, and
st_crs() extracts the CRS as a list with two components, one for the EPSG code and the other for the proj4string. For example:
st_bbox(edenmonaro) #> xmin ymin xmax ymax #> 147.7 -37.5 150.2 -34.5
Normally when we print the
edenmonaro object the output would display all the additional information (dimension, bounding box, geodetic datum etc) but for the remainder of this section I will show only the relevant lines of the output. In this case edenmonaro is defined by a MULTIPOLYGON geometry containing one feature:
edenmonaro #> Geometry set for 1 feature #> geometry type: MULTIPOLYGON #> MULTIPOLYGON (((150 -36.2, 150 -36.2, 150 -36.3...
However, we can “cast” the MULTIPOLYGON into the two distinct POLYGON geometries from which it is constructed using
st_cast(edenmonaro, "POLYGON") #> Geometry set for 2 features #> geometry type: POLYGON #> POLYGON ((150 -36.2, 150 -36.2, 150 -36.3, 150 ... #> POLYGON ((148 -36.7, 148 -36.7, 148 -36.7, 148 ...
To illustrate when this might be useful, consider the Dawson electorate, which consists of 69 islands in addition to a coastal region on the Australian mainland.
ozmaps::abs_ced %>% dawson <- filter(NAME == "Dawson") %>% pull(geometry) dawson#> Geometry set for 1 feature #> geometry type: MULTIPOLYGON #> MULTIPOLYGON (((148 -19.9, 148 -19.8, 148 -19.8... ggplot(dawson) + geom_sf() + coord_sf()
Suppose, however, our interest is only in mapping the islands. If so, we can first use the
st_cast() function to break the Dawson electorate into the constituent polygons. After doing so, we can use
st_area() to calculate the area of each polygon and
which.max() to find the polygon with maximum area:
st_cast(dawson, "POLYGON") dawson <-which.max(st_area(dawson)) #>  69
The large mainland region corresponds to the 69th polygon within Dawson. Armed with this knowledge, we can draw a map showing only the islands:
ggplot(dawson[-69]) + geom_sf() + coord_sf()