From 626922b5b4032f0c60b561a1607955b289347b73 Mon Sep 17 00:00:00 2001 From: Henrikki Tenkanen Date: Thu, 5 Oct 2023 12:07:47 +0300 Subject: [PATCH] Add material about activating geometry source with set_geometry --- .../chapter-06/md/02-geometric-operations.md | 127 +++- .../nb/02-geometric-operations.ipynb | 699 +++++++++++++++--- 2 files changed, 716 insertions(+), 110 deletions(-) diff --git a/source/part2/chapter-06/md/02-geometric-operations.md b/source/part2/chapter-06/md/02-geometric-operations.md index 39380188..d1cfaefa 100644 --- a/source/part2/chapter-06/md/02-geometric-operations.md +++ b/source/part2/chapter-06/md/02-geometric-operations.md @@ -12,24 +12,20 @@ jupyter: name: python3 --- + # Common geometric operations + + +Geometric operations refer to a set of methods that can be used to process and analyze geometric features, like points, lines and polygons. In the context of geographic data analysis, these operations allow us, for instance, to ask questions about how two or more geographic objects relate to each other: Do they intersect, touch, or overlap? Are they adjacent to one another? How far apart are they? With the tools bundled in geopandas, it is easy to perform these kind of operations. As we delve into geometric operations, you'll discover they form the foundation of many geospatial analyses, enabling insights that are often difficult to discern from non-spatial data alone. -Here we demonstrate some of the most common geometry manipulation functions available in `geopandas`. We will continue exploring the census tract data from Austin, Texas. It is often useful to do geometric manipulations on administrative borders for further analysis and visualization purposes. We will learn how to generate centroids, different outlines and buffer zones for the polygons. +In the following, we demonstrate some of the most common geometric manipulation functions available in geopandas. We will do this by continuing to explore the census tract data from Austin, Texas. Geometric manipulations are often useful e.g. when working with data related to administrative boundaries, as we often might need to transform or manipulate the geographic data in one way or another for further analysis and visualization purposes. Next, we will learn how to generate centroids, different outlines and buffer zones for the polygons. Let's start by reading the census tract data into `GeoDataFrame`. In this case, we use data that we already manipulated a bit in the previous section (by calculating the area and population density): + -```python tags=["remove_cell"] -import os - -os.environ["USE_PYGEOS"] = "0" -``` - -```python +```python editable=true slideshow={"slide_type": ""} import geopandas as gpd -import matplotlib.pyplot as plt from pathlib import Path -``` -```python # Define path do the data data_folder = Path("data/Austin") fp = data_folder / "austin_pop_density_2019.gpkg" @@ -39,62 +35,145 @@ data = gpd.read_file(fp) data.head() ``` -For the purposes of geometric manipulations, we are mainly interested in the geometry column which contains the polygon geometries. Remember, that the data type of the geometry-column is `GeoSeries`. Individual geometries are eventually `shapely` objects and we can use all of `shapely`'s tools for geometry manipulation directly via `geopandas`. + +For the purposes of geometric manipulations, we are mainly interested in the geometry column which contains the polygon geometries. Remember, that the data type of the geometry-column is `GeoSeries`. As we have mentioned earlier, the individual geometries are ultimately shapely geometric objects (e.g. `Point`, `LineString`, `Polygon`), and we can use all of shapely's tools for geometric manipulations directly via geopandas. The following shows that the geometries in the `GeoSeries` are stored as `MultiPolygon` objects: + -```python -# Check contents of the geometry column +```python editable=true slideshow={"slide_type": ""} data["geometry"].head() ``` -```python +```python editable=true slideshow={"slide_type": ""} # Check data type of the geometry column type(data["geometry"]) ``` -```python +```python editable=true slideshow={"slide_type": ""} # Check data type of a value in the geometry column type(data["geometry"].values[0]) ``` -Let's first plot the original geometries. We can use the in-built plotting function in `geopandas` to plot the geometries, and `matplotlib.pyplot` to turn off axis lines and labels. + +Let's first plot the original geometries. We can use the built-in `.plot()` function in geopandas to plot the geometries, and `matplotlib.pyplot` to turn off axis lines and labels: + + +```python editable=true slideshow={"slide_type": ""} +import matplotlib.pyplot as plt -```python data.plot(facecolor="none", linewidth=0.2) plt.axis("off") plt.show() ``` + _**Figure 6.13**. Basic plot of the census tracts._ + - + ## Centroid -Extracting the centroid of geometric features is useful in many cases. Geometric centroid can, for example, be used for locating text labels in visualizations. We can extract the center point of each polygon via the `centroid`-attribute of the geometry-column. The data should be in a projected coordinate reference system when calculating the centroids. If trying to calculate centroids based on latitude and longitude information, `geopandas` will warn us that the results are likely incorrect. Our sample data are in WGS 84 / UTM zone 14N (EPSG:32614), which is a projected , and we can proceed to calculating the centroids. +The centroid of a geometry is the geometric center of a given geometry (line, polygon or a geometry collection). Extracting the centroid of geometric features is useful in many cases. Geometric centroid can, for example, be used for locating text labels in visualizations. We can extract the center point of each polygon via the `centroid` attribute of the `geometry` column. The data should be in a projected coordinate reference system when calculating the centroids. If trying to calculate centroids based on latitude and longitude information, geopandas will warn us that the results are likely (slightly) incorrect. Our `GeoDataFrame` is in WGS 84 / UTM zone 14N (EPSG:32614) coordinate reference system (CRS) which is a projected one (we will learn more about these in the next section). Thus, we can directly proceed to calculating the centroids: + -```python +```python editable=true slideshow={"slide_type": ""} data.crs.name ``` -```python +```python editable=true slideshow={"slide_type": ""} data["geometry"].centroid.head() ``` -We can also apply the method directly to the `GeoDataFrame` to achieve the same result using the syntax `data.centroid`. At the same time, we can also plot the centroids for a visual check. + +We can also apply the method directly to the `GeoDataFrame` to achieve the same result using the syntax `data.centroid`. At the same time, we can also plot the centroids for a visual check: + -```python +```python editable=true slideshow={"slide_type": ""} data.centroid.plot(markersize=1) plt.axis("off") plt.show() ``` + _**Figure 6.14**. Basic plot of census tract centroids._ + + + +## Updating the source for geometries in a GeoDataFrame + +Before diving into other examples of geometric operations, let's discuss briefly about different ways to update the source column which is used to represent the geometries in your `GeoDataFrame`. In some cases, such as when calculating the centroids as we did earlier, you might actually want to save the centroids into your `GeoDataFrame` and continue processing or analysing the data based on these centroids. This can be done easily with geopandas, and there are a couple of approaches how to do this: + +1. Overwrite the existing geometries in the `geometry` column by storing the centroids into it. +2. Create a new column (e.g. `centroid`) and store the centroid into this one. Then activate the column as the "source" for geometries in your `GeoDataFrame`. This means that you can have multiple simultaneous columns containing geometries in a `GeoDataFrame` which can be very handy! + +Some important remarks about these approaches: The option 1 is very easy to do, but the downside of it is the fact that you do not have access to the original geometries (e.g. polygons) anymore. The option 2 requires a couple of steps, but the good side of it, is that you can easily swap between the original geometries and the centroids in your data. However, when saving the geographic data into disk, you can only include one column with geometries. Hence, latest at this stage, you need to decide which column is used for representing the geometric features in your data. In the following, we demonstrate how to do both of these. Let's start by showing how you can overwrite the existing geometries with centroids: + + +```python editable=true slideshow={"slide_type": ""} +# Make a copy +option_1 = data.copy() + +option_1["geometry"].head(2) +``` + +```python editable=true slideshow={"slide_type": ""} +# Update the geometry column with centroids +option_1["geometry"] = option_1.centroid + +option_1.head(2) +``` + + +As we can see, now the geometries in the `geometry` column were replaced and populated with `Point` objects that represent the centroids of the polygons. With this approach, you cannot anymore access the original polygon geometries. + +The second option is to create a new column for storing the centroids and then use this column as the source for representing geometries of the given `GeoDataFrame`: + + +```python editable=true slideshow={"slide_type": ""} +# Make a copy +option_2 = data.copy() + +# Step 1: Create a column with centroids +option_2["centroid"] = data.centroid +option_2.head(2) +``` + + +Now we have two columns in our `GeoDataFrame` that contain geometries. By default, geopandas always uses the `geometry` column as a source for representing the geometries. However, we can easily change this with `.set_geometry()` method which can be used to tell geopandas to use another column with geometries as the geometry-source: + + +```python editable=true slideshow={"slide_type": ""} +# Use centroids as the GeoDataFrame geometries +option2 = option_2.set_geometry("centroid") +option2.head(2) +``` + + +Nothing seem to have changed in the data itself, which is good because we did not want to modify any data. However, when we take a look at the `.geometry.name` attribute of the `GeoDataFrame`, we can see that the name of the column used for representing geometries has actually changed: + + +```python editable=true slideshow={"slide_type": ""} +option2.geometry.name +``` + + +We can still confirm this by plotting our `GeoDataFrame` which now returns a map with points: + + +```python editable=true slideshow={"slide_type": ""} +option2.plot() +``` + +By following this approach, you can easily change the active `geometry` for your `GeoDataFrame`. This can be highly useful when manipulating geometries as you can store the geometries from different computational steps into a same `GeoDataFrame` without a need to make multiple copies of the data. However, we recommend to be a bit careful when storing multiple columns with geometries, as it is possible that you accidentally use a different source for geometries than what you have planned to do, which can cause confusion and problems with your analyses. Always remember the name the columns intuitively which can help avoiding issues and confusion in your analyses! + + ## Unary union We can generate a joint outline for the administrative areas through creating a geometric union among all geometries. This could be useful, for example, for visualizing the outlines of a study area. The `unary_union` returns a single geometry object, which is automatically visualized when running the code in a Jupyter Notebook. + ```python data.unary_union diff --git a/source/part2/chapter-06/nb/02-geometric-operations.ipynb b/source/part2/chapter-06/nb/02-geometric-operations.ipynb index 1a6562da..06f29f34 100644 --- a/source/part2/chapter-06/nb/02-geometric-operations.ipynb +++ b/source/part2/chapter-06/nb/02-geometric-operations.ipynb @@ -3,7 +3,13 @@ { "cell_type": "markdown", "id": "70db89f0", - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "source": [ "# Common geometric operations" ] @@ -11,44 +17,30 @@ { "cell_type": "markdown", "id": "5e8c8dab", - "metadata": {}, - "source": [ - "Here we demonstrate some of the most common geometry manipulation functions available in `geopandas`. We will continue exploring the census tract data from Austin, Texas. It is often useful to do geometric manipulations on administrative borders for further analysis and visualization purposes. We will learn how to generate centroids, different outlines and buffer zones for the polygons. " - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "id": "2573c0ff-abee-4733-956d-25ca18773021", "metadata": { - "tags": [ - "remove_cell" - ] + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] }, - "outputs": [], "source": [ - "import os\n", + "Geometric operations refer to a set of methods that can be used to process and analyze geometric features, like points, lines and polygons. In the context of geographic data analysis, these operations allow us, for instance, to ask questions about how two or more geographic objects relate to each other: Do they intersect, touch, or overlap? Are they adjacent to one another? How far apart are they? With the tools bundled in geopandas, it is easy to perform these kind of operations. As we delve into geometric operations, you'll discover they form the foundation of many geospatial analyses, enabling insights that are often difficult to discern from non-spatial data alone.\n", "\n", - "os.environ[\"USE_PYGEOS\"] = \"0\"" + "In the following, we demonstrate some of the most common geometric manipulation functions available in geopandas. We will do this by continuing to explore the census tract data from Austin, Texas. Geometric manipulations are often useful e.g. when working with data related to administrative boundaries, as we often might need to transform or manipulate the geographic data in one way or another for further analysis and visualization purposes. Next, we will learn how to generate centroids, different outlines and buffer zones for the polygons. Let's start by reading the census tract data into `GeoDataFrame`. In this case, we use data that we already manipulated a bit in the previous section (by calculating the area and population density):" ] }, { "cell_type": "code", "execution_count": 2, - "id": "3035ea5e", - "metadata": {}, - "outputs": [], - "source": [ - "import geopandas as gpd\n", - "import matplotlib.pyplot as plt\n", - "from pathlib import Path" - ] - }, - { - "cell_type": "code", - "execution_count": 3, "id": "7acca403-0eee-41cc-ba9c-9866ea5f1b8c", - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "outputs": [ { "data": { @@ -84,40 +76,40 @@ " 6070.0\n", " 002422\n", " 4.029772\n", - " 1506.288769\n", - " POLYGON ((615643.487 3338728.496, 615645.477 3...\n", + " 1506.288778\n", + " MULTIPOLYGON (((615643.488 3338728.496, 615645...\n", " \n", " \n", " 1\n", " 2203.0\n", " 001751\n", " 1.532030\n", - " 1437.961408\n", - " POLYGON ((618576.586 3359381.053, 618614.330 3...\n", + " 1437.961394\n", + " MULTIPOLYGON (((618576.586 3359381.053, 618614...\n", " \n", " \n", " 2\n", " 7419.0\n", " 002411\n", " 3.960344\n", - " 1873.322183\n", - " POLYGON ((619200.163 3341784.654, 619270.849 3...\n", + " 1873.322161\n", + " MULTIPOLYGON (((619200.163 3341784.654, 619270...\n", " \n", " \n", " 3\n", " 4229.0\n", " 000401\n", " 2.181762\n", - " 1938.341868\n", - " POLYGON ((621623.757 3350508.165, 621656.294 3...\n", + " 1938.341859\n", + " MULTIPOLYGON (((621623.757 3350508.165, 621656...\n", " \n", " \n", " 4\n", " 4589.0\n", " 002313\n", " 2.431208\n", - " 1887.538655\n", - " POLYGON ((621630.247 3345130.744, 621717.926 3...\n", + " 1887.538658\n", + " MULTIPOLYGON (((621630.247 3345130.744, 621717...\n", " \n", " \n", "\n", @@ -125,26 +117,29 @@ ], "text/plain": [ " pop2019 tract area_km2 pop_density_km2 \\\n", - "0 6070.0 002422 4.029772 1506.288769 \n", - "1 2203.0 001751 1.532030 1437.961408 \n", - "2 7419.0 002411 3.960344 1873.322183 \n", - "3 4229.0 000401 2.181762 1938.341868 \n", - "4 4589.0 002313 2.431208 1887.538655 \n", + "0 6070.0 002422 4.029772 1506.288778 \n", + "1 2203.0 001751 1.532030 1437.961394 \n", + "2 7419.0 002411 3.960344 1873.322161 \n", + "3 4229.0 000401 2.181762 1938.341859 \n", + "4 4589.0 002313 2.431208 1887.538658 \n", "\n", " geometry \n", - "0 POLYGON ((615643.487 3338728.496, 615645.477 3... \n", - "1 POLYGON ((618576.586 3359381.053, 618614.330 3... \n", - "2 POLYGON ((619200.163 3341784.654, 619270.849 3... \n", - "3 POLYGON ((621623.757 3350508.165, 621656.294 3... \n", - "4 POLYGON ((621630.247 3345130.744, 621717.926 3... " + "0 MULTIPOLYGON (((615643.488 3338728.496, 615645... \n", + "1 MULTIPOLYGON (((618576.586 3359381.053, 618614... \n", + "2 MULTIPOLYGON (((619200.163 3341784.654, 619270... \n", + "3 MULTIPOLYGON (((621623.757 3350508.165, 621656... \n", + "4 MULTIPOLYGON (((621630.247 3345130.744, 621717... " ] }, - "execution_count": 3, + "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ + "import geopandas as gpd\n", + "from pathlib import Path\n", + "\n", "# Define path do the data\n", "data_folder = Path(\"data/Austin\")\n", "fp = data_folder / \"austin_pop_density_2019.gpkg\"\n", @@ -157,43 +152,60 @@ { "cell_type": "markdown", "id": "6f986ea3", - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "source": [ - "For the purposes of geometric manipulations, we are mainly interested in the geometry column which contains the polygon geometries. Remember, that the data type of the geometry-column is `GeoSeries`. Individual geometries are eventually `shapely` objects and we can use all of `shapely`'s tools for geometry manipulation directly via `geopandas`." + "For the purposes of geometric manipulations, we are mainly interested in the geometry column which contains the polygon geometries. Remember, that the data type of the geometry-column is `GeoSeries`. As we have mentioned earlier, the individual geometries are ultimately shapely geometric objects (e.g. `Point`, `LineString`, `Polygon`), and we can use all of shapely's tools for geometric manipulations directly via geopandas. The following shows that the geometries in the `GeoSeries` are stored as `MultiPolygon` objects:" ] }, { "cell_type": "code", - "execution_count": 4, + "execution_count": 11, "id": "3c5b8fd0", - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "outputs": [ { "data": { "text/plain": [ - "0 POLYGON ((615643.487 3338728.496, 615645.477 3...\n", - "1 POLYGON ((618576.586 3359381.053, 618614.330 3...\n", - "2 POLYGON ((619200.163 3341784.654, 619270.849 3...\n", - "3 POLYGON ((621623.757 3350508.165, 621656.294 3...\n", - "4 POLYGON ((621630.247 3345130.744, 621717.926 3...\n", + "0 MULTIPOLYGON (((615643.488 3338728.496, 615645...\n", + "1 MULTIPOLYGON (((618576.586 3359381.053, 618614...\n", + "2 MULTIPOLYGON (((619200.163 3341784.654, 619270...\n", + "3 MULTIPOLYGON (((621623.757 3350508.165, 621656...\n", + "4 MULTIPOLYGON (((621630.247 3345130.744, 621717...\n", "Name: geometry, dtype: geometry" ] }, - "execution_count": 4, + "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "# Check contents of the geometry column\n", "data[\"geometry\"].head()" ] }, { "cell_type": "code", - "execution_count": 5, + "execution_count": 12, "id": "bf17c2a8", - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "outputs": [ { "data": { @@ -201,7 +213,7 @@ "geopandas.geoseries.GeoSeries" ] }, - "execution_count": 5, + "execution_count": 12, "metadata": {}, "output_type": "execute_result" } @@ -213,17 +225,23 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 13, "id": "2640dc73", - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "outputs": [ { "data": { "text/plain": [ - "shapely.geometry.polygon.Polygon" + "shapely.geometry.multipolygon.MultiPolygon" ] }, - "execution_count": 6, + "execution_count": 13, "metadata": {}, "output_type": "execute_result" } @@ -236,20 +254,32 @@ { "cell_type": "markdown", "id": "e524d8e9", - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "source": [ - "Let's first plot the original geometries. We can use the in-built plotting function in `geopandas` to plot the geometries, and `matplotlib.pyplot` to turn off axis lines and labels." + "Let's first plot the original geometries. We can use the built-in `.plot()` function in geopandas to plot the geometries, and `matplotlib.pyplot` to turn off axis lines and labels:" ] }, { "cell_type": "code", - "execution_count": 7, + "execution_count": 14, "id": "c8e4e801", - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "outputs": [ { "data": { - "image/png": "\n", + "image/png": "", "text/plain": [ "
" ] @@ -259,6 +289,8 @@ } ], "source": [ + "import matplotlib.pyplot as plt\n", + "\n", "data.plot(facecolor=\"none\", linewidth=0.2)\n", "\n", "plt.axis(\"off\")\n", @@ -268,7 +300,13 @@ { "cell_type": "markdown", "id": "bb86c9ad-0465-4516-830f-84d52688fefc", - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "source": [ "_**Figure 6.13**. Basic plot of the census tracts._" ] @@ -277,19 +315,29 @@ "cell_type": "markdown", "id": "11e9d77c", "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, "tags": [] }, "source": [ "## Centroid\n", "\n", - "Extracting the centroid of geometric features is useful in many cases. Geometric centroid can, for example, be used for locating text labels in visualizations. We can extract the center point of each polygon via the `centroid`-attribute of the geometry-column. The data should be in a projected coordinate reference system when calculating the centroids. If trying to calculate centroids based on latitude and longitude information, `geopandas` will warn us that the results are likely incorrect. Our sample data are in WGS 84 / UTM zone 14N (EPSG:32614), which is a projected , and we can proceed to calculating the centroids." + "The centroid of a geometry is the geometric center of a given geometry (line, polygon or a geometry collection). Extracting the centroid of geometric features is useful in many cases. Geometric centroid can, for example, be used for locating text labels in visualizations. We can extract the center point of each polygon via the `centroid` attribute of the `geometry` column. The data should be in a projected coordinate reference system when calculating the centroids. If trying to calculate centroids based on latitude and longitude information, geopandas will warn us that the results are likely (slightly) incorrect. Our `GeoDataFrame` is in WGS 84 / UTM zone 14N (EPSG:32614) coordinate reference system (CRS) which is a projected one (we will learn more about these in the next section). Thus, we can directly proceed to calculating the centroids:" ] }, { "cell_type": "code", - "execution_count": 8, + "execution_count": 17, "id": "2c71636e-13f3-4562-8e80-abfa3e6a6798", - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "outputs": [ { "data": { @@ -297,7 +345,7 @@ "'WGS 84 / UTM zone 14N'" ] }, - "execution_count": 8, + "execution_count": 17, "metadata": {}, "output_type": "execute_result" } @@ -308,9 +356,15 @@ }, { "cell_type": "code", - "execution_count": 9, + "execution_count": 18, "id": "513407ce", - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "outputs": [ { "data": { @@ -323,7 +377,7 @@ "dtype: geometry" ] }, - "execution_count": 9, + "execution_count": 18, "metadata": {}, "output_type": "execute_result" } @@ -335,16 +389,28 @@ { "cell_type": "markdown", "id": "d0446d74", - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "source": [ - "We can also apply the method directly to the `GeoDataFrame` to achieve the same result using the syntax `data.centroid`. At the same time, we can also plot the centroids for a visual check." + "We can also apply the method directly to the `GeoDataFrame` to achieve the same result using the syntax `data.centroid`. At the same time, we can also plot the centroids for a visual check:" ] }, { "cell_type": "code", "execution_count": 10, "id": "20270dc2", - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "outputs": [ { "data": { @@ -367,15 +433,476 @@ { "cell_type": "markdown", "id": "d41ab0cb-7c22-4358-b098-85a167bec1c9", - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "source": [ "_**Figure 6.14**. Basic plot of census tract centroids._" ] }, + { + "cell_type": "markdown", + "id": "8eb48170-2bfd-48fe-82e1-7a3383fefcc4", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "## Updating the source for geometries in a GeoDataFrame\n", + "\n", + "Before diving into other examples of geometric operations, let's discuss briefly about different ways to update the source column which is used to represent the geometries in your `GeoDataFrame`. In some cases, such as when calculating the centroids as we did earlier, you might actually want to save the centroids into your `GeoDataFrame` and continue processing or analysing the data based on these centroids. This can be done easily with geopandas, and there are a couple of approaches how to do this:\n", + "\n", + "1. Overwrite the existing geometries in the `geometry` column by storing the centroids into it.\n", + "2. Create a new column (e.g. `centroid`) and store the centroid into this one. Then activate the column as the \"source\" for geometries in your `GeoDataFrame`. This means that you can have multiple simultaneous columns containing geometries in a `GeoDataFrame` which can be very handy!\n", + "\n", + "Some important remarks about these approaches: The option 1 is very easy to do, but the downside of it is the fact that you do not have access to the original geometries (e.g. polygons) anymore. The option 2 requires a couple of steps, but the good side of it, is that you can easily swap between the original geometries and the centroids in your data. However, when saving the geographic data into disk, you can only include one column with geometries. Hence, latest at this stage, you need to decide which column is used for representing the geometric features in your data. In the following, we demonstrate how to do both of these. Let's start by showing how you can overwrite the existing geometries with centroids:" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "71e2c614-1b13-49a3-8bec-58202c11af76", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "0 MULTIPOLYGON (((615643.488 3338728.496, 615645...\n", + "1 MULTIPOLYGON (((618576.586 3359381.053, 618614...\n", + "Name: geometry, dtype: geometry" + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Make a copy\n", + "option_1 = data.copy()\n", + "\n", + "option_1[\"geometry\"].head(2)" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "id": "ff952582-cc81-4d65-9251-0410c3b6921f", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
pop2019tractarea_km2pop_density_km2geometry
06070.00024224.0297721506.288778POINT (616990.190 3339736.002)
12203.00017511.5320301437.961394POINT (619378.303 3359650.002)
\n", + "
" + ], + "text/plain": [ + " pop2019 tract area_km2 pop_density_km2 geometry\n", + "0 6070.0 002422 4.029772 1506.288778 POINT (616990.190 3339736.002)\n", + "1 2203.0 001751 1.532030 1437.961394 POINT (619378.303 3359650.002)" + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Update the geometry column with centroids\n", + "option_1[\"geometry\"] = option_1.centroid\n", + "\n", + "option_1.head(2)" + ] + }, + { + "cell_type": "markdown", + "id": "a247571e-f8a0-4853-a05c-6b57a33ad79f", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "As we can see, now the geometries in the `geometry` column were replaced and populated with `Point` objects that represent the centroids of the polygons. With this approach, you cannot anymore access the original polygon geometries.\n", + "\n", + "The second option is to create a new column for storing the centroids and then use this column as the source for representing geometries of the given `GeoDataFrame`:" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "id": "a04c28a5-1abc-4089-8000-9efdadded03e", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
pop2019tractarea_km2pop_density_km2geometrycentroid
06070.00024224.0297721506.288778MULTIPOLYGON (((615643.488 3338728.496, 615645...POINT (616990.190 3339736.002)
12203.00017511.5320301437.961394MULTIPOLYGON (((618576.586 3359381.053, 618614...POINT (619378.303 3359650.002)
\n", + "
" + ], + "text/plain": [ + " pop2019 tract area_km2 pop_density_km2 \\\n", + "0 6070.0 002422 4.029772 1506.288778 \n", + "1 2203.0 001751 1.532030 1437.961394 \n", + "\n", + " geometry \\\n", + "0 MULTIPOLYGON (((615643.488 3338728.496, 615645... \n", + "1 MULTIPOLYGON (((618576.586 3359381.053, 618614... \n", + "\n", + " centroid \n", + "0 POINT (616990.190 3339736.002) \n", + "1 POINT (619378.303 3359650.002) " + ] + }, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Make a copy\n", + "option_2 = data.copy()\n", + "\n", + "# Step 1: Create a column with centroids\n", + "option_2[\"centroid\"] = data.centroid\n", + "option_2.head(2)" + ] + }, + { + "cell_type": "markdown", + "id": "12590b65-020d-4de1-8753-d89846e10afb", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "Now we have two columns in our `GeoDataFrame` that contain geometries. By default, geopandas always uses the `geometry` column as a source for representing the geometries. However, we can easily change this with `.set_geometry()` method which can be used to tell geopandas to use another column with geometries as the geometry-source:" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "id": "ee2bc828-a4e7-429e-8c66-f01b1d9a0dee", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
pop2019tractarea_km2pop_density_km2geometrycentroid
06070.00024224.0297721506.288778MULTIPOLYGON (((615643.488 3338728.496, 615645...POINT (616990.190 3339736.002)
12203.00017511.5320301437.961394MULTIPOLYGON (((618576.586 3359381.053, 618614...POINT (619378.303 3359650.002)
\n", + "
" + ], + "text/plain": [ + " pop2019 tract area_km2 pop_density_km2 \\\n", + "0 6070.0 002422 4.029772 1506.288778 \n", + "1 2203.0 001751 1.532030 1437.961394 \n", + "\n", + " geometry \\\n", + "0 MULTIPOLYGON (((615643.488 3338728.496, 615645... \n", + "1 MULTIPOLYGON (((618576.586 3359381.053, 618614... \n", + "\n", + " centroid \n", + "0 POINT (616990.190 3339736.002) \n", + "1 POINT (619378.303 3359650.002) " + ] + }, + "execution_count": 25, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Use centroids as the GeoDataFrame geometries\n", + "option2 = option_2.set_geometry(\"centroid\")\n", + "option2.head(2)" + ] + }, + { + "cell_type": "markdown", + "id": "0c4673de-11a9-4452-a9bf-38ac751fccf1", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "Nothing seem to have changed in the data itself, which is good because we did not want to modify any data. However, when we take a look at the `.geometry.name` attribute of the `GeoDataFrame`, we can see that the name of the column used for representing geometries has actually changed:" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "id": "79eac7fb-e43c-469f-a65d-6050031eff72", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "'centroid'" + ] + }, + "execution_count": 30, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "option2.geometry.name" + ] + }, + { + "cell_type": "markdown", + "id": "f0cb2e98-796e-479a-8a98-0f7b3c46141d", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "We can still confirm this by plotting our `GeoDataFrame` which now returns a map with points:" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "id": "8ef2fe82-5fae-4436-a080-b12d7bb15136", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 28, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "option2.plot()" + ] + }, + { + "cell_type": "markdown", + "id": "ddd5a720-810c-4910-9b94-76b34eb448ea", + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, + "source": [ + "By following this approach, you can easily change the active `geometry` for your `GeoDataFrame`. This can be highly useful when manipulating geometries as you can store the geometries from different computational steps into a same `GeoDataFrame` without a need to make multiple copies of the data. However, we recommend to be a bit careful when storing multiple columns with geometries, as it is possible that you accidentally use a different source for geometries than what you have planned to do, which can cause confusion and problems with your analyses. Always remember the name the columns intuitively which can help avoiding issues and confusion in your analyses!" + ] + }, { "cell_type": "markdown", "id": "c055ea6a", - "metadata": {}, + "metadata": { + "editable": true, + "slideshow": { + "slide_type": "" + }, + "tags": [] + }, "source": [ "## Unary union\n", "\n",