Skip to content

Commit

Permalink
Minor edits for rest of Chapter 4
Browse files Browse the repository at this point in the history
  • Loading branch information
davewhipp committed Oct 13, 2024
1 parent c8b3f3b commit f540c73
Show file tree
Hide file tree
Showing 4 changed files with 62 additions and 77 deletions.
42 changes: 19 additions & 23 deletions source/part1/chapter-04/md/02-subplots.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ jupyter:
<!-- #region editable=true slideshow={"slide_type": ""} -->
# Creating subplots

At this point you should know the basics of making plots with Matplotlib. Now we will expand on our basic plotting skills to learn how to create more advanced plots. In this section, we will show how to visualize data using pandas/Matplotlib and create multi-panel plots such as the one below.
At this point you should know the basics of making plots with `matplotlib`. Now we will expand on our basic plotting skills to learn how to create more advanced plots. In this section, we will show how to visualize data using `pandas` and `matplotlib` and create multi-panel plots such as the one below.

![_**Figure 4.10**. An example of seasonal temperatures for 2012-2013 using pandas and Matplotlib._](../img/subplots.png)

Expand All @@ -34,7 +34,7 @@ fp = "data/029740.txt"

data = pd.read_csv(
fp,
delim_whitespace=True,
sep=r"\s+",
na_values=["*", "**", "***", "****", "*****", "******"],
usecols=["YR--MODAHRMN", "TEMP", "MAX", "MIN"],
parse_dates=["YR--MODAHRMN"],
Expand Down Expand Up @@ -62,7 +62,7 @@ print("Number of no-data values per column: ")
print(data.isna().sum())
```

So, there are 1644 missing values in the TEMP_F column and we should remove those. We need not worry about the no-data values in `'MAX'` and `'MIN'` columns since we will not use them for the plots produced below. We can remove rows from our DataFrame where `'TEMP_F'` is missing values using the `dropna()` method.
So, there are 3579 missing values in the `TEMP_F` column and we should remove those. We need not worry about the no-data values in `MAX` and `MIN` columns since we will not use them for the plots produced below. We can remove rows from our `DataFrame` where `TEMP_F` is missing values using the `.dropna()` method.

```python
data.dropna(subset=["TEMP_F"], inplace=True)
Expand All @@ -72,7 +72,7 @@ print("Number of rows after removing no data values:", len(data))
<!-- #region editable=true slideshow={"slide_type": ""} tags=["question"] -->
#### Question 4.2

How many rows of data would remain if we removed all rows with any no-data values from our data (including no-data values in the `MAX` and `MIN` columns)? If you test this, be sure to save the modified DataFrame to another variable name or do not use the `inplace` parameter.
How many rows of data would remain if we removed all rows with any no-data values from our data (including no-data values in the `MAX` and `MIN` columns)? If you test this, be sure to save the modified `DataFrame` with another variable name or do not use the `inplace` parameter.
<!-- #endregion -->

```python editable=true slideshow={"slide_type": ""} tags=["remove_cell"]
Expand All @@ -82,7 +82,6 @@ How many rows of data would remain if we removed all rows with any no-data value
```python editable=true slideshow={"slide_type": ""} tags=["hide-cell", "remove_book_cell"]
# Solution


len(data.dropna())
```

Expand All @@ -92,7 +91,7 @@ Now that we have loaded the data, we can convert the temperature values from Fah
data["TEMP_C"] = (data["TEMP_F"] - 32.0) / 1.8
```

We can once again now check the contents of our DataFrame.
We can once again now check the contents of our `DataFrame`.

```python
data.head()
Expand All @@ -107,7 +106,7 @@ We can start with creating the subplots by dividing the data in the data file in
- Winter (December 2012 - February 2013)
- Spring (March 2013 - May 2013)
- Summer (June 2013 - August 2013)
- Autumn (Septempber 2013 - November 2013)
- Autumn (September 2013 - November 2013)

```python
winter = data.loc[(data.index >= "201212010000") & (data.index < "201303010000")]
Expand Down Expand Up @@ -142,7 +141,7 @@ Based on the plots above it looks that the correct seasons have been plotted and
### Finding the data bounds

In order to define y-axis limits that will include the data from all of the seasons and be consistent between subplots we first need to find the minimum and maximum temperatures from all of the seasons.
In addition, we should consider that it would be beneficial to have some extra space (padding) between the y-axis limits and those values, such that, for example, the maximum y-axis limit is five degrees higher than the maximum temperature and the minimum y-axis limit is five degrees lower than the minimum temperature. We can do that below by calculating the minumum of each seasons minumum temperature and subtracting five degrees.
In addition, we should consider that it would be beneficial to have some extra space (padding) between the y-axis limits and those values, such that, for example, the maximum y-axis limit is five degrees higher than the maximum temperature and the minimum y-axis limit is five degrees lower than the minimum temperature. We can do that below by calculating the minimum of each seasons minimum temperature and subtracting five degrees.

```python
# Find lower limit for y-axis
Expand All @@ -166,8 +165,8 @@ We can now use this temperature range to standardize the y-axis ranges of our pl

### Displaying multiple subplots in a single figure

With the data split into seasons and y-axis range defined we can now continue to plot data from all four seasons the same figure. We will start by creating a figure containing four subplots in a 2x2 panel using Matplotlib’s `subplots()` function. In the `subplots()` function, the user can specify how many rows and columns of plots they want to have in their figure.
We can also specify the size of our figure with `figsize()` parameter that takes the `width` and `height` values (in inches) as input.
With the data split into seasons and y-axis range defined we can now continue to plot data from all four seasons the same figure. We will start by creating a figure containing four subplots in a two by two panel using the `.subplots()` function from `matplotlib`. In the `.subplots()` function, the user can specify how many rows and columns of plots they want to have in their figure.
We can also specify the size of our figure with the `figsize` parameter that takes the `width` and `height` values (in inches) as input.

```python
fig, axs = plt.subplots(nrows=2, ncols=2, figsize=(12, 8))
Expand All @@ -176,7 +175,7 @@ axs

_**Figure 4.13**. Empty figure template with a 2x2 subplot panel._

We can see from the output of the code cell that we now have a list containing two nested lists, where the first nested list contains the axes for column 1 and 2 of **row 1** and the second contains the axes for columns 1 and 2 of **row 2**.
We can see from the output of the code cell that we now have a list containing two nested lists, where the first nested list contains the axes for column 1 and 2 of row 1 and the second contains the axes for columns 1 and 2 of row 2.

To make it easier to keep track of things, we can parse these axes into their own variables as follows.

Expand All @@ -189,10 +188,10 @@ ax22 = axs[1][1]

Now we have four different axis variables for the different panels in our figure.
Next we can use these axes to plot the seasonal temperature data.
We can start by plotting the data for the different seasons with different colors for each of the lines, and we can specify the *y*-axis limits to be the same for all of the subplots.
We can start by plotting the data for the different seasons with different colors for each of the lines, and we can specify the y-axis limits to be the same for all of the subplots.

- We can use the `c` parameter to change the color of the line. You can define colors using RBG color codes, but it is often easier to use one of the [Matplotlib named colors](https://matplotlib.org/stable/gallery/color/named_colors.html) [^matplotlib_colors].
- We can also change the line width or weight using the `lw`.
- We can use the `c` parameter to change the color of the line. You can define colors using RBG color codes, but it is often easier to use one of the [`matplotlib` named colors](https://matplotlib.org/stable/gallery/color/named_colors.html) [^matplotlib_colors].
- We can also change the line width or weight using the `lw` parameter.
- The `ylim` parameter can be used to define the y-axis limits.

Putting all of this together in a single code cell we have the following:
Expand Down Expand Up @@ -224,15 +223,15 @@ plt.show()

_**Figure 4.14**. Seasonal temperatures for 2012-2013 plotted in a 2x2 panel._

Great, now we have all the plots in same figure! However, we can see that there are some problems with our *x*-axis labels and a few other missing plot items we should add.
Great, now we have all the plots in same figure! However, we can see that there are some problems with our x-axis labels and a few other missing plot items we should add.

Let's re-create the plot and make some improvements. In this version of the plot we will:
Let's recreate the plot and make some improvements. In this version of the plot we will:

- Modify the x- and y-axis labels using the `xlabel` and `ylabel` parameters in the `plot()` function.
- Enable grid lines on the plot using the `grid=True` parameter for the `plot()` function.
- Modify the x- and y-axis labels using the `xlabel` and `ylabel` parameters in the `.plot()` function.
- Enable grid lines on the plot using the `grid=True` parameter for the `.plot()` function.
- Add a figure title using the `fig.suptitle()` function.
- Rotate the x-axis labels using the `plt.setp()` function.
- Add a text label for each plot panel using the `text()` function.
- Add a text label for each plot panel using the `.text()` function.

```python
# Create the figure and subplot axes
Expand Down Expand Up @@ -296,17 +295,14 @@ ax22.text(pd.to_datetime("20131115"), -25, "Autumn")
plt.show()
```

<!-- #region -->
_**Figure 4.15**. Seasonal temperatures for 2012-2013 plotted with season names and gridlines visible._


The new version of the figure essentially conveys the same information as the first version, but the additional plot items help to make it easier to see the plot values and immediately understand the data being presented. Not bad.
<!-- #endregion -->

<!-- #region editable=true slideshow={"slide_type": ""} tags=["question"] -->
#### Question 4.3

Visualize only the winter and summer temperatures in a 1x2 panel figure. Save the resulting figure as a .png file.
Visualize only the winter and summer temperatures in a one by two panel figure. Save the resulting figure as a .png file.
<!-- #endregion -->

```python tags=["remove_cell"]
Expand Down
10 changes: 5 additions & 5 deletions source/part1/chapter-04/md/03-plot-formatting.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ In addition, there are several factors that can help improve the communication o

Not all people viewing your plots will see them the same way. Some viewers may have color blindness, while others may have printed out a copy of your plot in grayscale from a printer. Thus, while choosing nice colors can help make your plots look visually pleasing to you, it is worthwhile to consider the other viewers or formats in which your plots may be viewed. In this way your visualizations can be as inclusive to different viewers as possible.

Let's consider an illustrative example. In this case we will use four lines to plot hypothetical monthly temperatures for various mythical lands in the year 1680 [^nerds]. We will use a pandas DataFrame called `data` for this purpose with four columns and one year of data. We can see temperatures for the first four months in the data table below by typing `data.head(4)`.
Let's consider an illustrative example. In this case we will use four lines to plot hypothetical monthly temperatures for various mythical lands in the year 1680 [^nerds]. We will use a `pandas` `DataFrame` called `data` for this purpose with four columns and one year of data. We can see temperatures for the first four months in the data table below by typing `data.head(4)`.
<!-- #endregion -->

```python tags=["hide-cell"]
Expand All @@ -50,7 +50,7 @@ data = pd.DataFrame(index=dates, data=temperatures)
data.head(4)
```

Using this data we can create a plot (Figure 4.16) to visualize the temperatures for the four mythical lands using the pandas `plot()` function.
Using this data we can create a plot (Figure 4.16) to visualize the temperatures for the four mythical lands using the `pandas` `.plot()` function.

```python
ax = data.plot(
Expand All @@ -63,13 +63,13 @@ ax = data.plot(

_**Figure 4.16**. Hypothetical temperatures for one year in different mythical lands._

In Figure 4.16, we can see a visualization of the contents of the `data` DataFrame and many people will be able to distinguish the lines using the four colors that have been selected. However, not all people may see the figure in the same way, and those who have printed a copy in grayscale will see things quite differently.
In Figure 4.16, we can see a visualization of the contents of the `data` `DataFrame` and many people will be able to distinguish the lines using the four colors that have been selected. However, not all people may see the figure in the same way, and those who have printed a copy in grayscale will see things quite differently.

![_**Figure 4.17**. Hypothetical mythical land temperatures in grayscale._](../img/lines-grayscale.png)

_**Figure 4.17**. Hypothetical mythical land temperatures in grayscale._

In Figure 4.17, we see that it is nearly impossible to tell which line is which in the plot, so color alone is not helping in distinguishing the lines on this plot. In this case a better option is to vary both the color and line pattern for each line so they can be distinguished easily irrespective of the line colors and how they may be seen. This can be done using the `style` parameter in the `plot()` function, as shown below.
In Figure 4.17, we see that it is nearly impossible to tell which line is which in the plot, so color alone is not helping in distinguishing the lines on this plot. In this case a better option is to vary both the color and line pattern for each line so they can be distinguished easily irrespective of the line colors and how they may be seen. This can be done using the `style` parameter in the `.plot()` function, as shown below.

```python
ax = data.plot(
Expand All @@ -83,7 +83,7 @@ ax = data.plot(

_**Figure 4.18**. Hypothetical mythical land temperatures with different line styles._

Here in Figure 4.18, viewers can easily tell which line is which whether they have colorblindness or have printed a figure from a printer in grayscale. The difference, of course, is that this figure uses four different line styles: `-` for a solid line, `:` for a dotted line, `--` for a dashed line, and `-.` for a line with dots and dashes. These are defined using shorthand plot formatting for Matplotlib [^shorthand], for which they are the only four available line styles. If your plots require more than four line styles, you will likely need to use Matplotlib rather than pandas for the plotting. In that case, you can find more about the line styles for Matplotlib plotting in the [Matplotlib documentation online](https://matplotlib.org/stable/gallery/lines_bars_and_markers/linestyles.html) [^linestyles].
Here in Figure 4.18, viewers can easily tell which line is which whether they have colorblindness or have printed a figure from a printer in grayscale. The difference, of course, is that this figure uses four different line styles: `-` for a solid line, `:` for a dotted line, `--` for a dashed line, and `-.` for a line with dots and dashes. These are defined using shorthand plot formatting for `matplotlib` [^shorthand], for which they are the only four available line styles. If your plots require more than four line styles, you will likely need to use `matplotlib` rather than `pandas` for the plotting. In that case, you can find more about the line styles for `matplotlib` plotting in the [`matplotlib` documentation online](https://matplotlib.org/stable/gallery/lines_bars_and_markers/linestyles.html) [^linestyles].

Although this plotting example may seem like a simple tip, it can make a great difference in ensuring all viewers see the same data effectively the same way. We will return to the topic of effective plot design to discuss selecting colors and other visualization tips in greater detail in Chapter 8.

Expand Down
Loading

0 comments on commit f540c73

Please sign in to comment.