Skip to content

Commit

Permalink
bugfixing index
Browse files Browse the repository at this point in the history
  • Loading branch information
trevorcampbell committed Nov 16, 2023
1 parent f3e4dfc commit 4a67b26
Show file tree
Hide file tree
Showing 8 changed files with 54 additions and 54 deletions.
12 changes: 6 additions & 6 deletions source/classification1.md
Original file line number Diff line number Diff line change
Expand Up @@ -183,7 +183,7 @@ total set of variables per image in this data set is:

+++

```{index} pandas.DataFrame; info
```{index} DataFrame; info
```

Below we use the `info` method to preview the data frame. This method can
Expand All @@ -195,7 +195,7 @@ as well as their data types and the number of non-missing entries.
cancer.info()
```

```{index} pandas.Series; unique
```{index} Series; unique
```

From the summary of the data above, we can see that `Class` is of type `object`.
Expand All @@ -213,7 +213,7 @@ method. The `replace` method takes one argument: a dictionary that maps
previous values to desired new values.
We will verify the result using the `unique` method.

```{index} pandas.Series; replace
```{index} Series; replace
```

```{code-cell} ipython3
Expand All @@ -227,7 +227,7 @@ cancer["Class"].unique()

### Exploring the cancer data

```{index} pandas.DataFrame; groupby, pandas.Series;size
```{index} DataFrame; groupby, Series;size
```

```{code-cell} ipython3
Expand Down Expand Up @@ -256,7 +256,7 @@ tumor observations.
100 * cancer.groupby("Class").size() / cancer.shape[0]
```

```{index} pandas.Series; value_counts
```{index} Series; value_counts
```

The `pandas` package also has a more convenient specialized `value_counts` method for
Expand Down Expand Up @@ -1607,7 +1607,7 @@ Imbalanced data with background color indicating the decision of the classifier

+++

```{index} oversampling, pandas.DataFrame; sample
```{index} oversampling, DataFrame; sample
```

Despite the simplicity of the problem, solving it in a statistically sound manner is actually
Expand Down
2 changes: 1 addition & 1 deletion source/clustering.md
Original file line number Diff line number Diff line change
Expand Up @@ -308,7 +308,7 @@ have.
clus = penguins_clustered[penguins_clustered["cluster"] == 0][["bill_length_standardized", "flipper_length_standardized"]]
```

```{index} see: within-cluster sum-of-squared-distances; WSSD
```{index} see: within-cluster sum of squared distances; WSSD
```

```{index} WSSD
Expand Down
20 changes: 10 additions & 10 deletions source/inference.md
Original file line number Diff line number Diff line change
Expand Up @@ -168,7 +168,7 @@ We can find the proportion of listings for each room type
by using the `value_counts` function with the `normalize` parameter
as we did in previous chapters.

```{index} pandas.DataFrame; [], pandas.DataFrame; value_counts
```{index} DataFrame; [], DataFrame; value_counts
```

```{code-cell} ipython3
Expand All @@ -187,13 +187,13 @@ value, {glue:text}`population_proportion`, is the population parameter. Remember
parameter value is usually unknown in real data analysis problems, as it is
typically not possible to make measurements for an entire population.

```{index} pandas.DataFrame; sample, seed;numpy.random.seed
```{index} DataFrame; sample, seed;numpy.random.seed
```

Instead, perhaps we can approximate it with a small subset of data!
To investigate this idea, let's try randomly selecting 40 listings (*i.e.,* taking a random sample of
size 40 from our population), and computing the proportion for that sample.
We will use the `sample` method of the `pandas.DataFrame`
We will use the `sample` method of the `DataFrame`
object to take the sample. The argument `n` of `sample` is the size of the sample to take
and since we are starting to use randomness here,
we are also setting the random seed via numpy to make the results reproducible.
Expand All @@ -213,7 +213,7 @@ airbnb.sample(n=40)["room_type"].value_counts(normalize=True)
glue("sample_1_proportion", "{:.3f}".format(airbnb.sample(n=40, random_state=155)["room_type"].value_counts(normalize=True)["Entire home/apt"]))
```

```{index} pandas.DataFrame; value_counts
```{index} DataFrame; value_counts
```

Here we see that the proportion of entire home/apartment listings in this
Expand Down Expand Up @@ -248,7 +248,7 @@ commonly refer to as $n$) from a population is called
a **sampling distribution**. The sampling distribution will help us see how much we would
expect our sample proportions from this population to vary for samples of size 40.

```{index} pandas.DataFrame; sample
```{index} DataFrame; sample
```

We again use the `sample` to take samples of size 40 from our
Expand Down Expand Up @@ -284,7 +284,7 @@ to compute the number of qualified observations in each sample; finally compute
Both the first and last few entries of the resulting data frame are printed
below to show that we end up with 20,000 point estimates, one for each of the 20,000 samples.

```{index} pandas.DataFrame;groupby, pandas.DataFrame;reset_index
```{index} DataFrame;groupby, DataFrame;reset_index
```

```{code-cell} ipython3
Expand Down Expand Up @@ -479,7 +479,7 @@ The price per night of all Airbnb rentals in Vancouver, BC
is \${glue:text}`population_mean`, on average. This value is our
population parameter since we are calculating it using the population data.

```{index} pandas.DataFrame; sample
```{index} DataFrame; sample
```

Now suppose we did not have access to the population data (which is usually the
Expand Down Expand Up @@ -987,7 +987,7 @@ mean of the sample is \${glue:text}`estimate_mean`.
Remember, in practice, we usually only have this one sample from the population. So
this sample and estimate are the only data we can work with.

```{index} bootstrap; in Python, pandas.DataFrame; sample (bootstrap)
```{index} bootstrap; in Python, DataFrame; sample (bootstrap)
```

We now perform steps 1–5 listed above to generate a single bootstrap
Expand Down Expand Up @@ -1106,7 +1106,7 @@ generate a bootstrap distribution of these point estimates. The bootstrap
distribution ({numref}`fig:11-bootstrapping5`) suggests how we might expect
our point estimate to behave if we take multiple samples.

```{index} pandas.DataFrame;reset_index, pandas.DataFrame;rename, pandas.DataFrame;groupby, pandas.Series;mean
```{index} DataFrame;reset_index, DataFrame;rename, DataFrame;groupby, Series;mean
```

```{code-cell} ipython3
Expand Down Expand Up @@ -1252,7 +1252,7 @@ Quantiles are expressed in proportions rather than percentages,
so the 2.5th and 97.5th percentiles
would be the 0.025 and 0.975 quantiles, respectively.

```{index} pandas.DataFrame; [], pandas.DataFrame;quantile
```{index} DataFrame; [], DataFrame;quantile
```

```{index} percentile
Expand Down
14 changes: 7 additions & 7 deletions source/intro.md
Original file line number Diff line number Diff line change
Expand Up @@ -437,13 +437,13 @@ can_lang

## Creating subsets of data frames with `[]` & `loc[]`

```{index} see: []; pandas.DataFrame
```{index} see: []; DataFrame
```

```{index} see: loc[]; pandas.DataFrame
```{index} see: loc[]; DataFrame
```

```{index} pandas.DataFrame; [], pandas.DataFrame; loc[], selecting columns
```{index} DataFrame; [], DataFrame; loc[], selecting columns
```

Now that we've loaded our data into Python, we can start wrangling the data to
Expand Down Expand Up @@ -475,7 +475,7 @@ high-level categories of languages, which include "Aboriginal languages",
our question we want to filter our data set so we restrict our attention
to only those languages in the "Aboriginal languages" category.

```{index} pandas.DataFrame; [], filtering rows, logical statement, logical operator; equivalency (==), string
```{index} DataFrame; [], filtering rows, logical statement, logical operator; equivalency (==), string
```

We can use the `[]` operation to obtain the subset of rows with desired values
Expand Down Expand Up @@ -521,7 +521,7 @@ can_lang[can_lang["category"] == "Aboriginal languages"]
### Using `[]` to select columns


```{index} pandas.DataFrame; [], selecting columns
```{index} DataFrame; [], selecting columns
```

We can also use the `[]` operation to select columns from a data frame.
Expand Down Expand Up @@ -551,7 +551,7 @@ can_lang[["language", "mother_tongue"]]

### Using `loc[]` to filter rows and select columns

```{index} pandas.DataFrame; loc[], selecting columns
```{index} DataFrame; loc[], selecting columns
```

The `[]` operation is only used when you want to filter rows *or* select columns;
Expand Down Expand Up @@ -612,7 +612,7 @@ So it looks like the `loc[]` operation gave us the result we wanted!

## Using `sort_values` and `head` to select rows by ordered values

```{index} pandas.DataFrame; sort_values, pandas.DataFrame; head
```{index} DataFrame; sort_values, DataFrame; head
```

We have used the `[]` and `loc[]` operations on a data frame to obtain a table
Expand Down
6 changes: 3 additions & 3 deletions source/reading.md
Original file line number Diff line number Diff line change
Expand Up @@ -407,7 +407,7 @@ canlang_data = pd.read_csv(
canlang_data
```

```{index} pandas.DataFrame; rename, pandas
```{index} DataFrame; rename, pandas
```

It is best to rename your columns manually in this scenario. The current column names
Expand Down Expand Up @@ -790,7 +790,7 @@ that we need for analysis; we do eventually need to call `execute`.
For example, `ibis` does not provide the `tail` function to look at the last
rows in a database, even though `pandas` does.

```{index} pandas.DataFrame; tail
```{index} DataFrame; tail
```

```{code-cell} ipython3
Expand Down Expand Up @@ -951,7 +951,7 @@ Databases are beneficial in a large-scale setting:

## Writing data from Python to a `.csv` file

```{index} write function; to_csv, pandas.DataFrame; to_csv
```{index} write function; to_csv, DataFrame; to_csv
```

At the middle and end of a data analysis, we often want to write a data frame
Expand Down
4 changes: 2 additions & 2 deletions source/regression1.md
Original file line number Diff line number Diff line change
Expand Up @@ -233,7 +233,7 @@ how well it predicts house sale price. This subsample is taken to allow us to
illustrate the mechanics of K-NN regression with a few data points; later in
this chapter we will use all the data.

```{index} pandas.DataFrame; sample
```{index} DataFrame; sample
```

To take a small random sample of size 30, we'll use the
Expand Down Expand Up @@ -287,7 +287,7 @@ Scatter plot of price (USD) versus house size (square feet) with vertical line i

+++

```{index} pandas.DataFrame; abs, pandas.DataFrame; nsmallest
```{index} DataFrame; abs, DataFrame; nsmallest
```

We will employ the same intuition from {numref}`Chapters %s <classification1>` and {numref}`%s <classification2>`, and use the
Expand Down
6 changes: 3 additions & 3 deletions source/viz.md
Original file line number Diff line number Diff line change
Expand Up @@ -718,7 +718,7 @@ in the magnitude of these two numbers!
We can confirm that the two points in the upper right-hand corner correspond
to Canada's two official languages by filtering the data:

```{index} pandas.DataFrame; loc[]
```{index} DataFrame; loc[]
```

```{code-cell} ipython3
Expand Down Expand Up @@ -848,7 +848,7 @@ using `_` so that it is easier to read;
this does not affect how Python interprets the number
and is just added for readability.

```{index} pandas.DataFrame; column assignment, pandas.DataFrame; []
```{index} DataFrame; column assignment, DataFrame; []
```

```{code-cell} ipython3
Expand Down Expand Up @@ -1228,7 +1228,7 @@ as `sort_values` followed by `head`, but are slightly more efficient because the
In general, it is good to use more specialized functions when they are available!
```

```{index} pandas.DataFrame; nlargest, pandas.DataFrame; nsmallest
```{index} DataFrame; nlargest, DataFrame; nsmallest
```

```{code-cell} ipython3
Expand Down
Loading

0 comments on commit 4a67b26

Please sign in to comment.