diff --git a/pull337/classification1.html b/pull337/classification1.html index e001036e..6f49dd91 100644 --- a/pull337/classification1.html +++ b/pull337/classification1.html @@ -864,23 +864,23 @@
Fig. 5.2 Scatter plot of concavity versus perimeter with new observation represented as a red diamond.#
Fig. 5.3 Scatter plot of concavity versus perimeter. The new observation is represented as a red diamond with a line to the one nearest neighbor, which has a malignant @@ -1128,23 +1128,23 @@ 5.5. Classification with K-nearest neigh - + Fig. 5.4 Scatter plot of concavity versus perimeter. The new observation is represented as a red diamond with a line to the one nearest neighbor, which has a benign @@ -1207,23 +1207,23 @@ 5.5. Classification with K-nearest neigh - + Fig. 5.5 Scatter plot of concavity versus perimeter with three nearest neighbors.# @@ -1303,23 +1303,23 @@ 5.5.1. Distance between points - + Fig. 5.6 Scatter plot of concavity versus perimeter with new observation represented as a red diamond.# @@ -1499,23 +1499,23 @@ 5.5.1. Distance between points - + Fig. 5.7 Scatter plot of concavity versus perimeter with 5 nearest neighbors circled.# @@ -1711,9 +1711,9 @@ 5.5.2. More than two explanatory variabl }); } - Fig. 5.9 Comparison of K = 3 nearest neighbors with unstandardized and standardized data.# @@ -2504,23 +2504,23 @@ 5.7.1. Centering and scaling - + Fig. 5.10 Close-up of three nearest neighbors for unstandardized data.# @@ -2617,23 +2617,23 @@ 5.7.2. Balancing - + @@ -2714,23 +2714,23 @@ 5.7.2. Balancing - + Fig. 5.12 Imbalanced data with 7 nearest neighbors to a new observation highlighted.# @@ -2788,23 +2788,23 @@ 5.7.2. Balancing - + Fig. 5.13 Imbalanced data with background color indicating the decision of the classifier and the points represent the labeled data.# @@ -2898,23 +2898,23 @@ 5.7.2. Balancing - + Fig. 5.14 Upsampled data with background color indicating the decision of the classifier.# @@ -3437,23 +3437,23 @@ 5.7.3. Missing data - + diff --git a/pull337/classification2.html b/pull337/classification2.html index 7bdd6fd2..648158b3 100644 --- a/pull337/classification2.html +++ b/pull337/classification2.html @@ -802,23 +802,23 @@ 6.5. Evaluating performance with - + @@ -1539,32 +1539,32 @@ 6.6.1. Cross-validation6.6.1. Cross-validation6.6.1. Cross-validation6.6.1. Cross-validation6.6.2. Parameter value selection - + Fig. 6.5 Plot of estimated accuracy versus the number of neighbors.# @@ -2276,23 +2276,23 @@ 6.6.3. Under/Overfitting - + Fig. 6.6 Plot of accuracy estimate versus number of neighbors for many K values.# @@ -2367,23 +2367,23 @@ 6.6.3. Under/Overfitting - + Fig. 6.7 Effect of K in overfitting and underfitting.# @@ -2802,23 +2802,23 @@ 6.8.1. The effect of irrelevant predicto - + Fig. 6.9 Effect of inclusion of irrelevant predictors.# @@ -2881,23 +2881,23 @@ 6.8.1. The effect of irrelevant predicto - + Fig. 6.10 Tuned number of neighbors for varying number of irrelevant predictors.# @@ -2951,23 +2951,23 @@ 6.8.1. The effect of irrelevant predicto - + Fig. 6.11 Accuracy versus number of irrelevant predictors for tuned and untuned number of neighbors.# @@ -3430,23 +3430,23 @@ 6.8.3. Forward selection in Python - + Fig. 6.12 Estimated accuracy versus the number of predictors for the sequence of models built using forward selection.# diff --git a/pull337/clustering.html b/pull337/clustering.html index 68911515..a4f173ee 100644 --- a/pull337/clustering.html +++ b/pull337/clustering.html @@ -448,7 +448,7 @@ 9.4. An illustrative examplethe palmerpenguins R package [Horst et al., 2020]. This data set was collected by Dr. Kristen Gorman and the Palmer Station, Antarctica Long Term Ecological Research Site, and includes -measurements for adult penguins (Fig. 9.1) found near there [Gorman et al., 2014]. +measurements for adult penguins (Fig. 9.1) found near there [Gorman et al., 2014]. Our goal will be to use two variables—penguin bill and flipper length, both in millimeters—to determine whether there are distinct types of penguins in our data. @@ -749,23 +749,23 @@ 9.4. An illustrative example - + Fig. 9.2 Scatter plot of standardized bill length versus standardized flipper length.# @@ -843,23 +843,23 @@ 9.4. An illustrative example - + Fig. 9.3 Scatter plot of standardized bill length versus standardized flipper length with colored groups.# @@ -952,23 +952,23 @@ 9.5.1. Measuring cluster quality - + Fig. 9.4 Cluster 0 from the penguins_standardized data set example. Observations are small blue points, with the cluster center highlighted as a large blue point with a black outline.# @@ -1035,23 +1035,23 @@ 9.5.1. Measuring cluster quality - + Fig. 9.5 Cluster 0 from the penguins_standardized data set example. Observations are small blue points, with the cluster center highlighted as a large blue point with a black outline. The distances from the observations to the cluster center are represented as black lines.# @@ -1114,23 +1114,23 @@ 9.5.1. Measuring cluster quality - + Fig. 9.6 All clusters from the penguins_standardized data set example. Observations are small orange, blue, and yellow points with cluster centers denoted by larger points with a black outline. The distances from the observations to each of the respective cluster centers are represented as black lines.# @@ -1198,23 +1198,23 @@ 9.5.2. The clustering algorithm - + Fig. 9.7 Random initialization of labels. Each cluster is depicted as a different color and shape.# @@ -1280,23 +1280,23 @@ 9.5.2. The clustering algorithm - + Fig. 9.8 First three iterations of K-means clustering on the penguins_standardized example data set. Each pair of plots corresponds to an iteration. Within the pair, the first plot depicts the center update, and the second plot depicts the reassignment of data to clusters. Cluster centers are indicated by larger points that are outlined in black.# @@ -1366,23 +1366,23 @@ 9.5.3. Random restarts - + Fig. 9.9 Random initialization of labels.# @@ -1437,23 +1437,23 @@ 9.5.3. Random restarts - + Fig. 9.10 First four iterations of K-means clustering on the penguins_standardized example data set with a poor random initialization. Each pair of plots corresponds to an iteration. Within the pair, the first plot depicts the center update, and the second plot depicts the reassignment of data to clusters. Cluster centers are indicated by larger points that are outlined in black.# @@ -1523,23 +1523,23 @@ 9.5.4. Choosing K - + Fig. 9.11 Clustering of the penguin data for K clusters ranging from 1 to 9. Cluster centers are indicated by larger points that are outlined in black.# @@ -1599,23 +1599,23 @@ 9.5.4. Choosing K - + Fig. 9.12 Total WSSD for K clusters ranging from 1 to 9.# @@ -1927,23 +1927,23 @@ 9.6. K-means in Python - + Fig. 9.13 The data colored by the cluster assignments returned by K-means.# @@ -2172,23 +2172,23 @@ 9.6. K-means in Python - + Fig. 9.14 A plot showing the total WSSD versus the number of clusters.# @@ -2295,7 +2295,7 @@ 9.9. References GWF14 -Kristen Gorman, Tony Williams, and William Fraser. Ecological sexual dimorphism and environmental variability within a community of Antarctic penguins (genus \emph Pygoscelis). PLoS ONE, 2014. +Kristen Gorman, Tony Williams, and William Fraser. Ecological sexual dimorphism and environmental variability within a community of Antarctic penguins (genus pygoscelis). PLoS ONE, 2014. HHG20 Allison Horst, Alison Hill, and Kristen Gorman. palmerpenguins: Palmer Archipelago penguin data. 2020. R package version 0.1.0. URL: https://allisonhorst.github.io/palmerpenguins/. diff --git a/pull337/inference.html b/pull337/inference.html index f9edcb7e..fdfb71f3 100644 --- a/pull337/inference.html +++ b/pull337/inference.html @@ -1220,23 +1220,23 @@ 10.4.1. Sampling distributions for propo - + Fig. 10.2 Sampling distribution of the sample proportion for sample size 40.# @@ -1344,23 +1344,23 @@ 10.4.2. Sampling distributions for means - + Fig. 10.3 Population distribution of price per night (dollars) for all Airbnb listings in Vancouver, Canada.# @@ -1466,23 +1466,23 @@ 10.4.2. Sampling distributions for means - + Fig. 10.4 Distribution of price per night (dollars) for sample of 40 Airbnb listings.# @@ -1681,23 +1681,23 @@ 10.4.2. Sampling distributions for means - + Fig. 10.5 Sampling distribution of the sample means for sample size of 40.# @@ -1777,23 +1777,23 @@ 10.4.2. Sampling distributions for means - + @@ -1859,23 +1859,23 @@ 10.4.2. Sampling distributions for means - +
Fig. 5.4 Scatter plot of concavity versus perimeter. The new observation is represented as a red diamond with a line to the one nearest neighbor, which has a benign @@ -1207,23 +1207,23 @@ 5.5. Classification with K-nearest neigh - + Fig. 5.5 Scatter plot of concavity versus perimeter with three nearest neighbors.# @@ -1303,23 +1303,23 @@ 5.5.1. Distance between points - + Fig. 5.6 Scatter plot of concavity versus perimeter with new observation represented as a red diamond.# @@ -1499,23 +1499,23 @@ 5.5.1. Distance between points - + Fig. 5.7 Scatter plot of concavity versus perimeter with 5 nearest neighbors circled.# @@ -1711,9 +1711,9 @@ 5.5.2. More than two explanatory variabl }); } - Fig. 5.9 Comparison of K = 3 nearest neighbors with unstandardized and standardized data.# @@ -2504,23 +2504,23 @@ 5.7.1. Centering and scaling - + Fig. 5.10 Close-up of three nearest neighbors for unstandardized data.# @@ -2617,23 +2617,23 @@ 5.7.2. Balancing - + @@ -2714,23 +2714,23 @@ 5.7.2. Balancing - + Fig. 5.12 Imbalanced data with 7 nearest neighbors to a new observation highlighted.# @@ -2788,23 +2788,23 @@ 5.7.2. Balancing - + Fig. 5.13 Imbalanced data with background color indicating the decision of the classifier and the points represent the labeled data.# @@ -2898,23 +2898,23 @@ 5.7.2. Balancing - + Fig. 5.14 Upsampled data with background color indicating the decision of the classifier.# @@ -3437,23 +3437,23 @@ 5.7.3. Missing data - + diff --git a/pull337/classification2.html b/pull337/classification2.html index 7bdd6fd2..648158b3 100644 --- a/pull337/classification2.html +++ b/pull337/classification2.html @@ -802,23 +802,23 @@ 6.5. Evaluating performance with - + @@ -1539,32 +1539,32 @@ 6.6.1. Cross-validation6.6.1. Cross-validation6.6.1.
Fig. 5.5 Scatter plot of concavity versus perimeter with three nearest neighbors.#
Fig. 5.6 Scatter plot of concavity versus perimeter with new observation represented as a red diamond.#
Fig. 5.7 Scatter plot of concavity versus perimeter with 5 nearest neighbors circled.#
Fig. 5.9 Comparison of K = 3 nearest neighbors with unstandardized and standardized data.#
Fig. 5.10 Close-up of three nearest neighbors for unstandardized data.#
Fig. 5.12 Imbalanced data with 7 nearest neighbors to a new observation highlighted.#
Fig. 5.13 Imbalanced data with background color indicating the decision of the classifier and the points represent the labeled data.#
Fig. 5.14 Upsampled data with background color indicating the decision of the classifier.#
- + @@ -1539,32 +1539,32 @@ 6.6.1. Cross-validation6.6.1.
Fig. 6.5 Plot of estimated accuracy versus the number of neighbors.#
Fig. 6.6 Plot of accuracy estimate versus number of neighbors for many K values.#
Fig. 6.7 Effect of K in overfitting and underfitting.#
Fig. 6.9 Effect of inclusion of irrelevant predictors.#
Fig. 6.10 Tuned number of neighbors for varying number of irrelevant predictors.#
Fig. 6.11 Accuracy versus number of irrelevant predictors for tuned and untuned number of neighbors.#
Fig. 6.12 Estimated accuracy versus the number of predictors for the sequence of models built using forward selection.#
palmerpenguins
Fig. 9.2 Scatter plot of standardized bill length versus standardized flipper length.#
Fig. 9.3 Scatter plot of standardized bill length versus standardized flipper length with colored groups.#
Fig. 9.4 Cluster 0 from the penguins_standardized data set example. Observations are small blue points, with the cluster center highlighted as a large blue point with a black outline.#
penguins_standardized
Fig. 9.5 Cluster 0 from the penguins_standardized data set example. Observations are small blue points, with the cluster center highlighted as a large blue point with a black outline. The distances from the observations to the cluster center are represented as black lines.#
Fig. 9.6 All clusters from the penguins_standardized data set example. Observations are small orange, blue, and yellow points with cluster centers denoted by larger points with a black outline. The distances from the observations to each of the respective cluster centers are represented as black lines.#
Fig. 9.7 Random initialization of labels. Each cluster is depicted as a different color and shape.#
Fig. 9.8 First three iterations of K-means clustering on the penguins_standardized example data set. Each pair of plots corresponds to an iteration. Within the pair, the first plot depicts the center update, and the second plot depicts the reassignment of data to clusters. Cluster centers are indicated by larger points that are outlined in black.#
Fig. 9.9 Random initialization of labels.#
Fig. 9.10 First four iterations of K-means clustering on the penguins_standardized example data set with a poor random initialization. Each pair of plots corresponds to an iteration. Within the pair, the first plot depicts the center update, and the second plot depicts the reassignment of data to clusters. Cluster centers are indicated by larger points that are outlined in black.#
Fig. 9.11 Clustering of the penguin data for K clusters ranging from 1 to 9. Cluster centers are indicated by larger points that are outlined in black.#
Fig. 9.12 Total WSSD for K clusters ranging from 1 to 9.#
Fig. 9.13 The data colored by the cluster assignments returned by K-means.#
Fig. 9.14 A plot showing the total WSSD versus the number of clusters.#
Kristen Gorman, Tony Williams, and William Fraser. Ecological sexual dimorphism and environmental variability within a community of Antarctic penguins (genus \emph Pygoscelis). PLoS ONE, 2014.
Kristen Gorman, Tony Williams, and William Fraser. Ecological sexual dimorphism and environmental variability within a community of Antarctic penguins (genus pygoscelis). PLoS ONE, 2014.
Allison Horst, Alison Hill, and Kristen Gorman. palmerpenguins: Palmer Archipelago penguin data. 2020. R package version 0.1.0. URL: https://allisonhorst.github.io/palmerpenguins/.
Fig. 10.2 Sampling distribution of the sample proportion for sample size 40.#
Fig. 10.3 Population distribution of price per night (dollars) for all Airbnb listings in Vancouver, Canada.#
Fig. 10.4 Distribution of price per night (dollars) for sample of 40 Airbnb listings.#
Fig. 10.5 Sampling distribution of the sample means for sample size of 40.#