From 611ba7a4866923bdce0bba76db39f2acd51a57b8 Mon Sep 17 00:00:00 2001 From: Michael Rossetti Date: Thu, 26 Sep 2024 10:19:00 -0400 Subject: [PATCH 1/2] Update index.qmd --- docs/notes/predictive-modeling/regression/index.qmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/notes/predictive-modeling/regression/index.qmd b/docs/notes/predictive-modeling/regression/index.qmd index 4baa405..902e9f7 100644 --- a/docs/notes/predictive-modeling/regression/index.qmd +++ b/docs/notes/predictive-modeling/regression/index.qmd @@ -213,7 +213,7 @@ However, for convenience, we will generally prefer to use corresponding regressi from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error r2 = r2_score(y_true, y_pred) -print("R2:", r2, 3) +print("R2:", r2) mae = mean_absolute_error(y_true, y_pred) print("MAE:", mae) From 1cdf5c1396f654ab2f1b7b3622ddc0a9d9fec4ee Mon Sep 17 00:00:00 2001 From: Michael Rossetti Date: Thu, 26 Sep 2024 10:23:43 -0400 Subject: [PATCH 2/2] Update multiple-features.qmd --- .../predictive-modeling/regression/multiple-features.qmd | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/notes/predictive-modeling/regression/multiple-features.qmd b/docs/notes/predictive-modeling/regression/multiple-features.qmd index a34b560..e2e8c1a 100644 --- a/docs/notes/predictive-modeling/regression/multiple-features.qmd +++ b/docs/notes/predictive-modeling/regression/multiple-features.qmd @@ -16,7 +16,7 @@ However in practice, it is common to use multiple features, each of which may co ## Considerations -When working with multiple features, there is a trade-off between **model performance** and **model complexity**. A model with billions of features, and consequently billions of parameters, can be slower to train and may lead to increased storage and computational costs when deployed. In many cases, a simpler model with fewer features that performs nearly as well as a more complex model can be preferable, especially if it offers faster training, lower deployment costs, and improved interpretability. This trade-off between model complexity and performance should be evaluated based on the specific requirements of the use case, such as the need for speed, scalability, or accuracy. +When working with multiple features, there is a trade-off between **model performance** and **model complexity**. A model with billions of features, and consequently billions of parameters, can be slower to train and may lead to increased storage and computational costs when deployed. In many cases, a simpler model with fewer features that performs nearly as well can be preferable, especially if it offers faster training, lower deployment costs, and improved interpretability. This trade-off between model complexity and performance should be evaluated based on the specific requirements of the use case, such as the need for speed, scalability, or accuracy. As previously discussed, one consideration when using multiple features is the potential need to perform [data scaling](../../applied-stats/data-scaling.qmd), to standardize the scale of all the features, and ensure features with large values aren't dominating the model. Although, for linear regression specifically, data scaling is not as important. @@ -49,7 +49,7 @@ print(dataset.DESCR) - [source](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_california_housing.html) ::: -After reading the dataset description, we see features like `latitude`, `longitude`, `population`, and `income` are describe the census block. Whereas `age`, `rooms`, `bedrooms`, `occupants`, and `value` describe the homes in that census block. +After reading the dataset description, we see features like `latitude`, `longitude`, `population`, and `income` describe the census block. Whereas `age`, `rooms`, `bedrooms`, `occupants`, and `value` describe the homes in that census block. Our goal is to use the features to predict a target of home value.