Skip to content

Commit

Permalink
Merge branch 'main' of github.com:prof-rossetti/python-for-finance in…
Browse files Browse the repository at this point in the history
…to main
  • Loading branch information
s2t2 committed Oct 4, 2024
2 parents 054293a + 1cdf5c1 commit eee7c35
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 3 deletions.
2 changes: 1 addition & 1 deletion docs/notes/predictive-modeling/regression/index.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -213,7 +213,7 @@ However, for convenience, we will generally prefer to use corresponding regressi
from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error

r2 = r2_score(y_true, y_pred)
print("R2:", r2, 3)
print("R2:", r2)

mae = mean_absolute_error(y_true, y_pred)
print("MAE:", mae)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ However in practice, it is common to use multiple features, each of which may co

## Considerations

When working with multiple features, there is a trade-off between **model performance** and **model complexity**. A model with billions of features, and consequently billions of parameters, can be slower to train and may lead to increased storage and computational costs when deployed. In many cases, a simpler model with fewer features that performs nearly as well as a more complex model can be preferable, especially if it offers faster training, lower deployment costs, and improved interpretability. This trade-off between model complexity and performance should be evaluated based on the specific requirements of the use case, such as the need for speed, scalability, or accuracy.
When working with multiple features, there is a trade-off between **model performance** and **model complexity**. A model with billions of features, and consequently billions of parameters, can be slower to train and may lead to increased storage and computational costs when deployed. In many cases, a simpler model with fewer features that performs nearly as well can be preferable, especially if it offers faster training, lower deployment costs, and improved interpretability. This trade-off between model complexity and performance should be evaluated based on the specific requirements of the use case, such as the need for speed, scalability, or accuracy.

As previously discussed, one consideration when using multiple features is the potential need to perform [data scaling](../../applied-stats/data-scaling.qmd), to standardize the scale of all the features, and ensure features with large values aren't dominating the model. Although, for linear regression specifically, data scaling is not as important.

Expand Down Expand Up @@ -49,7 +49,7 @@ print(dataset.DESCR)
- [source](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_california_housing.html)
:::

After reading the dataset description, we see features like `latitude`, `longitude`, `population`, and `income` are describe the census block. Whereas `age`, `rooms`, `bedrooms`, `occupants`, and `value` describe the homes in that census block.
After reading the dataset description, we see features like `latitude`, `longitude`, `population`, and `income` describe the census block. Whereas `age`, `rooms`, `bedrooms`, `occupants`, and `value` describe the homes in that census block.

Our goal is to use the features to predict a target of home value.

Expand Down

0 comments on commit eee7c35

Please sign in to comment.