DataTalksClub · Serena-github-c · Oct 23, 2024 · Oct 24, 2024
diff --git a/03-classification/10-training-log-reg.md b/03-classification/10-training-log-reg.md
@@ -20,6 +20,54 @@ This video was about training a logistic regression model with Scikit-Learn, app
 
 The entire code of this project is available in [this jupyter notebook](https://github.com/alexeygrigorev/mlbookcamp-code/blob/master/chapter-03-churn-prediction/03-churn.ipynb).
 
+## Notes on the solver parameter of LogisticRegression in scikit-learn
+A solver refers to the optimization algorithm used to find the coefficients (or weights) that minimize the loss. Solvers are responsible for adjusting the model's parameters during training to fit the data as well as possible.
+
+Here are the main solvers :
+1. lbfgs (Limited-memory Broyden–Fletcher–Goldfarb–Shanno)
+
+    Type: Quasi-Newton method.
+    Use case: Good for small to medium datasets, handles multinomial loss (for multiclass classification).
+    Pros: Efficient for problems with large numbers of classes.
+    Cons: May struggle with very large datasets.
+
+2. liblinear
+
+    Type: Coordinate descent algorithm.
+    Use case: Useful for small to medium datasets; it works well with binary classification and supports L1 and L2 regularization.
+    Pros: Suitable for smaller datasets and simpler models.
+    Cons: Does not handle multinomial classification directly (one-vs-rest is used instead).
+
+3. saga (Stochastic Average Gradient Descent)
+
+    Type: Variation of stochastic gradient descent (SGD).
+    Use case: Best for large datasets, sparse data, and models with L1 (lasso) regularization.
+    Pros: Works well with large datasets, supports L1, L2, and elastic-net regularization.
+    Cons: Typically slower than liblinear on smaller datasets.
+
+4. newton-cg (Newton’s Conjugate Gradient)
+
+    Type: Newton’s method with conjugate gradient optimization.
+    Use case: Suitable for large datasets and multinomial loss.
+    Pros: Can handle large datasets and problems with many classes.
+    Cons: More computationally expensive than lbfgs and saga.
+
+5. sag (Stochastic Average Gradient)
+
+    Type: Stochastic gradient descent (SGD).
+    Use case: Suitable for large datasets and models with L2 regularization.
+    Pros: Fast on large datasets.
+    Cons: Only supports L2 regularization.
+
+How to Choose a Solver:
+
+    If you’re working with a large dataset, try saga or sag.
+    If you need multiclass classification, lbfgs, saga, or newton-cg are good choices.
+    For small datasets, liblinear is often sufficient.
+    If you need L1 regularization or sparse data, saga is recommended.
+
+
+
 <table>
    <tr>
       <td>⚠️</td>

diff --git a/04-evaluation/Solver for Logistic Regression.txt b/04-evaluation/Solver for Logistic Regression.txt
@@ -0,0 +1,47 @@
+In the context of Logistic Regression in scikit-learn, a solver refers to the optimization algorithm used to find the coefficients (or weights) that minimize the loss function (in this case, the log-loss function for logistic regression). Solvers are responsible for adjusting the model's parameters during training to fit the data as well as possible.
+
+Different solvers have different approaches and trade-offs between speed, memory usage, and convergence behavior. In logistic regression, the solvers optimize the likelihood function to find the best-fit parameters.
+
+Here are the main solvers available in scikit-learn for LogisticRegression:
+1. lbfgs (Limited-memory Broyden–Fletcher–Goldfarb–Shanno)
+
+    Type: Quasi-Newton method.
+    Use case: Good for small to medium datasets, handles multinomial loss (for multiclass classification).
+    Pros: Efficient for problems with large numbers of classes.
+    Cons: May struggle with very large datasets.
+
+2. liblinear
+
+    Type: Coordinate descent algorithm.
+    Use case: Useful for small to medium datasets; it works well with binary classification and supports L1 and L2 regularization.
+    Pros: Suitable for smaller datasets and simpler models.
+    Cons: Does not handle multinomial classification directly (one-vs-rest is used instead).
+
+3. saga (Stochastic Average Gradient Descent)
+
+    Type: Variation of stochastic gradient descent (SGD).
+    Use case: Best for large datasets, sparse data, and models with L1 (lasso) regularization.
+    Pros: Works well with large datasets, supports L1, L2, and elastic-net regularization.
+    Cons: Typically slower than liblinear on smaller datasets.
+
+4. newton-cg (Newton’s Conjugate Gradient)
+
+    Type: Newton’s method with conjugate gradient optimization.
+    Use case: Suitable for large datasets and multinomial loss.
+    Pros: Can handle large datasets and problems with many classes.
+    Cons: More computationally expensive than lbfgs and saga.
+
+5. sag (Stochastic Average Gradient)
+
+    Type: Stochastic gradient descent (SGD).
+    Use case: Suitable for large datasets and models with L2 regularization.
+    Pros: Fast on large datasets.
+    Cons: Only supports L2 regularization.
+
+How to Choose a Solver:
+
+    If you’re working with a large dataset, try saga or sag.
+    If you need multiclass classification, lbfgs, saga, or newton-cg are good choices.
+    For small datasets, liblinear is often sufficient.
+    If you need L1 regularization or sparse data, saga is recommended.
+