From 5025813c6d94e04b7c7c33575519ba83336d9034 Mon Sep 17 00:00:00 2001
From: jin <571979568@qq.com>
Date: Mon, 15 Jul 2024 07:52:02 +0800
Subject: [PATCH] docs:Update Regression.md of Model Example

---
 .../Model Example/Regression/Regression.md    | 490 +++++++++---------
 1 file changed, 249 insertions(+), 241 deletions(-)
diff --git a/docs/source/For User/Model Example/Regression/Regression.md b/docs/source/For User/Model Example/Regression/Regression.md
index 731b7df7..e36745cc 100644
--- a/docs/source/For User/Model Example/Regression/Regression.md	
+++ b/docs/source/For User/Model Example/Regression/Regression.md	
@@ -24,19 +24,11 @@ By running this line of command, the following output should show up on your scr
 2 - Data For Classification
 3 - Data For Clustering
 4 - Data For Dimensional Reduction
-(User) ➜ @Number:
+(User) ➜ @Number: 1
 ```
 
 Enter the serial number of the sub-menu you want to choose and press `Enter`. In this doc, we will focus on the usage of Regression function, to do that, enter `1` and press `Enter`.
 
-```python
--*-*- Built-in Data Option-*-*-
-1 - Data For Regression
-2 - Data For Classification
-3 - Data For Clustering
-4 - Data For Dimensional Reduction
-(User) ➜ @Number: 1
-```
 
 ### 2.2 Generate a map projection
 
@@ -103,7 +95,7 @@ After pressing `Enter`to move forward, you will see a question pops up enquiring
 World Map Projection for A Specific Element Option:
 1 - Yes
 2 - No
-(Plot) ➜ @Number:
+(Plot) ➜ @Number:  1
 ```
 
 By choosing “Yes”, you can then choose one element to be projected in the world map; By choosing “No”, you can skip to the next mode. For demonstrating, we choose “Yes” in this case:
@@ -161,16 +153,17 @@ Index - Column Name
 46 - TH(PPM)
 47 - U(PPM)
 --------------------
-(Plot) ➜ @Number:
+(Plot) ➜ @Number: 10
 ```
 
 Here, we choose “10 - AL2O3(WT%)” as an example, after this, the path to save the image will be presented:
 
 ```python
-Save figure 'Map Projection - AL2O3(WT%)' in /home/yucheng/output/images/ma
+Save figure 'Map Projection - AL2O3(WT%)' in C:\Users\YSQ\geopi_output\Regression\test\artifacts\image\map.
+Successfully store 'Map Projection - AL2O3(WT%)'in 'Map Projection - AL2O3(WT%).xlsx' inC:\Users\YSQ\geopi_output\Regression\test\artifacts\image\map.
 ```
 
-![Map Projection - AL2O3(WT%)](https://github.com/ZJUEarthData/geochemistrypi/assets/66779478/0edf28b3-3006-49e6-a2b4-8ddcc7f94306)
+![Map Projection - AL2O3(WT%)](https://github.com/ZJUEarthData/geochemistrypi/assets/162782014/5b642790-99a6-4421-9422-7fd482a6d425)
 <font color=gray size=1><center>Map Projection - AL2O3(WT%)</center></font>
 
 When you see the following instruction:
@@ -179,10 +172,10 @@ When you see the following instruction:
 Do you want to continue to project a new element in the World Map?
 1 - Yes
 2 - No
-(Plot) ➜ @Number:
+(Plot) ➜ @Number: 2
 ```
 
-You can choose “Yes” to map another element or choose “No” to exit map mode.
+You can choose “Yes” to map another element or choose “No” to exit map mode. Here, we choose 2 to skip this step.
 
 ### 2.3 Enter the range of data and check the output
 
@@ -193,7 +186,7 @@ Select the data range you want to process.
 Input format:
 Format 1: "[**, **]; **; [**, **]", such as "[1, 3]; 7; [10, 13]" --> you want to deal with the columns 1, 2, 3, 7, 10, 11, 12, 13
 Format 2: "xx", such as "7" --> you want to deal with the columns 7
-@input:
+@input: [10,13]
 ```
 
 Here, we use “[10, 13]” as an example. The values of the elements we choose would be shown on the screen.
@@ -253,30 +246,25 @@ min      0.230000    0.000000    1.371100   13.170000
 50%      4.720000    0.925000    2.690000   21.223500
 75%      6.233341    1.243656    3.330000   22.185450
 max      8.110000    3.869550    8.145000   25.362000
-Successfully calculate the pair-wise correlation coefficient among the selected columns.
-Save figure 'Correlation Plot' in C:\Users\86188\geopi_output\GeoPi - Rock Classification\Xgboost Algorithm - Test 1\artifacts\image\statistic.
-Successfully store 'Correlation Plot' in 'Correlation Plot.xlsx' in C:\Users\86188\geopi_output\GeoPi - Rock Classification\Xgboost Algorithm - Test 1\artifacts\image\statistic.
-Successfully draw the distribution plot of the selected columns.
-Save figure 'Distribution Histogram' in C:\Users\86188\geopi_output\GeoPi - Rock Classification\Xgboost Algorithm - Test 1\artifacts\image\statistic.
-Successfully store 'Distribution Histogram' in 'Distribution Histogram.xlsx' in C:\Users\86188\geopi_output\GeoPi - Rock Classification\Xgboost Algorithm - Test 1\artifacts\image\statistic.
-Successfully draw the distribution plot after log transformation of the selected columns.
-Save figure 'Distribution Histogram After Log Transformation' in C:\Users\86188\geopi_output\GeoPi - Rock Classification\Xgboost Algorithm - Test 1\artifacts\image\statistic.
-Successfully store 'Distribution Histogram After Log Transformation' in 'Distribution Histogram After Log Transformation.xlsx' in C:\Users\86188\geopi_output\GeoPi - Rock Classification\Xgboost Algorithm - Test
-1\artifacts\image\statistic.
-Successfully store 'Data Original' in 'Data Original.xlsx' in C:\Users\86188\geopi_output\GeoPi - Rock Classification\Xgboost Algorithm - Test 1\artifacts\data.
-Successfully store 'Data Selected' in 'Data Selected.xlsx' in C:\Users\86188\geopi_output\GeoPi - Rock Classification\Xgboost Algorithm - Test 1\artifacts\data.
-(Press Enter key to move forward.)
+Successfully calculate the pair-wise correlation coefficient among the selected columns. Save figure 'Correlation Plot' in C:\Users\YSQ\geopi_output\Regression\test\artifacts\image\statistic.
+Successfully store 'Correlation Plot' in 'Correlation Plot.xlsx' in C:\Users\YSQ\geopi_output\Regression\test\artifacts\image\statistic.
+...
+Successfully store 'Data Original' in 'DataOriginal.xlsx' in C:\Users\YSQ\geopi_output\Regression\test\artifacts\data.
+Successfully store 'Data Selected' in 'DataSelected.xlsx' in C:\Users\YSQ\geopi_output\Regression\test\artifacts\data.
 ```
 
 The function calculates the pairwise correlation coefficients among these elements and create a distribution plot for each element. Here are the plots generated by our example:
 
-![Correlation Plot](https://github.com/ZJUEarthData/geochemistrypi/assets/66779478/be72b8da-aaca-4420-9d78-e8575d6ed8b4)
+
+![Correlation Plot](https://github.com/ZJUEarthData/geochemistrypi/assets/162782014/5774e386-c1ab-4347-8be0-592e00ab004f)
 <font color=gray size=1><center>Correlation Plot</center></font>
 
-![Distribution Histogram](https://github.com/ZJUEarthData/geochemistrypi/assets/66779478/96079dfe-8194-4412-af13-fe44fa1a3dd0)
+
+![Distribution Histogram](https://github.com/ZJUEarthData/geochemistrypi/assets/162782014/cfdd5c8b-2428-493d-98be-712885a1cde8)
 <font color=gray size=1><center>Distribution Histogram</center></font>
 
-![Distribution Histogram After Log Transformation](https://github.com/ZJUEarthData/geochemistrypi/assets/66779478/2156dc68-2989-44af-aa0b-f47e0ed56012)
+
+![Distribution Histogram After Log Transformation](https://github.com/ZJUEarthData/geochemistrypi/assets/162782014/7ebd82fa-1fb9-4cfe-9b59-9a479a59ca2b)
 <font color=gray size=1><center>Distribution Histogram After Log Transformation</center></font>
 
 
@@ -306,71 +294,27 @@ Note: you'd better use imputation techniques to deal with the missing values.
 (Press Enter key to move forward.)
 ```
 
-Here, we choose ”1 - Mean Values” as our strategy:
-
 ```python
--*-*- Strategy for Missing Values -*-*-
-1 - Mean Value
-2 - Median Value
-3 - Most Frequent Value
-4 - Constant(Specified Value)
-Which strategy do you want to apply?
+-*-*- Missing Values Process -*-*-
+Do you want to deal with the missing values?
+1 - Yes
+2 - No
 (Data) ➜ @Number: 1
-Successfully fill the missing values with the mean value of each feature column respectively.
-(Press Enter key to move forward.)
 ```
-
-Here, the pragram is performing a hypothesis testing on the imputation method used to fill missing values in a dataset. The null hypothesis is that the distribution of the data set before and after imputing remains the same. The Kruskal Test is used to test this hypothesis, with a significance level of 0.05. Monte Carlo simulation is used with 100 iterations, each with a sample size of half the dataset (54 in this case). The p-values are calculated for each column and the columns that reject the null hypothesis are identified.
+Here, let's choose 1 to deal with the missing values.
 
 ```python
--*-*- Hypothesis Testing on Imputation Method -*-*-
-Null Hypothesis: The distributions of the data set before and after imputing remain the same.
-Thoughts: Check which column rejects null hypothesis.
-Statistics Test Method: kruskal Test
-Significance Level:  0.05
-The number of iterations of Monte Carlo simulation:  100
-The size of the sample for each iteration (half of the whole data set):  54
-Average p-value:
-AL2O3(WT%) 1.0
-CR2O3(WT%) 0.9327453056346102
-FEOT(WT%) 1.0
-CAO(WT%) 1.0
-Note: 'p-value < 0.05' means imputation method doesn't apply to that column.
-The columns which rejects null hypothesis: None
-Successfully draw the respective probability plot (origin vs. impute) of the selected columns
-Save figure 'Probability Plot' in C:\Users\86188\geopi_output\GeoPi - Rock Classification\Xgboost Algorithm - Test 1\artifacts\image\statistic.
-Successfully store 'Probability Plot' in 'Probability Plot.xlsx' in C:\Users\86188\geopi_output\GeoPi - Rock Classification\Xgboost Algorithm - Test 1\artifacts\image\statistic.
-<class 'pandas.core.frame.DataFrame'>
-RangeIndex: 109 entries, 0 to 108
-Data columns (total 4 columns):
- #   Column      Non-Null Count  Dtype
----  ------      --------------  -----
- 0   AL2O3(WT%)  109 non-null    float64
- 1   CR2O3(WT%)  109 non-null    float64
- 2   FEOT(WT%)   109 non-null    float64
- 3   CAO(WT%)    109 non-null    float64
-dtypes: float64(4)
-memory usage: 3.5 KB
-None
-Some basic statistic information of the designated data set:
-       AL2O3(WT%)  CR2O3(WT%)   FEOT(WT%)    CAO(WT%)
-count  109.000000  109.000000  109.000000  109.000000
-mean     4.554212    0.956426    2.962310   21.115756
-std      1.969756    0.524695    1.133967    1.964380
-min      0.230000    0.000000    1.371100   13.170000
-25%      3.110977    0.680000    2.350000   20.310000
-50%      4.720000    0.956426    2.690000   21.223500
-75%      6.233341    1.170000    3.330000   22.185450
-max      8.110000    3.869550    8.145000   25.362000
-Successfully store 'Data Selected Imputed' in 'Data Selected Imputed.xlsx' in C:\Users\86188\geopi_output\GeoPi - Rock Classification\Xgboost Algorithm - Test 1\artifacts\data.
-(Press Enter key to move forward.)
+-*-*- Strategy for Missing Values -*-*-
+1 - Drop Rows with Missing Values
+2 - Impute Missing Values
+Notice: Drop the rows with missing values may lead to a significant loss of data if too many
+features are chosen.
+Which strategy do you want to apply?
+(Data) ➜ @Number:1
 ```
+We'll just skip the lines with missing info to keep things simple.
 
-A probability plot of the selected columns is also drawn and saved in a specified location.
 
-![Probability Plot](https://github.com/ZJUEarthData/geochemistrypi/assets/66779478/8cab64ac-593a-4f46-bf58-475a59e993e8)
-
-<font color=gray size=1><center>Probability Plot</center></font>
 
 ### 2.5 Feature Engineering
 
@@ -389,7 +333,7 @@ Index - Column Name
 Feature Engineering Option:
 1 - Yes
 2 - No
-(Data) ➜ @Number:
+(Data) ➜ @Number: 1
 ```
 
 After enter “1”, we now is ready to name the constructed feature and build the formula. In this example, we use “newFeature” as the name and we build the formula with “b*c+d”:
@@ -402,6 +346,8 @@ c - FEOT(WT%)
 d - CAO(WT%)
 Name the constructed feature (column name), like 'NEW-COMPOUND':
 @input: new Feature
+```
+```python
 Build up new feature with the combination of basic arithmatic operators, including '+', '-', '*', '/', '()'.
 Input example 1: a * b - c
 --> Step 1: Multiply a column with b column;
@@ -422,51 +368,66 @@ You can use mean(x) to calculate the average value.
 @input: b*c+d
 ```
 
-The output is as below:
+This step, we enter b*c+d. And the output is as below:
 
 ```python
-Successfully construct a new feature "new Feature".
-0      23.005680
-1      22.084600
-2      23.126441
-3      24.392497
-4      23.394575
-         ...
-104    23.801200
-105    22.017500
-106    27.033200
-107    23.825000
-108    22.656000
-Name: new Feature, Length: 109, dtype: float64
+Successfully construct a new feature new Feature.
+0     23.00568
+1     22.08460
+2     25.43000
+3     23.39590
+4     22.90900
+        ...
+93    23.80120
+94    22.01750
+95    27.03320
+96    23.82500
+97    22.65600
+Name: new Feature, Length: 98, dtype: float64
 (Press Enter key to move forward.)
------------------
 <class 'pandas.core.frame.DataFrame'>
-RangeIndex: 109 entries, 0 to 108
+RangeIndex: 98 entries, 0 to 97
 Data columns (total 5 columns):
  #   Column       Non-Null Count  Dtype
 ---  ------       --------------  -----
- 0   AL2O3(WT%)   109 non-null    float64
- 1   CR2O3(WT%)   109 non-null    float64
- 2   FEOT(WT%)    109 non-null    float64
- 3   CAO(WT%)     109 non-null    float64
- 4   new Feature  109 non-null    float64
+ 0   AL2O3(WT%)   98 non-null     float64
+ 1   CR2O3(WT%)   98 non-null     float64
+ 2   FEOT(WT%)    98 non-null     float64
+ 3   CAO(WT%)     98 non-null     float64
+ 4   new Feature  98 non-null     float64
 dtypes: float64(5)
-memory usage: 4.4 KB
+memory usage: 4.0 KB
 None
 Some basic statistic information of the designated data set:
-       AL2O3(WT%)  CR2O3(WT%)   FEOT(WT%)    CAO(WT%)  new Feature
-count  109.000000  109.000000  109.000000  109.000000   109.000000
-mean     4.554212    0.956426    2.962310   21.115756    23.853732
-std      1.969756    0.524695    1.133967    1.964380     1.596076
-min      0.230000    0.000000    1.371100   13.170000    18.474000
-25%      3.110977    0.680000    2.350000   20.310000    22.909000
-50%      4.720000    0.956426    2.690000   21.223500    23.904360
-75%      6.233341    1.170000    3.330000   22.185450    24.763500
-max      8.110000    3.869550    8.145000   25.362000    29.231800
+       AL2O3(WT%)  CR2O3(WT%)  FEOT(WT%)   CAO(WT%)  new Feature
+count   98.000000   98.000000  98.000000  98.000000    98.000000
+mean     4.444082    0.956426   2.929757  21.187116    23.883266
+std      1.996912    0.553647   1.072481   1.891933     1.644173
+min      0.230000    0.000000   1.371100  13.170000    18.474000
+25%      3.051456    0.662500   2.347046  20.310000    22.872800
+50%      4.621250    0.925000   2.650000  21.310000    23.907180
+75%      6.222500    1.243656   3.346500  22.284019    24.795747
+max      8.110000    3.869550   8.145000  25.362000    29.231800
 (Press Enter key to move forward.)
 ```
 
-After building the new feature, we can choose the mode to process data, in this doc, we choose “1 - Regression”:
+If you feel it's enough, just select 'no' to proceed to the next step. Here, we choose 2.
+```python
+Do you want to continue to build a new feature?
+1 - Yes
+2 - No
+(Data) ➜ @Number:2
+```
+
+```
+Successfully store 'Data Selected Dropped-Imputed Feature-Engineering' in 'Data Selected
+Dropped-Imputed Feature-Engineering.xlsx' in
+C:\Users\YSQ\geopi_output\Regression\test\artifacts\data.
+Exit Feature Engineering Mode.
+(Press Enter key to move forward.)
+```
+
+After building the feature, we can choose the mode to process data, in this doc, we choose “1 - Regression”:
 
 ```python
 -*-*- Mode Selection -*-*-
@@ -482,7 +443,7 @@ After building the new feature, we can choose the mode to process data, in this
 After entering the Regression menu, we are going to input X Set and Y Set separately, note that the new feature we just created is also in the list:
 
 ```python
--*-*- Data Split - X Set and Y Set-*-*-
+-*-*- Data Segmentation - X Set and Y Set -*-*-
 Divide the processing data set into X (feature value) and Y (target value) respectively.
 Selected sub data set to create X data set:
 --------------------
@@ -498,42 +459,101 @@ Select the data range you want to process.
 Input format:
 Format 1: "[**, **]; **; [**, **]", such as "[1, 3]; 7; [10, 13]" --> you want to deal with the columns 1, 2, 3, 7, 10, 11, 12, 13
 Format 2: "xx", such as "7" --> you want to deal with the columns 7
-@input:
+@input: 1
 ```
 
 After entering the X Set, the prompt of successful operation and basic statistical information would be shown:
 
 ```python
-uccessfully create X data set.
+Successfully create X data set.
 The Selected Data Set:
-     AL2O3(WT%)
-0      3.936000
-1      3.040000
-2      7.016561
-3      3.110977
-4      6.971044
-..          ...
-104    2.740000
-105    5.700000
-106    0.230000
-107    2.580000
-108    6.490000
-
-[109 rows x 1 columns]
+    AL2O3(WT%)
+0        3.936
+1        3.040
+2        4.220
+3        6.980
+4        6.250
+..         ...
+93       2.740
+94       5.700
+95       0.230
+96       2.580
+97       6.490
+
+[98 rows x 1 columns]
 Basic Statistical Information:
 Some basic statistic information of the designated data set:
        AL2O3(WT%)
-count  109.000000
-mean     4.554212
-std      1.969756
+count   98.000000
+mean     4.444082
+std      1.996912
 min      0.230000
-25%      3.110977
-50%      4.720000
-75%      6.233341
+25%      3.051456
+50%      4.621250
+75%      6.222500
 max      8.110000
-Successfully store 'X Without Scaling' in 'X Without Scaling.xlsx' in /home/yucheng/output/data.
+Successfully store 'X Without Scaling' in 'X Without Scaling.xlsx' in
+C:\Users\YSQ\geopi_output\Regression\test\artifacts\data.
 (Press Enter key to move forward.)
 ```
+Then, input Y Set like "2 - CR203(WT%)".
+```python
+-*-*- Data Segmentation - X Set and Y Set-*-*-
+Selected sub data set to create Y data set:
+--------------------
+Index - Column Name
+1 - AL2O3(WT%)
+2 - CR2O3(WT%)
+3 - FEOT(WT%)
+4 - CAO(WT%)
+5 - new Feature
+--------------------
+The selected Y data set:
+Notice: Normally, please choose only one column to be tag column Y, not multiple columns.
+Notice: For classification model training, please choose the label column which has
+distinctive integers.
+Select the data range you want to process.
+Input format:
+Format 1: "[**, **]; **; [**, **]", such as "[1, 3]; 7; [10, 13]" --> you want to deal with the columns 1, 2, 3, 7, 10, 11, 12, 13
+Format 2: "xx", such as "7" --> you want to deal with the columns 7
+@input:2
+```
+
+The prompt of successful operation and basic statistical information would be shown:
+
+```python
+Successfully create Y data set.
+The Selected Data Set:
+    CR2O3(WT%)
+0        1.440
+1        0.578
+2        1.000
+3        0.830
+4        0.740
+..         ...
+93       0.060
+94       0.690
+95       2.910
+96       0.750
+97       0.800
+
+[98 rows x 1 columns]
+Basic Statistical Information:
+Some basic statistic information of the designated data set:
+       CR2O3(WT%)
+count   98.000000
+mean     0.956426
+std      0.553647
+min      0.000000
+25%      0.662500
+50%      0.925000
+75%      1.243656
+max      3.869550
+Successfully store 'Y' in 'Y.xlsx' in
+C:\Users\YSQ\geopi_output\Regression\test\artifacts\data.
+(Press Enter key to move forward.)
+```
+
 
 After this, you may choose to process feature scaling on X Set or not:
 
@@ -541,10 +561,18 @@ After this, you may choose to process feature scaling on X Set or not:
 -*-*- Feature Scaling on X Set -*-*-
 1 - Yes
 2 - No
-(Data) ➜ @Number:
+(Data) ➜ @Number: 2
+```
+
+In this doc, we choose 2, and for the next step of feature selection, we also choose option 2.
+
+```python
+-*-*- Feature Selection on X set -*-*-
+1 - Yes
+2 - No
+(Data) ➜ @Number:2
 ```
 
-In the similar manner, we then set Y Set and check the related information generated onto the screen.
 
 The next step is to split the data into a training set and a test set. The test set will be used to evaluate the performance of the machine learning model that will be trained on the training set. In this example, we set 20% of the data to be set aside as the test set. This means that 80% of the data will be used as the training set. The data split is important to prevent overfitting of the model on the training data and to ensure that the model's performance can be generalized to new, unseen data:
 
@@ -554,10 +582,17 @@ Note: Normally, set 20% of the dataset aside as test set, such as 0.2
 (Data) ➜ @Test Ratio: 0.2
 ```
 
-After checking the output, you should be able to see a menu to choose a machine learning model for your data, in this example, we are going to use “2 - Polynomial Regression”:
+
+
+
+//
+
+
+
+After checking the output, you should be able to see a menu to choose a machine learning model for your data, in this example, we are going to use “7 - Extra-Trees”.
 
 ```python
--*-*- Model Selection -*-*-:
+-*-*- Model Selection -*-*-
 1 - Linear Regression
 2 - Polynomial Regression
 3 - K-Nearest Neighbors
@@ -566,106 +601,79 @@ After checking the output, you should be able to see a menu to choose a machine
 6 - Random Forest
 7 - Extra-Trees
 8 - Gradient Boosting
-9 - Xgboost
+9 - XGBoost
 10 - Multi-layer Perceptron
 11 - Lasso Regression
-12 - All models above to be trained
+12 - Elastic Net
+13 - SGD Regression
+14 - BayesianRidge Regression
+15 - All models above to be trained
 Which model do you want to apply?(Enter the Corresponding Number)
-(Model) ➜ @Number: 2
+(Model) ➜ @Number:7
 ```
 
-After choosing the model, the command line may prompt you to provide more specific options in terms of the model you choose, after offering the options, the program is good to go! And you may check the output like this after processing:
+We have already set up an automated learning program. You can simply choose option '1' to easily access it.
 
+```python
+Do you want to employ automated machine learning with respect
+to this algorithm?(Enter the Corresponding Number):
+1 - Yes
+2 - No
+(Model) ➜ @Number:1
+```
 
 
 ```python
-*-**-* Polynomial Regression is running ... *-**-*
-Expected Functionality:
-+  Model Score
-+  Cross Validation
-+  Model Prediction
-+  Model Persistence
-+  Predicted vs. Actual Diagram
-+  Residuals Diagram
-+  Permutation Importance Diagram
-+  Polynomial Regression Formula
------* Model Score *-----
-Root Mean Square Error: 1.2981800081993564
-Mean Absolute Error: 0.8666537321359384
-R2 Score: -0.5692041761356125
-Explained Variance Score: -0.5635060495257759
------* Cross Validation *-----
-K-Folds: 10
-* Fit Time *
-Scores: [0.00217414 0.00214863 0.00225115 0.00212574 0.00201654 0.00203323
- 0.00196433 0.00200295 0.00195527 0.00195432]
-Mean: 0.0020626306533813475
-Standard deviation: 9.940905756158756e-05
--------------
-* Score Time *
-Scores: [0.00440168 0.00398946 0.00407624 0.0041182  0.00420284 0.00452423
- 0.00406241 0.00427079 0.00406742 0.00404215]
-Mean: 0.004175543785095215
-Standard deviation: 0.0001651057611709732
--------------
-* Root Mean Square Error *
-Scores: [1.15785222 1.29457522 2.71100276 3.38856833 0.94791697 1.0329962
- 1.54759602 1.8725529  1.82623562 0.84039699]
-Mean: 1.6619693228088945
-Standard deviation: 0.7833005136355865
--------------
-* Mean Absolute Error *
-Scores: [0.86020769 0.85255076 1.71707909 2.17595274 0.73042456 0.8864327
- 1.2754413  1.32740744 1.48587525 0.67660019]
-Mean: 1.1987971734378662
-Standard deviation: 0.4639441214337496
--------------
-* R2 Score *
-Scores: [ 0.3821429  -0.12200627 -0.58303497 -0.98544835  0.3240076   0.02309755
- -0.93382518 -9.20857756 -1.11023532 -0.50902637]
-Mean: -1.2722905973773913
-Standard deviation: 2.6935459556340082
--------------
-* Explained Variance Score *
-Scores: [ 0.42490745 -0.01768215 -0.54672932 -0.90106814  0.32644583  0.18391296
- -0.92481771 -7.4016756  -0.39601889  0.24420376]
-Mean: -0.9008521815781642
-Standard deviation: 2.2175052662305945
--------------
------* Predicted Value Evaluation *-----
-Save figure 'Predicted Value Evaluation - Polynomial Regression' in /home/yucheng/output/images/model_output.
------* True Value vs. Predicted Value *-----
-Save figure 'True Value vs. Predicted Value - Polynomial Regression' in /home/yucheng/output/images/model_output.
------* Polynomial Regression Formula *-----
-y = 1.168AL2O3(WT%)+4.677CR2O3(WT%)-0.085AL2O3(WT%)^2-2.572AL2O3(WT%) CR2O3(WT%)-2.229CR2O3(WT%)^2+0.002AL2O3(WT%)^3+0.14AL2O3(WT%)^2 CR2O3(WT%)+0.762AL2O3(WT%) CR2O3(WT%)^2+0.232CR2O3(WT%)^3+1.4708950432993957
------* Model Prediction *-----
-    FEOT(WT%)   CAO(WT%)
-0    6.234901  21.516655
-1    3.081208  20.471231
-2    3.082333  19.539309
-3    2.838430  20.666521
-4    2.434649  21.558533
-5    2.478282  21.784115
-6    2.689378  20.075947
-7    2.744644  21.954583
-8    3.336340  22.054664
-9    3.033059  20.288637
-10   3.268753  21.438835
-11   3.129242  22.290128
-12   2.451531  21.640214
-13   2.984390  19.752188
-14   2.513781  21.035197
-15   2.699384  20.676107
-16   2.641574  21.844654
-17   3.449548  20.632201
-18   3.134386  22.138135
-19   2.986511  21.673300
-20   2.899159  19.943711
-21   2.606604  22.146161
-Successfully store 'Y Test Predict' in 'Y Test Predict.xlsx' in /home/yucheng/output/data.
------* Model Persistence *-----
-Successfully store the trained model 'Polynomial Regression' in 'Polynomial_Regression_2023-02-24.pkl' in /home/yucheng/output/trained_models.
-Successfully store the trained model 'Polynomial Regression' in 'Polynomial_Regression_2023-02-24.joblib' in /home/yucheng/output/trained_models.
+-*-*- Feature Engineering on Application Data -*-*-
+The same feature engineering operation will be applied to the
+inference data.
+Successfully construct a new feature new Feature.
+0           NaN
+1           NaN
+2           NaN
+3     25.430000
+4     22.909000
+5     23.211800
+...
+49    25.158800
+50    23.342814
+51    21.512000
+52    25.668000
+53    23.801200
+54    23.825000
+Name: new Feature, dtype: float64
+Successfully store 'Application Data Original' in 'Application Data Original.xlsx' in C:\Users\YSQ\geopi_output\GeoPi - Rock Classification\Regression\artifacts\data.
+Successfully store 'Application Data Feature-Engineering' in 'Application Data Feature-Engineering.xlsx' in C:\Users\YSQ\geopi_output\GeoPi - Rock Classification\Regression\artifacts\data.
+Successfully store 'Application Data Feature-Engineering Selected' in 'Application Data Feature-Engineering Selected.xlsx' in C:\Users\YSQ\geopi_output\GeoPi - Rock Classification\Regression\artifacts\data.
+(Press Enter key to move forward.)
 ```
 
+
+After moving to the next step, the Extra-Trees algorithm training will run automatically.
+This includes functionalities such as Model Scoring, Cross Validation, Predicted vs. Actual Diagram, Residuals Diagram, Permutation Importance Diagram, Feature Importance Diagram, Single Tree Diagram, Model Prediction, and Model Persistence.
+You can find the output stored in the specified path.
+
+```python
+-*-*- Transform Pipeline Construction -*-*-
+Build the transform pipeline according to the previous operations.
+Successfully store 'Transform Pipeline Configuration' in 'Transform Pipeline Configuration.txt' in C:\Users\YSQ\geopi_output\GeoPi - Rock Classification\Regression\artifacts.
+(Press Enter key to move forward.)
+
+-*-*- Model Inference -*-*-
+Use the trained model to make predictions on the application data.
+Successfully store 'Application Data Predicted' in 'Application Data Predicted.xlsx' in C:\Users\YSQ\geopi_output\GeoPi - Rock Classification\Regression\artifacts\data.
+Successfully store 'Application Data Feature-Engineering Selected Dropped-Imputed' in 'Application Data Feature-Engineering Selected Dropped-Imputed.xlsx' in C:\Users\YSQ\geopi_output\GeoPi - Rock Classification\Regression\artifacts\data.
+(Press Enter key to move forward.)
 ```
+
+
+![Feature Importance - Extra-Trees](https://github.com/user-attachments/assets/1d3f6177-8495-445f-ae6b-7bee8ab002a7)
+<front color=gray size=1><center>Feature Importance - Extra-Trees</center></font>
+![Permutation Importance - Extra-Trees](https://github.com/user-attachments/assets/22d825b4-c58d-4ead-ad6a-35cf7179a426)
+<front color=gray size=1><center>Permutation Importance - Extra-Trees</center></font>
+![Predicted vs  Actual Diagram - Extra-Trees](https://github.com/user-attachments/assets/a1adc20d-cfc0-459d-8a05-83c8b877af93)
+<front color=gray size=1><center>Predicted vs  Actual Diagram - Extra-Trees</center></font>
+![Residuals Diagram - Extra-Trees](https://github.com/user-attachments/assets/08cf261b-a60f-4bf5-94b1-c7f03e9c3358)
+<front color=gray size=1><center>Residuals Diagram - Extra-Trees</center></font>
+![Tree Diagram - Extra-Trees](https://github.com/user-attachments/assets/619d9a3d-9b60-4560-bdfd-3623ba293bff)
+<front color=gray size=1><center>Tree Diagram - Extra-Trees</center></font>