Zero Step ML

Fully automated machine learning using LLMs.

How it works

LLM analyses the column headers and predicts A) ML task (classification/regression), B) feature columns, C) target column.
Data is preprocessed and trained according to identified task type.
Model metrics are analysed and summarised in natural language.

flowchart TD
    A[0. Upload dataset]
    A --> B[1. Identify ML task, feature columns & target column]
    B --> C[2. Preprocess data]
    C --> D[3. Train model & analyse performance]

Install

Clone this repository locally.
Create .envrc and set OPENAI_API_KEY.
Install with poetry.

make install    # Install dependencies
make test       # Run unit tests to check installation

Usage

Obtain your dataset (supported file types: CSV) (supported ML tasks: classification, regression).
Run poetry run zero_step_ml --target="my_dataset.csv"

NOTE: Use the --interactive flag to enable interactive mode, which allows the user to approve LLM feature/target analysis, or provide additional context to enable the LLM to select the correct features/targets/task.

Testing

Performance on popular datasets:

Dataset	Identified task	Identified features	Identified target	Model	Metrics	LLM metric summary
Auto insurance in Sweden	Regression	`['number of claims']`	`'total payment for all the claims in thousands of Swedish Kronor'`	Linear Regression	`{'MAE': 26.41490933996047, 'MSE': 875.0434234424166, 'R2 Score': 0.89508194939184}`	The model was tested on a regression task and achieved an MAE of 26.41490933996047, an MSE of 875.0434234424166, and an R2 Score of 0.89508194939184. This indicates that the model is performing well, as the MAE is relatively low and the R2 Score is close to 1, indicating a strong correlation between the predicted and actual values.
Pima Indians Diabetes	Classification	`['Number of times pregnant', 'Plasma glucose concentration a 2 hours in an oral glucose tolerance test', 'Diastolic blood pressure (mm Hg)', 'Triceps skinfold thickness (mm)', '2-Hour serum insulin (mu U/ml)', 'Body mass index (weight in kg/(height in m)^2)', 'Diabetes pedigree function', 'Age (years)']`	`'Class variable (0 or 1)'`	Decision Tree Classifier	`{'Accuracy': 0.7597402597402597, 'Precision': 0.65, 'Recall': 0.7090909090909091, 'F1 Score': 0.6782608695652174}`	The model achieved an accuracy of 0.7597402597402597, which indicates that it was able to correctly classify 75.97% of the data. The precision of the model was 0.65, meaning that 65% of the predictions made by the model were correct. The recall of the model was 0.7090909090909091, meaning that it was able to correctly identify 70.91% of the positive cases. Finally, the F1 score of the model was 0.6782608695652174, which is a measure of the model's accuracy and precision combined. Overall, the model performed well, with a good balance of accuracy and precision.
Banknote authentication	Classification	`['Variance of Wavelet Transformed image (continuous)', 'Skewness of Wavelet Transformed image (continuous)', 'Kurtosis of Wavelet Transformed image (continuous)', 'Entropy of image (continuous)']`	`'Class (0 for authentic and 1 for inauthentic)'`	Decision Tree Classifier	`{'Accuracy': 0.9781818181818182, 'Precision': 0.991869918699187, 'Recall': 0.9606299212598425, 'F1 Score': 0.976}`	The model achieved an accuracy of 0.978, precision of 0.991, recall of 0.961, and an F1 score of 0.976 on the classification task. This indicates that the model is performing well, as all of the metrics are above 0.9. The accuracy score indicates that the model is correctly classifying the data with a high degree of accuracy. The precision score indicates that the model is correctly identifying the positive class with a high degree of accuracy. The recall score indicates that the model is correctly identifying the positive class with a high degree of accuracy. Finally, the F1 score indicates that the model is performing well overall, as it is a combination of the precision and recall scores.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.github/workflows		.github/workflows
zero_step_ml		zero_step_ml
.envrc.example.sh		.envrc.example.sh
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
zero-step-ml.png		zero-step-ml.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Zero Step ML

How it works

Install

Usage

Testing

License

About

Releases

Packages

Languages

License

Cutwell/zero-step-ml

Folders and files

Latest commit

History

Repository files navigation

Zero Step ML

How it works

Install

Usage

Testing

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages