Welcome to my Data Science and Machine Learning portfolio! This repository showcases my participation in the ChatGPT Answer Classification Challenge. In this challenge, I developed a model to classify answers as either ChatGPT-generated or human-written.
This challenge is part of the ML Olympiad organized by IEEE ESSTHS Student Branch, IEEE ESSTHS CIS SBC, GDSC ISETSo, and PyData Tunisia. It focuses on detecting AI-generated answers, a critical task in today's AI-driven world.
- Goal: Classify answers into ChatGPT-generated or human-written categories.
- Datasets: I worked with a dataset containing prompts (questions) and answers, where some answers were generated by ChatGPT and others by humans.
- Evaluation: The performance metric for this competition was accuracy.
Here are the key files related to this project:
- train.csv - The training dataset containing prompts and answers with ground truth labels.
- test.csv - The test dataset for making predictions.
- sample_submission.csv - A sample submission file with the required format.
- notebook in Kaggle or chat-gpt-vs-human.ipynb - My Jupyter Notebook with code, analysis, and model implementation.
-
Data Exploration: I started by exploring the training dataset to understand the structure and distribution of the data.
-
Feature Engineering: I engineered features and performed text preprocessing to prepare the data for modeling.
-
Model Selection: I experimented with various machine learning and NLP models to find the best-performing one.
-
Hyperparameter Tuning: To optimize model performance, I fine-tuned hyperparameters.
-
Validation: I used cross-validation techniques to assess model accuracy and robustness.
-
Submission: After obtaining satisfactory results, I created submission files for evaluation.
For detailed implementation and analysis, please refer to my notebook.
I ranked among the top 20 participants in the ML Olympiad's ChatGPT Answer Classification Challenge. My model successfully classified answers, contributing to the transparency and credibility of AI-generated content.
As I continue to build my expertise in data science and machine learning, I plan to:
- Explore advanced NLP techniques for even better classification.
- Incorporate additional data sources to enhance model accuracy.
- Improve model interpretability for practical applications.
I'm always open to collaboration and learning from the data science community. You can connect with me on LinkedIn or find more of my projects on GitHub.
I want to express my gratitude to the organizers of the ML Olympiad for providing this valuable opportunity for skill development and competition.
Thank you for visiting my portfolio, and I look forward to sharing more data science projects in the future! 🚀✨