Skip to content

chenyze/chenyze.github.io

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 

Repository files navigation

About Me

SQL enthusiast. Data-cleaning zealot. Language nerd. Food writer. Eight years of insight into consumer behaviour as a former journalist and PR practitioner in the F&B space. Aspires to use data to improve consumer experience. (more about me)

Connect with me on LinkedIn

Past Projects

MM Bus Ticket is the market leader in Burma in providing a digital platform for long-distance coach buses to sell and manage seat inventory.

I had the privilege of working with real data (over 4.4m transactions in 2018 alone) and understanding how data systems cope with business constraints in the real world. I undertook major data cleaning and transformation to deliver new insights. Machine learning models were also used to predict ticket sales for different customer segments.

Using the Ames Housing Dataset that is available on Kaggle, I attempted to

  • identify the features that best predict housing sale prices
  • build a regression model with as low an R2 score as possible.

I built a supervised machine learning model to classify unseen reddit posts into either r/wine or r/whisky. The data was scraped using Reddit's API together with BeautifulSoup, and Regex was used extensively to remove emoji, links and accented letters. For Natural Langauge Processing, SpaCy was used for lemmatizing and tokenizing, and I ran multiple models comparingCountVectorizer vs TF-IDF, as well as Logistic Regression vs Multinomial Naive Bayes vs Random Forest Classifier.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published