Skip to content

kanavanand/club_mahindra

Repository files navigation

Example Image

Public Leaderboard: 3rd Place : 95.001
Private Leaderboard: 3rd Place : 95.8484
Training

There is file Final_solution.ipynb . Run each cell and at the end you will get total 6 submission files generated by following models ->

  1. Single LGB model
  2. Single Catboost model
  3. 5 fold LightGBM model
  4. 5 fold Catboost model.
  5. Stacking of[xgb,lgb,catboost]
  6. Ensemgle of[1,2,3,4,5,6] using (model_1 X 0.3+model_2 X *0.2+model_3 X *0.3 +model_4 X *0.2) X *0.8 + model_5 X *0.2

Problem Statement

Food & Beverages Spend Prediction in Club Mahindra Resorts

Club Mahindra (Club M) makes significant revenue from Food and Beverages (F&B) sales in their resorts. The members of Club M are offered a wide variety of items and our taks was to make prediction of amount spend by a mamber per night which could help them to plan inventory accordingly.

Brief approach-

Thanks Analytics Vidhya and Club Mahindra for organising such a wonderful hackathon,The competition was quite intense and dataset was very clean to work.

Approach :

Step-1 :

I started my problem with very basic approach changing all the features (resort id , persontravellingID , main_product_code and other ordinal features to category . Made some common for each date columns [booking date , checkin_date, checkout_date ]:

  1. Weekday
  2. Month
  3. Day
  4. Day of year
  5. Week of year
  6. Is month end
  7. Year

Step-2: Intuitive features

  1. In_out : Checkout_Date - Checkin_Date
  2. book_in:Checkout_date - booking_date
  3. Roomnights per stay : roomnights/in_out
  4. Roomnights per book span : roomnights / book_out

Step - 3: Time Based Features :

  1. Prev_resort_time = Time when the resort was previously booked.
  2. Prev_resort_member_time = Time when the resort was previously booked by a particular member.
  3. Next_resort_time = Time when the resort will Next booked.
  4. Next_resort_member_time = Time when the resort will next booked by a particular member.

Step-4 : Groupby Features

S.No. TYPE Value_column ON
1. COUNT _ RESORT_ID
2 COUNT _ RESORT_ID,MemberID
3. COUNT _ ['resort_id','checkout_dateyear','checkout_datemonth']
4. COUNT _ ['memberid','checkout_dateyear']
5 VAR roomnights RESORT_ID
6 Median roomnights RESORT_ID,MemberID
7. MAX roomnights [resort_id,checkout_dateyear,checkout_datemonth]
8. MIN roomnights [memberid','checkout_dateyear']
9 VAR in_out RESORT_ID
10 Median in_out RESORT_ID,MemberID
11. MAX in_out ['resort_id','checkout_dateyear','checkout_datemonth']
12. MIN in_out ['memberid','checkout_dateyear']
13 VAR total_pax RESORT_ID
14 Median total_pax RESORT_ID,MemberID
15 MAX total_pax ['resort_id','checkout_dateyear','checkout_datemonth']
16 MIN total_pax ['memberid','checkout_dateyear']

…… in Similar fashion approx ~ 72 combinations were tried which gave a boost of rmse from 96 to 95.3 on LB and nearly same change in Local CV.

Modeling:

My final model consist of ensemble of

[ lightGBM , Catboost , 5_fold_Light GBM , 5_fold_Catboost and stacking of [xgb,catboost,lightGBM]

Example)

About

Club Mahindra

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published