This repo contains the full code divided into three parts: 'step_one.py', 'step_two.py', 'step_three.py'. Each of them can be used separately. Please, read the 'requirements' file to know, what packages, classes and files you need for each step.
We recommend you to download the whole folder instead of separate files. Don't change names of folders and files.
- Only for Windows OS. For Mac OS or Linux OS start from the 'step_two.py' file.
- Web Scraping (creates 'flights_data.csv' file for further steps).
- To use the webscraper please refer to specify the airport code. Click here to find the appropriate airport code
- Enter the appropriate parameters and run the script.
- Note that you cannot scrape data 2 years in advance. i.e if we are in 2022 you can't scarape data for 2024 because the website doesn't provide the data.
- If you skip the Step One, use 'flights_data.csv' file that is included to the current folder.
- Data transformation:
- From strings to integers,
- Cleaning blank values,
- Creates additional 'clean_data.csv' and 'step_two_logfile.log' files.
- Simple descriptive analysis (pandas):
- Top three cheapest airlines,
- Average price per number of stops,
- Average price per departure time on am/pm.
- Search filter to find flights by:
- Numbers of stops,
- Departure on am/pm,
- Price.
- Creates additional 'searcher_one.csv' and 'step_three_logfile.log' files.
- Visualisation with plots (pandas, matplotlib,seaborn):
- The average flight price per date,
- Numbers of flights per airline,
- The average flight price per airline