Skip to content

This project automates the process of web scraping fridges data from major e-commerce platforms Exito, Falabella, Alkosto, and Sodimac using Selenium and stores the scraped data in a PostgreSQL database. The data is visualized by Power Bi.

License

Notifications You must be signed in to change notification settings

juanes-grimaldos/web-scrapy-database

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Web Scraping Project for Exito, Falabella, Alkosto, and Sodimac

Summary

This project automates the process of web scraping fridges data from major e-commerce platforms Exito, Falabella, Alkosto, and Sodimac using Selenium and stores the scraped data in a PostgreSQL database. The data is visualized by Power Bi. Final Dashboard is available to consult here or in the .pbix file. The project is developed in Python and aims to streamline data collection for market analysis and research.

Table of Contents

Project Overview

This project involves developing a PostgreSQL database by web scraping product data from the following e-commerce websites:

  • Exito
  • Falabella
  • Alkosto
  • Sodimac

The data is collected using Selenium for automation and then processed and stored in a PostgreSQL database. This facilitates market analysis and data-driven decision-making.

Setup and Installation

Prerequisites

  • Python 3.11.2
  • PostgreSQL 16.3, compiled by Visual C++ build 1938, 64-bit
  • Google Chrome or Mozilla Firefox

Virtual Environment

It's recommended to use a virtual environment to manage dependencies. You can create and activate a virtual environment as follows:

# Create a virtual environment
python -m venv venv

# Activate the virtual environment (Windows)
.\venv\Scripts\activate

# Activate the virtual environment (macOS/Linux)
source venv/bin/activate

Installing Dependencies

Install the required packages using the requirements.txt file:

pip install -r requirements.txt

Enviromental Variables

In order to run the codes, several variables need to be set in order to run

  • PYTHONPATH (use src has working directory)
  • POSTGRES_PASSWORD
  • POSTGRES_PORT
  • POSTGRES_DB
  • POSTGRES_SERVER
  • USER_AGENT

Usage

Running the Scripts

You can run the scraping scripts using either Chrome or Firefox. Ensure that the appropriate WebDriver (e.g., chromedriver or geckodriver) is installed and added to your PATH.

Example command to run a script:

python alkosto_scraper.py

Execution Time

Full run: The script may take between 15 minutes to 1 hour to complete.

Quick run: Scraping about 50 links per store/script takes approximately 5 to 10 minutes.

Technical Details

Browser Compatibility

The scraping scripts are compatible with both Google Chrome and Mozilla Firefox. Make sure you have the corresponding WebDriver:

Chrome: Download Chromedriver
Firefox: Download Geckodriver

Expected Output

The scripts will scrape product details such as:

Product name
Price
space (liters)
energy consumption
Product URL

The following entity diagram shows the expected output of information scraped from websites:

description of the databases to create

The data is then saved into the PostgreSQL database configured in your setup.

Viewing the Dashboard

To view the interactive project dashboard:

You can view the interactive dashboard here.

About

This project automates the process of web scraping fridges data from major e-commerce platforms Exito, Falabella, Alkosto, and Sodimac using Selenium and stores the scraped data in a PostgreSQL database. The data is visualized by Power Bi.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages