Web-Crawler

This program crawls a website and analyzes the text of the pages it visits using k-means clustering and sentiment analysis. The program is designed to run on Python 3.7 or later.

Installation

Clone the repository to your local machine.
Install the required packages by running the following command in your terminal:

pip install -r requirements.txt

To run the program, navigate to the root directory of the repository and run the following command:

python main.py

Usage

The program will crawl the website specified in the crawler.py file and download the text content of each page it visits. It will then analyze the text using k-means clustering and sentiment analysis and output the results to the console.

Configuration

You can configure the program by modifying the crawler.py file. Specifically, you can change the following variables:

Url: the URL of the website to crawl
numberOfPages: the number of pages to crawl
blacklist: a list of HTML tags to exclude from the extracted text

You can also configure the k-means clustering by modifying the following variables in the main.py file:

n_clusters: the number of clusters to use in the clustering algorithm

Output

The program will output two tables to the console, one for each clustering run. Each table will contain the following columns:

cluster: the ID of the cluster to which the page was assigned
pages: the text content of the page
scores: the sentiment score of the page
sentiments: the sentiment label of the page (positive, negative, or neutral)

The program will also output a table showing the average sentiment score for each cluster.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
pages		pages
README.md		README.md
crawler.py		crawler.py
indexer.py		indexer.py
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web-Crawler

Installation

Usage

Configuration

Output

About

Releases

Packages

Languages

vatsashah45/Web-Crawler

Folders and files

Latest commit

History

Repository files navigation

Web-Crawler

Installation

Usage

Configuration

Output

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages