Chromium issue collection

Issue Collector for Chromium project

Overview

Welcome to Chromium issue collection project that provides scripts

given the query to obtain the list of issues
given the list of issue ids to extract issue metadata and associated comments for Chromium project from its offical issue tracker.

Dynamic Scraping

To collect issues, their metadata and associated comments we used dynamic web scraping tool Selenium.

Dependencies

Configure and Manage Your Environment with Anaconda

Overview

Using Anaconda consists of the following:

Install anaconda on your computer, by selecting the latest Python version for your operating system. If you already have conda installed, you should be able to skip this step and move on to step 2.
Create and activate * a new conda environment with chromium_issue_collection.yml file provided.

Git and version control

These instructions also assume you have git installed for working with Github from a terminal window, but if you do not, you can download that first with the command:

conda install git

Now, we're ready to create our local environment!

Clone the repository, and navigate to the downloaded folder. This may take a minute or two to clone due to the included image data.

git clone <github project webaddress>.git
cd chromium-issue-collection

Navigate to src/ folder and change <username> with your current user name in chromium_issue_collection.yml file

	prefix: /Users/<username>/opt/anaconda3/envs/chromium_issue_collection

Create (and activate) a new environment, named chromium_issue_collection with Python 3.8. If prompted to proceed with the install (Proceed [y]/n) type y.
- Linux or Mac:
```
conda env create -f chromium_issue_collection.yml
conda activate chromium_issue_collection
```
- Windows:
```
conda env create --name chromiumIssueCollection --file=chromium_issue_collection.yml
activate chromiumIssueCollection
```
For conda cheatsheet: https://docs.conda.io/projects/conda/en/4.6.0/_downloads/52a95608c49671267e40c689e0bc00ca/conda-cheatsheet.pdf
Download Chrome Driver from here and add path to ChromeDriver to your PATH
Customize scraper.py script if you want to add a new query or change an existing one.

To add a new query, you need to append a new key and values into queries dictionary:

'<key>': {
            'explanation' : '<explanation of the query>',
            'project' : '<project name>',
            'urlbase' : '<base url>',
            'headers' : {
                '<key>': <list of columns>, 
            },
            'output_filename' : '<output csv file name>'
        }

Customize run_scraper.py script by changing function calls in __main__ function.

Running Code Locally

To run scraper, navigate to src folder and run the script.

cd path/to/src/
python run_scraper.py

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chromium issue collection

Overview

Dynamic Scraping

Dependencies

Configure and Manage Your Environment with Anaconda

Overview

Git and version control

Running Code Locally

About

Releases

Packages

Languages

License

ssuloglu/chromium-issue-collection

Folders and files

Latest commit

History

Repository files navigation

Chromium issue collection

Overview

Dynamic Scraping

Dependencies

Configure and Manage Your Environment with Anaconda

Overview

Git and version control

Running Code Locally

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages