Skip to content

scrape proxies from more than 5 different sources and check which ones are still alive

License

Notifications You must be signed in to change notification settings

iw4p/proxy-scraper

Repository files navigation

Proxy Scraper and Checker

Tests Downloads

Scrape more than 1K HTTP - HTTPS - SOCKS4 - SOCKS5 proxies in less than 2 seconds.

Scraping fresh public proxies from different sources:

Installation

You can install the package directly from PyPI using pip:

pip install proxyz

Alternatively, you can install dependencies manually if you're working from the source code:

pip3 install -r requirements.txt

Usage

Using the Command-Line Interface

Once installed via pip, you can use the command-line tools proxy_scraper and proxy_checker directly.

For Scraping Proxies:

proxy_scraper -p http
  • With -p or --proxy, you can choose your proxy type. Supported proxy types are: HTTP - HTTPS - Socks (Both 4 and 5) - Socks4 - Socks5.
  • With -o or --output, specify the output file name where the proxies will be saved. (Default is output.txt).
  • With -v or --verbose, increase output verbosity.
  • With -h or --help, show the help message.

For Checking Proxies:

proxy_checker -p http -t 20 -s https://google.com -l output.txt
  • With -t or --timeout, set the timeout in seconds after which the proxy is considered dead. (Default is 20).
  • With -p or --proxy, check HTTPS, HTTP, SOCKS4, or SOCKS5 proxies. (Default is HTTP).
  • With -l or --list, specify the path to your proxy list file. (Default is output.txt).
  • With -s or --site, check proxies against a specific website like google.com. (Default is https://google.com).
  • With -r or --random_agent, use a random user agent per proxy.
  • With -v or --verbose, increase output verbosity.
  • With -h or --help, show the help message.

Running Directly from Source

If you prefer running the scripts directly from the source code, you can use the following commands:

For Scraping:

python3 proxyScraper.py -p http

For Checking:

python3 proxyChecker.py -p http -t 20 -s https://google.com -l output.txt

Good to Know

  • Dead proxies will be removed, and only alive proxies will remain in the output file.
  • This script is capable of scraping SOCKS proxies, but proxyChecker currently only checks HTTP(S) proxies.

Star History

Star History Chart

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Issues

Feel free to submit issues and enhancement requests or contact me via vida.page/nima.

License

MIT