scrapy-picture-spider

This project is a spider that uses scrapy and beautifulsoup4 for crawl picture.

Depend Library

This project the depend library have: BeautifulSoup4,Requests,Scarpy,BloomFilter.

So need execute following command:

pip install beautifulsoup4
pip install lxml
pip install Scrapy
pip install pybloom-live
pip install requests

Proxies Module

This module for dynamic crawl ip proxy which supply other module use.

Quick start

You can directly execute the script of name is proxies_spider.py also at function download() set your download path.

Deviant_art Module

This module for dynamic crawl image of the https://www.deviantart.com/ and download to your computer.

Quick start

In the directory of deviant_art_spider execute the following command start this crawler.

scrapy crawl deviant_art_image_spider

You can setting configuration at settings.py.

# The attribute is your path for download image
# It default path is current dir
IMAGES_STORE = '.'

# Scrapy close when downloaded image equal to this attribute
MAXIMUM_IMAGE_NUMBER = 10000

Pixiv

This module for dynamic crawl image of the https://www.pixiv.net/ and download to your computer.

Quick start

You need set your username and password for login pixiv at the settings.py.

USERNAME = 'your pixiv username'
PASSWORD = 'your pixiv password'

Then in the directory of pixiv_spider execute the following command start this crawler.

scrapy crawl pixiv_daily_spider

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.idea		.idea
deviant_art		deviant_art
doc		doc
pixiv		pixiv
proxies		proxies
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
proxies.txt		proxies.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scrapy-picture-spider

Depend Library

Proxies Module

Quick start

Deviant_art Module

Quick start

Pixiv

Quick start

About

Releases

Packages

Languages

License

SylvanasSun/scrapy-picture-spider

Folders and files

Latest commit

History

Repository files navigation

scrapy-picture-spider

Depend Library

Proxies Module

Quick start

Deviant_art Module

Quick start

Pixiv

Quick start

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages