ptt-crawler

crawl ptt articles from its website

usage:

scraping certain ptt board:

lsc crawler.ls <board-name>

All posts will be downloaded into data//post/ folder. There will also be a data//post-list.json to kepp track of your download history, so you can interrupt your download at any time and resume later.

categorize authors by title:

lsc cat.ls <board-name>

food.ls: example for fetching articles for article generation home-sale.ls: example for categorizing purpose of articles id-stat.ls: analyze users stand point. output to data//id-stat.json id-stat-show.ls: show users statistics, generate suspect.json.

LICENSE

all sources are licensed under MIT License. ( I used CC-BY-4.0 license before, but MIT License is better for code license. please refer to correspondent license according to the time you fork this project. )

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
parser/home-sale		parser/home-sale
.gitignore		.gitignore
README.md		README.md
cat.ls		cat.ls
crawler.ls		crawler.ls
food.ls		food.ls
home-sale.ls		home-sale.ls
id-stat-show.ls		id-stat-show.ls
id-stat.ls		id-stat.ls
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ptt-crawler

LICENSE

About

Releases

Packages

Languages

zbryikt/ptt-crawler

Folders and files

Latest commit

History

Repository files navigation

ptt-crawler

LICENSE

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages