This repository for Web Crawling, Information Extraction, and Knowledge Graph build up.
Implementations of utilities and algorithms to build your knowledge graph by Python 3
I will enrich those implementations and descriptions from time to time. If you include any of my work into your website or project; please add a link to this repository and send me an email to let me know.
Your comments are welcome. Thanks,
Programs | Description | Link |
---|---|---|
JSONLines | Once your crawler download a lot of pages, how can you aggregate all of those files into single one? Json Lines is your answer. The program will package each of your file into single JSON object into the file which will contain multiple JSON objects. | Source Code |
Conditional Random Field | This is a program to demostrate how to leverage crf to extract textbook information from syllabus of webpages. | Source Code |
Wrapper and BeautifulSoup | This program demostrate how to extract information from JSON Lines by BeautifulSoup. | Source Code |
Facebook Crawler | This is a crawler program to crawl facebook post via facebook graph api. | Source Code |
SPARQL | This is an exercise to query information via dbpedia Virtuoso SPARQL Query Editor to answer/retrive University related questions. | Source Code |
Market Index Prediction | This is a final project of building knowledge graph. I and my teammate YuCheng Kuo leverage not only stock price information but also combine social media listening data to feed into a LSTM (Long Short Term Memory) machine learning model to predict the trend of next day and next 30 day of Dow Jones Industrial Average index . | Project Repository |
- CRF suite Example: https://github.com/scrapinghub/python-crfsuite/blob/master/examples/CoNLL%202002.ipynb
- CRF Suite: https://python-crfsuite.readthedocs.io/en/latest/
- Facebook Crawler by Jacob: https://github.com/chenjr0719/Facebook-Page-Crawler/edit/master/Facebook_Page_Crawler.py
Last updated: January 16, 2018
The information contained on https://github.com/Cheng-Lin-Li/ website (the "Service") is for general information purposes only. Cheng-Lin-Li's github assumes no responsibility for errors or omissions in the contents on the Service and Programs.
In no event shall Cheng-Lin-Li's github be liable for any special, direct, indirect, consequential, or incidental damages or any damages whatsoever, whether in an action of contract, negligence or other tort, arising out of or in connection with the use of the Service or the contents of the Service. Cheng-Lin-Li's github reserves the right to make additions, deletions, or modification to the contents on the Service at any time without prior notice.
https://github.com/Cheng-Lin-Li/ website may contain links to external websites that are not provided or maintained by or in any way affiliated with Cheng-Lin-Li's github.
Please note that the Cheng-Lin-Li's github does not guarantee the accuracy, relevance, timeliness, or completeness of any information on these external websites.
Cheng-Lin Li@University of Southern California