Skip to content

AntiTyping/CS101

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Description

I decided to add a list of strongly connected pages to each search result. I used Kosaraju's algorithm to find strongly connected components of the directed graph of all crawled pages. I think that pages that contain similar information usually link to each other and listing related pages under each search result provides extra information that a user could use to assess the quality of search results.

Here is the graph of the sample web pages included in the code. There are two clusters of pages. One about python snakes and the other about the python language. The two strongly connected component groups are colored.

enter image description here

Here is an example result of running the search_with_scc.py

zeus:~/projects/CS101 (master)$ python search_with_scc.py 
Search results for ['python', 'language']
http://udacity.com/python/language7.html
   Strongly connected web pages:
    http://udacity.com/python/language8.html
    http://udacity.com/python/language6.html
http://udacity.com/python/language8.html
   Strongly connected web pages:
    http://udacity.com/python/language6.html
    http://udacity.com/python/language7.html
http://udacity.com/python/language6.html
   Strongly connected web pages:
    http://udacity.com/python/language8.html
    http://udacity.com/python/language7.html
http://udacity.com/python/language9.html
http://udacity.com/pythons.html

As you can see each search result for keywords "python language" is followed by a list of strongly connected (interlinked) web pages.

I think my code provides an interesting extension of what I have learned in the CS101 class without additional complexity.

I wish Google had such a feature :)

Code Repository

https://github.com/dracco/CS101/blob/master/search_with_scc.py

Members

apliszka

About

Repository for Udacity CS101

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages