SOSSE (Selenium Open Source Search Engine) is a Web archiving software, crawler and search engine written in Python, distributed under the GNU-AGPLv3 license. It is hosted on both Gitlab and Github site, please use any of them to open feature requests, bug report or merge requests, or open a discussion.
SOSSE main features are:
- 🌍 Browser based crawling: SOSSE uses Mozilla Firefox, or Google Chromium and Selenium to index pages that use Javascript. Requests can also be used for faster crawling
- 📚 Offline browsing: SOSSE can save HTML copy or take screenshots of crawled pages to create archives suitable for offline browsing
- 📉 Low resources requirements: SOSSE is entirely written in Python and uses PostgreSQL for data storage
- 🔓 Authentication: the crawlers can submit authentication forms with provided credentials
- 🔗 Search engines shortcuts: shortcuts search queries can be used to redirect to external search engines (sometime called "bang" searches)
- 🔖 Search history: users can authenticate to log their search query history privately
See the documentation and screenshots.
You can try the latest version with Docker:
docker run -p 8005:80 biolds/sosse:latest
Open http://127.0.0.1:8005/, and log in with user admin
, password admin
.
To persist Docker data, or find alternative installation methods, please check the documentation.
Join the Discord server to get help and share ideas!