Skip to content

biolds/sosse

Repository files navigation

SOSSE 🦦

SOSSE (Selenium Open Source Search Engine) is a Web archiving software, crawler and search engine written in Python, distributed under the GNU-AGPLv3 license. It is hosted on both Gitlab and Github site, please use any of them to open feature requests, bug report or merge requests, or open a discussion.

SOSSE main features are:

  • 🌍 Browser based crawling: SOSSE uses Mozilla Firefox, or Google Chromium and Selenium to index pages that use Javascript. Requests can also be used for faster crawling
  • 📚 Offline browsing: SOSSE can save HTML copy or take screenshots of crawled pages to create archives suitable for offline browsing
  • 📉 Low resources requirements: SOSSE is entirely written in Python and uses PostgreSQL for data storage
  • 🔓 Authentication: the crawlers can submit authentication forms with provided credentials
  • 🔗 Search engines shortcuts: shortcuts search queries can be used to redirect to external search engines (sometime called "bang" searches)
  • 🔖 Search history: users can authenticate to log their search query history privately

See the documentation and screenshots.

Try it out

You can try the latest version with Docker:

docker run -p 8005:80 biolds/sosse:latest

Open http://127.0.0.1:8005/, and log in with user admin, password admin.

To persist Docker data, or find alternative installation methods, please check the documentation.

Keep in touch

Join the Discord server to get help and share ideas!