Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source: Mastodon #174

Open
erlend-sh opened this issue Sep 14, 2024 · 5 comments
Open

Source: Mastodon #174

erlend-sh opened this issue Sep 14, 2024 · 5 comments

Comments

@erlend-sh
Copy link
Contributor

erlend-sh commented Sep 14, 2024

Started here:

Good reference for broader backup coverage by @kensanata here: https://github.com/kensanata/mastodon-archive

Personally my most wanted feature for my Mastodon content on weird right now is an easy search through my post history.

@zicklag
Copy link
Collaborator

zicklag commented Sep 16, 2024

Search is maybe a little tricky, depending on how often we need to update the search index.

I think the easiest way to get started will be to try out tinysearch. The only limitation that will probably bug much initially will be that it only works on whole words, but that's probably fine for starters?

@erlend-sh
Copy link
Contributor Author

Search is maybe a little tricky, depending on how often we need to update the search index.
What about once a day?

Can the index be updated incrementally or would we have to do a complete re-indexing every time?

PageFind is another good candidate for this. I can’t quite tell if it has the same limitation of full words. In any case, that’s pretty much on par with the default in-client search of Mastodon.

@zicklag
Copy link
Collaborator

zicklag commented Sep 17, 2024

What about once a day?

Yeah, I think that's reasonable. We'll just have to keep an eye on what kind of processing power that's taking and whether the app keeps responding smoothly while building indexes, etc.

We have to do re-synchronization with Mastodon on a certain cadence anyway, so we should probably build the index whenever we sync.

Can the index be updated incrementally or would we have to do a complete re-indexing every time?

Incremental updates aren't a built-in feature, but now that I think about it, because Leaf is content addressed, and tinysearch is super simple, I think it's actually really easy to make it incremental. I'd have to test it to make sure, but I think that could actually work great.

I already checked out PageFind, which looks solid for the typical static site use-case, but doesn't seem to have a library mode where we can use it to index our own, non-HTML content.

I opened a discussion to make sure: CloudCannon/pagefind#708

@zicklag
Copy link
Collaborator

zicklag commented Sep 23, 2024

Ah, got feedback that PageFind does support custom indexes through their Node.js API. That's perfect for us!

CloudCannon/pagefind#708 (comment)

@erlend-sh
Copy link
Contributor Author

Nice. Could you make a new issue for ‘Mastodon archive search’ outlining the basic Job to be Done?

@kimlimjustin can probably continue it from there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants