Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optional Use of Body_Safe #110

Open
HurricanePete opened this issue Sep 10, 2019 · 5 comments
Open

Optional Use of Body_Safe #110

HurricanePete opened this issue Sep 10, 2019 · 5 comments

Comments

@HurricanePete
Copy link

HurricanePete commented Sep 10, 2019

Hello @Jerska, just wanted to ask about the possibilities of customizing or deactivating the character limit dictated by body_safe. Something like passing an option to algoliasearchZendeskHC for characterLimit or setting it to false in order to store the entire article body and override the default option here:

def truncate str, max = 5_000

It was changed here as a bug fix: https://github.com/algolia/algoliasearch-zendesk/blob/master/CHANGELOG.md#2173-2017-10-17

It does look like crawler options come from Algolia (not Zendesk frontend), but would love to get some extra context about why the change was made and what our options are.

We use the search and instant search for a small knowledge base for our application. Since we don't have too many documents to index, we'd like to try and include the entire document body as part of the searches - since currently a lot of the article is not included in these search functions.

Would be happy to open a PR for this if it makes sense and there's not some other reason not to have it.

@Jerska
Copy link
Member

Jerska commented Sep 16, 2019

Hi @HurricanePete . Thanks for raising the issue.

The reason for this character limit is that Algolia has a size limit for records.
It used to be 100KB, but changed over time to now be 10KB, and we need to leave some available room for the other attributes of the article.
https://www.algolia.com/doc/faq/basics/is-there-a-size-limit-for-my-index-records/

Our suggestion in case some articles don't show up for a search query because of the size limit is to add relevant keywords in the tags of your article.

The long term solution would be #54 .
The idea is then to split the article in 1 record per paragraph instead of a record per article and use distinct at query time to have only one result per article. This consumes more records, but scales with really long documents.
As you can see in the creation date of the issue above, this has been a topic that we haven't tackled in a really long time.

Our Zendesk integration is currently in maintenance only mode, and we do not plan to add any new feature (which this would be).
I you'd be interested in creating a PR for this, I'd be happy to review it, but this requires a bunch of non-trivial changes.

@HurricanePete
Copy link
Author

Hello @Jerska - thank you for the reply. Our indexes average about 3kb each, so that shouldn't be a problem. Would you see any potential issues if we disabled the integration and uploaded the full article bodies from our end? We already (effectively) have a crawler in place, this would just be for reindexing on the Algolia side.

@Jerska
Copy link
Member

Jerska commented Sep 19, 2019

If you're able to do the indexing on your part, by all means feel free to. The requirement is to match the extracted JSON our system indexes.

What I'm not sure to understand is how that would fix the issue. You'll be facing the same limit, and if some records are truncated today by our script, it means you already have articles above 5KB. While there is some room between 5 and 10KB, I guess we can safely assume some of them will be above limit and fail to be indexed.

@HurricanePete
Copy link
Author

HurricanePete commented Sep 25, 2019

Ah - bit of a mix up there. I am planning on splitting by paragraph and then using the distinct feature within Algolia. This seems like the best option at this point, as I think you said the ability to do that through the algoliasearch-zendesk integration hadn't been developed.

@Jerska
Copy link
Member

Jerska commented Sep 26, 2019

It makes sense.

You are correct that the integration doesn't support this at this point in time.
We're open to Pull Requests, so if you want to take our code as a base for the script and submit one, it could be integrated directly in the connector.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants