Replies: 2 comments 2 replies
-
This was intentional. We have other places to find stats about how many results we have. Why expose a different non-specific number to a user? 10000 is even more obscure, it doesn't tell the user anything other than that we have a bunch of works, but they can't access them. For a scraper, maybe it's even an indication that they should crawl the tags of each work or something to try to uncover all those extra works behind the pagination barrier. It was even worse because we also showed Both are artificial barriers. 240 and an accurate page count based on that at least indicates how many real pages the user could request. It means something to API consumers. They can predict how many pages of results will exist for a query (e.g., for a frontend that wanted to show this information... maybe even ours?). 10000 doesn't do that. And it's still just as abstract/artificial as 240, and essentially an arbitrary limit (each responding to different problems being solved). Of the two, 240 (or a different value, if authenticated) is the only one with any real meaning. I don't believe this is an issue and recommend closing it.
To clarify, these are separate issues. Scraping can hurt API performance, but the primary motivation to prevent scraping is to prevent scraping. It is against our ToS. Just want to clarify that, for example, we wouldn't undo this pagination limit just because we could handle the performance of it. |
Beta Was this translation helpful? Give feedback.
-
For sure 👍 It didn't occur to me that this would change how the frontend presents works. I think the frontend should just say "top 240" as you suggested. "Over 10k" was already vague. You couldn't even see half that number. If we want to make the actual number of works possibly available for a search, then we could start including |
Beta Was this translation helpful? Give feedback.
-
Description
Previously, search results showed "Over 10,000 results for " label at the top. However, the users could at most view 240 results (12 pages x 20 results).
When implementing the additional search views, which have the same page depth and the same maximum number of results, we decided not to change the
result_count
value returned by the API, but change the result count label in the frontend to say something like "Showing top 240 [image|audio] results for 'cat'".#4372 changed the
result_count
to return at most 240 for unauthenticated users. So, without the "showing top ..." label, it might make the users think that Openverse has very few results for all search terms.The API documentation should also be updated. Currently, it says "Although there may be millions of relevant records, only the most relevant or the most recent several thousand records can be viewed. This is by design: the search endpoint should be used to find the top 10,000 most relevant results, not for exhaustive search or bulk download of every barely relevant result. "
Possible solutions
Keep returning 240 as
result_count
for all searches that have 240 or more results, and update the labels and the API documentation. We can re-use the label that we've always had (Over x results for query), but reduce the x to 240. Or we can change the label to "Show top 240 results for ..."Revert the change to return 10,000 as
result_count
if there are 10000 or more results, and keep "Over 10,000 results".My opinion is we should keep the 240, as that is de-facto maximum number of results that we return, but should update the code to say "Over 240 results". Currently, we only add "Over ..." when the
result_count
is above 10000 (which is never with the changes from #4372).Initial note by @obulat from the original issue:
I think this was unintentional because we never discussed reducing the shown
result_count
for the API results. It is tricky since both 240 and 10000 are confusing: an unauthenticated user will only get at max 240 results. However, I think we wanted to always show that we do have the results, but we are not showing all of them due to the restrictions related to the API performance (to prevent scraping).Beta Was this translation helpful? Give feedback.
All reactions