Performance indexing and searching #559

Ruitjes · 2021-09-29T09:15:22Z

Hello,

I made two proof of concepts of search-index and itemsjs in Next.js. I looked at the indexing time and the search time of the libraries, using a dataset with 28k movies containg 4 fields.

Results:

Itemsjs indexing time = ~700msec
Itemsjs search time = ~300msec
Search-Index indexing time = ~ 90000msec (1.5 minutes)
Search-Index search time = ~ 2000msec

Even with search-index using 500 rows it resulted in higher indexing time (~1200msec) then itemsjs with 28k rows.

Indexing code search-index:

async function initSearchIndex() {
    // initialize an index
    const searchIndex = await si();

    // Without this the index will keep filling after every refresh 
    await searchIndex.FLUSH()

    const t0 = window.performance.now();
    // add documents to the index
    await searchIndex.PUT(movies)
    const t1 = window.performance.now();
    console.log('Indexing took', t1 - t0, 'msec and');
    setInitTime(t1 - t0)
    
    // Calculate number of bytes on disk
    const numBytes = new Blob([movies]).size;
    console.log(numBytes, 'bytes (on disk)');
  
    return searchIndex;
  }

Is this common behaviour for search-index?

Thanks in advance.

The text was updated successfully, but these errors were encountered:

fergiemcdowall · 2021-09-29T13:06:32Z

Hi!

Probably the main difference is that search-index inserts into a persistent database, so you can index, stop the program and then start the progam and your index is still there. I think that itemjs is an "in memory" index, so you have to reindex every time you start up.

fergiemcdowall · 2021-09-29T13:17:58Z

BTW- I dont understand the 2000ms for a search. That seems very high- what are you searching for?

Ruitjes · 2021-09-30T09:07:01Z

Hi Fergie,

Thanks for your fast response!
That indeed seemed to be the cause for the longer indexing time, it has been changed to memdown and is now much faster ~ 5 seconds for 28k rows!

The searching time still takes quite some time, I might have it configured incorrect?

Code that runs on each change:

useEffect(() => {
    if (!searchIndex) {
      return;
    }

    async function search(){
      const t2 = window.performance.now();

      const q = {
        SEARCH: [query.search, { 
          FIELD: ['year'], 
          VALUE: {
            GTE: query.sliderGTE.toString(),
        },
        }],
      }
      const results = await searchIndex.QUERY(q, {
        DOCUMENTS: true,
        SORT: {
          TYPE: 'NUMERIC',              // can be 'NUMERIC' or 'ALPHABETIC'
          DIRECTION: query.sort,        // can be 'ASCENDING' or 'DESCENDING'
          FIELD: '_match.year'          // field to sort on
        },
        PAGE: {
          NUMBER: 0,
          SIZE: 25
        }
      })

      const t3 = window.performance.now();
      console.log('Searching took', t3 - t2, 'msec');
      setSearchTime(t3 - t2)
      
      console.log('searchIndex',searchIndex)
      setDocuments(results)
      setFirstSearch(true)
    }

    search();

  }, [searchIndex, query.search, query.sort, query.sliderGTE, query.sliderLTE])

Sandbox as demonstration: https://codesandbox.io/s/adoring-wozniak-ujip3

leeoniya · 2023-10-01T10:55:35Z

@fergiemcdowall

unfortunately, i can confirm that this library is extremely slow to index, to the point that i cannot include it in my uFuzzy benchmark :(

when trying to add 161k documents each containing a single string from testdata.json, i get a callstack explosion:

reducing the document count by 10x to 16k, clocks in at 33266ms to build the index. by comparison, the next-slowest library in the benchmark takes 5620ms to index the full 161k dataset. fast fulltext libs take ~500ms to index the entire dataset.

fergiemcdowall · 2023-10-03T10:06:08Z

Hehe- I can see that you are perhaps not completely impartial @leeoniya , but thanks for the test case 👍🙂

Indexing speed is a secondary consideration since search-index allows persistence and import. This means search-index can be switched off and on, and the data will still be there. It also means that implementers can pregenerate indices and then quickly import them.

That said- it seems that you have managed to break search-index in a way that shouldn't be possible- I will take a look at your test data when I have time and either introduce a fix, or suggest a set of recommended options that might alleviate the performance issues.

leeoniya · 2023-10-03T12:54:32Z

Hehe- I can see that you are perhaps not completely impartial @leeoniya , but thanks for the test case 👍🙂

not impartial, but trying hard to stay objective. it's basically impossible to do a broad apples-to-apples comparison of so many libs. search-index appears to be an outlier among outliers, though.

Indexing speed is a secondary consideration

strange take, i gotta say. all search libs i've encountered take at least some care to ensure their indexing performance isn't accidentally quadratic.

since search-index allows persistence and import.

think there are several libs that offer serializable indices. Pagefind being one, MiniSearch is another.

I will take a look at your test data when I have time and either introduce a fix, or suggest a set of recommended options that might alleviate the performance issues.

👍

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance indexing and searching #559

Performance indexing and searching #559

Ruitjes commented Sep 29, 2021

fergiemcdowall commented Sep 29, 2021

fergiemcdowall commented Sep 29, 2021

Ruitjes commented Sep 30, 2021

leeoniya commented Oct 1, 2023 •

edited

Loading

fergiemcdowall commented Oct 3, 2023 •

edited

Loading

leeoniya commented Oct 3, 2023 •

edited

Loading

Performance indexing and searching #559

Performance indexing and searching #559

Comments

Ruitjes commented Sep 29, 2021

fergiemcdowall commented Sep 29, 2021

fergiemcdowall commented Sep 29, 2021

Ruitjes commented Sep 30, 2021

leeoniya commented Oct 1, 2023 • edited Loading

fergiemcdowall commented Oct 3, 2023 • edited Loading

leeoniya commented Oct 3, 2023 • edited Loading

leeoniya commented Oct 1, 2023 •

edited

Loading

fergiemcdowall commented Oct 3, 2023 •

edited

Loading

leeoniya commented Oct 3, 2023 •

edited

Loading