RDF Generation very slow, places heavy load on system, and exceeds server timeout #234

paulmer · 2024-08-12T14:58:54Z

We have a book with just over 1300 pages that the authors are trying to export using the URL given under the Import/Export tab in the Dashboard. While the text under Export states "Loading the link might take a while depending on the amount of content", we're finding this operation is impractically slow with significant impact on the system services. In our test environment I found the operation didn't complete after 30 minutes, while driving the underlying mysql server to run at nearly 100% CPU utilization. Turning on query logging in mysql, I found hundreds of thousands of queries were being issued in the following pattern that had been repeated over 20,000 times before I aborted the process. 30 minutes is far longer than the timeout we have configured between our apache server and the PHP-FPM service running Scalar.

Is this expected performance, or should I be suspicious of problems with the data or database?

		   278 Connect	[email protected] on  using TCP/IP
		   278 Query	SET NAMES utf8
		   278 Query	USE `scalar`
		   278 Query	DROP TABLE IF EXISTS scalar_store_Q485781f3ff6610d5ada7f7b0546427ba
		   248 Query	SELECT *
FROM (`scalar_db_content`)
WHERE `book_id` =  '9'
AND `slug` =  'media/figure-1-the-lord-of-the-manor-surprises-a-peasant-girl-asleep-in-the-daytime'
LIMIT 1
		   248 Query	SELECT *
FROM (`scalar_db_versions`)
WHERE `version_id` =  '3745'
		   249 Prepare	SELECT id, val FROM scalar_store_s2val WHERE val_hash = '3523874090' ORDER BY id
		   249 Execute	SELECT id, val FROM scalar_store_s2val WHERE val_hash = '3523874090' ORDER BY id
		   249 Close stmt	
		   249 Prepare	SELECT id, val FROM scalar_store_s2val WHERE val_hash = '3523874090' ORDER BY id
		   249 Execute	SELECT id, val FROM scalar_store_s2val WHERE val_hash = '3523874090' ORDER BY id
		   249 Close stmt	
		   249 Query	CREATE TEMPORARY TABLE scalar_store_Q999970073e1268df3b1e8557577c1582 ( 
 `p` int UNSIGNED,
 `o` int UNSIGNED,
 `o type` int UNSIGNED,
 `o lang_dt` int UNSIGNED
) ENGINE=MyISAM
		   249 Query	INSERT INTO scalar_store_Q999970073e1268df3b1e8557577c1582 
SELECT DISTINCT
  T_0_0_0.p AS `p`,
  T_0_0_0.o AS `o`, 
    T_0_0_0.o_type AS `o type`, 
    T_0_0_0.o_lang_dt AS `o lang_dt`
FROM scalar_store_triple T_0_0_0
WHERE (T_0_0_0.s = 0) /* urn:scalar:version:3745 */
		   249 Prepare	SELECT
  V1.val AS `p`,
  V2.val AS `o`, 
    TMP.`o type` AS `o type`, 
    V3.val AS `o lang_dt`
FROM (scalar_store_Q999970073e1268df3b1e8557577c1582 TMP)
 JOIN scalar_store_id2val V1 ON (
            (V1.id = TMP.`p`)
        )
 JOIN scalar_store_o2val V2 ON (
            (V2.id = TMP.`o`)
        )
 JOIN scalar_store_id2val V3 ON (
            (V3.id = TMP.`o lang_dt`)
        )
		   249 Execute	SELECT
  V1.val AS `p`,
  V2.val AS `o`, 
    TMP.`o type` AS `o type`, 
    V3.val AS `o lang_dt`
FROM (scalar_store_Q999970073e1268df3b1e8557577c1582 TMP)
 JOIN scalar_store_id2val V1 ON (
            (V1.id = TMP.`p`)
        )
 JOIN scalar_store_o2val V2 ON (
            (V2.id = TMP.`o`)
        )
 JOIN scalar_store_id2val V3 ON (
            (V3.id = TMP.`o lang_dt`)
        )
		   249 Close stmt	
		   278 Quit	
		   279 Connect	[email protected] on  using TCP/IP
		   279 Query	SET NAMES utf8
		   279 Query	USE `scalar`
		   279 Query	DROP TABLE IF EXISTS scalar_store_Q999970073e1268df3b1e8557577c1582

The text was updated successfully, but these errors were encountered:

craigdietrich · 2024-08-22T15:52:21Z

Hi @paulmer,

This is a really odd one. Unfortunately for the ability to debug on our end, those queries are mostly from the ARC2 library that Scalar loads in to take care of storing metadata on a per-page basis. If they're running out of control, I'm not sure what we can do or even figure out. Though they should definitely not be running hundreds of thousands of queries!

Maybe I can poke around and see if something pops up. Can you email me at craigdietrich at gmail?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RDF Generation very slow, places heavy load on system, and exceeds server timeout #234

RDF Generation very slow, places heavy load on system, and exceeds server timeout #234

paulmer commented Aug 12, 2024

craigdietrich commented Aug 22, 2024

RDF Generation very slow, places heavy load on system, and exceeds server timeout #234

RDF Generation very slow, places heavy load on system, and exceeds server timeout #234

Comments

paulmer commented Aug 12, 2024

craigdietrich commented Aug 22, 2024