Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RDF Generation very slow, places heavy load on system, and exceeds server timeout #234

Open
paulmer opened this issue Aug 12, 2024 · 1 comment

Comments

@paulmer
Copy link
Contributor

paulmer commented Aug 12, 2024

We have a book with just over 1300 pages that the authors are trying to export using the URL given under the Import/Export tab in the Dashboard. While the text under Export states "Loading the link might take a while depending on the amount of content", we're finding this operation is impractically slow with significant impact on the system services. In our test environment I found the operation didn't complete after 30 minutes, while driving the underlying mysql server to run at nearly 100% CPU utilization. Turning on query logging in mysql, I found hundreds of thousands of queries were being issued in the following pattern that had been repeated over 20,000 times before I aborted the process. 30 minutes is far longer than the timeout we have configured between our apache server and the PHP-FPM service running Scalar.

Is this expected performance, or should I be suspicious of problems with the data or database?

		   278 Connect	[email protected] on  using TCP/IP
		   278 Query	SET NAMES utf8
		   278 Query	USE `scalar`
		   278 Query	DROP TABLE IF EXISTS scalar_store_Q485781f3ff6610d5ada7f7b0546427ba
		   248 Query	SELECT *
FROM (`scalar_db_content`)
WHERE `book_id` =  '9'
AND `slug` =  'media/figure-1-the-lord-of-the-manor-surprises-a-peasant-girl-asleep-in-the-daytime'
LIMIT 1
		   248 Query	SELECT *
FROM (`scalar_db_versions`)
WHERE `version_id` =  '3745'
		   249 Prepare	SELECT id, val FROM scalar_store_s2val WHERE val_hash = '3523874090' ORDER BY id
		   249 Execute	SELECT id, val FROM scalar_store_s2val WHERE val_hash = '3523874090' ORDER BY id
		   249 Close stmt	
		   249 Prepare	SELECT id, val FROM scalar_store_s2val WHERE val_hash = '3523874090' ORDER BY id
		   249 Execute	SELECT id, val FROM scalar_store_s2val WHERE val_hash = '3523874090' ORDER BY id
		   249 Close stmt	
		   249 Query	CREATE TEMPORARY TABLE scalar_store_Q999970073e1268df3b1e8557577c1582 ( 
 `p` int UNSIGNED,
 `o` int UNSIGNED,
 `o type` int UNSIGNED,
 `o lang_dt` int UNSIGNED
) ENGINE=MyISAM
		   249 Query	INSERT INTO scalar_store_Q999970073e1268df3b1e8557577c1582 
SELECT DISTINCT
  T_0_0_0.p AS `p`,
  T_0_0_0.o AS `o`, 
    T_0_0_0.o_type AS `o type`, 
    T_0_0_0.o_lang_dt AS `o lang_dt`
FROM scalar_store_triple T_0_0_0
WHERE (T_0_0_0.s = 0) /* urn:scalar:version:3745 */
		   249 Prepare	SELECT
  V1.val AS `p`,
  V2.val AS `o`, 
    TMP.`o type` AS `o type`, 
    V3.val AS `o lang_dt`
FROM (scalar_store_Q999970073e1268df3b1e8557577c1582 TMP)
 JOIN scalar_store_id2val V1 ON (
            (V1.id = TMP.`p`)
        )
 JOIN scalar_store_o2val V2 ON (
            (V2.id = TMP.`o`)
        )
 JOIN scalar_store_id2val V3 ON (
            (V3.id = TMP.`o lang_dt`)
        )
		   249 Execute	SELECT
  V1.val AS `p`,
  V2.val AS `o`, 
    TMP.`o type` AS `o type`, 
    V3.val AS `o lang_dt`
FROM (scalar_store_Q999970073e1268df3b1e8557577c1582 TMP)
 JOIN scalar_store_id2val V1 ON (
            (V1.id = TMP.`p`)
        )
 JOIN scalar_store_o2val V2 ON (
            (V2.id = TMP.`o`)
        )
 JOIN scalar_store_id2val V3 ON (
            (V3.id = TMP.`o lang_dt`)
        )
		   249 Close stmt	
		   278 Quit	
		   279 Connect	[email protected] on  using TCP/IP
		   279 Query	SET NAMES utf8
		   279 Query	USE `scalar`
		   279 Query	DROP TABLE IF EXISTS scalar_store_Q999970073e1268df3b1e8557577c1582
@craigdietrich
Copy link
Collaborator

Hi @paulmer,

This is a really odd one. Unfortunately for the ability to debug on our end, those queries are mostly from the ARC2 library that Scalar loads in to take care of storing metadata on a per-page basis. If they're running out of control, I'm not sure what we can do or even figure out. Though they should definitely not be running hundreds of thousands of queries!

Maybe I can poke around and see if something pops up. Can you email me at craigdietrich at gmail?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants