Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to load 92MB file with 5 tables #331

Open
gillianh1 opened this issue Sep 16, 2022 · 12 comments
Open

Unable to load 92MB file with 5 tables #331

gillianh1 opened this issue Sep 16, 2022 · 12 comments

Comments

@gillianh1
Copy link

Description: Generated a file using DBPTK desktop. Contains 5 tables and the file is 92MB. When try to open file in DBPTK desktop a blue progress dot pulses on the open option but the file never loads.

Context:
DBPTK Desktop: Installed on Windows 10 PC
Using dbptk-desktop-2.6.0.exe

Steps required to reproduce the bug:

  1. Generated a file using DBPTK desktop. Contains 5 tables and the file is 92MB.
  2. When try to open file in DBPTK desktop a blue progress dot pulses on the open option but the file never loads. ( a smaller file of 2MB with 2 tables does load successfully)
  3. I tried increasing memory in settings. But still unable to load the file.
  4. Have we reached the limitations of DBPTK desktop or running this on a Windows PC?

Is there any documentation on hardware/sizing requirements or limitations?

image

@hmiguim
Copy link
Member

hmiguim commented Sep 16, 2022

Hi,

Please attach the log files to better understand the problem. Logs are available in the menu Help -> Logs

@gillianh1
Copy link
Author

gillianh1 commented Sep 16, 2022

The file was created successfully using 2.6.0 but we where unable to open using 2.6
We have since been able to connect to same database and user using version 2.6.1 and have been able to create a new extract file and open the 92MB file.
We are however still unable to load the original file created using version 2.6 in 2.6.1 desktop exe.
We are able to open the new file created in version 2.6.1 using version 2.6 desktop exe.
I will upload the log

@gillianh1
Copy link
Author

dbvtk.log
Latest failed attempt at 11:47

@luis100
Copy link
Member

luis100 commented Sep 16, 2022

This seems to be the issue, a non-hex character in input.

2022-09-16 11:47:58,262 [http-nio-auto-1-exec-7] ERROR o.a.solr.handler.RequestHandlerBase - org.apache.solr.common.SolrException: org.apache.solr.search.SyntaxError: Non-hex character in Unicode escape sequence: o
org.apache.solr.common.SolrException: org.apache.solr.search.SyntaxError: Non-hex character in Unicode escape sequence: o
	at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:212)
	at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:333)
	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:216)
	at org.apache.solr.core.SolrCore.execute(SolrCore.java:2637)
	at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:227)
	at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:214)
	at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:1003)
	at com.databasepreservation.common.server.index.utils.SolrUtils.find(SolrUtils.java:155)
	at com.databasepreservation.common.server.index.DatabaseRowsSolrManager.find(DatabaseRowsSolrManager.java:178)
	at com.databasepreservation.common.api.v1.DatabaseResource.getViewerDatabaseIndexResult(DatabaseResource.java:97)
	at com.databasepreservation.common.api.v1.DatabaseResource.find(DatabaseResource.java:71)

@luis100
Copy link
Member

luis100 commented Sep 16, 2022

Generally, the XML might be malformed, it started using an Unicode escape sequence but then put an "o" instead of a number. So you must look into the SIARD content to see where this came from.

@gillianh1
Copy link
Author

The SIARD file was produced using DBPTK Desktop (Using dbptk-desktop-2.6.0.exe)

No error was received when file was produced. So how would we know there was an issue with the file? Do we always need to open and validate the file. Can we not assume a file is OK if SIARD file created without error?

If rename the SIARD file with a .zip extension we can navigate the files.

We have subsequently create a new file using dbptk-desktop-2.6.1.exe pointing to the same user an database and this file is OK so it is not an issue with the tables/data being extracted from the database.

I will try generating the file again from 2.6.0 Desktop version to see if can reproduce the issue.

@gillianh1
Copy link
Author

I was able to extract, import and validate the file in version 2.6.
image
This time the file does open.
I have access to both files and both files are the same size.
I saved both files as .zip and was able to navigate all files/tables.
I will attach the log.

@gillianh1
Copy link
Author

Latest log

dbvtk.log

Original file from 2.6 will not load (uoesiardschema_extract.siard)
New file from 2.6 will load (2.6_uoesiardschema_extract.siard)

@hmiguim
Copy link
Member

hmiguim commented Sep 21, 2022

Hi @gillianh1 thank you for using and testing DBPTK and your feedback. Since version 2.6.1 is working fine I suggest you using that version instead of 2.6.0.

@gillianh1
Copy link
Author

This is what I plan to do. My only concern is that a file that was produced without error yet it cannot be opened.
I would not like to be in this position when try to open a SIARD file in the future.

Is your recommendation to create, open and validate each file that is produced before archiving?

Thanks

@hmiguim
Copy link
Member

hmiguim commented Sep 21, 2022

The validation step is essential to have a proof that the produced SIARD is following the specification.

To ensure that no record is lost you can use a module called Merkle Tree filter documentation available here. However this requires to have a stored procedure that calculates the hash for every column exported using the Merkle tree top hash algorithm.

DBPTK offers you a set of tools to validate and verify completeness and correctness. And as a rule of thumb you should create, open and validate to see if the extract process went well.

@gillianh1
Copy link
Author

Thank you for you help and confirmation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants