Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix XBVR scraper so that it actually works... #1259

Closed
wants to merge 9 commits into from

Conversation

vt-idiot
Copy link
Contributor

Actually making it work properly without requiring any GraphQL calls at all.

Note: I have no idea if the .zip/gallery part still works, I never used it, and this scraper never worked for me to begin with since Stash didn't pre-populate the title fields with the file extensions. This fixes that.

Actually making it work properly without requiring any GraphQL calls at all. Note: I have no idea if the .zip/gallery part still works, I never used it, and this scraper never worked for me since Stash didn't actually pre-populate the title fields with the file extensions. This fixes that.
@vt-idiot vt-idiot changed the title Update xbvrdb.py Fix XBVR scraper so that it actually works... Jan 31, 2023
@bnkai
Copy link
Collaborator

bnkai commented Jan 31, 2023

This needs some more troubleshooting/testing.
Stash prior to 0.17 had an option to populate the title from the filename with or without the file extension.
From version >= 0.17 there is no title populated from the filename (unless you use a plugin ?)
Ideally a graphql request (or another sqlite query at least) should be made to retrieve the actual filename from stash and use that to query XBVRs filename.
I don't use the scraper myself, i could write up something in a couple of days if you can test @vt-idiot

@bnkai bnkai added script Scraper executes a script fix Fixes a bug labels Jan 31, 2023
@vt-idiot
Copy link
Contributor Author

vt-idiot commented Feb 1, 2023

TL;DR DW about it. As it stands, I don't think the current version of the script works at all. This was I think the smallest change possible to fix that without overhauling it entirely. I should update the .yml...

Another user had offered a GraphQL script that was supposed to scrape filenames, but even after fixing my GraphQL issue (XBVR user -> 9999 was occupied -> threw Stash on 1234 and had no idea the scraper was actually querying Stash itself as part of the scrape...) it still returned an unmarshal...JSON...EOF error. I think it was because it was trying to do an sql lookup, by appending +'%' to title - normally fine and dandy for sql in Python, but would not work because GraphQL wasn't returning the filename as a string?

I don't know nearly enough about GraphQL, sqlite3, or its Python plugin, or enough Python to test that. But I do know that my little kludge fixes it entirely. XBVR doesn't index any 4 letter file extensions anyways (.webm is the only relevant one I can think of, and from one studio, for videos from many years ago), so (title+'____',)) is a perfect catch-all for both .mkv and .mp4 - it's working perfectly for me now, and the "original" functionality of throwing the XBVR scene-id into the title field as a backup also works, so I figured it was worth sharing.

If you are going to write something...added searchbyQueryFragment would be nice. I tried to make it work and couldn't really figure it out.

Also tried to make it allow choosing between either site or studio from the XBVR DB for the studio field in Stash, but I think Stash only expects one result for studio, so it just wouldn't work. Switched it to reading from studio outright on my end, since JAV usually has the DVD prefix in the site field which isn't very helpful in Stash, but will typically return something more akin to a network field, if it existed on Stash on scenes instead of on individual studios, for most other sites.

@vt-idiot
Copy link
Contributor Author

vt-idiot commented Feb 1, 2023

Found it. I believe @Tweeticoats wrote it?
xbvrdb.py.txt

It's still searching by filename - Stash doesn't return the extension, only the base name. The SQL query was being made with =? with no (apparent) way to turn filename or title into title+'%'.

ie. if you try to edit the query on line 40 in the c.execute... call from the attached file to use an SQL wildcard to fill in for the extension, you'll run into the same exact problem I did.

why waste time use lot word when few word do trick?

there's no need for the GraphQL lookup...using sqlite3 only means the SQL wildcard still works without having to deal with any of the (minimal) overhead or converting the one data type or the other into the "right" one for appending an SQL wildcard, lol

@vt-idiot
Copy link
Contributor Author

vt-idiot commented Feb 1, 2023

Hmm. Feet in mouth. Newly added scenes have the extension in the filename, lmao.

oops. I guess an ISO date is bad
* Adds studio code based on XBVR scene ID
* Uses presence of "JAVR" tag to add "JAV" and "Censored" tags
* Else adds "Virtual Reality" tag
* WIP: Assigns proper Naughty America studios automatically
@vt-idiot
Copy link
Contributor Author

vt-idiot commented May 5, 2023

I'll piggyback onto #1333

@vt-idiot vt-idiot closed this May 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix Fixes a bug script Scraper executes a script
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants