Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sceneByFragment cannot find scene for SexLikeReal? #2069

Open
NormanPriv opened this issue Oct 18, 2024 · 2 comments
Open

sceneByFragment cannot find scene for SexLikeReal? #2069

NormanPriv opened this issue Oct 18, 2024 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@NormanPriv
Copy link

I am new to the scrapers, but I have never successfully got "Scrape with fragment" to work with the SexLikeReal scraper.

It always gives the root URL without finding the URL for the specific scene and getting info for that scene.

Here is the part from SexLikeReal.yml that deals with the fragment:

sceneByFragment:
  action: scrapeXPath
  # url format: https://www.sexlikereal.com/scenes/{title}-{code}
  # However, the url:
  #     https://www.sexlikereal.com/{code}
  # will redirect to the full url so that is what we will use for scrapping
  queryURL: https://www.sexlikereal.com/{filename}
  queryURLReplace:
    # filename format:
    #   SLR_{stufio:[^_]+}_{title:[^_]+}_{res:\d+p}_{code:\d+}_{vrtype}.{ext}
    #     vrtype: stuff we do not care about but could contain '_'
    filename:
      - regex: (?i)^SLR_.+_\d+p_(\d+)_.*$
        with: $1
      - regex: .*\.[^\.]+$ # if no id is found in the filename
        with: # clear the filename so that it doesn't leak
  scraper: sceneScraper

It seems that the regex that replaces filename with the studio code is not working.

@NormanPriv NormanPriv added the bug Something isn't working label Oct 18, 2024
@Maista6969
Copy link
Collaborator

I'd love to make scrapers smarter so they can avoid the unnecessary request if the pattern doesn't match, but that would require some changes to the scraper engine

Does the studio have a new filename format? What's the name of the file you're attempting to scrape?

@NormanPriv
Copy link
Author

NormanPriv commented Oct 19, 2024

@Maista6969 Here is one example where the scraper cannot generate the scene URL:
SLR_SLR Originals_Naughty Neighbor_original_47332_FISHEYE190

The file name above is the one I got when downloading the file in a browser by clicking their download link.

On my local copy, I change the pattern as follows and now it works:

queryURL: https://www.sexlikereal.com/{filename}
  queryURLReplace:
    # Recognize code from file name
    filename:
      - regex: '.*_(\d+)_.*'
        with: $1

Somehow the original pattern is too restrictive?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants