Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

twitter | help with skipping/--no-skip and --filter, and config #6198

Closed
Baniita opened this issue Sep 18, 2024 · 4 comments
Closed

twitter | help with skipping/--no-skip and --filter, and config #6198

Baniita opened this issue Sep 18, 2024 · 4 comments

Comments

@Baniita
Copy link

Baniita commented Sep 18, 2024

Hi, I'm about to lose my mind bc I've had so much trouble trying to get gallery-dl to work for me, please help... I have so many issues...

  1. I ran a trial run earlier with --filter and it ran fine after eleventeen tries (didn't always work, sometimes it seemed to ignore my --filter)
    but I was trying to test some commands, and now subsequent runs are skipping those posts, which I DON'T want.
    a) Where is the history of seen posts stored? Maybe I can clear those out? Is it just cache.sqlite3, or is it also stored elsewhere? When I renamed cache.sqlite3, I think it still skipped posts marked seen...?
    b) There are 3 "skip: true/false" areas in my config. I'm not sure which one I want to adjust to make it not skip stuff that's already been downloaded.

  2. Do these commands look right?

gallery-dl "https://twitter.com/tls6491" --cookies "C:\Users\Bani\AppData\Roaming\gallery-dl\cookies.txt" -d "C:\Users\Bani\Pictures\zz DL" --filter "datetime(2024, 9, 10) <= date < datetime(2024, 9, 17)"

gallery-dl "https://twitter.com/tls6491" --cookies "C:\Users\Bani\AppData\Roaming\gallery-dl\cookies.txt" -d "C:\Users\Bani\Pictures\zz DL" --filter "date >= datetime(2024, 9, 15)" --verbose --no-skip

  1. Does --filter conflict with --no-skip? I wanted it to re-download everything within a certain date range. It seemed to ignore the filter entirely when metadata and postprocessing was in config

  2. Re: my config...
    a) Is there anywhere that explains the cards, conversations, strategy options, path-extended fields? I don't know what they mean or do.
    b) what is the difference between "users": "user" and "users": "timeline"? my old config used timeline.
    b) I copied most of these fields from other people... I have no idea what they do, particularly the post-processor lol... what does that post-processor do... er... anyway, keeping that one in causes me a lot of problems...?
    c) how does the rest of the config look?

    "extractor":
    {
        "base-directory": "./gallery-dl/",
        "parent-directory": false,
        "postprocessors": null,
        "archive": null,
        "cookies": "C:/Users/Bani/AppData/Roaming/gallery-dl/cookies.txt",
        "cookies-update": true,
        "proxy": null,
        "skip": false,

        "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:115.0) Gecko/20100101 Firefox/115.0",
        "retries": 4,
        "timeout": 30.0,
        "verify": true,
        "fallback": true,

        "sleep": 0,
        "sleep-request": 0,
        "sleep-extractor": 0,

        "path-restrict": "auto",
        "path-replace": "_",
        "path-remove": "\\u0000-\\u001f\\u007f",
        "path-strip": "auto",
        "path-extended": true,

        "extension-map": {
            "jpeg": "jpg",
            "jpe" : "jpg",
            "jfif": "jpg",
            "jif" : "jpg",
            "jfi" : "jpg"
        },
                "twitter":
        {
            "username": "tls6491",
            "password": "-redacted-",
			"filename": "{author['name']}-{tweet_id}-0{num}.{extension}",
			"base-directory": "C:/Users/Bani/Pictures/zz DL",
			"sleep": 2,
			"sleep-request": 2,
			"ratelimit": "wait:1800",
			"cards": false,
        	        "conversations": true,
			"pinned": true,
			"quoted": true,
           		"replies": true,
          		"retweets": true,
           		"strategy": null,
			"locked": "wait",
			"twitpic": true,
			"unique": true,
            		"users": "user",
			"videos": false,
			"expand": false,
			"relogin": true,
			"size": "orig",
			"skip": false,
			"path-extended": false,
			"retries": 1,
			"retry-codes": [429, 430],
			"metadata": true,
            "postprocessors":[
                {
		    "name": "mtime",
                    "key": "date"
				},
				{	
					"name": "metadata",
					"event": "post",
					"filename": "{author['name']}-{tweet_id}-{num}_{date:?//%Y-%m-%d %H_%M_%S}.json"
				}
            ]
        }
    },

    "downloader":
    {
        "filesize-min": null,
        "filesize-max": null,
        "mtime": true,
        "part": true,
        "part-directory": null,
        "progress": 3.0,
        "rate": null,
        "retries": 8,
        "timeout": 30.0,
        "verify": true,

        "http":
        {
            "adjust-extensions": true,
            "chunk-size": 32768,
            "headers": null,
            "validate": true
        },

        "ytdl":
        {
            "format": null,
            "forward-cookies": false,
            "logging": true,
            "module": null,
            "outtmpl": null,
            "raw-options": null
        }
    },

    "output":
    {
        "mode": "auto",
        "progress": true,
        "shorten": true,
        "ansi": false,
        "colors": {
            "success": "1;32",
            "skip"   : "2"
        },
        "skip": false,
        "log": "[{name}][{levelname}] {message}",
        "logfile": null,
        "unsupportedfile": null
    },

    "netrc": false
}
@mikf
Copy link
Owner

mikf commented Sep 18, 2024

a) Where is the history of seen posts stored?

In a download archive, which you haven't enabled, so nowhere. It won't overwrite already existing files though, at least not by default. That's what --no-skip / "skip": "false" is for.

b) There are 3 "skip: true/false" areas in my config. I'm not sure which one I want to adjust to make it not skip stuff that's already been downloaded.

The one in the twitter block.

You might want to re-enable the output.skip one, since it wont display skipped downloads otherwise.

Do these commands look right?

I think so.

Does --filter conflict with --no-skip?

It doesn't.

--filter makes gallery-dl completely ignore files for which the filter expression is false. --no-skip causes gallery-dl to overwrite already downloaded files which weren't --filtered.

Is there anywhere that explains the cards, conversations, strategy options, path-extended fields?

https://gdl-org.github.io/docs/configuration.html#extractor-twitter-ads
(scroll down a bit to see all twitter options)

what is the difference between "users": "user" and "users": "timeline"?

https://gdl-org.github.io/docs/configuration.html#extractor-twitter-users

I copied most of these fields from other people... I have no idea what they do,

Nice.

what does that post-processor do

https://gdl-org.github.io/docs/configuration.html#postprocessor-options

The mtime one sets the mtime of downloaded files to the time stored in the date metadata field.

The metadata one writes each Tweet's metadata to an external .json file.

@ForxBase

This comment was marked as off-topic.

@Baniita
Copy link
Author

Baniita commented Sep 20, 2024

So in the SQlite, nowhere else? I swear it kept marking things as seen even when I moved the sqlite lol... did I make an error..

the extractor section skip doesn't matter? or is it like, extractor is for every extractor, but twitter is just for twitter--

so... no difference between "user" and "timeline"? 😂 I assume functionally there's no difference...

the doc on postprocessors may be beyond me rip. I've been having issues with it enabled. will it still record my history even if I have postprocessor section on my twitter config off? (maybe I will want it to later, but when I used the postprocessor section, it just downloaded jsons and I worried it'll never download the actual pics, so I had to shut it down before I got ratelimited for a day)

thank you very much!

@mikf
Copy link
Owner

mikf commented Sep 21, 2024

So in the SQlite, nowhere else?

SQLite archives (not the cache file) and files already present in the filesystem.

or is it like, extractor is for every extractor, but twitter is just for twitter--

Exactly. When skip is defined for twitter, it overrides the "global" extractor.skip setting, but only for twitter URLs.

so... no difference between "user" and "timeline"?

You can set different options per subcategory (like user and timeline), but yes, there is no functional difference.

will it still record my history even if I have postprocessor section on my twitter config off

Yes. Post processors and archives for actual downloaded files are completely independent.

@Baniita Baniita closed this as completed Sep 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants