Skip to content

Stage 1: Frame Extraction and Similar Image Removal

CyberMeow edited this page Dec 24, 2023 · 13 revisions

Extract thousands of frames per episode of 24 minutes and remove similar images

  • If you start from this stage, please set --src_dir to the folder containing either the videos (for screenshots pipelines) or the raw images (for booru pipeline).
  • Output folder: /path/to/dataset_dir/intermediate/{image_type}/raw
  • After this stage, you can go over the images to select those you want to keep.

Frame extraction

Requirements: ffmpeg with mpdecimate filter; cuda support is optional but highly recommended

This part is specific to the screenshots pipeline. For the moment being, I directly rely on calling ffmpeg command like

file_pattern="${dst_ep_dir}/${prefix}EP$((ep_init+i))_%d.png"

ffmpeg -hwaccel cuda -i $filename -filter:v \
"mpdecimate=hi=64*200:lo=64*50:frac=0.33,setpts=N/FRAME_RATE/TB" \
-qscale:v 1 -qmin 1 -c:a copy "$file_pattern"

The use of mpdecimate filter removes consecutive frames that are too similar. Enabling the filter makes a big difference in both processing speed and the number of extracted frames (probably 10 times fewer images with the filter). An even more aggressive approach would be to keep only the key frames.

  • extract_key: Extract key frames instead of using mpdecimate filter.
    Example usage: --extract_keys
  • image_prefix: This allows you to give a prefix to the extracted images. If not provided the prefix corresponding to each video is inferred from file name.
    Example usage: --image_prefix haifuri
  • ep_init: The episode number to start from. Since the processing order is obtained by sorting the filenames, the episode given here could be different from the actual on. If not provided the episode number of each video is inferred from file name.
    Example usage: --ep_init 4

⚠️ It is important to ensure that every image has different name at this stage. Otherwise some images will be overwritten later.

Similar image removal

This step uses 'mobilenetv3_large' to detect and remove similar images, reduces dataset size by a factor of 2 to 10 depending on the anime.

Since the extracted images can take a lot of place, I have made the decision to combine the two steps and perform removal for each episode independently after its frames are extracted. A final pass is also performed on all the extracted images at the end to remove repeated images in notably OP and ED.

  • no_remove_similar: Put this argument to skip this step.
    Example usage: --no_remove_similar
  • similar_thresh: The threshold above which we judge that two images are too similar. Default is 0.95.
    Example usage: --similar_thresh 0.9
  • detect_duplicate_model: You can use any other model from timm for duplicate detection.
    Example usage: --detect_duplicate_model mobilenetv2_100
  • detect_duplicate_batch_size: Batch size for computing the features of images that are used for duplicate detection.
    Example usage: --detect_duplicate_batch_size 32