Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

# Pass Image color channels information to Transformers #2846

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Commits on Jul 18, 2024

  1. # Pass Image color channels information to Transformers

    Background:
    In Huggingface Transformers' image processor, e.g. CLIPImageProcessor, the constructor requires input of input_data_format, which gives the Image's color channels being in the first or the last position in its shape.
    
    For example, if an image's shape is (512, 512, 3), it means its resolution is 512*512 pixels, and it has RBG, 3 color channels. In this case, input_data_format is ImageChannelDimension.LAST or ChannelDimension.LAST in Transformers.
    
    Sometimes, people would use customized Image format in a shape of (3, 512, 512) for performance purpose. Transformers requires users to point it out, or it would infer to tell it from its shape.
    
    Generally, an image would have 1 or 3 color channels representing Gray or RGB. So, the inferring algorithm in Transformers looks for 1 or 3 values in the image's shape.
    
    If your input images are in the shape of (3, xxx, 1) or (1, xxx, 3), the inferring algorithm would get confused, and raise following exception:
    'The channel dimension is ambiguous. Got image shape (1, xxx, 3). Assuming channels are the first dimension.' 'ValueError: mean must have 1 elements if it is an iterable, got 3'
    
    Fix:
    1. Add a class ImageChannelDimension to define 2 possible Image color channels position in an Image's shape
    2. Input this information in model.encode method, and pass it to Tokenizer and image processor from Transformers.
    davychxn committed Jul 18, 2024
    Configuration menu
    Copy the full SHA
    7a7cb61 View commit details
    Browse the repository at this point in the history

Commits on Jul 25, 2024

  1. # Made 2 modifications.

    1. Add doc-string for newly added 'image_channel_dimension' parameter of 'encode' function.
    2. Changed the parameter's name from 'input_data_format' to 'image_channel_dimension'.
    davychxn committed Jul 25, 2024
    Configuration menu
    Copy the full SHA
    33a4ebc View commit details
    Browse the repository at this point in the history

Commits on Jul 27, 2024

  1. # Modified 2 files

    1. To make the 'tokenize' interface compatible between Texts and Images.
    davychxn committed Jul 27, 2024
    Configuration menu
    Copy the full SHA
    e683051 View commit details
    Browse the repository at this point in the history

Commits on Sep 20, 2024

  1. Merge branch 'master' of https://github.com/UKPLab/sentence-transformers

    
    
    And fixed Conflicts in:
    sentence_transformers/SentenceTransformer.py
    davychxn committed Sep 20, 2024
    Configuration menu
    Copy the full SHA
    531c59a View commit details
    Browse the repository at this point in the history