Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DRAFT] Support multiple tokenizers and other layers with assets #1860

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Commits on Sep 22, 2024

  1. Support multiple tokenizers and other layers with assets

    Preset saving and loading does not currently generalize to multiple
    tokenizers (or other preprocessor with static assets), this is a work
    in progress PR towards adding it, specifically for stable diffusion.
    
    The high-level api would allow something like this
    
    ```python
    # High-level loading.
    image_to_text = keras_hub.models.ImageToText.from_preset(
        "sd3_preset",
    )
    # Low-level tokenizer loading.
    clip_l_tokenizer = kersa_hub.tokenizers.Tokenizer.from_preset(
        "sd3_preset", config_file="clip_l_tokenizer.json",
    )
    clip_g_tokenizer = kersa_hub.tokenizers.Tokenizer.from_preset(
        "sd3_preset", config_file="clip_g_tokenizer.json",
    )
    ```
    
    During conversion, we would need to make sure each tokenizer was
    created with a separate `config_file` passed to the constructor. Then
    when calling `task.save_to_preset("path")`, you would get the following
    structure.
    
    ```
    assets/clip_l_tokenizer/...
    assets/clip_g_tokenizer/...
    assets/t5_tokenizer/...
    clip_l_tokenizer.json
    clip_g_tokenizer.json
    t5_tokenizer.json
    ```
    mattdangerw committed Sep 22, 2024
    Configuration menu
    Copy the full SHA
    5bdfdea View commit details
    Browse the repository at this point in the history