Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ [Tasks] JSON Schema spec for Inference types + TS type generation #449

Merged
merged 51 commits into from
Jan 26, 2024

Conversation

SBrandeis
Copy link
Contributor

@SBrandeis SBrandeis commented Jan 19, 2024

TL;DR

TODO

  • Add a text2text-generation task to serve as a "canonical reference" for summarization & translation
  • Add a text-to-audio task to server as a "canonical reference" for text-to-speech
  • Code generation tweaks
    • Change any types to unknown
    • Generate output types for sentence-similarity
    • Generate output types for feature-extraction -> Let's do that later?
  • add placeholder specs

@SBrandeis
Copy link
Contributor Author

Ping @coyotte508 for visibility

Copy link
Member

@julien-c julien-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks 😍😍😍

Copy link
Contributor

@osanseviero osanseviero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

/**
* Inputs for Audio Classification inference
*/
export interface AudioClassificationInput {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-flagging this comment in case it was lost

Copy link
Contributor

@Wauplin Wauplin Jan 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for images. The jsonschema cannot specify this since sending as raw data and sending as json are 2 different things. So for now it's kind of a blind spot. If we provide an openapi schema for our APIs in the future, then it will be possible to document it. Openapi easily integrates with jsonschema so having them is already a first good step.

(difference between a jsonschema as in this PR and an openapi description is that this PR describes objects with their attributes while the openapi description with include stuff like server routes, accepted headers, etc.)

(^ only my understanding of the specs, anyone feel free to correct me 😄)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - sorry for the delay in answering

Leaving the image/audio data as unknown was intentional, to give more flexibility to the libraries.
Image & audio data can be passed in several different forms (raw binary data, path to a local or remote file, base64 encoded data...) and I did not want to constrain downstream users of those types into one single representation.

(difference between a jsonschema as in this PR and an openapi description is that this PR describes objects with their attributes while the openapi description with include stuff like server routes, accepted headers, etc.)

Yes that is correct, there will be some additional work necessary to generate an OpenAPI spec for an inference API (including actually specifying how we expect the binary data to be represented)

@@ -216,6 +216,7 @@ export interface TaskData {
datasets: ExampleRepo[];
demo: TaskDemo;
id: PipelineType;
canonicalId?: PipelineType;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this property to express one task being a "subtask" of another (eg, summarization being a subtask of text2text-generation)

@SBrandeis
Copy link
Contributor Author

I added a "post-process" script using the typescript API to generate the appropriate array type while glideapps/quicktype#2481 is being handled

/**
* The function to apply to the model outputs in order to retrieve the scores.
*/
functionToApply?: AudioClassificationOutputTransform;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not supported by any library afaik

Comment on lines +6 to +8
"items": {
"description": "The output depth labels"
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iirc, the output is a dictionary with two entries, one being the depth which is a depth estimation image, the other is predicted_depth, which is the tensor. See https://huggingface.co/docs/transformers/main/tasks/monocular_depth_estimation

Comment on lines +6 to +8
"items": {
"description": "The output depth labels"
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +86 to +98
/**
* The answer to the question.
*/
answer: string;
end: number;
/**
* The probability associated to the answer.
*/
score: number;
start: number;
/**
* The index of each word/box pair that is in the answer
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the alphabetical order is a bit weird with the docstrings. We have The answer to the question., then answer, then end, much later start.

/**
* The answer to the question.
*/
answer: string;
end: number;
/**
* The probability associated to the answer.
*/
score: number;
start: number;
/**

Comment on lines +18 to +19
parameters?: { [key: string]: unknown };
[property: string]: unknown;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of this would work out of the box in sentence_transformers API, but I guess we can add later on if needed

"$id": "/inference/schemas/feature-extraction/output.json",
"$schema": "http://json-schema.org/draft-06/schema#",
"description": "The embedding for the input text, as a nested list (tensor) of floats",
"type": "array",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: it's an array in sentence transformers (one embedding per input), a list within a list in transformers (one embedding per token), and a list within a list within a list in Inference API (for batching) iirc

  1. https://github.com/huggingface/api-inference-community/blob/main/docker_images/sentence_transformers/app/pipelines/feature_extraction.py#L25
  2. https://github.com/huggingface/transformers/blob/main/src/transformers/pipelines/feature_extraction.py#L96

@@ -0,0 +1,12 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this one is not exported

/**
* Parametrization of the text generation process
*/
generate?: GenerationParameters;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also support forward params so we can pass things such as speaker_embeddings in SpeechT5 https://huggingface.co/microsoft/speecht5_tts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do it as a follow-up

packages/tasks/src/tasks/text-to-audio/inference.ts Outdated Show resolved Hide resolved
/**
* I can be the papa you'd be the mama
*/
temperature?: number;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add the others?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a bunch in 826181a - there are still a lot of other parameters to add

@@ -0,0 +1,53 @@
/**
* Inference code generated from the JSON schema spec in ./spec
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use this opportunity to unify text-generation and text2text-generation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, probably

/**
* The strategy used to fuse tokens based on model predictions
*/
aggregationStrategy?: TokenClassificationAggregationStrategy;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit strange as this is actually a load parameter, not an inference parameter - see https://huggingface.co/docs/transformers/main/en/main_classes/pipelines#transformers.TextToAudioPipeline

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah you're right - but shouldn't be supported by the call method too?

Copy link
Contributor

@osanseviero osanseviero Jan 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect yes, but maybe it changes how the model is loaded?

packages/tasks/src/tasks/video-classification/inference.ts Outdated Show resolved Hide resolved
@SBrandeis SBrandeis mentioned this pull request Jan 26, 2024
11 tasks
@SBrandeis SBrandeis merged commit b70ac6c into main Jan 26, 2024
2 checks passed
@SBrandeis SBrandeis deleted the tasks-specifications branch January 26, 2024 16:59
@Wauplin
Copy link
Contributor

Wauplin commented Jan 26, 2024

whoop whoop 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants