Skip to content

Commit

Permalink
Add wav2vec2 support - export and audio tasks modeling (#645)
Browse files Browse the repository at this point in the history
* wav2vec2 base support

* fix outputs for audio-xvector

* add CTC modeling

* some tests and modeling

* add xvector

* fix doc

* fix doc

* try fix tests

* disable auto triggered CIs for inf1
  • Loading branch information
JingyaHuang authored Jul 11, 2024
1 parent 17fe854 commit 56cb8a5
Show file tree
Hide file tree
Showing 16 changed files with 827 additions and 40 deletions.
11 changes: 1 addition & 10 deletions .github/workflows/test_inf1_export.yml
Original file line number Diff line number Diff line change
@@ -1,16 +1,7 @@
name: Optimum neuron / Test INF1 partial export

on:
push:
branches: [ main ]
paths:
- "setup.py"
- "optimum/**.py"
pull_request:
branches: [ main ]
paths:
- "setup.py"
- "optimum/**.py"
workflow_dispatch

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
Expand Down
9 changes: 1 addition & 8 deletions .github/workflows/test_inf1_full_export.yml
Original file line number Diff line number Diff line change
@@ -1,14 +1,7 @@
name: Optimum neuron / Test INF1 full export

on:
push:
branches: [ main ]
paths:
- "optimum/exporters/neuron/*.py"
pull_request:
branches: [ main ]
paths:
- "optimum/exporters/neuron/*.py"
workflow_dispatch

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
Expand Down
11 changes: 1 addition & 10 deletions .github/workflows/test_inf1_inference.yml
Original file line number Diff line number Diff line change
@@ -1,16 +1,7 @@
name: Optimum neuron / Test INF1 inference

on:
push:
branches: [ main ]
paths:
- "setup.py"
- "optimum/**.py"
pull_request:
branches: [ main ]
paths:
- "setup.py"
- "optimum/**.py"
workflow_dispatch

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
Expand Down
9 changes: 1 addition & 8 deletions .github/workflows/test_inf1_pipelines.yml
Original file line number Diff line number Diff line change
@@ -1,14 +1,7 @@
name: Optimum neuron / Test INF1 pipelines

on:
push:
branches: [ main ]
paths:
- "optimum/neuron/pipelines/**.py"
pull_request:
branches: [ main ]
paths:
- "optimum/neuron/pipelines/**.py"
workflow_dispatch

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
Expand Down
16 changes: 16 additions & 0 deletions docs/source/package_reference/modeling.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,22 @@ The following Neuron model classes are available for computer vision tasks.
### NeuronModelForObjectDetection
[[autodoc]] modeling.NeuronModelForObjectDetection

## Audio

The following auto classes are available for the following audio tasks.

### NeuronModelForAudioClassification
[[autodoc]] modeling.NeuronModelForAudioClassification

### NeuronModelForAudioFrameClassification
[[autodoc]] modeling.NeuronModelForAudioFrameClassification

### NeuronModelForCTC
[[autodoc]] modeling.NeuronModelForCTC

### NeuronModelForXVector
[[autodoc]] modeling.NeuronModelForXVector

## Stable Diffusion

The following Neuron model classes are available for stable diffusion tasks.
Expand Down
1 change: 1 addition & 0 deletions docs/source/package_reference/supported_models.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ limitations under the License.
| RoFormer | feature-extraction, fill-mask, multiple-choice, question-answering, text-classification, token-classification |
| Swin | feature-extraction, image-classification |
| T5 | text2text-generation |
| Wav2Vec2 | feature-extraction, automatic-speech-recognition, audio-classification, audio-frame-classification, audio-xvector |
| XLM | feature-extraction, fill-mask, multiple-choice, question-answering, text-classification, token-classification |
| ViT | feature-extraction, image-classification |
| XLM-RoBERTa | feature-extraction, fill-mask, multiple-choice, question-answering, text-classification, token-classification |
Expand Down
5 changes: 5 additions & 0 deletions optimum/commands/export/neuronx.py
Original file line number Diff line number Diff line change
Expand Up @@ -249,6 +249,11 @@ def parse_args_neuronx(parser: "ArgumentParser"):
default=1,
help=f"Stable diffusion only. Number of images per prompt {doc_input}",
)
input_group.add_argument(
"--audio_sequence_length",
type=int,
help=f"Audio tasks only. Audio sequence length {doc_input}",
)

level_group = parser.add_mutually_exclusive_group()
level_group.add_argument(
Expand Down
10 changes: 10 additions & 0 deletions optimum/exporters/neuron/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
from typing import List

from ...utils import (
DummyAudioInputGenerator,
DummyBboxInputGenerator,
DummyInputGenerator,
DummySeq2SeqDecoderTextInputGenerator,
Expand Down Expand Up @@ -59,6 +60,15 @@ class TextAndVisionNeuronConfig(NeuronDefaultConfig):
DUMMY_INPUT_GENERATOR_CLASSES = (DummyTextInputGenerator, DummyVisionInputGenerator, DummyBboxInputGenerator)


class AudioNeuronConfig(NeuronDefaultConfig):
"""
Handles audio architectures.
"""

DUMMY_INPUT_GENERATOR_CLASSES = (DummyAudioInputGenerator, DummyTextInputGenerator)
INPUT_ARGS = ("batch_size", "audio_sequence_length")


class TextNeuronDecoderConfig(NeuronDecoderConfig):
"""
Handles text decoder architectures.
Expand Down
26 changes: 26 additions & 0 deletions optimum/exporters/neuron/model_configs.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@
)
from ..tasks import TasksManager
from .config import (
AudioNeuronConfig,
TextAndVisionNeuronConfig,
TextEncoderNeuronConfig,
TextNeuronDecoderConfig,
Expand Down Expand Up @@ -402,6 +403,31 @@ def outputs(self) -> List[str]:
return common_outputs


@register_in_tasks_manager(
"wav2vec2",
*[
"feature-extraction",
"automatic-speech-recognition",
"audio-classification",
"audio-frame-classification",
"audio-xvector",
],
)
class Wav2Vec2NeuronConfig(AudioNeuronConfig):
NORMALIZED_CONFIG_CLASS = NormalizedConfig

@property
def inputs(self) -> List[str]:
return ["input_values"]

@property
def outputs(self) -> List[str]:
common_outputs = super().outputs
if self.task == "audio-xvector":
common_outputs.append("embeddings")
return common_outputs


@register_in_tasks_manager("unet", *["semantic-segmentation"], library_name="diffusers")
class UNetNeuronConfig(VisionNeuronConfig):
ATOL_FOR_VALIDATION = 1e-3
Expand Down
8 changes: 8 additions & 0 deletions optimum/neuron/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,10 @@
"NeuronModelForImageClassification",
"NeuronModelForSemanticSegmentation",
"NeuronModelForObjectDetection",
"NeuronModelForCTC",
"NeuronModelForAudioClassification",
"NeuronModelForAudioFrameClassification",
"NeuronModelForXVector",
],
"modeling_diffusion": [
"NeuronStableDiffusionPipelineBase",
Expand Down Expand Up @@ -71,7 +75,10 @@
from .accelerate import ModelParallelismPlugin, NeuronAccelerator, NeuronAcceleratorState, NeuronPartialState
from .hf_argparser import NeuronHfArgumentParser
from .modeling import (
NeuronModelForAudioClassification,
NeuronModelForAudioFrameClassification,
NeuronModelForCausalLM,
NeuronModelForCTC,
NeuronModelForFeatureExtraction,
NeuronModelForImageClassification,
NeuronModelForMaskedLM,
Expand All @@ -82,6 +89,7 @@
NeuronModelForSentenceTransformers,
NeuronModelForSequenceClassification,
NeuronModelForTokenClassification,
NeuronModelForXVector,
)
from .modeling_decoder import NeuronDecoderModel
from .modeling_diffusion import (
Expand Down
Loading

0 comments on commit 56cb8a5

Please sign in to comment.