Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full-integer quantization tflite models #802

Open
aidansmyth95 opened this issue Jun 3, 2024 · 0 comments
Open

Full-integer quantization tflite models #802

aidansmyth95 opened this issue Jun 3, 2024 · 0 comments

Comments

@aidansmyth95
Copy link

aidansmyth95 commented Jun 3, 2024

Has anyone converted the FastSpeech2 or Tacotron models to full-integer quantized tflite models?

My representative dataset generator for FastSpeech2 is returning a floating point exception during conversion, any ideas about what I might be doing wrong? This seems close to enabling the int8x8 tflite model. I need to run on an ARM Ethos-U55 NPU processor, where floating-point support is limited. I don't care so much about quantization error for now, rather profiling it on the U55 once I have a tflite model. We can use tricks like QAT if we need to reduce the quantization error later.

def representative_dataset():
    # Provide a set of input samples that are representative of the data
    # the model will be dealing with during inference
    for _ in range(1):  # Adjust the number of samples as needed
        input_ids = tf.convert_to_tensor(np.random.randint(0, 1, size=(1, 50), dtype=np.int32), dtype=tf.int32)  # Example input shape
        speaker_ids = tf.convert_to_tensor(np.array([1], dtype=np.int32), dtype=tf.int32)
        speed_ratios = tf.convert_to_tensor(np.array([1.0], dtype=np.float32), dtype=tf.float32)
        f0_ratios = tf.convert_to_tensor(np.array([1.0], dtype=np.float32), dtype=tf.float32)
        energy_ratios = tf.convert_to_tensor(np.array([1.0], dtype=np.float32), dtype=tf.float32)
        print(input_ids.shape)
        yield ([input_ids, speaker_ids, speed_ratios, f0_ratios, energy_ratios])
   converter.optimizations = [tf.lite.Optimize.DEFAULT]
    converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
    converter.representative_dataset = representative_dataset
    converter.inference_input_type = tf.int8
    converter.inference_output_type = tf.int8
    converter.target_spec.supported_types = [tf.int8]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant