Resolve key assumptions #1

aaroneden · 2024-02-18T02:51:23Z

Our first goal is to enable realtime virtual speech to speech translation over the web. This would allow a webinar presenter to have their voice translated to another language in realtime to the audience. Additionally the reverse would be true for the audience.

The assumptions below must be proven in order for this project to be successful. Proving these will allow us to design the appropriate architecture to allow online events to occur without language barriers.

S2S inference lag is not disruptive to a conversation.
If the lag is too long it will stall the conversation and it will not be valuable for participants.
an S2S model can be utilized via a web conferencing system
We must have a web conferencing system with plugin support that allows the audio stream to be overridden so we can inject specific language audio streams.
Realtime S2S translation is not cost prohibitive.
If the inference cost is too high then it won't make sense to implement this system. We need to have a sense of what the inference cost is per language for a 1 hour webinar.
Seamless m4t is the best model to utilize for this use case.
It is possible that other S2S models would work as well or better than Seamless. We need to understand the tradeoffs between different models.

aaroneden added good first issue Good for newcomers question Further information is requested labels Feb 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resolve key assumptions #1

Resolve key assumptions #1

aaroneden commented Feb 18, 2024 •

edited

Loading

Resolve key assumptions #1

Resolve key assumptions #1

Comments

aaroneden commented Feb 18, 2024 • edited Loading

aaroneden commented Feb 18, 2024 •

edited

Loading