You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Our first goal is to enable realtime virtual speech to speech translation over the web. This would allow a webinar presenter to have their voice translated to another language in realtime to the audience. Additionally the reverse would be true for the audience.
The assumptions below must be proven in order for this project to be successful. Proving these will allow us to design the appropriate architecture to allow online events to occur without language barriers.
S2S inference lag is not disruptive to a conversation.
If the lag is too long it will stall the conversation and it will not be valuable for participants.
an S2S model can be utilized via a web conferencing system
We must have a web conferencing system with plugin support that allows the audio stream to be overridden so we can inject specific language audio streams.
Realtime S2S translation is not cost prohibitive.
If the inference cost is too high then it won't make sense to implement this system. We need to have a sense of what the inference cost is per language for a 1 hour webinar.
Seamless m4t is the best model to utilize for this use case.
It is possible that other S2S models would work as well or better than Seamless. We need to understand the tradeoffs between different models.
The text was updated successfully, but these errors were encountered:
Our first goal is to enable realtime virtual speech to speech translation over the web. This would allow a webinar presenter to have their voice translated to another language in realtime to the audience. Additionally the reverse would be true for the audience.
The assumptions below must be proven in order for this project to be successful. Proving these will allow us to design the appropriate architecture to allow online events to occur without language barriers.
S2S inference lag is not disruptive to a conversation.
If the lag is too long it will stall the conversation and it will not be valuable for participants.
an S2S model can be utilized via a web conferencing system
We must have a web conferencing system with plugin support that allows the audio stream to be overridden so we can inject specific language audio streams.
Realtime S2S translation is not cost prohibitive.
If the inference cost is too high then it won't make sense to implement this system. We need to have a sense of what the inference cost is per language for a 1 hour webinar.
Seamless m4t is the best model to utilize for this use case.
It is possible that other S2S models would work as well or better than Seamless. We need to understand the tradeoffs between different models.
The text was updated successfully, but these errors were encountered: