Offline Speech Recognition #2089 #2242

VladislavAntonyuk · 2024-10-01T11:14:19Z

Description of Change

Added 2 new methods for Offline Speech Recognition, Removed ListenAsync method, as it is impossible (with current implementation) to correctly Stop Listening the recognition. I also added a new State, allowing us to see if Listening is Active, but Silience.

Linked Issues

Fixes Offline SpeechToText #2089 , [BUG] SpeechToText throws Objective-C exception on iOS #1779 , [BUG] SpeechToText on iOS 17 cuts off or fails to recognize many words #1966

PR Checklist

Has a linked Issue, and the Issue has been approved(bug) or Championed (feature/proposal)
Has tests (if omitted, state reason in description)
Has samples (if omitted, state reason in description)
Rebased on top of main at time of PR
Changes adhere to coding standard
Documentation created or updated: Update SpeechToText docs, Add OfflineSpeechToText MicrosoftDocs/CommunityToolkit#489

Additional information

src/CommunityToolkit.Maui.Core/Essentials/SpeechToText/ISpeechToText.shared.cs

bijington

@VladislavAntonyuk thanks for this. I think it looks great! I only had a few comments to get your thoughts on some things.

Finally, this might be outside of the scope of this PR but I wanted to raise it because I think it should be in a follow-up PR; I would love the ability to chain the 2 implementations together when registering with DI, something like:

builder.Services.AddSpeechToText(SpeechToText.Default).WithFallback(OfflineSpeechToText.Default);

And in theory developers could chain the other way round:

builder.Services.AddSpeechToText(OfflineSpeechToText.Default).WithFallback(SpeechToText.Default);

What do you think to the above? We might have to wrap this in another class rather than complicating the flow of the 2 current implementations so it probably is best in a follow-up PR.

src/CommunityToolkit.Maui.Core/Essentials/SpeechToText/ISpeechToText.shared.cs

src/CommunityToolkit.Maui.Core/Essentials/SpeechToText/OfflineSpeechToText.shared.cs

...mmunityToolkit.Maui.Core/Essentials/SpeechToText/OfflineSpeechToTextImplementation.shared.cs

VladislavAntonyuk · 2024-10-14T14:01:09Z

Thank you @bijington.
if user registers SpeechToText like this builder.Services.AddSpeechToText(OfflineSpeechToText.Default).WithFallback(SpeechToText.Default);
he still needs to somehow resolve the implementation.

Also, we need to provide the lifetime of the service. And what is under the hood of AddSpeechToText? We'll still have the same AddSingleton(OfflineSpeechToText.Default) call in that method.

bijington · 2024-10-15T18:14:41Z

Thank you @bijington. if user registers SpeechToText like this builder.Services.AddSpeechToText(OfflineSpeechToText.Default).WithFallback(SpeechToText.Default); he still needs to somehow resolve the implementation.

Also, we need to provide the lifetime of the service. And what is under the hood of AddSpeechToText? We'll still have the same AddSingleton(OfflineSpeechToText.Default) call in that method.

Yes I agree the developer will need to define the lifetime of the service which increases the complexity.

Perhaps we could move the WithFallback method onto the ISpeechToText interface instead, then the developer could write something like:

builder.Services.AddSingleton(OfflineSpeechToText.Default.WithFallback(SpeechToText.Default));

Then WithFallback could look something like:

public static void ISpeechToTextExtensions
{
    public ISpeechToText WithFallback(this ISpeechToText primary, ISpeechToText secondary)
    {
        return new PriorityBasedSpeechToText(primary, secondary);
    }
}

internal class PriorityBasedSpeechToText : ISpeechToText
{
    readonly ISpeechToText primaryService;
    readonly ISpeechToText secondaryService;

    public PriorityBasedSpeechToText(this ISpeechToText primary, ISpeechToText secondary)
    {
        primaryService = primary;
        secondaryService = secondary;
    }

    public Task<SpeechToTextRecognitionResult> StartListenAsync()
    {
        // attempt primary, if fails fallback to secondary...
    }
}

It could well become more complicated with things like permissions, so we may have to request all permissions for both primary and secondary.

What do you think?

VladislavAntonyuk · 2024-10-15T19:17:30Z

I see pros and cons of such approach.
As for developers it is much easier registering the service, but the strategy of choosing the right implementation maybe complicated (Users preferences may forbid online recognition, unstable internet connection, etc).
Also as WithFallback still receives interface as a parameter, developers may write such code:
services.AddSpeechToText(SpeechToText.Default).WithFallback(SpeechToText.Default);

We could technically hide the online/offline recognition in the implementation and keep single service. We had such implementation for Windows in our initial release.

The main idea of Offline Speech To Text to allow developers explicitly specify the required implementation.
Also I don't want they inject 2 separate interfaces in the service (MyService(ISpeechToText s1, IOfflineSpeechToText s2)), because there simultaneous usage may be unpredictable.

We can open a discussion for the next month.

…kit.Maui.Sample.csproj

bijington

@VladislavAntonyuk thanks for this! I have a few comments and I think they are mostly pretty minor

samples/CommunityToolkit.Maui.Sample/ViewModels/Essentials/OfflineSpeechToTextViewModel.cs

src/CommunityToolkit.Maui.Core/Essentials/SpeechToText/ISpeechToText.shared.cs

…ecognition

…CommunityToolkit/Maui into 2089-offline-speech-recognition # Conflicts: # src/CommunityToolkit.Maui.Core/Essentials/SpeechToText/OfflineSpeechToTextImplementation.tizen.cs

bijington

Thanks @VladislavAntonyuk I have added some xml docs improvements but the rest looks good to me

src/CommunityToolkit.Maui.Core/Essentials/SpeechToText/ISpeechToText.shared.cs

samples/CommunityToolkit.Maui.Sample/ViewModels/Essentials/OfflineSpeechToTextViewModel.cs

Co-authored-by: Shaun Lawrence <[email protected]>

…lineSpeechToTextViewModel.cs

bijington

LGTM! Do we just need to write the docs? Is this something you are happy to do? If not I'm happy to write something

VladislavAntonyuk · 2024-10-31T21:11:23Z

LGTM! Do we just need to write the docs? Is this something you are happy to do? If not I'm happy to write something

Thank you Shaun! I will create docs PR this weekend.

pictos

The sample app is crashing for me. Here's the log that I could get.

I ran on iphone 14 pro. And allowed all permissions.

VladislavAntonyuk · 2024-11-01T05:01:38Z

The sample app is crashing for me. Here's the log that I could get.

I ran on iphone 14 pro. And allowed all permissions.

It’s text to speech. I will remove it from the offline sample

pictos · 2024-11-04T02:55:20Z

will try again during this week, asap

Offline Speech Recognition #2089

2d8bb93

VladislavAntonyuk self-assigned this Oct 1, 2024

Merge branch 'main' into 2089-offline-speech-recognition

67f44a3

VladislavAntonyuk added breaking change This label is used for PRs that include a breaking change area/essentials Issue/Discussion/PR that has to do with Essentials labels Oct 1, 2024

VladislavAntonyuk requested a review from brminnick October 1, 2024 11:16

bijington reviewed Oct 1, 2024

View reviewed changes

src/CommunityToolkit.Maui.Core/Essentials/SpeechToText/ISpeechToText.shared.cs Outdated Show resolved Hide resolved

VladislavAntonyuk added the needs discussion Discuss it on the next Monthly standup label Oct 1, 2024

brminnick added the hacktoberfest-accepted A PR that has been approved during Hacktoberfest label Oct 3, 2024

Offline Speech Recognition #2089 (#2258)

9b7e48d

VladislavAntonyuk requested a review from bijington October 5, 2024 14:35

VladislavAntonyuk removed the needs discussion Discuss it on the next Monthly standup label Oct 5, 2024

VladislavAntonyuk and others added 2 commits October 7, 2024 11:42

Fix build

07c4ac8

Merge branch 'main' into 2089-offline-speech-recognition

f3c1497

bijington reviewed Oct 14, 2024

View reviewed changes

VladislavAntonyuk and others added 2 commits October 14, 2024 16:53

Update according to comments

6c52500

Merge branch 'main' into 2089-offline-speech-recognition

27bf6ae

Fix tizen

e8a28b8

VladislavAntonyuk added 3 commits October 19, 2024 14:17

Merge branch 'main' into 2089-offline-speech-recognition

02d322a

Discard changes to samples/CommunityToolkit.Maui.Sample/CommunityTool…

14facc0

…kit.Maui.Sample.csproj

Discard changes to global.json

4e8b436

VladislavAntonyuk requested a review from bijington October 19, 2024 11:19

Merge branch 'main' into 2089-offline-speech-recognition

285e477

bijington requested changes Oct 24, 2024

View reviewed changes

VladislavAntonyuk added 3 commits October 25, 2024 11:56

Remove Task

eddfc71

Merge remote-tracking branch 'origin/main' into 2089-offline-speech-r…

79345ae

…ecognition

Merge branch '2089-offline-speech-recognition' of https://github.com/…

3f8b96f

…CommunityToolkit/Maui into 2089-offline-speech-recognition # Conflicts: # src/CommunityToolkit.Maui.Core/Essentials/SpeechToText/OfflineSpeechToTextImplementation.tizen.cs

Fix tizen

67894fc

VladislavAntonyuk force-pushed the 2089-offline-speech-recognition branch from 3892a6e to 67894fc Compare October 25, 2024 09:51

VladislavAntonyuk requested a review from bijington October 25, 2024 12:37

bijington reviewed Oct 26, 2024

View reviewed changes

VladislavAntonyuk and others added 3 commits October 27, 2024 11:42

Update ISpeechToText.shared.cs

b69b054

Co-authored-by: Shaun Lawrence <[email protected]>

Update ISpeechToText.shared.cs

e995e8a

Co-authored-by: Shaun Lawrence <[email protected]>

Update samples/CommunityToolkit.Maui.Sample/ViewModels/Essentials/Off…

833c9c7

…lineSpeechToTextViewModel.cs

VladislavAntonyuk requested a review from bijington October 27, 2024 23:28

Fix xml comment

fd3e1fb

bijington previously approved these changes Oct 31, 2024

View reviewed changes

pictos requested changes Oct 31, 2024

View reviewed changes

VladislavAntonyuk dismissed bijington’s stale review via 07ba0b4 November 2, 2024 11:23

Update sample

7ddd7fe

VladislavAntonyuk force-pushed the 2089-offline-speech-recognition branch from 07ba0b4 to 7ddd7fe Compare November 2, 2024 11:25

VladislavAntonyuk requested a review from pictos November 3, 2024 17:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Offline Speech Recognition #2089 #2242

Offline Speech Recognition #2089 #2242

VladislavAntonyuk commented Oct 1, 2024 •

edited

Loading

bijington left a comment

VladislavAntonyuk commented Oct 14, 2024

bijington commented Oct 15, 2024 •

edited

Loading

VladislavAntonyuk commented Oct 15, 2024

bijington left a comment

bijington left a comment

bijington left a comment

VladislavAntonyuk commented Oct 31, 2024

pictos left a comment •

edited

Loading

VladislavAntonyuk commented Nov 1, 2024

pictos commented Nov 4, 2024

Offline Speech Recognition #2089 #2242

Are you sure you want to change the base?

Offline Speech Recognition #2089 #2242

Conversation

VladislavAntonyuk commented Oct 1, 2024 • edited Loading

Description of Change

Linked Issues

PR Checklist

Additional information

bijington left a comment

Choose a reason for hiding this comment

VladislavAntonyuk commented Oct 14, 2024

bijington commented Oct 15, 2024 • edited Loading

VladislavAntonyuk commented Oct 15, 2024

bijington left a comment

Choose a reason for hiding this comment

bijington left a comment

Choose a reason for hiding this comment

bijington left a comment

Choose a reason for hiding this comment

VladislavAntonyuk commented Oct 31, 2024

pictos left a comment • edited Loading

Choose a reason for hiding this comment

VladislavAntonyuk commented Nov 1, 2024

pictos commented Nov 4, 2024

VladislavAntonyuk commented Oct 1, 2024 •

edited

Loading

bijington commented Oct 15, 2024 •

edited

Loading

pictos left a comment •

edited

Loading