Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Publish as Home Assistant addon #76

Open
the-mentor opened this issue May 31, 2023 · 14 comments
Open

Publish as Home Assistant addon #76

the-mentor opened this issue May 31, 2023 · 14 comments

Comments

@the-mentor
Copy link

the title says it all having WIS as home assistant addon will lower the barrier to entry for many people.

This project is amazing keep up the amazing work !!

@kristiankielhofner
Copy link
Contributor

Thanks!

We're actually evaluating our general Home Assistant add-on/component approach. We would definitely like to get Willow/WIS more well integrated with HA, we just want to solidify some things on our side first.

@lundyfpv
Copy link

lundyfpv commented Jun 6, 2023

I'm 100% on board with this. I have my Tesla P4 just waiting to be dropped into my HA server.

@kristiankielhofner
Copy link
Contributor

The Home Assistant component has nothing to do with where and how WIS runs. Your Tesla P4 won't be utilized by Home Assistant with or without a Willow Home Assistant component.

The Willow Home Assistant component will (essentially) be a thin network proxy layer that hooks into Home Assistant to provide tighter integration with Home Assistant. It has nothing to do with inference or anything WIS does.

If you have a Tesla P4 it can be utilized by WIS today.

@lordratner
Copy link

The Home Assistant component has nothing to do with where and how WIS runs. Your Tesla P4 won't be utilized by Home Assistant with or without a Willow Home Assistant component.

The Willow Home Assistant component will (essentially) be a thin network proxy layer that hooks into Home Assistant to provide tighter integration with Home Assistant. It has nothing to do with inference or anything WIS does.

If you have a Tesla P4 it can be utilized by WIS today.

I think what he meant was "add-on" rather than "integration" or component, which in HAOS is a Home Assistant Supervisor-operated container with a service in it. As an example, you can use the Mosquitto MQTT broker addon which starts a container with Mosquitto, but monitored, updated, and accessed through Home Assistant.

This is probably impractical for WIS due to the resource requirements (and GPU option, which I don't think Home Assistant covers) based on how many people are running HAOS on a Raspberry Pi.

@lundyfpv
Copy link

lundyfpv commented Jun 6, 2023

Yep addon/container is what we mean. The question of does HA allow Gpu pass through for add-ons is something I do not know the answer to though.

@kristiankielhofner
Copy link
Contributor

We're learning that the Home Assistant community deploys Home Assistant via a staggering variety of means - HAOS, docker containers, layers of VMs/LXC, directly on metal, etc. The goal of a Willow Home Assistant component would be to allow HA to utilize a separate WIS instance (hosted wherever) for HA STT, TTS, LLM, etc support as well as enabling all of the HA interaction to be smoother and more tightly integrated - so avoiding things like a web socket connection to HA and then separate HTTP/HTTPS connections to WIS. A Willow HA component would allow you to configure many aspects of the Willow experience and provide them via a single Web Socket connection from Willow to HA, which would provide a much better and faster experience while consuming less hardware resources.

I'd have to think more about a HAOS container but in the end the issue is going to be the fact that it is primarily targeted for a Raspberry PI, which is fundamentally incapable of delivering the kind of experience we want with Willow. People look for all kinds of things from Willow but I still (personally) believe a voice assistant that takes at least several seconds to do anything (with poor quality) is fundamentally unacceptable and not something we want to target. Last I looked HAOS also has generic Docker management components that allow you to run any docker container on HAOS but then the next issue (as noted) would be supporting GPU passthrough...

If you look at our comparison benchmarks you will see that our default model (medium) takes 51 seconds to do speech recognition on 3.8 seconds of speech on a Raspberry Pi. A Tesla P4 does it in 586 ms - 87x (almost two orders of magnitude) faster.

We also continue to have the goal of Willow compatibility with other platforms. It is already in use with openHAB and the REST API endpoints. I love Home Assistant, and I use it personally, but taking any steps to inextricably link Willow and/or any Willow functionality to Home Assistant is not something we will directly implement. We also already have functionality (dynamic language detection, speaker authentication/verification, etc) that Home Assistant currently has no concept of. To achieve our goal of providing the best voice interface in the world we cannot be limited by Home Assistant.

That said, we also won't take any steps to prevent the development of anything you're describing with Home Assistant and we'd even extend Willow, WIS, etc in any ways it would need to be extended to support whatever would be required. This is an area where we would love to see community collaboration - either a HACS component or direct in-tree support with Home Assistant because we don't currently have the development resources for such an undertaking.

@lordratner
Copy link

I think direct integration with HA following the Add-on route is a complete non-starter, for the reasons discussed above.

However HA has been making huge changes with voice this year, so I don't think an add-on is the right way to look at this anyways. They are now using "pipelines" I believe, which is just a fancy way of saying you set up a voice assistant using three components: a conversation agent, STT, and TTS. You can pick from any available agent for each job.

pipeline

In this case, Piper and Whisper are add-ons running on the HAOS hardware, but that is not necessary. It's just the easy-button answer for most of their users, again, running on Raspberry Pis. Whisper has addon configuration settings for what models to use and beam size, with only one model working on a Raspberry Pi.

These Addons then use the "Wyoming Protocol" integration to tie the two together. This is where WIS would integrate into HAOS, I think. You just put the IP and Port of the service, and HA makes it available to add to an assistant pipeline.

The question then becomes, are WIS and the Wyoming Protocol compatible?

@kristiankielhofner
Copy link
Contributor

kristiankielhofner commented Jun 7, 2023

I think we're talking past each other - everything in your screenshot is fundamentally an HA integration component that talks to some external resource:

  • Nabu Casa cloud for their Azure text to speech.
  • Faster-whisper is a connection to their Whisper container, which again lives separately.
  • Same with Piper.
  • Same with the OpenAI components for supported functionality.

A WIS HA integration component would expose the capabilities of WIS to HA as these components do today. WIS for speech to text, text to speech, etc would just become another option in the drop downs of the screenshot you provided.

These various add-ons do not solely use the Wyoming Protocol, in fact they overwhelmingly do not.

WIS and Willow will not be directly implementing the Wyoming Protocol for various reasons:

  1. While it is "open" it's also essentially proprietary to the Home Assistant ecosystem. The state of a protocol being "open" comes into significant question when it's not implemented by anything else outside of the ecosystem of the creators.

  2. It's fundamentally broken. The entire concept of MQTT for voice transport is problematic for a variety of inherent reasons. The creators of Wyoming would be well served to look at other mature technology areas that do media transport - media streaming, voice/video over IP, etc. Entire industries across use cases have learned many hard lessons from prior failed approaches like Wyoming but the Home Assistant ecosystem seems dedicated to repeating all of these mistakes (and more) from decades ago, while somehow thinking their results are going to be different. There isn't a single implementation outside of the smart home ecosystem using MQTT or anything like it for media transport - and for many good reasons.

  3. It's new and unproven. Combined with my position above, I think it will be very difficult (if not impossible) for Wyoming based approaches (as well as others taken within the HA ecosystem) to deliver anything resembling the level of experience Willow (again, at two months old) does today. Let alone the extremely rapid progress we're making on things so far beyond what HA is capable of today they're not even on the radar. There are many significantly better existing protocols and approaches that have matured over decades (like the ones Willow uses).

I've noted before we don't see there being any reason Willow has to be in any kind of conflict with HA, the community, the team, or the ecosystem but I find it curious to repeatedly have these conversations when Willow at two months old (based on my decades of experience in these areas and more) already provides a quality of experience that (when it really comes down to it) is leagues beyond what the rest of the open source ecosystem is doing in this area. There are very good reasons for that, and these fundamental issues and more contribute to (or are the direct cause of) this significant gap in capability and user experience.

@lordratner
Copy link

lordratner commented Jun 7, 2023

I think we're talking past each other - everything in your screenshot is fundamentally an HA integration component that talks to some external resource:

Yeah definitely, that was my point. In the case of the screenshot Whisper and Piper are running as container addons in HAOS, but WIS would certainly have to run externally. It would just be selected in the pipeline interface I posted the screenshot of. It looks like instead of creating integrations for Piper and whisper, they are just using the Wyoming protocol integration.

No clue about Wyoming, but the info you shared is fascinating. I just noticed that HA was using it. It sounds like WIS will probably just use a custom integration. Put in the IP, port, and maybe something for authentication (no clue or input, I'm sure you have that figured). Boom. Now it's available for use.

I think part of the reason we're talking past each other is because the original poster was also fairly unclear about add-ons and what capabilities they are limited to.

I think you're going to be dealing with the mild frustration of home assistant users like myself for some time, but I also think that they represent a large and growing pool of potential users, since what you are doing with WIS is going to naturally pull in all of the home automators with self-hosting aspirations. Again, like myself.

Well I think it would be a tragedy to for WIS to be in any way limited to or hampered by home assistant, I do think getting it to work easily might be a path to a larger support base for what you're putting together.

@the-mentor
Copy link
Author

Home assistant addons are just a simple way to run a pre-defined container images for stuff like WIS they also allow to use hardware like skyconnect etc
An addon will make it very simple for users to get started and self host WIS.

Also if WIS can also use coral ai usb accelerator rather then GPU it will make it even easier.

Just some ideas whatever the project goes with works for me since I know how to run docker I just think it will be great for the ecosystem

Thanks

@kristiankielhofner
Copy link
Contributor

@lordratner - Exactly. If you look at the HA components for all of the other voice integrations they essentially say "Point at a server, provide some details, we abstract the rest". The Wyoming situation is unfortunate and I hope it improves (they made changes with V3 but I'd argue it's just as problematic). One of the issues we're having is Willow being extremely new. Rhasspy (as an example) is four years old... When I first posted the EARLY EARLY preview announcement of Willow to Hacker News it spread very far and wide to audiences I didn't expect and we weren't prepared to deal with. As anyone can tell just from looking at the "install" instructions Willow and WIS are not intended for casual HA users and we're reminded of that on a daily basis. That said we're happy to work with the unintended/expected broader user base and we've re-prioritized development priorities to better serve these users - much earlier than expected. My thinking and expectation with the initial release was for a couple of dozen developers to work through the early pains of Willow. That clearly hasn't been the case (for good, bad, and ugly).

@the-mentor Again, on point. In the end all of this stuff (WIS, Piper, etc) are just docker containers. The issue of running WIS on HAOS (as bad of an idea as that may be) really just comes down to documentation. Support for Coral is another recurring question. In short, it's fundamentally impossible (tiny memory, TFLite only) to do broad speech recognition or speech generation on a Coral. The on ESP BOX pre-defined command detection already supports at least 2x (~150 vs 400) as many pre-defined commands vs Coral.

@tensiondriven
Copy link

+1 for any Home Assistant integration being a lightweight wrapper/API layer available via HACS. Eventually someone may roll willow into a HA addon, but it doesn't make sense to me to try to support it as an addon.

To enable someone from the community to do this, perhaps providing a good REST API with webhooks and decent documentation is all it would take. (I'd use such a REST API + Webhook for my own project, actually.)

@ccgauvin94
Copy link

ccgauvin94 commented Jul 25, 2023

Seeing what HA is doing now with an on-device Assistant option (to replace Google Assistant on Android, for instance), I can only dream of what it'd be like to have that routing to a Willow Inference Server running Vicuna (or perhaps LLama2).

There doesn't need to be a great HA addon, just a way to route HA's TTS and STT through WIS would probably be enough, at least for most use cases. Essentially, I think, that means just making them available as providers. Not sure what that entails, but I am sure it's probably more complicated than "just". But the advantage is once it's in HA, then we can let HA worry about the integrations with the massive library of tech integrations they have. Imagine "Hi Willow, can you read me the IEEE standard for whatever thing that's in my Documents/Standards folder on my Nextcloud?" and it just reads it back?

This is an incredibly impressive project. My mind is blown. I haven't felt like this about computers and technology in such a long time, I feel like a kid again, imagining all the possibilities of what this could do. Congratulations on the unbelievable release. Seeing what I'm saying pop up on my own server with ~200 ms latency (3060 12gb) (literally indiscernible. I have both logs open and I can't move my eyes between windows fast enough to see the text roll in on the screen).

Tech is such a slog for me lately, but this is so refreshing I honestly am blown away.

@tensiondriven
Copy link

HA just added the ability for "services" to return values in the latest release. I can only imagine this was done to enable language models to be called with a given set of args and to return something - either a text string, json, function call, etc.

Exciting times.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants