Have a look at Home Assistant! It’s a great open source smart home platform that recently released a local (so not processing requests in the cloud) voice assistant. It’s pretty neat!
home assistant is amazing but it is not yet an alternative to Alexa, the assistant/voice is still in development and far from being usable. it’s impossible for me to remember the specific wording assist demands and voice to text is incorrect like nine out of ten times. And this includes giving up on terrible locally hosted models trying out their cloud which obviously is a huge privacy hole, but even then it was slow and inaccurate. It’s a mystery to me how the foss community is so behind on voice, Siri and Google Assistant started working offline years ago, and they work straight on a mobile device.
I have one big frustration with that: Your voice input has to be understood PERFECTLY by TTS.
If you have a “To Do” list, and speak “Add cooking to my To Do list”, it will do it! But if the TTS system understood:
Todo
To-do
to do
ToDo
To-Do
…
The system will say it couldn’t find that list. Same for the names of your lights, asking for the time,… and you have very little control over this.
HA Voice Assistant either needs to find a PERFECT match, or you need to be running a full-blown LLM as the backend, which honestly works even worse in many ways.
They recently added the option to use LLM as fallback only, but for most people’s hardware, that means that a big chunk of requests take a suuuuuuuper long time to get a response.
I do not understand why there’s no option to just use the most similar command upon an imperfect matching, through something like the Levenshtein Distance.
Want to setup a more privacy friendly solution?
Have a look at Home Assistant! It’s a great open source smart home platform that recently released a local (so not processing requests in the cloud) voice assistant. It’s pretty neat!
home assistant is amazing but it is not yet an alternative to Alexa, the assistant/voice is still in development and far from being usable. it’s impossible for me to remember the specific wording assist demands and voice to text is incorrect like nine out of ten times. And this includes giving up on terrible locally hosted models trying out their cloud which obviously is a huge privacy hole, but even then it was slow and inaccurate. It’s a mystery to me how the foss community is so behind on voice, Siri and Google Assistant started working offline years ago, and they work straight on a mobile device.
I have one big frustration with that: Your voice input has to be understood PERFECTLY by TTS.
If you have a “To Do” list, and speak “Add cooking to my To Do list”, it will do it! But if the TTS system understood:
The system will say it couldn’t find that list. Same for the names of your lights, asking for the time,… and you have very little control over this.
HA Voice Assistant either needs to find a PERFECT match, or you need to be running a full-blown LLM as the backend, which honestly works even worse in many ways.
They recently added the option to use LLM as fallback only, but for most people’s hardware, that means that a big chunk of requests take a suuuuuuuper long time to get a response.
I do not understand why there’s no option to just use the most similar command upon an imperfect matching, through something like the Levenshtein Distance.
Because it takes time to implement. It will come.
I’ve seen something about this pop up occasionally on my feed, but it’s usually a conversation I’m nowhere close to understanding lol
Could you recommend any resources for a complete noob?