Simple Python Voice Assist For Linux in Python using PVRhino

This voice assist class should just work and is really simple to understand and setup. It runs locally and is mostly free. I have had no impact to my system resources running this, but your mileage may vary. It uses PVRhino from the PicoVoice folks. You can setup an account, get your key, then quickly train and download your model. Once you have all that plugged into the code, your computer will understand the voice commands you trained.

Note: It is very likely that this code will run on Windows and OS X.

Speech

The other piece is the computer talking to you. I am using Piper so you will need to download the voice models, which are there if you dig around, and plug that into the code as well.

Putting it together

Once you have your model trained and the code pointing to the Rhino and Piper models, all that is left to do is to configure your speech and execute configs, which are simple Python dictionaries. Basically intent is the key, the spoken text is the value. The other config is intent as key and execute command as value. That is not built out in this code. I leave that to you to decide what you want to do.

Virtual Environment

Before you run this you will need a Python virtual environment with the libraries installed. Here are the steps after installing UV Astral. This is also documented in the code. Replace Lilith with whatever you want or ignore the wake word entirely.

* In the terminal create a project: uv init Lilith
cd into Lilith
* Create virtual: uv venv
* Activate (Fish shell): source .venv/bin/activate.fish
   ** leave .fish off for Bash.
* Install libs:
uv add pyaudio
uv add pvrhino
uv add piper-tts
uv add halo

Copy this code to main.py in source. While still in the active virtual environment run:
python main.py

Disclosure

I am not including the models for two reasons, I don’t know what you want to do and I am not sure PicoVoice wants people handing out their models. They may not care, but I did not research so better safe than sorry. The Piper models are easily available at the link above. You may not want the voice I chose anyway.

I am not licensing this open or otherwise. I give this as an example with no warranty expressed or implied and without any guarantees to freely do with what you will. It is a starting point.

Future

In the future I may add speech to text, noise cancelling, a nice fancy interface to sit in the dock. At this point I am sharing to help people get started that may be interested in voice assist who are unsure of how to get started. This solution worked best for me and I wrote this for fun. I honestly doubt I will use it in any serious way, if at all. I suppose it is cool, though.

Alternative

I did try OpenWakeWord so there is at least that alternative and there is a project called EasySpeak-Linux if you wish to explore his solution.

Code

#!/usr/bin/env python
import pvrhino
import pyaudio
from halo import Halo
from piper import PiperVoice, SynthesisConfig

import struct
import os

"""
Lilith Voice Assist for Linux.
Mohawke <mohawke@darkartistry.com> March 5, 2026

The Rhino model can be trained at PicoVoice website.
This code requires an account plus a key from them.
The key can be used for pvporcupine, rhino, etc...

I do not provide the models.

I created an instance as the wake word, In this case I 
used Lilith.

This code based on code provided by a the following   
sources. This code is not licensed and can be freely
used as a base for a bigger program or to learn. I 
do not care...  

ref: EasySpeak-Linux (openwakeword project.)
ref: PVPorcupine/PVRhino (The exmaples there.)

Change model paths below in this code.

I build my virtual environments using uv Astral.
https://docs.astral.sh/uv/

** Create a project: uv init Lilith
cd into Lilith
** Create virtual: uv venv
** Activate (Fish shell): source .venv/bin/activate.fish
 leave .fish off for Bash.
** Install libs:
uv add pyaudio
uv add pvrhino
uv add piper-tts
uv add halo

ToDo ::
Add Synthesis Conf.
   Might change the voice from Amy as well.
Add Text to Speech & possibly speech to text.
Add background noise damping.
Move command and voice configs to external files.
"""

# Be sure tpo replace these paths and filenames.
PIPER_MODEL_PATH = os.path.expanduser("~/Documents/Python/Lilith/voices/en_US-amy-medium.onnx")
RHINO_MODEL_PATH = RHINO_MODEL = os.path.expanduser("~/Documents/Python/Lilith/models/intent/YOUR TRAINED MODEL.rhn")
RHINO_KEY = "YOUR PICOVOICE KEY" 

class Lilith ():
    def __init__(self):
        self.PIPER_MODEL  = PIPER_MODEL_PATH
        self.RHINO_MODEL  = RHINO_MODEL_PATH
        self.RHINO_KEY    = RHINO_KEY
        self.Piper        = None
        self.Rhino        = None
        self.SAMPLE_BIT_DEPTH = pyaudio.paInt16  # 16 bits per sample
        self.NUM_CHANNELS = 1 # mono
        #self.CHUNK_SIZE   = 4096
        self.Input_Audio  = None
        self.Output_Audio = None
        self.pyaudio_output_instance = None
        self.pyaudio_input_instance = None
        # The following two vars will point to either external config or dict. import.
        #  Example for how this is intended to work. Create your intent model 
        #  and align it with these speech and execute configs.
        self.response_list = {
            "lilth": "Yes, Mohawke?",
            "terminal": "Opening terminal.",
            "open": "Open what, dumb ass?" # Can use slots here to include app names.
        }
        self.exec_list = {
            "terminal": "kitty", # Might want to set $TERMINAL # os.system($TERMINAL)
            "mail": "thunderbird",
            "music": "kew",
            "casts": "castero",
            "news": "bulletty"
        }

    def initPiper(self):
        '''
        Initialize the Piper voice class with local model.
        Download the voice you like from Piper.
        '''
        self.Piper = PiperVoice.load(self.PIPER_MODEL)

    def initRhino(self):
        '''
        Initialize rhino class with local model.
        You can use pvporcupine generic models or
        train your own. If you change wake word or 
        commands you will need to train the new model,
        download it, and change this code to use and 
        understand you intents.
        '''
        self.Rhino = pvrhino.create(
            access_key=self.RHINO_KEY,
            context_path=self.RHINO_MODEL
        )

    def initOutput_Audio(self):
        '''
        Init and open audio for output. Only open during responses.
        '''
        self.pyaudio_output_instance = pyaudio.PyAudio()
        self.Output_Audio = self.pyaudio_output_instance.open(
                                format=self.SAMPLE_BIT_DEPTH,
                                channels=self.NUM_CHANNELS,
                                rate=self.Piper.config.sample_rate,
                                output=True
                            )

    def initInput_Audio(self):
        '''
        Init and open audio to receive commands. Kept alive while running.
        '''

        # Set up microphone stream
        self.pyaudio_input_instance = pyaudio.PyAudio()
        self.Input_Audio = self.pyaudio_input_instance.open(
            rate=self.Rhino.sample_rate,
            channels=1,
            format=pyaudio.paInt16,
            input=True,
            frames_per_buffer=self.Rhino.frame_length
        )

    def command_text(self, command=""):
        '''
        Get text to speak from config.
        Extend as needed.
        '''
        speak_text = None
        if command.lower() in self.response_list:
            speak_text = self.response_list[command.lower()]
        else:
            speak_text = "Sorry, I do not understand. Did you teach me that command?"

    def exec_intent(self, command):
        '''
        Interact with the desktop. Execute commands listed in config.
        This is an example, and for me only opens the terminal.
        '''
        if command.lower() in self.exec_intent: 
            os.system(self.exec_intent[command.lower().strip()])

    def speak(self, command):
        '''
        Speak responses to your intents if you wish, or simply
        execute commands.
        '''
        synthesize_args = {
            "sentence_silence": 0.0,
        }

        # Get relevent response text from Rhino context.
        response_text = self.command_text(command)

        # Init and open audio output.
        self.initOutput_Audio()

        # Output verbal response.
        for chunk in self.Piper.synthesize(response_text): # , **synthessize_args) ## fails
            self.Output_Audio.write(chunk.audio_int16_bytes)

        # Close output resource.
        self.Output_Audio.close()
    
    def listen(self):
        '''
        Open mic and listen for commands. 
        '''

        # Init MODEL wrappers.
        self.initRhino()
        self.initPiper()

        # Init microphone for voice commands.
        self.initInput_Audio()

        # Keep listening until program exit.
        try: 
            intent = ''
            while True:
                pcm = self.Input_Audio.read(self.Rhino.frame_length, exception_on_overflow=False)
                pcm_unpacked = struct.unpack_from("h" * self.Rhino.frame_length, pcm)

                is_finalized = self.Rhino.process(pcm_unpacked)

                if is_finalized:
                    # get inference if is_finalized is true
                    with Halo(text=intent, spinner='dots'):
                        inference = self.Rhino.get_inference()
                        if inference.is_understood:

                            # Use intent and slots if inference was understood.
                            # When training your model, be sure you include your wake word
                            # as an intent. In this case it is Lilith.
                            intent = inference.intent
                            
                            # Slots are comliments to intent, like vars: 
                            #   Intent = ${slot_name} coffee
                            # Slots: make, brew
                            # command can then be make coffee or brew coffee...
                            slots = inference.slots

                            # Macros assist in more complicated commands.

                            #print(intent)
                            self.speak(intent) # speak relevant responses based on intent.

                            # Execute command if found.
                            self.exec_intent(intent)

        except KeyboardInterrupt:
            print("\nExiting Lilith... Goodbye.")
        finally:
            # Cleanup
            if 'Output_Audio' in locals():
                self.Output_Audio.close()
            if 'pyaudio_output_instance' in locals():
                self.pyaudio_output_instance.terminate()
    
            self.Input_Audio.close() 
            self.pyaudio_output_instance.terminate()

            self.Rhino.delete()

if __name__ == "__main__":
    dear_lilith = Lilith()
    dear_lilith.listen()

Blog

Simple Python Voice Assist For Linux in Python using PVRhino

Speech

Putting it together

Virtual Environment

Disclosure

Future

Alternative

Code

Like this:

Leave a ReplyCancel reply

Mohawke

Recent Posts

Categories

Links