general/other kokoro-onnx text to speech with Cuda support using Podman

I have created a Podman container for Kokoro-onnx with Cuda support

Kokoro-onnx is a light weight tts for text to speech

I have also knocked up a simple python script that lets you specify a text file,
voice to use and optionally an output file name

Features:

Supports multiple languages
Offer multiple voices
Lightweight: ~300MB

Kokoro-onnx tts


freebsd-cuda project


Kokoro-onnx section


I have created a demo of the speech attached in a demo.zip file

Using the some of the text from the review for World of the Unknown: Ghosts: 1
Which is Usborne book released in 1977, which if you are as old as i am you may have had as a kid


Heres the text i processed

First published in 1977, this cult classic has been reissued for a new generation of ghost-hunters.

This book is for anyone who has shivered at shadowy figures in the dark, heard strange sounds in the night, or felt the presence of a mysterious 'something' from the unknown.

Ghost stories are as old as recorded history and exist all over the world. Many of the different kinds of ghosts that are thought to haunt the Earth and their behaviour are described here.
You will meet haunting spirits, screaming skulls, phantom ships, demon dogs, white ladies, gallows ghosts and many more.

This book also explains the techniques and equipment of ghost hunting and tells how lots of 'ghosts' have been exposed as fakes or explained away as natural events. Also included are some theories that attempt to explain the possible existence of ghosts.

Because the podman container uses Cuda it processed the text in 6 seconds on my Dell XPS 15 2019
with a NVIDIA GeForce GTX 1650 gpu in 6 seconds to create a 50 second audio file

Run the kokoro script with the -h option to show the help

Code:
kokoro -h

Help output

Code:
Usage: kokoro -f <input_text_file> [OPTIONS]

Options:
  -f <file>    Path to the input text file to read from (Required)
  -o <file>    Path to the output wav file destination
               Default: input-filename.wav
  -v <voice>   Specify the speaker voice variant
               Default: bm_lewis
  -h           Display this help usage screen

Available Voices Reference:
  https://huggingface.co/hexgrad/Kokoro-82M/blob/main/VOICES.md

Run the kokoro script

Code:
kokoro -f input.txt

Use the -f option with a text file to process

Use the -v option to specify a voice

[[https://huggingface.co/hexgrad/Kokoro-82M/blob/main/VOICES.md]]

If you omit the -f option the default voice bm_lewis will be used

Use the -o option to specify an output filename

If you omit the -o option the default ouput filename will be the name of input file with a .wav extension

Update:

I have added a new script called kokoro-stream which streams the audio using ffplay in the container
over the pulseaudio socket to Freebsd instead of saving as a wav

The kokoro-stream script streams the audio from a text file instead of saving it as a wav

Open Pulse Audio volume control on Freebsd before starting the container

This will create the /tmp/pulseaudio.socket, and keep pulseaudio open

Run the kokoro-stream script with the -h option to show the help

Code:
kokoro-stream -h

Help output

Code:
Usage: kokoro -f <input_text_file> [OPTIONS]

Options:
  -f <file>    Path to the input text file to read from (Required)
  -v <voice>   Specify the speaker voice variant
               Default: bm_lewis
  -h           Display this help usage screen

Available Voices Reference:
  https://huggingface.co/hexgrad/Kokoro-82M/blob/main/VOICES.md

Run the kokoro script

Code:
kokoro-stream -f input.txt

Use the -f option with a text file to process

Use the -v option to specify a voice

[[https://huggingface.co/hexgrad/Kokoro-82M/blob/main/VOICES.md]]

If you omit the -f option the default voice bm_lewis will be used

The audio will be streamed using ffplay inside the container over the pulseaudio socket to Freebsd
 

Attachments

Back
Top