I have created a Podman container for Kokoro-onnx with Cuda support
Kokoro-onnx is a light weight tts for text to speech
I have also knocked up a simple python script that lets you specify a text file,
voice to use and optionally an output file name
Features:
Supports multiple languages
Offer multiple voices
Lightweight: ~300MB
Kokoro-onnx tts
github.com
freebsd-cuda project
github.com
Kokoro-onnx section
github.com
I have created a demo of the speech attached in a demo.zip file
Using the some of the text from the review for World of the Unknown: Ghosts: 1
Which is Usborne book released in 1977, which if you are as old as i am you may have had as a kid
Heres the text i processed
Because the podman container uses Cuda it processed the text in 6 seconds on my Dell XPS 15 2019
with a NVIDIA GeForce GTX 1650 gpu in 6 seconds to create a 50 second audio file
Run the kokoro script with the -h option to show the help
Help output
Run the kokoro script
Use the -f option with a text file to process
Use the -v option to specify a voice
[[https://huggingface.co/hexgrad/Kokoro-82M/blob/main/VOICES.md]]
If you omit the -f option the default voice bm_lewis will be used
Use the -o option to specify an output filename
If you omit the -o option the default ouput filename will be the name of input file with a .wav extension
Update:
I have added a new script called kokoro-stream which streams the audio using ffplay in the container
over the pulseaudio socket to Freebsd instead of saving as a wav
The kokoro-stream script streams the audio from a text file instead of saving it as a wav
Open Pulse Audio volume control on Freebsd before starting the container
This will create the /tmp/pulseaudio.socket, and keep pulseaudio open
Run the kokoro-stream script with the -h option to show the help
Help output
Run the kokoro script
Use the -f option with a text file to process
Use the -v option to specify a voice
[[https://huggingface.co/hexgrad/Kokoro-82M/blob/main/VOICES.md]]
If you omit the -f option the default voice bm_lewis will be used
The audio will be streamed using ffplay inside the container over the pulseaudio socket to Freebsd
Kokoro-onnx is a light weight tts for text to speech
I have also knocked up a simple python script that lets you specify a text file,
voice to use and optionally an output file name
Features:
Supports multiple languages
Offer multiple voices
Lightweight: ~300MB
Kokoro-onnx tts
GitHub - thewh1teagle/kokoro-onnx: TTS with kokoro and onnx runtime
TTS with kokoro and onnx runtime. Contribute to thewh1teagle/kokoro-onnx development by creating an account on GitHub.
freebsd-cuda project
GitHub - NapoleonWils0n/freebsd-cuda: Run Linux command line and gui applications on FreeBSD with full Nvidia Cuda hardware acceleration.
Run Linux command line and gui applications on FreeBSD with full Nvidia Cuda hardware acceleration. - NapoleonWils0n/freebsd-cuda
Kokoro-onnx section
GitHub - NapoleonWils0n/freebsd-cuda: Run Linux command line and gui applications on FreeBSD with full Nvidia Cuda hardware acceleration.
Run Linux command line and gui applications on FreeBSD with full Nvidia Cuda hardware acceleration. - NapoleonWils0n/freebsd-cuda
I have created a demo of the speech attached in a demo.zip file
Using the some of the text from the review for World of the Unknown: Ghosts: 1
Which is Usborne book released in 1977, which if you are as old as i am you may have had as a kid
Heres the text i processed
First published in 1977, this cult classic has been reissued for a new generation of ghost-hunters.
This book is for anyone who has shivered at shadowy figures in the dark, heard strange sounds in the night, or felt the presence of a mysterious 'something' from the unknown.
Ghost stories are as old as recorded history and exist all over the world. Many of the different kinds of ghosts that are thought to haunt the Earth and their behaviour are described here.
You will meet haunting spirits, screaming skulls, phantom ships, demon dogs, white ladies, gallows ghosts and many more.
This book also explains the techniques and equipment of ghost hunting and tells how lots of 'ghosts' have been exposed as fakes or explained away as natural events. Also included are some theories that attempt to explain the possible existence of ghosts.
Because the podman container uses Cuda it processed the text in 6 seconds on my Dell XPS 15 2019
with a NVIDIA GeForce GTX 1650 gpu in 6 seconds to create a 50 second audio file
Run the kokoro script with the -h option to show the help
Code:
kokoro -h
Help output
Code:
Usage: kokoro -f <input_text_file> [OPTIONS]
Options:
-f <file> Path to the input text file to read from (Required)
-o <file> Path to the output wav file destination
Default: input-filename.wav
-v <voice> Specify the speaker voice variant
Default: bm_lewis
-h Display this help usage screen
Available Voices Reference:
https://huggingface.co/hexgrad/Kokoro-82M/blob/main/VOICES.md
Run the kokoro script
Code:
kokoro -f input.txt
Use the -f option with a text file to process
Use the -v option to specify a voice
[[https://huggingface.co/hexgrad/Kokoro-82M/blob/main/VOICES.md]]
If you omit the -f option the default voice bm_lewis will be used
Use the -o option to specify an output filename
If you omit the -o option the default ouput filename will be the name of input file with a .wav extension
Update:
I have added a new script called kokoro-stream which streams the audio using ffplay in the container
over the pulseaudio socket to Freebsd instead of saving as a wav
The kokoro-stream script streams the audio from a text file instead of saving it as a wav
Open Pulse Audio volume control on Freebsd before starting the container
This will create the /tmp/pulseaudio.socket, and keep pulseaudio open
Run the kokoro-stream script with the -h option to show the help
Code:
kokoro-stream -h
Help output
Code:
Usage: kokoro -f <input_text_file> [OPTIONS]
Options:
-f <file> Path to the input text file to read from (Required)
-v <voice> Specify the speaker voice variant
Default: bm_lewis
-h Display this help usage screen
Available Voices Reference:
https://huggingface.co/hexgrad/Kokoro-82M/blob/main/VOICES.md
Run the kokoro script
Code:
kokoro-stream -f input.txt
Use the -f option with a text file to process
Use the -v option to specify a voice
[[https://huggingface.co/hexgrad/Kokoro-82M/blob/main/VOICES.md]]
If you omit the -f option the default voice bm_lewis will be used
The audio will be streamed using ffplay inside the container over the pulseaudio socket to Freebsd