Voice RSS is a text-to-speech API that can be used on a Raspberry Pi. Here we show how to use Voice RSS to make a Raspberry Pi talk.
Table of contents
Introduction to Voice RSS
The basic idea behind TTS is to give a Raspberry Pi output in the form of vocal sentences instead of displaying text on the screen.
The instructions in this post will enable you to use the tts
command in the form of tts "Hello, world."
which will then be spoken.
- Get the Raspberry Pi 4B 4GB Starter Kit from Amazon.com
- Get the Raspberry Pi 4B 8GB Starter Kit from Amazon.com
- Get the Compact USB Powered Mini Stereo Sound Bar with Audion Jack from Amazon.com
Background
A while back I was experimenting with Steven Hickson’s PiAUISuite. It uses Google Speech to take basic vocal commands through a microphone, and runs basic logic which can give output in various forms — speech being one of them. Google changed to a paid speech-to-text service around November 2015 and the project was placed on hold.
Since I had PiAUISuite installed on a few Raspberry Pis, I thought to take its ‘tts’ command and show how to tweak it a little to at least have a good TTS ability using Voice RSS.
Assumptions and requirements
For this post, a fully installed Raspberry Pi Model B with the latest version of Raspbian was used. Default sound output from either the 3.5mm audio jack or HDMI cable needs to be audible.
During the installation process, a connection to the internet will be required. Without a screen, keyboard and mouse, PuTTY and/or WinSCP can be used to do the testing and coding.
A basic, free Voice RSS account will allow up to 350 requests per day.
Sound output (ALSA at least). The latest Raspberry Pi B models have HDMI and a 3.5mm audio jack which can be used for sound. By default, Rasbian should have most things installed for ALSA to work.
Limitations
Although not always a limitation per se, this system is mainly controlled by running terminal commands. It is great for Python and Bash scripts. As mentioned above, a free Voice RSS account will only give you a maximum of 350 requests per day.
A modified version of PiAUISuite
PiAUISuite will install a nifty command called ‘tts‘. This is what we’ll be using after some modification. Start by installing some packages and then PiAUISuite from Github by typing the following on your terminal from a freshly booted Raspbian:
sudo apt-get install git sudo apt-get install mpg123 git clone https://github.com/StevenHickson/PiAUISuite.git cd PiAUISuite/Install ./InstallAUISuite.sh cd /home/pi
Say yes to install the dependencies (so initially you’ll be saying yes twice). Afterward, PiAUISuite will one by one try to install and set up playvideo
, downloader
, gvapi
, gtextcommand
, youtube
, youtube-safe
and voicecommand
. We will only be needing the last one on the list, voicecommand
— so say no to installing the rest.
After all the dependencies and voicecommand are installed (which can take a while), the installer will automatically prompt to set up voicecommand. On a fresh install, there will be no commands found and it will ask you to try to set itself up. We won’t be using this, so say no. You can do this later by using voicecommand -s
.
Next, we will change some code on the newly created tts file to use Voice RSS’s TTS service instead of Google’s.
To continue we will need a Voice RSS API key, so go and get one.
To edit the original tts file, use the following command from the Raspbian terminal:
sudo nano /usr/bin/tts
to edit the original code from Steven Hickson:
#!/bin/bash #for the Raspberry Pi, we need to insert some sort of FILLER here since it cuts off the first bit of audio string=$@ lang="en" if [ "$1" == "-l" ] ; then lang="$2" string=`echo "$string" | sed -r 's/^.{6}//'` fi #empty the original file echo "" > "/dev/shm/speak.mp3" len=${#string} while [ $len -ge 100 ] ; do #lets split this up so that its a maximum of 99 characters tmp=${string:0:100} string=${string:100} #now we need to make sure there aren't split words, let's find the last space and the string after it lastspace=${tmp##* } tmplen=${#lastspace} #here we are shortening the tmp string tmplen=`expr 100 - $tmplen` tmp=${tmp:0:tmplen} #now we concatenate and the string is reconstructed string="$lastspace$string" len=${#string} #get the first 100 characters wget -q -U Mozilla -O "/dev/shm/tmp.mp3" "https://translate.google.com/translate_tts?tl=${lang}&q=$tmp&ie=UTF-8&total=1&idx=0&client=t" cat "/dev/shm/tmp.mp3" >> "/dev/shm/speak.mp3" done #this will get the last remnants wget -q -U Mozilla -O "/dev/shm/tmp.mp3" "https://translate.google.com/translate_tts?tl=${lang}&q=$string&ie=UTF-8&total=1&idx=0&client=t" cat "/dev/shm/tmp.mp3" >> "/dev/shm/speak.mp3" #now we finally say the whole thing cat "/dev/shm/speak.mp3" | mpg123 - 1>>/dev/shm/voice.log 2>>/dev/shm/voice.log
After getting your Voice RSS API, Keven recommended replacing the entire script with the following shorter version:
#!/bin/bash #for the Raspberry Pi, we need to insert some sort of FILLER here since it cuts off the first bit of audio string=$@ lang="en-gb" if [ "$1" == "-l" ] ; then lang="$2" string=`echo "$string" | sed -r 's/^.{6}//'` fi #empty the original file echo "" > "/dev/shm/speak.mp3" len=${#string} wget -q -U Mozilla -O "/dev/shm/tmp.mp3" "http://api.voicerss.org/?key=MYAPIKEYGOESHERE&src=$string&f=22khz_16bit_mono&hl=$lang" cat "/dev/shm/tmp.mp3" >> "/dev/shm/speak.mp3" #now we finally say the whole thing cat "/dev/shm/speak.mp3" | mpg123 - 1>>/dev/shm/voice.log 2>>/dev/shm/voice.log
Simply replace MYAPIKEYGOESHERE with your own and exit (Ctrl + X then Y) to save.
The command can now be used from any directory without sudo like so:
tts "Hello, world"
which will convert your text to speech, which can be heard on your default sound card and audio out. Voice RSS allows for up to 10 000 characters per call.
You can go through the Voice RSS documentation yourself and see what languages they have available, but I was happy with the quality of the default English voice.
Conclusion
Voice RSS is a text-to-speech API that can be used on a Raspberry Pi. Here we showed how to use Voice RSS to make a Raspberry Pi talk by using PiAUISuite.
I think I figured out what the unexpected sounds are. The input -l es-mx is 8 characters, not 6. Voicerss expects all language identifiers to be 4 characters long. So this line of script: sed -r ‘s/^.{6}//’`
is wrong. That ‘6’ should be an 8: sed -r ‘s/^.{8}//’`
I’ve been using your code to speak in different languages. That means the if statement: [ “$1” == “-l” ] evaluates to true.
To me it sounds like the first bit of audio is cut off in that case.
There is a comment in the code that says some sort of filler needs to be inserted because of that cut off.
The cut off doesn’t seem to be a problem unless -l is specified.
I don’t understand what filler is getting inserted in the case where -l is not specified so I’m unable to reinsert that filler when language is specified.
Can you clarify/explain why the cut off occurs when -l is specified and how I can fix that?