Using Voice RSS to make a Raspberry Pi talk

Voice RSS is a text-to-speech API that can be used on a Raspberry Pi. Here we show how to use Voice RSS to make a Raspberry Pi talk.

Table of contents

Introduction to Voice RSS

The basic idea behind TTS is to give a Raspberry Pi output in the form of vocal sentences instead of displaying text on the screen.

The instructions in this post will enable you to use the tts command in the form of tts "Hello, world." which will then be spoken.

Background

A while back I was experimenting with Steven Hickson’s PiAUISuite. It uses Google Speech to take basic vocal commands through a microphone, and runs basic logic which can give output in various forms — speech being one of them. Google changed to a paid speech-to-text service around November 2015 and the project was placed on hold.

Since I had PiAUISuite installed on a few Raspberry Pis, I thought to take its ‘tts’ command and show how to tweak it a little to at least have a good TTS ability using Voice RSS.

Assumptions and requirements

For this post, a fully installed Raspberry Pi Model B with the latest version of Raspbian was used. Default sound output from either the 3.5mm audio jack or HDMI cable needs to be audible.

During the installation process, a connection to the internet will be required. Without a screen, keyboard and mouse, PuTTY and/or WinSCP can be used to do the testing and coding.

A basic, free Voice RSS account will allow up to 350 requests per day.

Sound output (ALSA at least). The latest Raspberry Pi B models have HDMI and a 3.5mm audio jack which can be used for sound. By default, Rasbian should have most things installed for ALSA to work.

Limitations

Although not always a limitation per se, this system is mainly controlled by running terminal commands. It is great for Python and Bash scripts. As mentioned above, a free Voice RSS account will only give you a maximum of 350 requests per day.

A modified version of PiAUISuite

PiAUISuite will install a nifty command called ‘tts‘. This is what we’ll be using after some modification. Start by installing some packages and then PiAUISuite from Github by typing the following on your terminal from a freshly booted Raspbian:

sudo apt-get install git
sudo apt-get install mpg123
git clone https://github.com/StevenHickson/PiAUISuite.git
cd PiAUISuite/Install
./InstallAUISuite.sh
cd /home/pi

Say yes to install the dependencies (so initially you’ll be saying yes twice). Afterward, PiAUISuite will one by one try to install and set up playvideo, downloader, gvapi, gtextcommand, youtube, youtube-safe and voicecommand. We will only be needing the last one on the list, voicecommand — so say no to installing the rest.

After all the dependencies and voicecommand are installed (which can take a while), the installer will automatically prompt to set up voicecommand. On a fresh install, there will be no commands found and it will ask you to try to set itself up. We won’t be using this, so say no. You can do this later by using voicecommand -s.

Next, we will change some code on the newly created tts file to use Voice RSS’s TTS service instead of Google’s.

To continue we will need a Voice RSS API key, so go and get one.

To edit the original tts file, use the following command from the Raspbian terminal:

sudo nano /usr/bin/tts

to edit the original code from Steven Hickson:

#!/bin/bash

#for the Raspberry Pi, we need to insert some sort of FILLER here since it cuts off the first bit of audio

string=$@
lang="en"
if [ "$1" == "-l" ] ; then
    lang="$2"
    string=`echo "$string" | sed -r 's/^.{6}//'`
fi

#empty the original file
echo "" > "/dev/shm/speak.mp3"

len=${#string}
while [ $len -ge 100 ] ;
do
    #lets split this up so that its a maximum of 99 characters
    tmp=${string:0:100}
    string=${string:100}

    #now we need to make sure there aren't split words, let's find the last space and the string after it
    lastspace=${tmp##* }
    tmplen=${#lastspace}

    #here we are shortening the tmp string
    tmplen=`expr 100 - $tmplen`
    tmp=${tmp:0:tmplen}

    #now we concatenate and the string is reconstructed
    string="$lastspace$string"
    len=${#string}

    #get the first 100 characters
    wget -q -U Mozilla -O "/dev/shm/tmp.mp3" "https://translate.google.com/translate_tts?tl=${lang}&q=$tmp&ie=UTF-8&total=1&idx=0&client=t"
    cat "/dev/shm/tmp.mp3" >> "/dev/shm/speak.mp3"
done
#this will get the last remnants
wget -q -U Mozilla -O "/dev/shm/tmp.mp3" "https://translate.google.com/translate_tts?tl=${lang}&q=$string&ie=UTF-8&total=1&idx=0&client=t"
cat "/dev/shm/tmp.mp3" >> "/dev/shm/speak.mp3"
#now we finally say the whole thing
cat "/dev/shm/speak.mp3" | mpg123 - 1>>/dev/shm/voice.log 2>>/dev/shm/voice.log

After getting your Voice RSS API, Keven recommended replacing the entire script with the following shorter version:

#!/bin/bash
#for the Raspberry Pi, we need to insert some sort of FILLER here since it cuts off the first bit of audio
string=$@
lang="en-gb"
if [ "$1" == "-l" ] ; then
    lang="$2"
    string=`echo "$string" | sed -r 's/^.{6}//'`
fi

#empty the original file
echo "" > "/dev/shm/speak.mp3"

len=${#string}
wget -q -U Mozilla -O "/dev/shm/tmp.mp3" "http://api.voicerss.org/?key=MYAPIKEYGOESHERE&src=$string&f=22khz_16bit_mono&hl=$lang"
cat "/dev/shm/tmp.mp3" >> "/dev/shm/speak.mp3"
#now we finally say the whole thing
cat "/dev/shm/speak.mp3" | mpg123 - 1>>/dev/shm/voice.log 2>>/dev/shm/voice.log

Simply replace MYAPIKEYGOESHERE with your own and exit (Ctrl + X then Y) to save.

The command can now be used from any directory without sudo like so:

tts "Hello, world"

which will convert your text to speech, which can be heard on your default sound card and audio out. Voice RSS allows for up to 10 000 characters per call.

You can go through the Voice RSS documentation yourself and see what languages they have available, but I was happy with the quality of the default English voice.

Conclusion

Voice RSS is a text-to-speech API that can be used on a Raspberry Pi. Here we showed how to use Voice RSS to make a Raspberry Pi talk by using PiAUISuite.

2 comments

Andy Citron 9 February 2022

I think I figured out what the unexpected sounds are. The input -l es-mx is 8 characters, not 6. Voicerss expects all language identifiers to be 4 characters long. So this line of script: sed -r ‘s/^.{6}//’`
is wrong. That ‘6’ should be an 8: sed -r ‘s/^.{8}//’`

Andy Citron 8 February 2022

I’ve been using your code to speak in different languages. That means the if statement: [ “$1” == “-l” ] evaluates to true.

To me it sounds like the first bit of audio is cut off in that case.
There is a comment in the code that says some sort of filler needs to be inserted because of that cut off.

The cut off doesn’t seem to be a problem unless -l is specified.

I don’t understand what filler is getting inserted in the case where -l is not specified so I’m unable to reinsert that filler when language is specified.

Can you clarify/explain why the cut off occurs when -l is specified and how I can fix that?