Using eSpeak to make a Raspberry Pi talk

eSpeak is an open-source text-to-speech synthesiser that can be installed on a Raspberry Pi. Here, we show how to use eSpeak to make a Raspberry Pi talk.

Table of contents

Introduction to eSpeak on a Raspberry Pi

eSpeak is a compact, open-source text-to-speech synthesiser for Windows and Linux and can synthesise speech from text in English and other languages.

eSpeak is a great piece of software to create a talking Raspberry Pi. The commands are triggered using the terminal and Bash commands.

At the time of writing, eSpeak is newer than some other software text-to-speech synthesizers and is free to use. Apart from its installation, it does not need the internet to operate and is relatively small in size. It is also straightforward to get up and running with the default Raspbian settings, easy to use and very customisable.

On the downside, eSpeak wails a little and sounds very much like an alien or a robot (which can also be why you would like to incorporate it into a project). The text-to-speech (TTS) conversion is also not that accurate and due to bad pronunciation, some words might be difficult to hear.

Requirements for using eSpeak on a Raspberry Pi

For this post, a fully installed Raspberry Pi with the latest version of Raspbian was used. The default sound output from the 3.5mm audio jack or HDMI cable must be audible.

During the installation process, a connection to the internet will be required. Without a screen, keyboard and mouse, PuTTY and/or WinSCP can be used to do the testing and coding.

Sound output

eSpeak should work out of the box with Raspbian’s default sound settings. The only requirement is to choose the desired sound output (i.e. HDMI or audio jack).

To test the sound output on Raspbain, the following terminal command can be used:

aplay /usr/share/scratch/Media/Sounds/Vocals/Singer2.wav

You should hear a clip playing a short, “haaa” singing voice. If this clip can be heard, then eSpeak should also be heard.

Alternatively, eSpeak can be used with the -w command to write wave files containing the speech instead of playing it on the sound device. More on this below.

Installing eSpeak on a Raspberry Pi

While connected to the internet, the following terminal command is used to install eSpeak:

sudo apt-get install espeak

To see if eSpeak has been installed correctly, use:

epseak -h

espeak --help

Using the eSpeak command

On the Raspberry Pi, eSpeak is used by using terminal commands. The eSpeak command can be used in s couple of ways.

The simplest way to use eSpeak is by typing the desired speech in the form of text input (text within double quotes) after the espeak command:

espeak "Hello, world"

To read text from a text file, use:

espeak -f <text file>

By not entering any text after the eSpeak command the program will use text taken form stdin, but each line is treated as a separate sentence. I.e. by just typing:

espeak

followed by text on subsequent lines, each line is spoken when Enter/RETURN is pressed. Pressing Ctrl + Z will enter the command prompt cursor again.

eSpeak command options

eSpeak has plenty of handy command-line options which will alter its default use. These include changing the accent/language, gendered tone, pitch, speed, etc. of the spoken voice.

Command-line options can be ‘stacked’ onto each other. To see the version of eSpeak and all the command-line options:

epseak -h

espeak --help

The voice used by eSpeak is determined by the voice accent/language file and a variant determining its tone (e.g. male or female). To change the voice accent/language, the correct voice file needs to be used.

To see a list of the available voice files, the following command is used:

espeak --voices

To use the Afrikaans accent/language, for example, the following command option is used:

espeak -v af

where af is the corresponding Language in the list of available voices.

By default, all languages/accents are generated in a male tone. With eSpeak, the tone of a voice can also be changed using an additional variant property:

-v <voice filename>[+<variant>]

According to the official documentation, the variants are “+m1 +m2 +m3 +m4 +m5 +m6 for male voices and +f1 +f2 +f3 +f4 which simulate female voices by using higher pitches. Other variants are +croak and +whisper.”

To use the Afrikaans accent/language with a mid-tone female voice, for example, the following command option is used:

-v af+f2

Some of the other more useful command-line options include:

-f <text file> speaks a text file.
--stdin takes the text input from stdin.
-a <integer> sets amplitude (volume) in a range of 0 to 200. The default is 100.
-p <integer> adjusts the pitch in a range of 0 to 99. The default is 50.
-s <integer> sets the speed in words-per-minute (approximate values for the default English voice, others may differ slightly). The default value is 170. Range 80 to 390.
-g <integer> inserts a pause between words. The value is the length of the pause, in units of 10 mS (at the default speed of 170 wpm).
-l <integer> inserts a line-break length, default value 0. If set, then lines which are shorter than this are treated as separate clauses and spoken separately with a break between them. This can be useful for some text files, but bad for others.
-w <wave file> writes the speech output to a file in WAV format, rather than speaking it.
-z removes the end-of-sentence pause which normally occurs at the end of the text.
--stdout writes the speech output to stdout as it is produced, rather than speaking it. The data starts with a WAV file header which indicates the sample rate and format of the data. The length field is set to zero because the length of the data is unknown when the header is produced.

See the eSpeak documentation for more information.

Conclusion

eSpeak is a great piece of software to create a talking Raspberry Pi. It is free, easy to install and use, customisable and can synthesize speech using text in many languages.