An Audio-Book Project

Created by Sanchit Ghule
Roll No: MM16B004
Guide: Dr. G. Phanikumar
Project URL: https://project-audiobook.github.io/

Introduction

As a contemporary system to provide equal education rights to all learners, assistive tools and accessible documents gives different opportunities for people who cannot get enough educational services because of their limitations or disabilities. The traditional way to convert available books into audio-books requires a human intervention, which again requires strenuous efforts to produce a high quality audiobooks. Also, the production and publication processes is very difficult in these scenarios. This study conducted to develop a tools to produce high quality audio-books for the blind and visually impaired.

What is audiobook?

According to Wikipedia:

An audiobook (or a talking book) is a recording of a book or other work being read out loud. ---wikipedia.org

What is speech synthesiser?

Speech synthesis is the artificial production of human speech. A computer used for speech synthesis is called a speech synthesizer.

the functionality of a computer software to convert a text into speech, synthesise it and speak it is called TTS.

Overview Of Text To Speech

Text Document

Spoken Version

Limited domain synthesis.
Full synthesis.

1 Limited Domain Speech Synthesis

This speech synthesis technique used in situations where the text to be spoken is of a limited domain. e.g. IVR systems, talking clocks and ATMs(Automatic Teller Machines), automated announcements, etc.

2 Full synthesis

This is a more generic technique of synthesising speech. Voices generated intended for this purpose use a significantly large amount of text recordings to build the voice. These voices are capable of synthesising any text of perticular language. e.g. screen readers, TTS systems

Problem Statement

Despite advancements in screen reading and text to speech technology, a significant amount of text is inaccessible to the users. Reading printed text is still a major challenge for visually impaired users.
In this first part of the work, our goal is to build a tool which could utilise all current synthesiser technology, which could process all documents, and produce high quality multilingual "mp3" audiobooks.
The Build tool could be able to process unlimited pages of PDF documents.

selection of programming language

Python
C#

Python Code


						import pyttsx3
						import PyPDF2

						book = open('sanchit-ghule-personal-resume.pdf', 'rb')
						pdfReader = PyPDF2.PdfFileReader(book)
						pages = pdfReader.numPages

						speaker = pyttsx3.init()
						for num in range(0, pages):
						    page = pdfReader.getPage(num)						
						    text = page.extractText()
						    speaker.say(text)
						    speaker.runAndWait()

Shortcomings with Python

Code is not efficient with large number of files.
It is taking longer to process text extraction and audio conversion.
Limited number of synthesisers are available in Python.
difficult to use in multithreading. Module doesn’t quite behave the way expected.

Need to look for alternate options.

C Sharp

C Sharp Source Code


using System;
using System.Collections;
using System.Collections.Generic;
using System.Collections.ObjectModel;
using System.ComponentModel;
using System.Globalization;
using System.IO;
using System.Runtime.Serialization;
using System.Security.Permissions;
using System.Text;
using System.Text.RegularExpressions;
#if LINQ
using System.Linq;
#endif
#if TEST
using NDesk.Options;
#endif

Advantages of using C# Tool

It is very efficient.
executable can run on Powershell as well as WSL2.
Can process several thousands of books.
Can process all worlds languages. It will only require supporting synthesiser language.
Option to varying speech rate, volume of processing audio, ability to change synthesiser support and synthesiser language.
It can produce "mp3" audiofile as output.
Script can run 24/7, 365 days without overloading.

Synthesiser used

Sr No	Synthesizer Name	audio file
1.	Google female voice	audiofile
2.	Microsoft cortana	audiofile
3.	Eloquence	audiofile
4.	IVONA 2 voices:	…
	Brian	audiofile
	Kimberly	audiofile
	Kendra	audiofile
	Ivy	audiofile
	Eric	audiofile
	Emma	audiofile
	Jennifer	audiofile
	Salli	audiofile

Sr No	Synthesizer Name	audio file
5.	VW voices:	…
	Julie	audiofile
	Kate	audiofile
	Paul	audiofile
6.	ScanSoft voices:	…
	Nathan (American)	audiofile
	Samantha (American)	audiofile
	Tom (American)	audiofile
	Daniel (British)	audiofile
	Serena (British)	audiofile
	Stephanie (British)	audiofile
	Karen (Australian)	audiofile

Sr No	Synthesizer Name	audio file
	Lee (Australian)	audiofile
	sangita (Indian)	audiofile
	Hindi-Lekha	audiofile
	Kannada-Alpana	audiofile
	Marathi-Ananya	audiofile
	Tamil-Vani	audiofile
	Telugu-Geeta	audiofile
7.	Microsoft Windows voices:	…
	Microsoft David	audiofile
	Microsoft Zira	audiofile
8.	Hear2Read voices	…
	Hear2Read-english	audiofile
	Hear2Read-hindi	audiofile
	Hear2Read-marathi	audiofile

Conclusions and Future work

Current day TTS systems and screen reader implementations are not capable of efficiently speaking mathematical content.
Mathematical equations comprise of different types of visual cues to convey their semantic meaning. Some of these visual cues are special symbols, superscripts, subscripts, parentheses,etc.
Despite advances in screen reading and text to speech technologies, the problem of speaking complex math remains majorly unresolved.
Having developed the Tool for producing audiobooks and defined the problem, in future we will attempt to improve the accessibility of mathematical content.
We will develop automated server backend for our tool.

AudioBook-Project Results

Creating audio-books is practical, easily accessible and inexpensive for distribution. e.g. cassette, CD ROM and Internet.
can be used as instruction materials.
Benefits of audio-book in distance education.
Audio-Book is not so widespread but inexpensive that its potential in open education contexts is easily overlooked.
The use of audio-books to support open education can be extended to most disciplines.