An Audio-Book Project

Created by Sanchit Ghule
Roll No: MM16B004
Guide: Dr. G. Phanikumar
Project URL: https://project-audiobook.github.io/

Introduction

As a contemporary system to provide equal education rights to all learners, assistive tools and accessible documents gives different opportunities for people who cannot get enough educational services because of their limitations or disabilities. The traditional way to convert available books into audio-books requires a human intervention, which again requires strenuous efforts to produce a high quality audiobooks. Also, the production and publication processes is very difficult in these scenarios. This study conducted to develop a tools to produce high quality audio-books for the blind and visually impaired.

What is audiobook?

According to Wikipedia:

An audiobook (or a talking book) is a recording of a book or other work being read out loud. ---wikipedia.org

What is speech synthesiser?

Speech synthesis is the artificial production of human speech. A computer used for speech synthesis is called a speech synthesizer.

the functionality of a computer software to convert a text into speech, synthesise it and speak it is called TTS.

Overview Of Text To Speech

Text Document

Down arrow

Spoken Version

  • Limited domain synthesis.
  • Full synthesis.

1 Limited Domain Speech Synthesis

This speech synthesis technique used in situations where the text to be spoken is of a limited domain. e.g. IVR systems, talking clocks and ATMs(Automatic Teller Machines), automated announcements, etc.

2 Full synthesis

This is a more generic technique of synthesising speech. Voices generated intended for this purpose use a significantly large amount of text recordings to build the voice. These voices are capable of synthesising any text of perticular language. e.g. screen readers, TTS systems

Problem Statement

Despite advancements in screen reading and text to speech technology, a significant amount of text is inaccessible to the users. Reading printed text is still a major challenge for visually impaired users.
In this first part of the work, our goal is to build a tool which could utilise all current synthesiser technology, which could process all documents, and produce high quality multilingual "mp3" audiobooks.
The Build tool could be able to process unlimited pages of PDF documents.

selection of programming language

  • Python
  • C#

Python Code


						import pyttsx3
						import PyPDF2

						book = open('sanchit-ghule-personal-resume.pdf', 'rb')
						pdfReader = PyPDF2.PdfFileReader(book)
						pages = pdfReader.numPages

						speaker = pyttsx3.init()
						for num in range(0, pages):
						    page = pdfReader.getPage(num)						
						    text = page.extractText()
						    speaker.say(text)
						    speaker.runAndWait()
					

Shortcomings with Python

  1. Code is not efficient with large number of files.
  2. It is taking longer to process text extraction and audio conversion.
  3. Limited number of synthesisers are available in Python.
  4. difficult to use in multithreading. Module doesn’t quite behave the way expected.

Need to look for alternate options.

C Sharp

C Sharp Source Code


using System;
using System.Collections;
using System.Collections.Generic;
using System.Collections.ObjectModel;
using System.ComponentModel;
using System.Globalization;
using System.IO;
using System.Runtime.Serialization;
using System.Security.Permissions;
using System.Text;
using System.Text.RegularExpressions;
#if LINQ
using System.Linq;
#endif
#if TEST
using NDesk.Options;
#endif
					

Advantages of using C# Tool

  1. It is very efficient.
  2. executable can run on Powershell as well as WSL2.
  3. Can process several thousands of books.
  4. Can process all worlds languages. It will only require supporting synthesiser language.
  5. Option to varying speech rate, volume of processing audio, ability to change synthesiser support and synthesiser language.
  6. It can produce "mp3" audiofile as output.
  7. Script can run 24/7, 365 days without overloading.

Synthesiser used

Sr No Synthesizer Name audio file
1. Google female voice audiofile
2. Microsoft cortana audiofile
3. Eloquence audiofile
4. IVONA 2 voices:
  Brian audiofile
  Kimberly audiofile
  Kendra audiofile
  Ivy audiofile
  Eric audiofile
  Emma audiofile
  Jennifer audiofile
  Salli audiofile
Sr No Synthesizer Name audio file
5. VW voices:
  Julie audiofile
  Kate audiofile
  Paul audiofile
6. ScanSoft voices:
  Nathan (American) audiofile
  Samantha (American) audiofile
  Tom (American) audiofile
  Daniel (British) audiofile
  Serena (British) audiofile
  Stephanie (British) audiofile
  Karen (Australian) audiofile
Sr No Synthesizer Name audio file
  Lee (Australian) audiofile
  sangita (Indian) audiofile
  Hindi-Lekha audiofile
  Kannada-Alpana audiofile
  Marathi-Ananya audiofile
  Tamil-Vani audiofile
  Telugu-Geeta audiofile
7. Microsoft Windows voices:
  Microsoft David audiofile
  Microsoft Zira audiofile
8. Hear2Read voices
  Hear2Read-english audiofile
  Hear2Read-hindi audiofile
  Hear2Read-marathi audiofile

Conclusions and Future work

  • Current day TTS systems and screen reader implementations are not capable of efficiently speaking mathematical content.
  • Mathematical equations comprise of different types of visual cues to convey their semantic meaning. Some of these visual cues are special symbols, superscripts, subscripts, parentheses,etc.
  • Despite advances in screen reading and text to speech technologies, the problem of speaking complex math remains majorly unresolved.
  • Having developed the Tool for producing audiobooks and defined the problem, in future we will attempt to improve the accessibility of mathematical content.
  • We will develop automated server backend for our tool.

AudioBook-Project Results

Creating audio-books is practical, easily accessible and inexpensive for distribution. e.g. cassette, CD ROM and Internet.
can be used as instruction materials.
Benefits of audio-book in distance education.
Audio-Book is not so widespread but inexpensive that its potential in open education contexts is easily overlooked.
The use of audio-books to support open education can be extended to most disciplines.