We use speech as the chief communicating media to pass on between ourselves in our twenty-four hours to twenty-four hours life. However, when it comes to interacting with computing machines, apart from watching and executing actions, bulk of communicating is achieved presents through reading the computing machine screen.It involves surfing the cyberspace, reading electronic mails, eBooks, research documents and many more and this is really clip devouring.
Nevertheless, visually impaired community in Sri Lanka is faced with much problem pass oning with computing machines since a suited tool is non available for convenient usage. As an appropriate solution to this job, this undertaking proposes an effectual tool for Text-To-Speech transition suiting address in native linguistic communication.What is text-to-speech?Not everybody can read text when displayed on the screen or when printed. This may be because the individual is partly sighted, or because they are non literate. These people can be helped by bring forthing address instead than by publishing or exposing it, utilizing a Text-to-Speech ( TTS ) System to bring forth the address for the given text. A Text-To-Speech ( TTS ) system takes written text ( can be from a web page, text editor, clipboard…
etc. ) as the input and change over it to an hearable format so you can hear what is at that place in the text. It identifies and reads aloud what is being displayed on the screen.
With a TTS application, one can listen to computing machine text in topographic point of reading it. That means you can listen to your electronic mails, eBooks while you do something else which consequence in salvaging your valuable clip. Apart from clip salvaging and authorising the visually impaired population, TTS can besides be used to get the better of the literacy barrier of the common multitudes, increase the possibilities of improved man-machine interaction through online newspaper reading from the cyberspace and heightening other information systems such as larning ushers for pupils, IVR ( Synergistic Voice Recognition ) systems, automated conditions calculating systems and so on [ 1 ] [ 2 ] .
What is “ Sinhala Text To Speech ”?“ Sinhala Text To Speech ” is the system I selected as my concluding research undertaking. As a station alumnus pupil I selected a research undertaking that will change over the Sinhala input text into a verbal signifier.Actually, the term “ Text-To-speech ” ( TTS ) refers to the transition of input text into a spoken vocalization. The input is a Sinhala text, which may dwell of a figure of words, sentences, paragraphs, Numberss and abbreviations. TTS engine should place it without any ambiguity and bring forth the corresponding address sound wave with acceptable quality. The end product should be apprehensible for an mean receiving system without doing much attempt. This means that the end product should be made every bit near as to the natural address quality.Address is produced when air is forced from the lungs through the vocal cords ( glottis ) and along the vocal piece of land.
Speech is split into a quickly changing excitement signal and a easy variable filter. The envelope of the power spectra contains the vocal piece of land information. [ 40 ]The verbal signifier of in input should be apprehensible for the receiving system. This means that the end product will be made every bit closer as the natural human voice. The system will transport out few chief characteristics.
Some of them are, after come ining the text user will capable of choosing one of voice qualities, means adult females voice, male voice and child voice. Besides the user is capable of making fluctuation in velocity of the voice.Why need “ Sinhala Text To Speech ”?Since most commercial computing machine systems and applications are developed utilizing English, use and the benefits of those systems are limited merely to the people with English literacy.
Due to that fact, bulk of universe could non take the advantages of such applications. This scenario is besides applicable to Sri Lanka every bit good. Though Sri Lankans have a high linguistic communication literacy, computing machine and English linguistic communication literacy in sub urban countries are bit low.Therefore the sum of benefits and the advantages which can be gained through computing machine and information systems are being kept off from people in rural countries.
One manner to get the better of that would be through localisation. For that “ Sinhala Text To Speech ” will move as a strong platform to hike up package localisation and besides to cut down the spread between computing machines and people.The chief aim of the undertaking is to develop a to the full featured complete Sinhala Text to Speech system that gives a address end product similar to human voice while continuing the native prosodic features in Sinhala linguistic communication.
The system will be holding a female voice which is a immense demand in the current localisation package industry.It will move as the chief platform for Sinhala Text To Speech and developers will hold the benefit of constructing terminal user applications on top of that. This will profit visually impaired population and people with low IT literacy of Sri Lanka by enabling convenient entree of information such as reading electronic mails, eBooks, website contents, paperss and larning coachs. An terminal user windows application will be developed and it will move as a papers reader every bit good as a screen reader.To develop a system, that can able to read text in Sinhala format and covert it in to verbal ( Sinhala ) signifier. And besides, It will capable to alter the sound waves, It mean user would able to choose voice quality harmonizing to his/her sentiment. There are might be three voice choices. These are sort of female voice, sort of male voice and sort of child ‘s voice.
And user can alter the velocity of the voice. If person needs to hear low velocity voices or high-velocity voice, so he/she can alter it harmonizing to their demands.Produce a verbal format for the input Sinhala text.Input Sinhala text which may be a user input or a given text papers will be transformed in to sound moving ridges, which is so end product is captured by talkers.
So the handicapped people will be one of the most good stakeholders of Sinhala Text to Speech system. Besides undergraduates and research people who need to utilize more mentions can direct the text to my system, merely listen and grab what they need.The end product would be more like natural address.The human voice is a complex acoustic signal, which is generated by an air watercourse expelled at either oral cavity, nose or both. Important features of the address sound are speed, silence, accentuation and the degree of energy end product. The lingua suitably controls the air steam, lips with the aid of other articulators in the vocal system.
Many fluctuations of the address signal are caused by the individual ‘s vocal system, in order to convey the significance and emotion to the receiving system who so understand the message. Besides includes many other features, which are in receiving system ‘s hearing system to place what is being said.Identify an efficient manner of interpreting Sinhala text in to verbal signifier.By developing this system we would be able to place and proposed a most suited algorithm, which can be used to interpret Sinhala format to verbal signifier by a fast and efficient mode.Control the voice velocity and types of the voice ( e.g. adult male, adult females, child voice, etc.
) .Users would be capable of choosing the quality of the sound moving ridge, which they want. Besides they would be leting reset the velocity of the end product as they need. Peoples, those would wish to larn Sinhala as their 2nd linguistic communication to larn elocution decently by altering the velocity ( cut downing and increasing ) . So this will better the hearing capablenesss.
Small childs can be encouraged to larn linguistic communication by changing the velocity and types.Propose ways for that can be extended the current system further more for future demands.This system merely gives the basic maps. The system is executable of heightening farther more in order to fulfill the changing demands of the users.
This can be embedded in to playthings so can be used to better kids listening and elocution abilities. So those will Borden their speech production capacity.The idea of developing a Sinhala Text To Speech ( STTS ) engine have begun when I sing the chances available for Sinhala talking users to hold on the benefit of Information and Computer Technology ( ICT ) . In Sri Lanka more than 75 % of population speaks in Sinhala, but it ‘s really rare to happen Sinhala packages or Sinhala stuffs sing ICT in market. This is straight consequence to development of ICT in Sri Lanka.In present few Sinhala text to speech packages are available but those have jobs such as quality of sound, font scheme, pronunciation etc. Because of these jobs developers are afraid to utilize those STTS for their applications. My focal point on developing an engine that can change over Sinhala words in digitized signifier to Sinhala pronunciation with mistake free mode.
This engine will assist to develop some applications.Some applications where STTS can be usedDocument reader. An already digitized papers ( i.e. electronic mails, e-books, newspapers, etc. ) or a conventional papers by scanned and produced through an optical character recognizer ( OCR ) .
Aid to disable individual. The vision or voice impaired community can utilize the computing machines aided devices, straight to pass on with the universe. The vision-impaired individual can be informed by a STTS system.
The voice-impaired individual can pass on with others by supplying a computer keyboard and a STTS system.Talking books & A ; playthings. Producing speaking books & A ; toys will hike the toys market and instruction.Help helper. Develop aid helper speaks in Sinhala like in MS Office aid helper.
Automated News casting. The hereafter of wholly new strain of telecasting webs that have plans hosted by computer-generated characters is possible.Sinhala SMS reader. SMS consist of several abbreviations. If a system that read those messages it will assist to receiving systems.Language instruction. A high quality TTS system incorporated with a computer-aided device can be used as a tool, in larning a new linguistic communication.
These tools can assist the scholar to better really rapidly since he/she has the entree to the right pronunciation whenever needed.Travelers guide. System that located inside the vehicle or nomadic device that will give information current location & A ; other relevant information incorporated with GPRS.Alert systems. Systems that can be incorporated with a TTS system to pull the attending of the controlled elements since as worlds are used to pull attending through voice.Specially, states like Sri Lanka, which is still fighting to reap the ICT benefits, can utilize a Sinhala TTS engine as a solution to convey the information efficaciously. Users can acquire required information from their native linguistic communication ( i.e.
by change overing the text to native linguistic communication text ) would of course travel their ideas to the accomplishable benefits and will be encouraged to utilize information engineering much often.Therefore the development of a TTS engine for Sinhala will convey personal benefits ( e.g. assistance for disabled, linguistic communication acquisition ) in a societal position and decidedly a fiscal benefit in economic footings ( e.
g. practical telecasting webs, toys industry ) for the users.We studied the Sinhala TTS implemented by the Language Technology Research Laboratory ( LTRL ) of University of Colombo School of Computing ( UCSC ) as the initial measure. This was allowed us the chance to understand how the system works. The LTRL agreed to let go of the beginning codification of the enforced system to us on petition.The mismatches in the presently implemented TTS system compared with the native Sinhala linguistic communication with regard to the inflection studied. By analysing the regulation base and sing the system as an terminal user, betterments required to do the address end product of the system closer to the native linguistic communication identified.
This included the form matching of the words and look intoing the inflection of a word with regard to the place of the word in the sentence ; such as a word behind a inquiry grade, exclaiming grade, comma or a full halt and depending on whether it ‘s in get downing or an terminal of the sentence. We besides planned to present the vocal from a female voice.The identified mismatches and the betterments so modified to convey the system into a better version.
We aimed to develop the solution short clip ends which allow holding a sense of achievement. Having short term ends make life easier. Undertaking reappraisal was a really utile and powerful manner of adding a uninterrupted betterment mechanism. The undertaking supervisors are consulted on a regular footing for reappraisals and feed back in order to do right determinations, clear misinterpretations and carry out the hereafter developments efficaciously and expeditiously. Good planning and meeting follow up was important to do these reexamine a success.Database TechnologyOO methodological analysiss and Relational Database Management System ( MicrosoftA® SQL Servera„? 2008 ) used to develop centralised database on chief waiter. A database direction system, or DBMS, is package design to help in keeping and using big aggregation of informations [ 42 ] . The SQL Server 2008 is design to work as a information storage engine for 1000s of concurrent users who connect over a web, it is besides capable of working as a stand-along database straight on the same computing machine as an application [ 41 ] .
DBMS provide some of import functionality. Applications are independent from informations representation, storage and location ( informations and location independency ) . DBMS is able to scan through 1000000s of record and recover expeditiously ( efficient informations entree ) .
DBMS enforce unity constrain and security permission on the information ( data unity and security ).DBMS provide installations to informations and its efficient handiness ( data disposal ) . DBMS agenda concurrent entree to the informations in such mode that user can believe of the informations as being accessed by one user at a clip. Further, DBMS protects users from the effects on of system failures ( coincident entree and crash recovery ) .Speech database is the major constituent of the system. In order to construct the address database, it will foremost do diphones list and so choose sentences which contains diphones. After that, those sentences will be recorded and decently labeled for usage in the Speech database.Diphones of voices are stored in a address database.
When adding a new voice we have introduced the relevant files including diphones to the database. Even though we called it a database it is a set of files resided in the system.SecurityIntegrity is a major construct in the country of security.
In the facet of unity the system should work as the user expects. In this instance user expects the system to read the input strings/words right. Therefore rightness of the end product is really of import in this system in the context of security.PerformanceA major demand is that entree to the stored information is fast and hence, the synthesis system must be both fast and efficient. Since this system supplying a address end product, the hearable address should non hold any holds in the center of the address and the overall public presentation should be satisfactory.When sing the public presentation of this type of systems, they should keep a unvarying flow of end product. Therefore it is required to keep a similar feature in the end product watercourse.
It is acceptable to hold a sensible hold before get downing the reading.“ Text to speech “ is really popular country in computing machine scientific discipline field. There are several research held on this country. Most of research base on “ how to develop more natural address for given text “ .
There are freely available text to speech bundle available in the universe. But most of package develops for most common linguistic communication like English, Nipponese, Chinese linguistic communications. Even some package companies distribute “ text to speech development tools “ for English linguistic communication every bit good. “ Microsoft Speech SDK tool kit ” is one of the illustrations for freely distributed tool kit developed by Microsoft for English linguistic communication.Nowadays, some universities and research labs making their research undertaking on “ Text to speech ” .
Carnegie Mellon University held their research focal point on text to speech ( TTS ) . They provide Open Source Speech Software, Tool kits, related publication and of import techniques to undergraduate pupil and package developer every bit good. TCTS Lab besides making their research on this country. They introduced simple, but general functional diagram of a TTS system [ 39 ] .Image Recognition: Thierry Dutoit.Figure: A simple, but general functional diagramBefore the undertaking induction, a basic research was done to acquire familiarized with the TTS systems and to garner information about the bing such systems. Subsequently a comprehensive literature study was done in the Fieldss of Sinhala linguistic communication and its features, Festival and Festvox, generic TTS architecture, constructing new man-made voices, Festival and Windows integrating and how to better bing voices.History of Speech SynthesizingA historical analysis is utile to understand how the current systems work and how they have developed into their present signifier.
History of synthesized address from mechanical synthesis to the signifier of today ‘s high-quality synthesists and some mileposts in synthesis related techniques will be discussed under History of Speech Synthesizing.Attempts have been made over two hundred old ages ago to bring forth man-made address. In 1779, Russian Professor Christian Kratzenstein explained that differences between five vowels ( /a/ , /e/ , /i/ , /o/ , and /u/ ) and constructed equipment to make them. Besides, acoustic resonating chambers which were likewise to human vocal piece of land were built and activated with vibrating reeds. [ 16 ]In 1791, “ Acoustic-Mechanical Address Machine ” was introduced by Wolfgang von Kempelen which generated individual and combinations of sounds. He described his surveies on address production and experiments with his address machine in his publications.
Pressure chamber for the lungs, a vibrating reed to move as vocal cords, and a leather tubing for the vocal piece of land action were the important constituents of his machine and he was able to bring forth different vowel sounds by commanding the form of the leather tubing. Consonants were created by four separate restricted transitions controlled by fingers and a theoretical account of vocal piece of land including hinged lingua and movable lips is used for plosive sounds.In mid 1800 ‘s, Charles Wheatstone implemented a version of Kempelen ‘s speech production machine which was capable of bring forthing vowels, sounds, combinations of some sound and even full words. Vowels were generated utilizing vibrating reed with all transitions closed and consonants including nasals were generated with disruptive flow through an appropriate transition with reed-off.In 1800 ‘s, Alexander Graham Bell and his male parent constructed a same sort of machine without any important success. He changed vocal piece of land by manus to bring forth sounds utilizing his Canis familiaris between his legs and by doing it growl.No important betterments on research and experiments with mechanical and semi electrical parallels of vocal systems were made until 1960s ‘ [ 38 ] .
The first to the full electrical synthesis device was introduced by Stewart in 1922 [ 17 ] . For the excitement, there was a doorbell in it and another two resonant circuits to pattern the acoustic resonances of the vocal piece of land. This machine was able to bring forth individual inactive vowel sounds with two lowest formants. But it could n’t make any consonants or connected vocalizations. A similar sort of synthesist was made by Wanger [ 27 ] .This device consisted of four electrical resonating chambers connected parallel and it was besides excited by a buzz-like beginning. The four end products by resonating chambers were combined in the proper amplitudes to bring forth vowel spectra. In 1932, Obata and Teshima, two research workers discovered the 3rd formant in vowels [ 28 ] .
The three first formants are by and large considered to be adequate for apprehensible man-made address.The Formant synthesist was introduced by Walter Lawrence in 1953 [ 17 ] and was named as PAT ( Parametric Artificial Talker ) . It consisted of three electronic formant resonating chambers connected in analogue. As an input signal, either a bombilation or a noise was used.
It could command the three formant frequences, voicing amplitude, cardinal frequence, and noise amplitude. Approximately the same clip, Gunner Fant introduced the first cascade formant synthesist named OVE I ( Orator Verbis Electris ).In 1962, Fant and Martony introduced an improved synthesist named OVE II which consisted separate parts in it to pattern the transportation map of the vocal piece of land for vowels, nasals and obstruent consonants. The OVE undertakings were farther improved and as a consequence OVE III and GLOVE introduced at the Kungliga Tekniska Hogskolan ( KTH ) , Sweden, and the present commercial Infovox system is originally descended from these [ 30 ] [ 31 ] [ 32 ] .There was a conversation between PAT and OVE on how the transportation map of the acoustic tubing should be modeled, in analogue or in cascade. John Holmes introduced his parallel formant synthesist in 1972 after analyzing these synthesists for few old ages.
The voice synthesis was so good that the mean hearer could non state the difference between the synthesized and the natural one [ 17 ] . About a twelvemonth subsequently he introduced parallel formant synthesist developed with JSRU ( Joint Speech Research Unit ) [ 33 ] .First articulator synthesist was introduced in 1958 by George Rosen at the Massachusetts Institute of Technology, M.I.
T. [ 17 ] . The DAVO ( Dynamic Analog of the Vocal piece of land ) was controlled by tape recording of control signals created by manus. The first experiments with Liner Predictive Coding ( LPC ) were made in mid 1960s [ 28 ] .
The first full text-to-speech system for English was developed in the Electro proficient Laboratory, Japan 1968 by Noriko Umeda and his comrades [ 17 ] . The synthesis was based on an articulative theoretical account and included a syntactic analysis faculty with some sophisticated heuristics. Though the system was apprehensible it is yet monotone.