Open Access   Article Go Back

Twee: A Novel Text-To-Speech Engine

D. Das1 , H. Hassan2 , S. Gupta3

Section:Survey Paper, Product Type: Journal Paper
Volume-07 , Issue-01 , Page no. 67-70, Jan-2019

Online published on Jan 20, 2019

Copyright © D. Das, H. Hassan, S. Gupta . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library

How to Cite this Paper

  • IEEE Citation
  • MLA Citation
  • APA Citation
  • BibTex Citation
  • RIS Citation

IEEE Style Citation: D. Das, H. Hassan, S. Gupta, “Twee: A Novel Text-To-Speech Engine,” International Journal of Computer Sciences and Engineering, Vol.07, Issue.01, pp.67-70, 2019.

MLA Style Citation: D. Das, H. Hassan, S. Gupta "Twee: A Novel Text-To-Speech Engine." International Journal of Computer Sciences and Engineering 07.01 (2019): 67-70.

APA Style Citation: D. Das, H. Hassan, S. Gupta, (2019). Twee: A Novel Text-To-Speech Engine. International Journal of Computer Sciences and Engineering, 07(01), 67-70.

BibTex Style Citation:
@article{Das_2019,
author = {D. Das, H. Hassan, S. Gupta},
title = {Twee: A Novel Text-To-Speech Engine},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {1 2019},
volume = {07},
Issue = {01},
month = {1},
year = {2019},
issn = {2347-2693},
pages = {67-70},
url = {https://www.ijcseonline.org/full_spl_paper_view.php?paper_id=595},
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
UR - https://www.ijcseonline.org/full_spl_paper_view.php?paper_id=595
TI - Twee: A Novel Text-To-Speech Engine
T2 - International Journal of Computer Sciences and Engineering
AU - D. Das, H. Hassan, S. Gupta
PY - 2019
DA - 2019/01/20
PB - IJCSE, Indore, INDIA
SP - 67-70
IS - 01
VL - 07
SN - 2347-2693
ER -

           

Abstract

With the advancement of technology and the widespread use of smart devices, the world has witnessed that the networking and/or the connectivity horizon has broadened to an exalted level. One of the prominent researches being undertaken in this digital era is the development of Text-to-Speech (TTS) engines; which is capable enough of offering more interactivity with the prevalent smart devices. There are various TTS engines available in the market currently, but these engines lack the capability of showing the effects of human voice e.g., they fail to provide credible indications of the sentiment, mood or emotional state of mind of the speaker etc. Further speaking, presently there is no comprehensible or consummate TTS engine that could replicate human behaviour and/or mannerisms with utmost precision and accuracy. This paper proposes a novel Text-to-Speech engine named ‘Twee’ whose pronunciation works in sync with real world human intelligence. The proposed system is an application of the interdisciplinary field of research whereby domains such as Natural Language Processing, Artificial Intelligence and Digital Signal Processing are amalgamated to perform sentiment analysis on text through the processing of phonemes. This system works well both in mono channel mode and in stereo mode and is capable of generating varied effects on a voice depending on the type of communication.

Key-Words / Index Term

Artificial Intelligence, Natural Language Processing, Digital Signal Processing, Phoneme, Emotion

References

[1] A. Drahota, A. Costall, V. Reddy, “The Vocal Communication of Different Kinds of Smile”, Speech Communication, Vol.50, Issue.4, pp.278-287, 2007. doi: 10.1016/j.specom.2007.10.001
[2] W.Y. Wang, K. Georgila, “Automatic Detection of Unnatural Word-Level Segments in Unit-Selection Speech Synthesis”, In the Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, Waikoloa, HI, USA, pp.289-294, 2011.
[3] R.E. Remez, P.E. Rubin, D.B. Pisoni, T.D. Carrell, “Speech Perception without Traditional Speech Cues”, Science, New Series, Vol.212, Issue.4497, pp. 947-950, 1981. doi:10.1126/science.7233191
[4] J. Zhang, “Language Generation and Speech Synthesis in Dialogues for Language Learning”, Massachusetts Institute of Technology, pp.1-68, 2004.
[5] S. Lemmetty, “Review of Speech Synthesis Technology”, Helsinki Universty of Technology, pp.1-113, 1999.
[6] I.G. Mattingly,"Speech synthesis for phonetic and phonological models", Current Trends in Linguistics. Mouton, The Hague, Vol. 12, pp.2451–2487, 1974.
[7] FFmpeg Git, "FFmpeg 4.0 "Wu"", last accessed 2018-07-18.
[8] Takanishi Lab Webpage, "Anthropomorphic Talking Robot Waseda Talker Series", Retrieved from http://www.takanishi.mech.waseda.ac.jp/top/research/voice/index.htm, last accessed 2018-10-10.
[9] Deepmind Webpage, "WaveNet: A Generative Model for Raw Audio”, Retrieved from https://deepmind.com/blog/wavenet-generative-model-raw-audio/, last accessed 2018-09-08.