Transcripter-Generation of the transcript from audio to text using Deep Learning

Fatima Ansari,Ramsakal Gupta, Uday Singh, Fahimur Shaikh

Open Access Article Go Back

Transcripter-Generation of the transcript from audio to text using Deep Learning

Fatima Ansari¹ , Ramsakal Gupta² , Uday Singh³ , Fahimur Shaikh⁴

Section:Review Paper, Product Type: Journal Paper
Volume-7 , Issue-1 , Page no. 770-773, Jan-2019

CrossRef-DOI: https://doi.org/10.26438/ijcse/v7i1.770773

Online published on Jan 31, 2019

Copyright © Fatima Ansari,Ramsakal Gupta, Uday Singh, Fahimur Shaikh . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at Google Scholar | DPI Digital Library

XML View

PDF Download

How to Cite this Paper

IEEE Citation
MLA Citation
APA Citation
BibTex Citation
RIS Citation

IEEE Style Citation: Fatima Ansari,Ramsakal Gupta, Uday Singh, Fahimur Shaikh, “Transcripter-Generation of the transcript from audio to text using Deep Learning,” International Journal of Computer Sciences and Engineering, Vol.7, Issue.1, pp.770-773, 2019.

MLA Style Citation: Fatima Ansari,Ramsakal Gupta, Uday Singh, Fahimur Shaikh "Transcripter-Generation of the transcript from audio to text using Deep Learning." International Journal of Computer Sciences and Engineering 7.1 (2019): 770-773.

APA Style Citation: Fatima Ansari,Ramsakal Gupta, Uday Singh, Fahimur Shaikh, (2019). Transcripter-Generation of the transcript from audio to text using Deep Learning. International Journal of Computer Sciences and Engineering, 7(1), 770-773.

BibTex Style Citation:
@article{Ansari_2019,
author = {Fatima Ansari,Ramsakal Gupta, Uday Singh, Fahimur Shaikh},
title = {Transcripter-Generation of the transcript from audio to text using Deep Learning},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {1 2019},
volume = {7},
Issue = {1},
month = {1},
year = {2019},
issn = {2347-2693},
pages = {770-773},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=3581},
doi = {https://doi.org/10.26438/ijcse/v7i1.770773}
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
DO = {https://doi.org/10.26438/ijcse/v7i1.770773}
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=3581
TI - Transcripter-Generation of the transcript from audio to text using Deep Learning
T2 - International Journal of Computer Sciences and Engineering
AU - Fatima Ansari,Ramsakal Gupta, Uday Singh, Fahimur Shaikh
PY - 2019
DA - 2019/01/31
PB - IJCSE, Indore, INDIA
SP - 770-773
IS - 1
VL - 7
SN - 2347-2693
ER -

VIEWS	PDF	XML
346	334 downloads	150 downloads

Bar Line

Abstract

A video is the most powerful medium in the propagation of information and important part of the video for exchanging the information is audio, which is an important aspect of the video on which the whole message depends and as it is used in all field like Teaching, Entertainment, Conference Meeting, News Broadcast. So converting the Audio into Text in Documented format make easy for referring purpose as it is difficult to search the said word in the video as compared to the transcript. The main objective of developing this system is to present an automated way to generate the transcript for audio and video. As it is not possible to make the same informative video in all Languages. So this the place where our System plays an important role. It will extract the audio from the given video and transcript is generated based on which it can be translated into any desired language. It can be very useful for people who speak the language which is not used by the majority of the population. In this way, it has much application in all field where information exchange is happening based on Video

Key-Words / Index Term

Neural Network, Audio extraction, Speech recognition, Time synchronization, Automatic Transcript generation, Natural language processing, Connectionist Temporal Classification (CTC), Hidden Markov Model (HMM).

References

[1]Houssem chamber, Marlon Oliveira, Kevin McGuinness, Suzanne Little, Keisuke Kameyama
“Educational video classification by using a transcript to image transform and supervised learning.”
[2] Tatsuya Kawahara, Yusuke Nemoto, Yuka Akita. “Automatic lecture Transcription by exploiting presentation slide information for Language Model Adaption.”May 2008 IEEE
[3] Wai Fong Chua, “Teaching and learning only the language of numbers—monolingualism in a multilingual world, ” Critical Perspectives on Accounting, Vol. 7, No. 1, pp. 129-156 February 1996, ISSN 1045-2354, http://dx.doi.org/10.1006/cpac.1996.0019.
[4] G. Nowak, S. Grabowski, C. Draus, D. Zarebski and W. Bieniecki, “Designing a computer-assisted translation system for multi-lingual catalog and advertising brochure translations,” in Proc. of Sixth International Conference on Perspective Technologies and Methods in MEMS Design pp.175- 180, 20-23 April 2010.
[5] Houssem chamber, Marlon Oliveira, Kevin McGuinness, Suzanne Little, Keisuke Kameyama, Paul Kwan, Alistair Sutherland, “Educational video classification by using a transcript to image transform and supervised learning.” 13-15 July 2016
[6] J. Santos and J. Nombela, “Text-to-speech conversion in Spanish a complete rule-based synthesis system,” in Proc. of IEEE International Conference on Acoustics, Speech, and Signal Processing, vol.7, pp.1593-1596, May 1982.
[7] F. Y. Sadeque, S. Yasar and M. M. Islam, “Bangla text to speech conversion: A syllabic unit selection approach,” in Proc. of International Conference on Informatics, Electronics & Vision (ICIEV), pp.1-6, 17-18 May 2013.
[8] Xia Linsi, N. Yamashita and Toru Ishida, “Analysis on Multilingual Discussion for Wikipedia Translation,” in Proc. of Second International Conference on Culture and Computing, pp.104-109, 20-22 Oct. 2011.
[9] Tatsuya Kawahara, Yusuke Nemoto, Yuka Akita, “Automatic lecture Transcription by exploiting presentation slide information for Language Model Adaption.” August 2008.
[10] Nikolas Lee, Jia Wern Yong,”Automated Transcript Generation for the video conferences” 11-Nov 2017
[11] S. Peitz, M. Freitag, A. Mauser, and H. Ney, “Modeling punctuation prediction as machine translation,” in Proceedings of the International Workshop on Spoken Language Translation, 2011.
[12] F. Batista, H. Moniz, I. Trancoso, and N. Mamede, “Bilingual experiments on automatic recovery of capitalization and punctuation of automatic speech transcripts,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, pp. 474–485, 2012.

Citations	2325
h-index	16
i10-index	47