A Multimedia Information System to Support the Discourse Analysis of Video Recordings of Television Programs Moisés H. R. Pereira, Flávio L. C. Pádua, Giani D. Silva
Guilherme T. Assis, Taiana M. Zenha
Lab. de Pesquisas Interdisciplinares em Informação Multimídia Centro Federal de Educação Tecnológica de Minas Gerais Belo Horizonte, Minas Gerais, Brazil {moiseshrp,cardeal}@decom.cefetmg.br,
[email protected]
Instituto de Ciências Exatas e Biológicas - ICEB Universidade Federal de Ouro Preto - UFOP Ouro Preto, Minas Gerais, Brazil
[email protected],
[email protected]
Abstract — This paper describes the development of a multimedia information system to support the discourse analysis of video recordings of television programs. Although the TV system is one of the most fascinating media phenomena ever created by men, there is still a lack of information systems that allow an effective retrieval of TV information relevant to the discoursive analysis and evaluation of such content. Given this context, in an attempt to provide Brazilian TV researchers with computational tools to assist their research, as well as to contribute to the discussion towards turning viable the access to the TV collection developed in this country, this work proposes the development of a multimedia information system, with a partnership with the Brazilian open TV channel Rede Minas. The mentioned system is based on the Matterhorn framework, indexing and retrieval techniques of audiovisual information and a tool collection that allows the automatic evaluation of essential parameters on the discourse analysis of TV video. The architecture developed for the system involves the possibility of information retrieval based on content independent metadata, as well as content dependent metadata, which are determined through discourse analysis techniques and image and sound signal processing. Keywords - Multimedia information system, discourse analysis, textual indexing, metadata, video retrieval, speech recognition.
I.
INTRODUCTION
The increase of audiovisual information in the late years, particularly those produced by TV broadcasters, has deepen the demand for multimedia information systems capable of efficiently storing and retrieving these sort of data in great databases. The creation of new metadata and content description patterns has been matter of research, seeking the improvement of search quality and analysis of audiovisual collections in systems that allow the intelligent access to video registered data [1]. In Brazil, according to the federal law of Copyrights (Law Number 9.610/98), television broadcasts belong to public domain solely in the occasion of their transmissions [2]. Therefore, after the transmission stage, broadcasters become the owners of such content, holding the right to either authorize or deny register indulgence of a program for any function. In order to manage the storage of their schedule, each broadcaster
has a Documentation Center (CEDOC), which mainly provides for internal demand, including those for newscast productions. Usually, the CEDOCs do not completely store the programming, excluding the breaks between shows, commercials and publicity advertising. Moreover, discoursive information such as themes, participant identities, filming plans, among others, is not included in the description of such audiovisual material. This fact complicates the work of researchers who need to access information in order to characterize the programming grid. Given such context, in an attempt to provide researchers with support regarding the Brazilian television system, this paper has developed a multimedia information system to support television videos discoursive analysis. The architecture of the hereby presented multimedia information system was conceived to work with the following content dependent metadata (CDM): (i) discoursive metadata and (ii) audio content based metadata. The textual discoursive metadata are determined by documentalists, based on Discourse Analysis (DA) techniques which, although manually processed, allow more semantic and similar to video informational content data to be mold, under the perspective of fields such as linguistics, journalism and television, thus promoting communicational intentionality patterns, programming categorization and strategies, among other. The audio content based metadata are automatically estimated by signal processing techniques frequently used by speech recognition and automatic transcription systems [3]. In this paper, the automatic indexing stage over audio content based metadata is performed by using the speech recognition system Julius [4] on corresponding module, as well as the phonetics dictionary and both acoustic and linguistic models developed in the Signal Processing Laboratory (LaPS) from Universidade Federal do Pará (UFPA). All that considered, this paper approaches the development and application of new techniques for multimedia information (particularly, videos) processing, indexing, retrieval and analysis, which hold great importance when assuring the success of several services, mainly those applications related to content production.
II.
RELATED WORK
In this section, some of the main literature work which significantly contributed to research advances in this field are presented. They involve the development of automatic discoursive analysis tools, the proposition of discoursive analysis formal models and the implementation of multimedia information systems that were based on the use of audiovisual information retrieval robust techniques. A. Automatic Discoursive Analysis Among the efforts towards text discoursive analysis automation, one can highlight the development of the first rhetorical parser for journalistic English texts [5]. The developed methodology and the propound formalization consist in the basis of several papers on automatic discoursive analysis, including the rhetorical text structuring theory known as RST (Rhetorical Structure Theory). The paper posed in [6] consists in applying Information Retrieval (RI) techniques in order to assist the discoursive analysis, by using the vector model to compare each textual segment with the title metadata of the analyzed document. In 2003, it is noticed the development of classifiers to discoursive analysis of textual documents through the Machine Learning SVM (Support Vector Machine) technique, in which documents are gathered by rhetorical similarity [7]. In Brazil, the automatic discoursive analyzer DiZer (DIscourse analyZER) deserves highlight position in the discoursive processing of Brazilian Portuguese texts [8]. This paper uses the Artificial Intelligence (AI) technique in order to identify rhetorical relations and gather them to present to the expert the general rhetorical structure of the content. In the audiovisual files context, the problem in the video discoursive analysis is managed and debated in [9]. Similarly, in this paper, techniques of media discourse analysis are used to generate textual metadata to television videos description and indexing. Towards a collection containing videos belonging to genres such as Debate and Interview, the role of the ways of organizing the discourse to provide functional communication between the program's participants is debated. The information system developed draws attention for supporting the discoursive analysis of audiovisual documents by using mapped discoursive metadata for each relation between the media concept and the corresponding informational content. Moreover, the system performs the video retrieval and the composing of infographics by means of these metadata or markers, in order to ease the discoursive analysis of television videos by experts. B. Multimedia Information System Among the main papers focused on the development of systems similar to the hereby posed, the Informedia [10] represents one of the most relevant regarding automatic multimedia information retrieval. This system uses processing techniques and analysis of speech, image and natural language, in an attempt to automate the description, segmentation and indexing of TV and radio programs.
The Físchlár [11] system's project, created by the Center of Video Digital Processing from University of Dublin, Ireland, analyzes and develops video processing techniques such as camera shot boundary detection, main board extraction, hidden subtitle analysis and XML architecture for video retrieval. The Opencast Matterhorn [12] is an open source software project to produce, manage and distribute audiovisual files of academic content, created by Opencast Community from California University (US Berkeley) in 2008. Besides manual text description, the system processes video’s intrinsic data, such as size, duration and file's name, as well as performs the simple segmentation of videos in key frames, selected by an edge detection algorithm. The Matterhorn uses the Dublin Core and MPEG-7 patterns as metadata scheme and allows the implementing of personalized services. This paper fits in the multimedia information systems development context, by relying on concepts and techniques recently used in works related to digital video representing, describing, classifying and processing [3], combining general content independent metadata and content dependent metadata determined by means of textual data acquired from the automatic speech recognition (audio) and media discourse analysis techniques [4, 13]. III.
MULTIMEDIA INFORMATION SYSTEM
In the search for the development of a multimedia information system to support discoursive analysis of videos from programs of the Brazilian television system, the methodology used in this paper consisted in molding and implementing each module of the system based on the discoursive textual metadata mapped by means of Discourse Analysis techniques. The Matterhorn [12] framework was used for the implementing of the posed system. A. Indexing and Retrieval Modules The indexing module is responsible for attributing indexes, also called descriptors or metadata, to each video file, ensuring an effective required information retrieval process. On the other side, the retrieval module, closely attached to the structure created in the indexing process, uses the user's query applied to the system in order to restore the videos whose indexes were considered in the query. The indexing module handles the text obtained by videos, transforming them into a specifically structured document to index creating by Lucene [14]. The video's textual data are acquired through the filling out of a form by the documentalist; about the video's general features; and the speech recognition process over the audio signals. The descriptive form was increased with the Genre and Filmic Plans fields, to all genres; the fields Theme, Participant Identity, Management, Participant Disposal, Vision Axis, Sequenciality and Enunciate Mode in Debate and Interview videos; and the field Structuring in Newscast videos, which also stores values from Themes. Moreover, all fields above are controlled; meaning they do not allow free data typing by the documentalist, thus providing options, achieved through DA methods. A controlled vocabulary was created to the Debate
and Interview genre fields by the Language Study team [9], including values in Themes for Newscast, based on [2, 13]. To Newscast genre videos, the Structuring metadata was implemented, composed by Vignette, Notice of Matter, Simple Note, Covered Note, Footnote, Interview and Reportage enunciate elements. The Structuring field is multivalued, meaning it stores one or more values from these elements: for each element, the values of its name, the emission period and the associated theme are indexed. The emission period refers to the period of time in which an element was shown in video, consisting on the foundation to the image distributing analysis in the enunciate area according to the proper categorizing of such elements in the infographics generating module. In the automatic indexing stage, the Matterhorn extracts the video's general features, such as the processing or indexing date and the total duration, storing them in XML metadata files and in the indexes bank. Next, the indexing module activates the speech recognition process through the Julius ASR [4, 15]. The Julius receives the video's audio file and a configuration file which indicates the resources from the current language. The language resources in Brazilian Portuguese consist of a phonetics dictionary, a linguistics model and an acoustic model assembled by the FalaBrasil group from the Signal Processing Laboratory (LaPS) from Universidade do Pará [15]. Over such resources, the audio signal is processed and the recognized words are inserted into an output file. By the end of the process, this file contains the transcription of the given audio track, and is stored in the multimedia database. Next, the indexing module activates the Lucene in order to read the transcription file's content and index it. The retrieval module implements, by means of the Lucene and the Solr, the vector model, by applying various weights over the searching term to each of the indexing fields. In order to define these weights to each modeled metadata, an internal quest was made with 8 experts from the DA, all of them project's participants, through a Likert scale form containing all the corresponding indexing fields with the importance score on a video search. Through this activity, the arithmetic average was estimated for each field and its value was attributed as the metadata weight in the retrieval module. Regarding the query processing through the retrieval module, the basic metadata were implemented with the following weights: Title and Program, with weight 9.5 each; Host, weight 8.0; Exhibition Date, weight 7.6; Description with weight 8.7; Television Channel, weight 8.1; and Duration, weight 4.6. To the discoursive metadata, the following weights were implemented: Genre, Theme and Structuring with weight 8.8 each; Participant Identity, weight 6.7; Management, weight 5.3; Participants Disposal and Filmic Plans, weight 6.0 each; Vision Axis with weight 5.8; Sequenciality, weight 6.1 and Enunciate Mode with weight 7.6. For the general search service, all the metadata implemented in the retrieval module is submitted to the research terms informed by the user, according to the defined weights. In this paper, the combination of similarity between metadata groups for each video beheld in the input query was implemented in the retrieval module, meaning that the retrieval
module applies all the query terms over the basic metadata, followed by the discoursive metadata, and finally over the content metadata which, in this case, corresponds only to the textual metadata acquired with the speech recognition. For each video, these similarity measurements are combined through weighted average and this value becomes the video's final similarity for such query. Each metadata group weight was defined through experimental testing performed, in subsection IV-B, combining the precision and recall measurements in the effectiveness evaluation of several queries, as well as through measured values from its response quality efficiency assessment. B. Discoursive Infographics Generating Module The discoursive infographics generating module is responsible for showing in the implemented web interface the graphics assembled according to the user's options, containing enough information to support the researcher in the discoursive analysis of the TV videos obtained from query through these options. The options hold the identifiers from the implemented indexes in the indexing module and its values are applied over the retrieval module in the shape of textual query, providing the metadata from the contemplated videos, joining them and generating the infographics accordingly to the DA concepts. In order to generate discoursive infographics it is necessary to choose the option that corresponds to the television genre of the group of videos which one wishes to analyze. If the user fails to provide such criteria, the system performs the research within all genres, providing metadata from all of the basis' videos. In this case, the only possible graphics are those related to the theme capital over the emission period, as well as the simple accounting of themes from videos, with two options of clustering: genres and programs. In case of the option referring to Debate and Interview genres, besides the mentioned graphics, it is also possible to generate graphics over participant's identities, considering each program's television emission period or the overall of videos where each identity figures. If the Newscast genre is selected, the system allows the generation of infographics regarding theme, Types of Matters and image emission, using for the last two mentioned ones the emission period of the video. Since it is a single genre, the gathering options for graphics become restrict to Themes, Types of Matters and Programs. The graphics generated from the number of videos included in the basis are simple data counters which assist the digital collection characterization. On the other hand, graphics generated over videos' emission period enable the graphical analysis over the distribution of discoursive elements emission period, assist the programming grid and subsidize studies on the theme sequence communicative rhetorical among programs, sections of the same program or within the same section, as it happens in newscasts. Fig. 1 shows a discoursive infographics generation module schematic sketch while the options referring to the retrieval metadata are selected by user and manipulated by the system, emphasizing the fact that each combination represents the generation of one type of graphic.
analyze the emission period distribution of the studio images, represented by the host's featuring in Notice of Matter, Footnote, Simple Note, Interview; and the external studio scenery images as observed in Reportage and Covered Note. In the graphic generation over Themes in which, because of the existence of more than one theme in the same video exhibition from this genre, it is possible to plot the newscast theme sequence and assist the discoursive analysis under a more specific context such as the news' tension level [2]. The summarizing curves hold the possibility of analyzing the user's actions over the collection, involving the average level of interest in video retrieval, average number of accesses each video holds, and the average amount of time spent watching when relating to the total emission period for each genre, program or theme. Thus, it is possible to analyze, besides the communicative intentionality of the programming grid, the effects of the given television strategy over the public. Figure 1.
Discoursive infographics generating schema.
In the implemented Web interface, in order to generate any graphic, it is necessary for the user to choose at least one option for main data, constituting an obligatory option and enabling the options for type of flow, gathering data and time restriction. The type of flow corresponds to the data accounting one wish to use in the graphic assembly over number of videos or over exhibition period. The gathering data are those which will join the main data in sets, gathering them into columns in the graphic, that is, they consist of the columns' names and the main data are the columns' content. The time restriction is a way of selecting the videos with exhibition dates within the period of time established by the user in this option. Finally, when the user chooses a gathering data, the system enables the options referring to the summarization of data intended to be represented by means of curves over the infographic. When selecting the genres Debate and/or Interview, the information system provides the Participant’s Identities option in the main data field, besides the Themes option. If the user chooses either of those options, leaving Number of Videos as the flow option, the generated infographics will simply consist of graphic quantifiers over the number of videos distributed by themes or identities. Using the flow of exhibition period, it is possible to generate graphics referring to the theme capital and the social identities emission assigned to participants. Within the Newscast genre, the information system enables the options Types of Matters and Image Emission in the main data field, besides the Theme option, which features in all genres. As elicited in the beginning of this section, these genre's graphics only bear data flow over emission period. The graphics generated over Types of Matters support the discoursive analysis of newscasts enunciate capital, containing the merely informative elements Footnote, Simple Note, Covered Note, Interview and Reportage. The infographics generated on Image Emission analyze enunciate or acting areas. They consist in the presentation of Vignette period; internal and external spaces period. With this type of infographic and its combinations, it is possible to
IV. EXPERIMENTAL RESULTS This section features the experiments conducted with the implemented multimedia information system and debates the main achieved results, seeking the demonstration and evaluation of the system's effectiveness, efficiency and scalability. In order to do so, experiment scripts were conducted. These scripts used some of the groups of videos from the multimedia data basis, specified for each experiment, in a computational environment composed by a server machine with processing system Xeon W3565 with 3.2 GHz over 8 cores, 6GB RAM memory and 2 TB hard disk; three virtual servers are managed by this machine, each of them containing 30 GB hard disk, 1 GB RAM memory to the administration server, 1GB to the video exhibition server and 4GB to the Composer processing server. The featured experimental results are analyzed in two distinguished moments: (i) analysis over the discoursive infographics generated in the corresponding module and (ii) trials regarding video retrieval and system's effectiveness and efficiency analysis. A. Discoursive Infographics Analysis In order to analyze the infographics generated by the posed information system, three main groups' perspectives might be considered: (i) infographics of theme accounting and theme capital; (ii) infographics of participant’s identities and (iii) infographics of enunciate areas analysis. Each of these groups contains dozens of possible graphics, in such way that only a few main samples were selected with the purpose of analyzing its content and informational goals. The discoursive infographics generation module provides the user, after their choosing over which main data they wish to base the graphic on, with the data flow options over number of videos or exhibition period. Regarding the exhibition period, which fosters the television capital, in this case theme capital, there are several linguistic studies that underlie on the DA to analyze the use of a certain communicative strategy by a program or broadcaster, in order to find the reason for a certain theme to be broadcast more or less times. Fig. 2 explicits that
the most addressed television program themes hereby analyzed were Political Currency (22.75%) and Artists Life (18.72%) followed by Laws (10.97%), Discrimination (9.47%) and Urban Violence (8.17%). Despite the 70.08% emission period concentration, one may observe a great theme variety.
Figure 2.
Theme capital from television collection.
In the data composing process, the themes might be gathered within programs, genres or types of journal pieces (in case of Newscast videos). Regarding the themes gathered by genre, it is possible to analyze the informative focus type in which certain theme is currently employed, meaning which features are involved in a certain theme so that it is more exploited in the debates environment, where polemic aspects are praised within the struggle for the speech performed by the participants; or yet, if the knowledge quest over the theme is more important, case in which the proper environment would be an interview, where there is word internalization by an expert who “must” know about the subject, as well as in the newscast environment, to report on the theme. Fig. 3 shows the gathering of themes by genres, where it is clear that great amount of time is spent on Political Currency themes in both genres, considering their proportion within the database; and Artists’ Life within debate programs, which might be awarded to Rede Minas broadcaster's cultural and educational character. The mentioned television broadcaster is integrated to the policy of Secretaria da Cultura de Minas Gerais (the culture department of Minas Gerais state), and has been proving to be active in spreading artistic deeds and works, as well as present facts and events connected to the state's political situation.
Figure 3.
Theme capital from programs of Debate and Interview.
B. Trial Retrieval and Results Analysis For the performing of the retrieval trial, a database containing 71 videos, 25 of which Jornal Minas program videos (genre Newscast) and 46 videos of the Conexão Roberto D'Ávilla, Roda Viva, Brasil das Gerais and Rede Mídia programs (genres Debate and Interview) was used. Operating on those videos, the posed information system retrieval module was evaluated regarding its efficacy and effectiveness. The intention was to obtain weights for each metadata group in order to contribute with the best retrieval indexes, since before analyzing the television videos under the DA perspectives, the researchers must, primarily, have access to the objects of interest through an efficient means of search. The effectiveness consists in assessing if the retrieval system provides in acceptable proportion the relevant videos requested by the user, when in comparison with the list of videos included in the research, among the ones that should be effectively recovered. The efficiency intends to assess if the query's relevant videos are being provided as soon as possible or not, considering the provided videos percentage gaps when in comparison to the overall amount of relevant videos. These retrieval trials were performed by molding 8 textual queries and one list containing the relevant videos for each query. Each query underwent separately each metadata group. The metadata groups were the basic ones, discoursive metadata and speech recognition content metadata. The precision and recall of each group/query were estimated. Next, the queries underwent all groups in the general search service, combining the obtained similarities. Lists with limited size up to 10 to 20 videos were analyzed since the interface allows the exhibition of 10 videos per page. Moreover, it is unusual for the user to be willing to browse through many interface pages in an attempt to find something relevant, due to the fact that such videos should feature among the first positions in their area of access. Since the system's precision generates values in intervals of 10, on a 0 to 1 scale, about the top 10 videos, coincident values for distinct queries are common. Table I presents the precision, recall and F1-measure values for the processed queries. TABLE I.
PRECISION, RECALL AND F1 VALUES FOR GENERAL SEARCH.
Queries
Precision
Recall
F1-measure
10
20
10
20
10
20
political corruption
0,70
0,40
0,77
0,88
0,73
0,55
health problems
0,70
0,45
0,58
0,75
0,63
0,56
urban violence
0,70
0,46
1,00
1,00
0,79
0,63
citizen's rights
0,70
0,45
0,58
0,75
0,63
0,56
social discrimination Brazilian literature
0,30 0,40
0,31 0,31
0,37 0,57
0,75 0,85
0,33 0,47
0,43 0,45
philosophy humanity
0,30
0,27
0,50
1,00
0,37
0,42
artists life
0,50
0,35
0,55
0,77
0,52
0,48
Total Average
0,53
0,37
0,61
0,84
0,55
0,51
In order to assess the system's efficiency, the percentage of relevant provided videos was estimated in gaps of 5 to 5 videos in the retrieval lists of 10 and 20 videos for each query. Fig. 4 illustrates the average efficiency of the retrieval module graphic to all the molded queries from the trials.
These perspectives provide valuable studies relating the fields of Computer Science and Language Studies. The proper processing of content dependent metadata associated to audio signals and video's visual components might subsidize the DA regarding the raising of verbal and visual capitals. Verbal capital relates to the accounting of the period each participant spoke during the program emission, plotting the corresponding discoursive infographic by means of speaker recognition algorithms in audio signals. The procedure is similar for the visual capital by means of the each participant’s face detection. ACKNOWLEDGMENT The authors thank the support of FAPEMIG-Brazil under Procs. APQ-01180-10 and APQ-02269-11; CEFET-MG under Proc. PROPESQ - 076/09; CAPES-Brazil; and CNPq-Brazil. REFERENCES Figure 4.
Graphic for the average efficiency of the system.
By analyzing the graphic, it is given, in average, 40% of the relevant videos among the first 5 provided videos and 60% among the 10 first videos on the retrieval list. In some cases, a larger retrieval list was required so that more relevant videos, or all of them, could be provided, such as in the Urban Violence case. Nevertheless, the given percentage for the first 10 recovered positions illustrates good efficiency in the posed information system retrieval module. V.
[1]
[2]
[3]
[4]
CONCLUSION
Among the main implemented components, the discoursive infographics generation module allows the quantitative analysis of elements extracted from discoursive metadata, provided by a documentalist during the manual indexing process. Regarding the structure managed by the other modules, the infographic generation service allows the user to build their graphic by choosing options regarding discoursive metadata for all of the videos from the stored television collection. The search services are based on the retrieval and indexing modules, and were tested and evaluated regarding their efficiency and effectiveness, particularly the general search service. The results were favorable to this paper's main goal regarding the discoursive metadata, mostly due to the fact that the corresponding infographic generation module uses the retrieval module, performing a full query throughout the entire database. Thus, the discoursive options selected by the user on the Web interface filter the textual metadata of the videos. Among the hereby molded metadata, one can find the content dependent metadata related to the audio signal from each processed video. Within the indexing module, it was implemented a method to extract the textual content from audio signals by means of speech recognition, using the Julius software, but the audio transition process is still a studying field which must be more widely explored in order to solve the recognition problems. That being said, for future papers, the creation of linguistic theme models with reduced vocabulary is recommended, verifying, for a certain input video, the metadata which provides its corresponding genre and, based in such metadata, choosing the proper linguistic model.
[5]
[6] [7]
[8]
[9]
[10]
[11] [12]
[13]
[14] [15]
Dimitrova, N.; Zhang, H.; Shahraray, B.; Sezan, I.; Huang, T. and Zakhor, A., “Applications of Video-Content Analysis and Retrieval”, IEEE Multimedia, 2002, 42–55. David-Silva, G., “A Informação Televisiva: Uma Encenação da Realidade (Comparação entre Telejornais Brasileiros e Franceses)”, Universidade Federal de Minas Gerais - Faculdade de Letras, 2005. Muthukumar, K.; Seetha, S. & Pádua, F. L. C., “Generating MPEG 7 Audio Descriptor for Content Based Retrieval”, Proceedings of IEEE RAICS - Recent Advances in Intelligent Computational Systems, 2011. Ablimit, M.; Neubig, G.; Mimura, M.; Mori, S.; Kawahara, T. and Hamdulla, A., “Uyghur morpheme-based language models and ASR”, IEEE 10th International Conference on Signal Processing (ICSP), Beijing, China, 2010, 581–584 Marcu, D., “Extending a Formal and Computational Model of Rhetorical Structure Theory with Intentional Structures à la Grosz and Sidner”, Procs. of the 18th conference on Computational Linguistics, 2000, 1. Schilder, F., “Robust discourse parsing via discourse markers, topicality and position”, Natural Language Engineering, 2002, 8, 235–255. Reitter, D., “Simple Signals for Complex Rhetorics: On Rhetorical Analysis with Rich-Feature Support Vector Models”, GLDV - Journal for Computational Linguistics and Language Technology, 2003, 38–52. Kawamoto, D. and Pardo, T.A.S., “Learning Sentence Reduction Rules for Brazilian Portuguese”, In the Proceedings of the 7th International Workshop on Natural Language Processing and Cognitive Science NLPCS, June 8–12, Funchal/Madeira, Portugal, 2010, 90–99. Sabino, J. L. M. F., “A Análise Discursiva de Entrevistas e Debates Televisivos como Parâmetro para Indexação e Recuperação de Informações em um Banco de Dados Audiovisuais”, CEFET-MG, Departamento de Estudos de Linguagens, 2011. Christel M. G.; Richardson, J. and Wactlar, H. D., “Facilitating access to large digital oral history archives through informedia technologies”, JCDL '06 Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries, 2006. Lee, H. and Smeaton, A., “Designing the User Interface for the Físchlár Digital Video Library”, Journal of Digital information, 2006, 2. Ketterl, M.; Schult, O. A. and Hochman, A., “Opencast Matterhorn: A Community-driven Open Source Solution for Creation, Management and Distribution of Audio and Video in Academy”, 11th IEEE International Symposium on Multimedia, 2009, 687–692. Charaudeau, P., “Visées discoursives, genres situationnels et construction textuelle”, Analyse des discours. Types et genres, Université de Paris - Centre d'Analyse du Discours, 2001. Hatcher, E; Gospodnetić, O. and McCandless, M., “Lucene in action: a guide to the Java search engine”, Manning Publications, 2nd ed., 2010. Couto I., Neto N., Tadaiesky V., Klautau A., Maia R., “An Open Source HMM-based Text-to-Speech System for Brazilian Portuguese”, 7th International Telecommunications Symposium (ITS), Manaus, 2010.