Technology and languages of the new economy
R. Bucciarelli 1[0000-0002-2790-4726]R. Marcone 2[0000-0002-0450-0550] F. Santoro3[0000-0001-7026-8881]C. Dolci4[0000-0001-5182-2408]
1 Università degli Studi di Siena; Italia
[email protected] 2 Università degli studi di Salerno
[email protected] 3 SIDELMED. S.P.A.
[email protected] 4università per stranieri di Siena
[email protected]
Abstract. DSN Pol of University of Salerno, together with other Research Centres and, in particular, with the Laboratoire d’Automatique et Linguistique (CNRS – Paris 7), have found new methods for linguistic poll, for a modern language teaching. These researches are part of the so-called project “Lessico Grammatica della Lingua italiana” for theory and practice of formal language (LGLI), by ELIA, A. Martinelli, M. D’Agostino, E. (1981), developed by Gross, M. (1968; 1975; 1977; 1991; ), based on Harris, Z.S. theories (1970, 1988). Research centres have been searching and experimenting new linguistic tools for years, in order to support didactics and facilitate real time communication. Silberzetein,M., one of M. Gross students, has started production of French dictionaries DELAC – DELACF1, for compound words, resulting in the creation of NLB language applications in text builders. Looking at reference models by best French linguists, last generation blended system. Acro-Word has been born by Bucciarelli, R., Villari, P., Terrone, M.., Terrone F., Santoro F., Marcone R., ( 1999, 2003; 2004). It is a multifunctional software, as it can be both a lexical and morphosyntactic dictionary and a text builder for paraphrase and mixed construction. Acro-Word is an electronic consultation dictionary or a “Man manufactorer and acronyms generator” that works in three different phases: First Phase: it 11
I constructed the first package of Finite State tools for Natural Language Processing, as well as the French DELAC-DELACF dictionaries for compound words, for my PhD research from 1986 to 1989 at the LADL (University of Paris 7-CNRS), under the supervision of Prof. Maurice Gross.
2 works using linguistic motors and modular software based on INTEX, UNITEX and NOOJ models. Second Phase: also called “Man builder text”, in which the builder creates a digital text in NLG and focuses on “minimum sentence”. It elaborates a text made up of both free and fixed sentences. As Silberzetein said, “It can analyse and produce all types of sentences that correspond to a determined syntactic grammar and not only”. Third Phase: Acro-Word, on the base of codes transmitted by a digital operator, produces a real time reformulated text, on a sliding band. Keywords: electronic dictionary; text constructor; paraphrase and reformulation
1
INTRODUCTION
In the “galaxy” of languages, synthetic communication, made up of fixed phrases or pre-constituted texts, is becoming more and more common. This type of communication needs linguistic resources able to ensure a specific type of communication, thanks to technical terminologies belonging to particular professional fields. These terminologies reproduce a short semantic and want to be faithful to grammar and sentence order. During more than ten-year experimental work with the Department of Communication Sciences of the University of Salerno in collaboration with other research centers and, in particular, with the Laboratoire d'Automatique et Linguistique (CNRS - Paris 7), new methods for linguistic investigation have been developed. These systems are centered on the construction of syntactic lexicons which, taking advantage of the data processing, aim to obtain a more exhaustive and formalized description of a given language The research is part of the project (ALTI) national research (PRIN 2005 -2008). The research center of the University of Salerno is committed to working on different lexical and syntactic aspects of the so-called polyrhematic sequences existing in the Italian /English language. The methodology used is the lexico-grammatical one, which was originally introduced after 1960 in the European scientific community by Maurice Gross at the Parigi University 7 . The main purpose of the lexicon-grammar is to describe in detail all the combinatorial mechanisms inextricably linked to the lexical entries. One of the most important results of the lexico-grammatical description is the development of software and lingware oriented to automatic textual analysis. The research is focused on the retrieval and analysis of linguistic forms, both simple and compound (polirematic),in large textual corpus; then these forms , in accordance with the European standards RELEX, are inserted in the database. The electronic databases were integrated and used in an automated text analysis software and used for morphosyntactic tagging of texts in electronic format De Bueriis, G., Elia, A. (pag 11): In this phase, local grammars in the form of graphs and finite-state transducers, were constructed. A local grammar is an electronic grammar built to describe a specific characteristic of a given language. In our case, local grammars are applied to texts to retrieve lexical, morphological and syntactic information. The purpose of this research project is to
3 demonstrate how, starting from word analysis, it is possible to carry out the automatic analysis of words combinations, meaning the automatic analysis of complete phrases and sentences. Our research unit has created a polyromatico electronic terminology which consists of about 30,000 entries in flexed form, equivalent to about 15,000 canonical forms. Henceforth the experience of automatic processing and analysis of the fixed phrases made with Nooj and the study of new software2. Hence the idea of producing not only electronic lexicons, lexical descriptors of syntactic, morphological and orthographic forms, but new applications for the automatic treatment of natural language.
1.1
OPERATING HYPOTHESIS
The first approach is Type Race Bucciarelli, R., Elia, A., Renda , M., ( 2000) 3 It is the first linguistic resource for the automatic processing of natural language, designed by the Team of the Department of Communication Sciences at the DSN pole University of Salerno, dir . Elia. A.The software is able to automatically generate morphemes or parts of words. In a first prototype version the T.R consists of several interacting moments because the improvement of the linguistic information enhances the quality of the output. In the experimental phase, the exploratory hypothesis of computational linguistics is focused on the syntagma analysis’ and in particular on the synthesis of morphological and lexical elements capable of producing cognitive learning techniques of regulatory texts such as decrees and law. It provides the realisation of a morphological, syntactic-semantic Parser, which is able to make structural choices based on customized codes, such as the position of a certain element as well as of fixed structures and formulas used in various parts of the analysed text; in order to produce taxonomies of polyromatics and of lexemic elements reduced to acronyms. These acronyms compile a database, which must automatically reproduce the fixed phrase at the time of the recall. This first phase is named "manufacturer and generator acronyms". The second
2
Type –Race is an IT product, in which the software serves as a container, the use of the product is the re-sult of human wisdom. © belongs to Renda, M., Bucciarelli R., Elia A., . N50061363717
3
4 phase begins with the call and processing of information by an operator, who will produce a text. This represents the first attempt to compose a text formed by paraphrases of sentences in which the digital operator reformulates the language, thus it adds to the unit of meaning specific sentences or parts of free sentences . The first approach is Type Race Bucciarelli- Elia- Renda( 2000)4 It is the first linguistic resource for the automatic processing of natural language, designed by the Team of the Department of Communication Sciences at the DSN pole University of Salerno, dir Elia, A.,The software is able to automatically generate morphemes or parts of words. In a first prototype version the T.R consists of several interacting moments because the improvement of the linguistic information enhances the quality of the output. In the experimental phase, the exploratory hypothesis of computational linguistics is focused on the syntagma analysis’ and in particular on the synthesis of morphological and lexical elements capable of producing cognitive learning techniques of regulatory texts such as decrees and law. It provides the realisation of a morphological, syntacticsemantic Parser, which is able to make structural choices based on customized codes, such as the position of a certain element as well as of fixed structures and formulas used in various parts of the analysed text; in order to produce taxonomies of polyromatics and of lexemic elements reduced to acronyms. These acronyms compile a database, which must automatically reproduce the fixed phrase at the time of the recall. This first phase is named "manufacturer and generator acronyms". The second phase begins with the call and processing of information by an operator, who will produce a text. This represents the first attempt to compose a text formed by paraphrases of sentences in which the digital operator reformulates the language, thus it adds to the unit of meaning specific sentences or parts of free sentence.
2
TEXT BUILDER ACRO WORD
In 20014 the Acro Word, Bucciarelli R.; Marcone R., Terrone M., Terrone F., Santoro F., Villari P.,5 a new linguistic resource for the automatic reproduction and management of NLG languages, particularly for the new economy, was born. The software aims to support the production of texts or textual parts. The Text constructor is able to generate NLG languages and not just in paraphrases (morphological and synonymy invariance), but texts composed of free sentences and fixed phrases texts . For the first time a new figure is needed : a digital linguist, someone who due to his knowledge, formulates, reformulates, translates and transforms a digital text( software for automatic text analysis and a text constructor for the communication of languages NLG. Before reaching the construction of a digital text, it is necessary to fully examine the corpus because each part has an autonomous function, that interacts on the lexico-grammatical model . The computational linguistics validates the empirical method of its research
5
Acro Word 2014 system BIVUTES, Bucciarelli ,R.., Marcone, R., Terrone, M., Terrone F., Santoro F., Villari P., Bivutex is a mixed system the program is in Microsfot Office Access the authors and reserved the copyright rights 2014
5 and creates the software for the automatic text analysis, which utilize applications such as electronic dictionaries . It consists of two phases. 2.1
MANUFACTURER AND GENERATOR ACRONYMS
In phase A the polysematic unity is focused because to have a linguistic resource of fixed text parts it is necessary to start from the production of taxonomies of fixed sentences, as defined by De Mauro ( 1999 ) as a group of words that has a unitary meaning and not deducible from the words that compose it. With the participation in the DELACF team of the DSN center, the research and study phase on the composite polyromatic words of Italian, on the method of formalization of natural language, the lexicon-grammar elaborated by Maurice Gross, which is mainly based on structuralist conclusions, transformational and distribution of Harris, begins. At this point a collaboration between a team of researchers, linguists and IT specialists, each of whom helps contribute , starts. We pay specific attention to the automatic descriptions of specialized languages and we set ourselves the goal of formulating a Text constructor that does not produce paraphrased texts, but instead authentic ones :manipulated by man. The units of interest of interest are specialized LSP languages that are interconnections between the general language and that of specularity, because the LSP languages are not used for specialized uses and therefore are not autonomous languages, divided by ordinary language, but included and osmotically shared in " specific meanings De Mauro (1994, pp.410-411) Longobardi ( 2005 p.129) argued that researchers and linguists associate semantic and syntactic properties to the automatic processing of texts and do not include them for the retrieval and selection of information. The terminology cards acquire data for documentary sources, but rarely for the syntactic-semantic functioning . It is this assertion that lead us to the idea that by associating syntactic properties to the automatic processing of texts for data recovery and to the study of semantic syntactic functioning, a total recovery of information could be obtained. In this first phase the research is oriented on the retrieval and analysis of composed (polirematic) linguistic forms in a large textual corpus. Then we move on to the design of the A.W product: the collecting phase of the semantic units, reduced in acronyms and then inserted into the data base starts. After the indexing and tokenization, we then perform the matching of the semantic units contained in the taxonomies and the cataloged entries. The matching of the units provides the electronic dictionary. The entries are listed in alphabetical order and contain morpho-grammatical information ,subdivided according to their autonomous units of meaning. The analysis of these forms and the reference to the lexical-grammatical theoretical and methodological model identified by Maurice Gross and applied to the Italian language by Annibale Elia lead to the DB of A.W . It is a lemmatization of the polyromatics, an automatic retrieval of information for the structuring of an electronic dictionary that within the text analysis software plays the role of a linguistic engine that performs all the matching and parsing operations.Acro-Word. Electronic dictionary and texts developer of fixed phrases or paraphrase .Automatic text analyzer (morphosyntactic) utilises linguistic engines using a lexical data base embedded in a
6 packaged shell, composed of modular software on the INTEX, UNITEX, NOOJ, CATALOGA model6: BUVITES’ methods of analysis are divided into three phases • Phase A - the automatic reading of a text, indexing that the tokenization included, the matching between the words contained in the text and the entries cataloged and classified in the electronic dictionaries. The result is the creation of electronic dictionaries of the analyzed text; in these dictionaries, the entries are listed in alphabetical order with morpho-grammatical information and subdivided according to their autonomous units of meaning. Phase B of the analysis allows to read within the text, due to specific searches that can be viewed in the form of concordances; and to localize syntactic patterns, disambiguation and parsing of the text. A third and final phase of the analysis allows to import specific files created in the form of a table with Microsoft® Office Excel (especially in INTEX and UNITE)7. The electronic dictionary allows the insertion of images, videos and documents that can be used and viewable when searched; as well as translated texts which will characterized the following phase. 2.2
FASE 2 MAN BUILDER TEXT
The constructor of the text, as Elia, A. (1999) says, is a writing expert who writes a digital text in L 1, works on the technicalities of a language with the L verbal operator and creates a transformational and substitutive manipulation analysis s’ . If the digital operator has to transmit in L2, he has to think in L2, and reformulate in L2. In this case useful is the Kantian philosophical view: man’s vision of a cognitive image is translated into thoughts in relation to his way of interacting with the contents and communication skills. In this phase B the focus moves from the polyromatic unit to the simple sentence. The reference model is the formalization of natural language’s method: the lexicon-grammar, elaborated by Maurice Gross ,which is based mainly on the structuralist, transformational and distributional conclusions of Harris. Gross improves the concepts of linguistic transformation and nuclear phrase (which becomes a simple sentence in lexico6
Cfr Silberzstein, M., su Nooj.cfr http://www.nooj4nlp.net/ INTEX = dizionari elettronici, grammatiche locali ed automi stati finiti , che realizza anni novanta Max Silberztein.cfr.http://intex.univ-fcomte.fr/; UNITEX= versione speculare di INTEX , cfr.http//igm.univ-mlv.fr%7Eunitex; NOOJ= prodotto da Alberto Postiglione ( per costruzione e strutturazione della shell) e Mario Monteleone ( per la gestione de lingware) Dip. Di Scienze della Comunicazione dell’Università di Salerno.
7
INTEX = dizionari elettronici, grammatiche locali ed automi stati finiti , che realizza anni novanta Max Silberztein.cfr.http://intex.univ-fcomte.fr/; UNITEX= versione speculare di INTEX , cfr.http//igm.univ-mlv.fr%7Eunitex; NOOJ= prodotto da Alberto Postiglione ( per costruzione e strutturazione della shell) e Mario Monteleone ( per la gestione de lingware) Dip. Di Scienze della Comunicazione dell’Università di Salerno.
7 grammar), placing them within the framework of a formal grammar of natural languages according to which the lexicon is the set of values associated to ordered sequences related to autonomous combinatorial rules and principles. The theoretical model of reference is represented by the grammar "to operators and arguments" by Z. S. Harris (1957, 1963, 1970). The text constructor dominates and guides the software : describes the fonts ,filters of the tones (construction of the syntagmatic axes), produces free phrases ,repeats the fixed phrases, according to the textual typology of the text in production and applies the description of the Morphology and syntax of the language in usage : proceeding with a taxonomic classification of the possible sentences in the spoken Italian it is necessary to clarify the importance of the verb in the sentence through the method of research and experimentation of the L.G.L.I. Given these premises,in order to describe a language from a lexico-grammatical point of view, the following operations must be carried out: - identify the verb in the sentence; • identify sentences (ordinary verb, support verb (Vsup), idiomatic phrases); • enumerate transformation and substitution manipulations; describe possible combinations of verbs with nominal forms (testing the concrete speaker’s possibilities); classify verbs; •observe the verbal dislocation in the paradigmatic structure; analyze the paraphrases D’Agostino, E., Elia, A., (1998)8 C. Bucciarelli: The aridity and static nature signs will be replaced by human intelligence, which, powered by the spirit, enlivens the learning process, making it exhaustive and always understandable: where the machines does not succeed, the man occurs with his sensitivity. It is believed that signs give universal and unexceptionable truths, but only the human mind knows how to find the right way to understand, elaborate and internalize the knowledge. With the right mix of computer science and human knowledge, with proper individual's abilities, one is able to stimulate the reactivity of the other, to crystallize knowledge and above all, we can achieve results safer for everybody. The exponent, a pioneer of a new lexical strategy, of science and of human feeling’s combination, proposes to pursue objectives not yet achieved and to increase the unskilled ones’ knowledge. He hopes that his project will proceed without interferences and that will be appreciated with a unanimous approval . Due to a large scale diffusion and his commitment, his work should have a positive response, raise the interest of many and particularly enrich those who still ignore the basic knowledge of the signs9. 8
9
Cfr D’Agostino, E., Elia, A., Il significato delle frasi semplici alle forme polirematiche , in Ai limiti del linguaggio, , Laterza, Bari, (1998). Bucciarelli , ., Man builder text , Bucciarelli R., “in Text builders for real time communication I.R.I.S Salerno. (in corso di stampa)
8
At this stage the operator describes the language he is going to transmit and, using the data entered in the database and his specialized skills of the theoretical Lexicon-Grammar model, translates the text, applying the manipulations of the language is translating; more precisely using the filmed images reconverts the verbal codes into written codes. The machine becomes a support for specialized persons that use their competences toimprove the communication. The text developer applies his skills and improves the quality:local grammars are used to retrieve lexical, morphological and syntactic information, accomplish the automatic analysis of word combinations, that is the automatic analysis of complete syntagm and sentences. Not only electronic lexicons, but manmade electronic text constructors, the linguist is not satisfied and always tries to realize, in generation mode, to automatically produce paraphrases of sentences. My research is based on the use of fixed phrases, or even more on the validity of polyrathematics to construct a text in paraphrases of sentences. They are not just electronic lexicons, but with the help of human’s intelligence electronic texts’ creator. The linguist does not settle easily and therefore is constantly trying to create automatic paraphrase of sentences. The research I have conducted starts from the use of fixed phrases and more precisely from the validity of the polyromatics in order to make paraphrases of sentences. 2.3
FASE 3 TEXT TRANSMISSION IN REAL TIME
Phase C is characterized by data transmission, which based on the codes transmitted by the digital operator reproduces in real time the reformulated text. The software is just a means that the man exploits thanks to his know-how.
3 WAYS OF USE 10 Text constructor, which complies with phonology, phonetics, morphologies and syntax of the text in creation. Supporting teaching tool,
3.1Synthesis of functions It is defined as §AR>. §TNC>. The objective §OB>. The recursive technique is divided into 3 phases: the first phase consists of §FS1> .the second phase uses §FS2> and finally the third phase that is of understanding and calculation allows §FS3>( see. Fig. 1)
10
Erminio Acierno is the producer of experimentation from the scientific area conducted, with the heuristic of the lexicon-grammar
9
Fig. 1. Reduction in acronyms and text synthesis A recursive algorithm is a procedure that aims to solve a problem with simple means (an algorithm which calls itself). The recursive technique allows to write synthetic algorithms to solve many problems and they are also very efficient even in the implementation of the same function. The goal is e to rewrite functions, sets or codes in terms of themselves, but in smaller sizes. The recursive technique is divided into 3 phases: the first phase consists in creating a new function using the main function so that the new one results as variation of the principal; the second phase is required for the division of the primary function from the secondary function so that any operation on it results easy; the third phase (understanding and calculation ‘s phase) allows us to understand how the function performs with numerical values and consents us to calculate all the values of the function starting from the initial condition.(see Fig.2)
Fig. 2. A figure caption is always placed below the illustration. Short captions are centered, while long ones are justified. The macro button chooses the correct format automatically. To sum up, in this phase the operator formulated the following operations: • Description of the sequences; • Focusing of fixed phrases; • Manipulations of L1 in L2
3.2 Data transmission In the third phase the resource is ready to be transmitted in real time as shown below;
10 It is defined as §AR>. §TNC>. The objective §OB>. The recursive technique is divided into 3 phases: the first phase consists of §FS1> .the second phase serves to §FS2> finally the third phase the phase of comprehension and calculation allows us §FS3>( see.Fig 3)
Fig. 3. A figure caption is always placed below the illustration. Short captions are centered, while long ones are justified. The macro button chooses the correct format automatically.
4 Concluding intuition: Work -Tools11 In the last prototype Work-Tolls there were two great insights : inserting objects in the text of different languages, even non-verbal ones like Lis, you could obtain a redefinition of the syntagmatic axes through the transformation of the non-verbal Lis language into the Italian natural language. In the alphabetically scanned codes one could obtain the construction of the text by transferring each individual alphabetic sign into the language to be reconverted, ie non-verbal code in verbal code. 12
(see. Fig )
Fig. 4. In this image the asl program has been inserted into work tools: formulates the ASL amrician language in text; translates from the Asl language into Italian in real time. 11
12
Work-Tools text constructor © Bucciarelli, R., Villari., P., Terrone F., Terrone, M., Santoro, F., Marcone , R., 2014 http://depaul.academia.edu/RosaleeWolfe
11
Reference 1. Bucciarelli, R.,: Lexicography for the description and analysis of a linguistic corpus, Formazione & Insegnamento XII – 4 –( 2014). 2. Bucciarelli, R.,Santoro, F edited., di Galdi, A., : Technology and languages of the new economy., I.R.I.S Salerno.(2017). 3. Bucciarelli , R., edited Galdi, A.,( linguistics and glottology series), Heart screening, an SALERNO edited Salerno . (2013). 4. Bucciarelli ,R., Villari P., edited Galdi,A., ( linguistics and glottology serie,): Project work: The professional communication contrasts isobaric acronyms in transduction ediz. I.R.I.S , Salerno, (2014). 5. Bucciarelli ,R. , Galdi, A., ( linguistics and glottology serie): Immersion in the textual typologies of italian writing,ediz. I.R.I.S, San Severino SA, (2013) Bucciarelli, R., in : Progetto Luce, n°2 13-15 (1995). 6. DAgostino, E., Elia, A.,: The meaning of the sentences: a continuum from simple sentences to polyratic forms, in Ai Limits of Language, edited by Albano Leoni E., Gambarara D., Genzini S., Lo Piparo Simone R., Laterza, Bari (1998). 7. De Mauro, T., The dictionary of the Italian language, Paravia, Turin (2000) 8. De Bueriis, G., Di Maio, F., Elia, A., Monteleone, M., Monti. J., Vietri, S., Lessi-ci elet-tronici and lexical, syntactic, morphological and orthographic descriptions, ediz Plectia, Salerno. (2005). 9. DAgostino, E., Elia, A.,: The meaning of sentences: a continuum from simple sentences to polyratic forms, in the limits of language, by Albano Leoni E., Gambarara D., Genzini S., Lo Piparo Simone R ., Laterza, Bari (1998). 10. Elia, A., Landi, A., Bucciarelli, R.,: From the grammar to the poetic text. Lin-guistic lessons, PRIN project, Loffredo publisher, Naples (2000) 11. Gross, M., Transformational grammar of French. 1- Syntax of the verb, Can-tilene, Paris. (1968) 12. Gross, M., Methods in Syntax, Complementary Construction Regime, Hermann, Paris, (1975). 13. Gross, M., Transformational Grarnmaire of the French. 2 - Syntax of the POM, Cantilene, Paris. 1977. 14. Gross, M.,: Transformational grammar dufrancais. 3 - Syntax of the adverb, Maurice Gross and Asstril, Paris. (1991) 15. Harris, Z.S.: Papers in Structural and Transformational Linguistics, Dordrecht, (1970). 16. Harris, Z.S.: Language and Information, Columbia University Press, New York (trae}, it. a cura di Martineili, M., Linguaggio e informazione, Adelphi, Milano, (1995 Silberztein, M.: Dictionnaires électroniques et analyse automatique de textes. Le système INTEX, Masson, Paris.(1993). 17. Intex, Université de Franche Comté, Besangon, disponibile su http://mshe.univfeomte.fr/ intexidownioads/Manuel.pdf.(2004).
12