Apr 20, 2003 - Dialectometry. John Nerbonne and William Kretzschmar, Jr. ..... thors segmented the speech in a very rough fashion and likewise corrected for ...
Aug 17, 2006 - Linguistic variation is not merely of proverbial interest, there are even songs sung celebrating its fascination, e.g. Cole Porter's Let's call the whole thing off : ..... on which Labov, Ash & Boberg (2005) has written extensively. ..
markers during their analysis (Williams & Reiter 2003). More importantly, the PDTB .... the Study of Language. Kamp, Hans, Josef van Genabith & Uwe Reyle.
learning, including the consequences of trial-by-trial learning for response laten- ... 1 Two-layer networks, non-linear separation, and human learning q q q. 7300.
represented by the English expression galore. Like the adverb of quantification often, it may express the same quantificational force as many, but unlike often, ...
... for discussion and John Bateman, Dorothee Beermann, Hans-Ulrich Krieger, ... resultative constructions and derivational morphology interact (Müller 2002; ...
Mar 15, 2015 - GOOSSENS, J. (1977) Inleiding tot de Nederlandse Dialectologie. Wolters-Noordho, Groningen: Wolters-Noordhoff. HEERINGA, W. (2004) ...
The entropy change of a system during a process can be calculated: (kJ/K). 2. 1
... Note that the cyclic integral of δQ / T will give us the entropy change only if the.
Feb 4, 2000 - entropy. ISSN 1099-4300 www.mdpi.org/entropy/. Emergence and ..... Cells belonging to pluricellular organisms live a highly constrained life.
Oct 3, 2018 - of the same type as the Connectivity Index, the HDI, etc, the weights ... Keywords: economic complexity; infrastructure; economic development.
Jul 4, 2018 - entropy. Article. Conditional Gaussian Systems for Multiscale. Nonlinear Stochastic ... 4.1.4 A Nonlinear Triad Model with Multiscale Features .
circular economy industrial ... The Second Law (SLT), or entropy law, states that during any process, useful energy ... In practice, entropy and the second law of.
Dec 31, 2003 - ENTROPY GENERATION IN LAMINAR FLUID FLOW. THROUGH A .... through duct at constant heat flux boundary condition. Sahin [16] studied ...
Feb 11, 2005 - Numerical Study On Local Entropy Generation In ..... the ease with which the analysis model can be created, and because the software allows ...
Sep 4, 2006 - Entropy, 2006, 8 , 175-181. Entropy. ISSN 1099-4300 www.mdpi.org/entropy/. Full paper. Generalized Ideal Gas Equations for Structureful.
Jan 30, 2002 - Associate Editor, ENTROPY (ISSN 1099-4300), Unit for Interdisciplinary Studies, ... natural, life, behavioral and social sciences study its myriad ...
Dec 13, 2018 - entropy. Article. Entropy Churn Metrics for Fault Prediction in Software Systems. Arvinder Kaur and Deepti Chopra *. University School of ...
Jun 30, 2003 - mechanics is applied to some physical systems in order to show the effect of the incompleteness of information. It is shown that this extensive ...
Apr 29, 2005 - Basis of Local Approach in Classical Statistical Mechanics. ... No one has succeeded in finding a non-trivial solution of Liouville's ... Any nonequilibrium thermodynamic system can be presented as .... In expression ( 4 ) the presence
relational quantum mechanics. Any observer makes the system in superposition collapse to a classical value of entropy, i.e., he/she fixes classical information by ...
Apr 21, 2018 - of the jump component means the probability of having a jump is the .... strategy, and passive trading strategy [25]. ...... A higher probability of.
Oct 11, 2018 - Thermostat, and Convergence to Equilibrium ..... just after the Big Bang [41]), or that some, yet undetetected, exotic quantum systems ..... Time evolution of HV, Lf and the L1 norm, for a uniform initial ..... (dâf), once equilibriu
Dec 31, 2003 - wave packet spreading and reduction stemming from the non-white nature of the noise from the classical ..... NOMENCLATURE c speed of light ...
physics, chemistry and biology. Processes leading to ... ecules, respectively), extends to the concept of the irreversibility of time. .... Fernandez, The study of pro-.
4 Mar 2009 ... Thomas Zastrow - University Tübingen www.thomas-zastrow.de. Computational
Dialectometry. 1. Information Theoretic Approaches in.
Computational Dialectometry
1
Information Theoretic Approaches in Computational Dialectometry
4. March 2009 Groningen, Netherlands
Thomas Zastrow - University Tübingen
www.thomas-zastrow.de
Computational Dialectometry
2
Outline
• Information • Entropy • Applying Information and Entropy to Dialectometry • Advantages and Disadvantages • The Bulgarian Data Set • Map(s): Information • Map(s): Entropy • Conclusion Thomas Zastrow - University Tübingen
www.thomas-zastrow.de
Computational Dialectometry
3
Information
• The information I of an element z in a data set:
I(z) = −log 2 p(z) • p(z) is the probability of z • Example: using a fair 6 side dice, every throw contains the € information
I(T) = −log 2 (1/6) = 2.5849 Bit
Thomas Zastrow - University Tübingen
€
www.thomas-zastrow.de
Computational Dialectometry
4
Entropy
• Entropy measures the amount of surprise in a data set: entropy is the relation between the information and the noise of a data set: n
H(X)= −∑ p(zi ) log 2 p(zi ) i=1
• n is the number of elements in the alphabet • p(zi)€is the propability of the actual element
Thomas Zastrow - University Tübingen
www.thomas-zastrow.de
Computational Dialectometry
5
Applying Information and Entropy to Dialectometry
• Information theoretic approaches can only work with phonetic data: differences between dialects are expressed as pronunciation differences • Pronunciation differences are resulting in qualitative and quantitative different uses of elements: Dialect 1
Dialect 2
Oder
Oder
Geht
Gäht
Andere
Andere
Thomas Zastrow - University Tübingen
www.thomas-zastrow.de
Computational Dialectometry
6
Applying Information and Entropy to Dialectometry
• The information is calculated on the basis of the whole data set, then the percentage of every site on this amount of information is visualized • The entropy is calculated as individual value for every site seperately and then visualized • Unigrams are used
Thomas Zastrow - University Tübingen
www.thomas-zastrow.de
Computational Dialectometry
7
Advantages and Disadvantages
• + Information theoretic approaches are taking into account the whole data set at once: aggregate method • + Analysis is possible on the basis of arbitrary n-grams • - The entropy is a pure quantitative unit the order of elements doesn't play a role • - Word borders are ignored • - Elements with the same number of appearance are treated identical Thomas Zastrow - University Tübingen
www.thomas-zastrow.de
Computational Dialectometry
8
The Bulgarian Data Set
• In cooperation with the Bulgarian Academy of Science and the University of Sofia, two data sets of Bulgarian dialects have been compiled: • A set of phonetic data, containing 156 distinct words in 197 geographical locations (sites) • A set of 112 lexical lemmas, collected in the same 197 sites • This presentations relies on the phonetic data set, using 118 out of 156 words in 197 sites
Thomas Zastrow - University Tübingen
www.thomas-zastrow.de
Computational Dialectometry
9
The Bulgarian Data Set - Phonetic
• The data is encoded in XSampa, which is an electronic form of the IPA • XML is used as container: агнеlamb|агне `agne `jagne "agne "jAgne
Thomas Zastrow - University Tübingen
www.thomas-zastrow.de
Computational Dialectometry
10
Bulgarian Dialects
Thomas Zastrow - University Tübingen
www.thomas-zastrow.de
Computational Dialectometry
11
Analysis and Visualization with VDM
• "Visual Dialectometry", developed by Edgar Haimerl at the University Salzburg • Initially, developed to analyse and visualize dialectometrical investigations on the basis of "Relativer Identitätswert" by Prof. Dr. Hans Goebl • VDM takes a similarity matrix as input, so also other data on a geographical basis can be analysed and visualized
Thomas Zastrow - University Tübingen
www.thomas-zastrow.de
Computational Dialectometry
12
Analysis and Visualization with VDM
• Possible steps of analysis: • Classification on the basis of interval algorithms (Synopsis Map) • Hierarchical Clustering with several methods of distance measurement (dendrograms and maps) • Isogloss Map • Ray Map
Thomas Zastrow - University Tübingen
www.thomas-zastrow.de
Computational Dialectometry
13
Synopsis Map 6 classes, algorithm MinMwMax, site 1 as reference point
Thomas Zastrow - University Tübingen
www.thomas-zastrow.de
Computational Dialectometry
14
Cluster Analysis Ward method, 2 clusters
Thomas Zastrow - University Tübingen
www.thomas-zastrow.de
Computational Dialectometry
15
Cluster Analysis
Ward method, 12 clusters
Thomas Zastrow - University Tübingen
www.thomas-zastrow.de
Computational Dialectometry
16
Isogloss Map
Thomas Zastrow - University Tübingen
www.thomas-zastrow.de
Computational Dialectometry
17
Ray Map
Thomas Zastrow - University Tübingen
www.thomas-zastrow.de
Computational Dialectometry
18
Overview - Information
Thomas Zastrow - University Tübingen
www.thomas-zastrow.de
Computational Dialectometry
19
Overview - Entropy
Thomas Zastrow - University Tübingen
www.thomas-zastrow.de
Computational Dialectometry
20
Comparison to other methods – vector analysis
Thomas Zastrow - University Tübingen
www.thomas-zastrow.de
Computational Dialectometry
21
Conclusions
• Despite its disadvantages, information theory based methods are able to identify the main linguistic structures between dialects • Because of its aggregating nature, the entropy and similar methods can be used in addition to well-known methods in computational dialectometry, for example edit distance based approaches
Thomas Zastrow - University Tübingen
www.thomas-zastrow.de
Computational Dialectometry
22
Further Work
• Extend the analysis to n-gram models (already done for bigrams) • Use more enhanced methods of information theory, for example conditional entropy etc. (claculating entropy values against a gold standard) • Extensive statistical analysis of the results