... regression meth- ods is discussed and applied to real world data with great
success. ... 2.3 Google Books n-gram Corpus . ... 4.3 Testing significance of a
regression model . . . . . . . . ..... to time series modelling is based on fitting ARIM
Sep 7, 2017 - ... over Time. Guy D. Rosin1, Eytan Adar2, Kira Radinsky1,3 .... to measure word relatedness over time using these temporal .... Let L = (t1,v1),(t2,v2),...,(tn.vn) be a list .... (Battle Mogadishu, Somalia, 2010, HappenedIn, true).
1 FAULU opened shop in 1995 and modified/improved its operations in 1999; FINCA 1992; ..... PRIDE Uganda, which has help
Ciência Rural, Santa Maria, v.47: 03, e20160342, 2017 ... of PR and may be used in diagnostic laboratories that follow ISO 17025 and ... para o diagnóstico de PR e pode ser utilizada em laboratórios de diagnóstico que se seguem normas internacion
wire lines, coaxial cables, optical fibers, or free space that serves as prop- ...... the noiseless received waveform Y (t)âan approach that we may call the received ...
sequence similarity to genes from any other known organism [7]. The simple program regulating the life of P. falciparum may hold the key to its downfall, as any ...
... of DNA microarray data. Engineering application of Artificial. Intelligence, 17(4):417â427, 2004. [19] Suh, K.N., Kain, Kevin, C., Keystone, J. S., Malaria CMAJ.
In free word combinations, the meaning of a whole is obtained by summing the ... Are idioms, lexical functions, and free word combinations useful in natural lan-.
ato Temporal Random Indexing (TRI), che permette la costruzione dei WordSpaces e fornisce degli strumenti per l'analisi lin- guistica. Nell'articolo proponiamo ...
HTML tags) and pre-processing, we obtain several text files that represent the papers. ... was a challenge in converting these LaTeX files to XHTML files.
Proposal of Community-based Walking Trail Sharing Service. Chi-Chih Yu. *1 ..... We develop our smartphone application on the Android plat- form smartphone ...
With the growing popularity of the World Wide Web (Web), large volumes of data such as addresses ... answering the Web Usage Mining problem in real time.
Mar 30, 2015 - This international conference âWord Knowledge and Word Usage: Representations and pro- .... aspect of implicative structure, which we call.
Mar 30, 2015 - inferred from knowledge of either A or B. Table 1 .... would be expected.2 Thus the property speakers seem to be ...... saleâ. These categories are much less established in memory than common categories. We inter- ...... laptop. Resp
Sep 17, 2015 - A good embedding provides vector rep- ... of a specific task, e.g., linguistic insight or good ... of nearest neighbors of a word w are all words.
Kenneth Church and Patrick Hanks. 1990. Word ... Claire Grover, John Carroll and John Reckers. 1993. The Alvey Natural ... Dublin, Ireland. Michael Sussna.
so that only the keyword plus filler recogniser has to be run when a search request is received. Ranked keyword scores document. Audio. Keyword. Keyword ...
Figure 1: TextDNA allows people to compare word usage patterns across large text corpora. ... system that uses a configu
AbstractâIn the case of non-quasi-static (i.e., time-selective fast fading) channels, which do exist in practice, the performance of the existing NO-STBC detectors ...
Jun 5, 2015 - Concreteness and Reweighting of Examples. Beata Beigman Klebanov, Chee ... ing examples and of a suite of features related to concreteness of the target ...... Scikit-learn: Machine Learning in Python. Journal of Machine ...
retrieval failures. Expanding ... The similarities are calculated based on the contexts in which a set ... associated with the occurrence of a particular context word.
Apr 15, 2015 - graphs from the third âHarry Potterâ novel (Rowling, 1999)2, as it is both readable ...... Harry Potter and the Prisoner of Azkaban. New York, NY:.
This paper introduces two rapid methods: Adenosine Triphosphate. (ATP) and BactiQuantTM, and relate them to traditional methods. Principles. When working ...
simple linear model: y = a + bx + regression line calculated using method, that is, by Pnthe least-squares 2 minimizing the value of e = i=1 i Ondˇrej Herman (FI MUNI)
Detection of word usage over time
7. 12. 2013
4 / 19
Linear regression 35 30 25 20 15 10 5 0
1980
1990
2000
’slight’ - Google ngrams
polynomial model coefficient of determination (R 2 ) adjusted R 2 Ondˇrej Herman (FI MUNI)
linear model directly using the total counts as the weights skews the results Ondˇrej Herman (FI MUNI)
Detection of word usage over time
7. 12. 2013
6 / 19
Weighted linear regression 0.85 0.80 0.75 0.70 0.65 0.60
1
2
3
4
5
6
(a) adjusted R 2
7
8
1.2 1.0 0.8 0.6 0.4 0.2 0.0 −0.2 9
1980
1990
2000
2 (b) model with maximal Radj
’Chernobyl’ - Google ngrams
R 2 , the coefficient of determination, is the fraction of variance explained by the regression model R 2 increases with the degree of the regression model kitchen sink regression Ondˇrej Herman (FI MUNI)
(a) ’steep’ Oxford English Corpus, (b) ’carrot’ from Google ngrams, p = 0.414
p = 4.3 × 10−10
example F-test p-values
H0 : the mean predicts the behavior of the series well H1 : the given regression model predicts the behavior well Ondˇrej Herman (FI MUNI)
Detection of word usage over time
7. 12. 2013
10 / 19
Robust regression
Moore-Wallis test Mann-Kendall test Spearman’s ρ Theil-Sen metod
Ondˇrej Herman (FI MUNI)
Detection of word usage over time
7. 12. 2013
11 / 19
Moore-Wallis test also known as the sign-difference test
9 8 7 6 5 4 3 2 1 0
0
2
4
6
8
10 12 14 16
16 14 12 10 8 6 4 2 0
0
2
4
6
8
10 12 14 16
no trend is detected in the first series, a downward trend is detected in the second series asymtotically optimal on short series the power of the test is low Ondˇrej Herman (FI MUNI)
Detection of word usage over time
7. 12. 2013
12 / 19
Theil-Sen estimator defined as the median of the pairwise slopes of the samples: b 0 = med
Behavior of the Theil-Sen estimator for words encountered in the British National Corpus Ondˇrej Herman (FI MUNI)
Detection of word usage over time
7. 12. 2013
13 / 19
Mann-Kendall test used to test the significance of a regression model fitted using the Theil-Sen estimator
S=
n X i X i=1 j=1
6 5 4 3 2 1 0
1976 1984 1992 (a) ’oil’, p = 0.021
6 5 4 3 2 1 0 −1
sgn(xi − xj ) sgn(yi − yj )
1976 1984 1992 (b) ’disk’, p = 0.009
6 5 4 3 2 1 0
1976 1984 1992
(c) ’slow’, p = 0.821
Words from the British National Corpus tested using the Mann-Kendall test with the trend line fitted using the Theil-Sen estimator Ondˇrej Herman (FI MUNI)
Detection of word usage over time
7. 12. 2013
14 / 19
Spearman’s ρ
calculated as the correlation coefficient of a linear model obtained by using the rank of the observations instead of the actual value yields almost the same results as the Mann-Kendall test the distribution of the test scores is more difficult to calculate
Ondˇrej Herman (FI MUNI)
Detection of word usage over time
7. 12. 2013
15 / 19
Slope normalization the slope estimates are not directly comparable, they need to be normalized
d=
b0 y¯
where bˆ is the estimated slope and y¯ is the mean of y , the observed frequencies.
On the next slide: the slopes obtained from Google ngrams of the 50 most common words from the Oxford English Corpus ordered by the slope relative to the mean d Ondˇrej Herman (FI MUNI)
Detection of word usage over time
7. 12. 2013
16 / 19
word which been his he It were be by there was has of had would all but one not the it will is at The this
rej Herman (FI MUNI) from Google Detection of word usage over 50 timemost common words 7. 12. 2013 / 19 TheOndˇ slopes obtained ngrams of the from 17 the
Future work
anomaly detection piecewise linear model
Ondˇrej Herman (FI MUNI)
Detection of word usage over time
7. 12. 2013
18 / 19
Conclusion
Mann-Kendall test together with the Theil-Sen estimator give the best results standard linear regression model gives satisfactory results most of the time