Examples with importance weights - GitHub

Recommend Documents

Importance Weighting Without Importance Weights: An Efficient ...

best known regret bounds for FPL in online combinatorial optimization with full feedback, closing the perceived performa

Importance Weighting Without Importance Weights: An Efficient ...

best known regret bounds for FPL in online combinatorial optimization with full feedback, closing ... Importance weighti

importance weights in localized prostate cancer - PLOS

Jul 26, 2018 - ... were obtained for the agreement between patients and architects (Ken- ..... Moulton B, Collins PA, Burns-Cox N, Coulter A. From informed ...

Rellich inequalities with weights

Mar 31, 2011 - Rellich inequalities with weights. Paolo Caldiroli. Roberta Musina. Abstract. Let Î© be a cone in Rn with n â¥ 2. For every fixed Î± â R we find the.

Java with Generators - GitHub

processes the control flow graph and transforms it into a state machine. This is required because we can then create sta

Beta operators with Jacobi weights

Feb 14, 2014 - arXiv:1402.3485v1 [math.CA] 14 Feb 2014. Beta operators with Jacobi weights. Heiner Gonska, Ioan Rasa and Elena Dorina StËanilËa. Abstract.

Ergodic theorems with arithmetical weights

Dec 24, 2014 - DS] 24 Dec 2014. ERGODIC THEOREMS WITH ARITHMETICAL WEIGHTS. CHRISTOPHE CUNY AND MICHEL WEBER. Abstract. We prove ...

Estimating importance weights for the IWQOL-Lite using conjoint ...

Mar 4, 2010 - for Research and Treatment of Cancer Quality of Life .... Because of my weight, I have trouble tying my shoes. Because of my weight, I have ...

A fast procedure for calculating importance weights in ... - Google Sites

the number of times sample point xi appears in xâ. Thus we can approximate ..... of n = 1664 repair times for Verizon'

Based Approach for Determining Importance Weights of Influencing ...

Dec 5, 2016 - quality medical care at an affordable price, especially in Asian countries.1 This ... Email: [email protected]. Copyright Â© 2016 The ... infrastructure, tourism attractions, marketing & promotional activities, cultural factors ...

Film Terms with examples

erotica (hentai), children's stories, although most notably sci-fi and ..... Warner Bros. in the 1930s; a sub- genre of drama and epic films. Examples: The Story.

9.1 Practice with Examples

Apr 22, 2010 ... 160. Geometry. Practice Workbook with Examples ... drawn to the hypotenuse of a right triangle and use a geometric mean to solve problems.

with ZeroMQ and gevent - GitHub

Everything else (Ruby, Python, PHP, Perl, C/C++, Java). Threading. Evented .... AMQP. Producer. Consumer. MQ. Exchange.

OpenBMS connection with CAN - GitHub

Arduino with BMS- and CAN-bus shield as BMS a master. - LTC6802-2 or LTC6803-2 based boards as cell-level boards. - CAN

with ZeroMQ and gevent - GitHub

Everything else (Ruby, Python, PHP, Perl, C/C++, Java) ... and distributed systems across languages? ... interfering wit

Digital Design with Chisel - GitHub

Dec 27, 2017 - This lecture notes (to become a book) are an introduction into hardware design with the focus on using th

s Examples of the importance of trace gas measurements ... - CiteSeerX

Luke, W. T.; Williams, E. J.; Eubank, C. S.; Fahey, D. W.;. Nunnermacker, L. J.; Roberts, J. M.; Calvert, J. G.; Curran,. C. M.; Delany, A. C.; Fried, A.; Gandrud, B. W. ...

The importance of graphing the data: Anscombe's regression examples

... of graphing the data: Anscombe's regression examples ... of linear regression analysis. .... part of data analysis (Tukey, 1977; Tufte, 1997). â¢Let's look at some ...

The Importance of Neutral Examples for Learning Sentiment - CiteSeerX

Dept. of Computer Science. Bar-Ilan ... Moreover, the use of neutral training examples in learning facilitates better di

GitHub

data can only be âcorrectedâ for a single point on the sky. ... sufficient to predict it at the phase center (shifti

Frequency Response with MATLAB Examples

Oct 27, 2016 - The Skogestad's method assumes you apply a step on the input ( ) and then observe the response and .... strictly in the left half plane (has strictly negative real part). .... âGolden rulesâ of Stability Margins for a Control Syst

The Carleson Embedding Theorem with matrix weights

Aug 24, 2017 - CA] 24 Aug 2017. THE CARLESON EMBEDDING THEOREM WITH MATRIX WEIGHTS. AMALIA CULIUC AND SERGEI TREIL. Abstract.

The Centroid of Points with Approximate Weights

and supports such tangent- nding queries in time O(logn) apiece. Theorem 6. Polytope CL;H(S) in IR2 can be computed in O(nlogn + k log2 n) time for inputs in ...

Some quadrature formulae with nonstandard weights - Core

where f is a continuous function in (0,+â) having a possible exponential growth for x â +â,K(x,y) is a ..... Example 2. g(x) = sin(5x)eâx3/2+x, f (x) = sin(5x)ex3/2.

Examples with importance weights - GitHub

Download PDF

51 downloads 267 Views 186KB Size Report

Comment

Page 3 ... Learning with importance weights y. wT t x wT t+1x s(h)||x||2 ... â£p=(wtâs(h)x)Tx s (h) = Î·. âl(p,y).

Examples with importance weights Sometimes some examples are more important. Importance weights pop up in: boosting, differing train/test distributions, active learning, etc. John can reduce everything to importance weighted binary classification.

Principle Having an example with importance weight h should be equivalent to having the example h times in the dataset.

Learning with importance weights

y

Learning with importance weights

wt> x

y

Learning with importance weights

−η(∇`)> x

wt> x

y

Learning with importance weights

−η(∇`)> x

> wt> x wt+1 x

y

Learning with importance weights

−6η(∇`)> x

wt> x

y

Learning with importance weights

−6η(∇`)> x

wt> x

y

> wt+1 x ??

Learning with importance weights

−η(∇`)> x

wt> x

y

> wt+1 x

Learning with importance weights

> wt> x wt+1 x y

Learning with importance weights

s(h)||x||2 > wt> x wt+1 x y

What is s(·)? Losses for linear models `(w > x, y ). ∇w ` = Update must be given by

∂`(p,y ) ∂p x

wt+1 = wt − s(h)x s(h) must satisfy ∂`(p, y ) s(h + ) = s(h) + η ∂p p=(wt −s(h)x)> x ∂`(p, y ) 0 s (h) = η ∂p p=(wt −s(h)x)> x Finally s(0) = 0

Many loss functions Loss

`(p, y )

Squared

(y − p)2

Logistic

log(1 + e −yp )

Exponential

e −yp

Logarithmic

y log yp + (1 − y ) log 1−y 1−p

W (e

p−y x>x hηx > x+yp+e yp

)−hηx > x−e yp for y ∈ {−1, 1} yx > x py −log(hηx > x+e py ) for y ∈ {−1, 1} x > xy q p−1+ (p−1)2 +2hηx > x

if y = 0

q x>x p− p 2 +2hηx > x x>x p−1+ 1 (12hηx > x+8(1−p)3/2 )2/3 4 y =0 x>x p− 1 (12hηx > x+8p 3/2 )2/3 4 y =1 “ ” x>x −y min hη, 1−yp for y ∈ {−1, 1} x>x if y > p −τ min(hη, y −p ) τx>x p−y if y ≤ p (1 − τ ) min(hη, ) (1−τ )x > x

if y = 1

Hellinger

(

√

p−

√ √ √ 2 y ) − ( 1 − p − 1 − y )2

if if

Hinge τ -Quantile

max(0, 1 − yp) if y > p if y ≤ p

τ (y − p) (1 − τ )(p − y )

Update s(h) „ « > 1 − e −hηx x

Robust results for unweighted problems astro - logistic loss

spam - quantile loss

0.97

0.98

0.96

0.97 0.96 standard

standard

0.95 0.94 0.93 0.92

0.95 0.94 0.93 0.92

0.91

0.91

0.9

0.9 0.9

0.91

0.92

0.93 0.94 0.95 importance aware

0.96

0.97

0.9

1

0.945

0.99

0.94

0.98

0.935

0.97

0.93

0.96

0.925 0.92

0.93 0.94 0.95 importance aware

0.96

0.97

0.98

0.95 0.94

0.915

0.93

0.91

0.92

0.905

0.91

0.9

0.92

webspam - hinge loss

0.95

standard

standard

rcv1 - squared loss

0.91

0.9 0.9 0.905 0.91 0.915 0.92 0.925 0.93 0.935 0.94 0.945 0.95 importance aware

0.9

0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 importance aware

And now something completely different Adaptive, individual learning rates in VW. It’s really GD separately on each coordinate i with ηt,i = r Pt

s=1

1

∂`(ws> xs ,ys ) ∂ws,i

2

Coordinate-wise scaling of the data less of an issue Can state this formally (Duchi, Hazan, and Singer / McMahan and Streeter, COLT 2010)

Some tricks involved Store sum of squared gradients w.r.t wi near wi . float InvSqrt(float x){ float xhalf = 0.5f * x; int i = *(int*)&x; i = 0x5f3759d5 - (i >> 1); x = *(float*)&i; x = x*(1.5f - xhalf*x*x); return x; } Special SSE rsqrt instruction is a little better

Experiments Raw Data ./vw --adaptive -b 24 --compressed -d tmp/spam_train.gz average loss = 0.02878 ./vw -b 24 --compressed -d tmp/spam_train.gz -l 100 average loss = 0.03267 TFIDF scaled data ./vw --adaptive -b 24 --compressed -d tmp/rcv1_train.gz -l 1 average loss = 0.04079 ./vw -b 24 --compressed -d tmp/rcv1_train.gz -l 256 average loss = 0.04465