Subject-Object-Specific Covariates in Paired Comparison Models – An ...

2 downloads 0 Views 260KB Size Report
The data we consider are data from the season 2014/15 of the German Bundesliga. In particular, match-specific covariates are used to model the results from ...
Subject-Object-Specific Covariates in Paired Comparison Models – An Application to Data from the German Bundesliga Gunther Schauberger1 , Andreas Groll1 , Gerhard Tutz1 1

LMU Munich, Germany

E-mail for correspondence: [email protected] Abstract: A model for results of football matches is proposed that is able to take into account match-specific covariates as, for example, the total distance a team runs in the specific match. The model extends the Bradley-Terry model in many different ways. In addition to the inclusion of covariates, it considers ordered response values and (possibly team-specific) home effects. Penalty terms are used to reduce the complexity of the model and to find clusters of teams with equal covariate effects. Keywords: Bradley-Terry; BTLLasso; Paired Comparison; Football data.

1

Introduction

Paired Comparisons occur if two objects are compared with respect to an underlying latent trait. In this work, we consider football matches and treat them as paired comparisons between two teams where the underlying latent traits are the playing abilities of the teams. The data we consider are data from the season 2014/15 of the German Bundesliga. In particular, match-specific covariates are used to model the results from single matches. In general, if covariates are to be considered in paired comparison, one has to distinguish between subjects and objects of the paired comparisons and, accordingly, between subject-specific, object-specific and subject-objectspecific covariates. In football matches, the teams are the objects while a single match can be considered to be the subject that makes the comparison between the two objects/teams. In our application, subject-object-specific covariates are considered. The Bradley-Terry model (Bradley and Terry, 1952) is the standard model for paired comparison data. Assuming a set of objects {a1 , . . . , am }, in its This paper was published as a part of the proceedings of the 31st International Workshop on Statistical Modelling, INSA Rennes, 4–8 July 2016. The copyright remains with the author(s). Permission to reproduce or extract any parts of this abstract should be requested from the author(s).

2

Modeling Bundesliga Data

most simple form the Bradley-Terry model is given by P (ar  as ) = P (Y(r,s) = 1) =

exp(γr − γs ) . 1 + exp(γr − γs )

One models the probability that a certain object ar dominates or is preferred over another object as , ar  as . The random variable Y(r,s) is defined to be Y(r,s) = 1 if ar dominates as and Y(r,s) = 0 otherwise. The parameters γr represent the attractiveness or strength of the respective objects.

2

Bundesliga Data

The main goal of this work is to analyze if (and which) match-specific covariates influence the result of football matches. Match-specific covariates are information on specific measurements of the teams in each match, as for example the number of kilometers a team runs (Distance). In total, all the following covariates are known per team and per match: Distance Total amount of km run BallPossession Percentage of ball possession TacklingRate Rate of won tacklings ShotsonGoal Total number of shots on goal Passes Total number of passes CompletionRate Percentage of passes reaching teammates FoulsSuffered Number of fouls suffered Offside Number of offsides (in attack) In particular, it is interesting which covariates have an influence at all and for which covariates there are different effects for single teams. As the covariates we consider are collected per team and per match, they generally can be termed as subject-object-specific covariates.

3

A Paired Comparison Model for Football Matches Including Subject-Object-Specific Covariates

When using a paired comparison model for football matches several extensions compared to the standard Bradley-Terry model are needed. The model has to be able to handle an ordinal response (in particular draws), home effects and subject-object-specific covariates.

Schauberger et al.

3

For that purpose, we propose to use the general model for ordinal response data Yi(r,s) ∈ {1, . . . , K} denoted by P (Yi(r,s) ≤ k)

= =

exp(δr + θk + γir − γis ) 1 + exp(δr + θk + γir − γis ) exp(δr + θk + βr0 − βs0 + z Tir αr − z Tis αs ) . 1 + exp(δr + θk + βr0 − βs0 + z Tir αr − z Tis αs )

Basically, the model is a special case of a cumulative logit model and allows for the inclusion of so-called subject-object-specific covariates z ir . See also Tutz and Schauberger (2015) for a model including object-specific covariates z r and Schauberger and Tutz (2015) for a model including subjectspecific covariates z i . Yi(r,s) encodes an ordered response with K categories (including a category for draws) for a match between team ar and team as on matchday i where ar played at its home ground. The linear predictor of the model contains the following terms: δr team-specific home effects of team ar θk category-specific threshold parameters βr0 team-specific intercepts z ir p-dimensional covariate vector that varies over teams and matches αr p-dimensional parameter vector that varies over teams. In general, for ordinal paired comparisons it can be assumed that the response categories have a symmetric interpretation so that P (Y(r,s) = k) = P (Y(s,r) = K − k + 1) holds. Therefore, the threshold parameters should be restricted with θk = −θK−k and, if K is even, θK/2 = 0 to guarantee for symmetric probabilities. Instead, the home effects now cover the possible order effects (the advantage of the home team ar over the away team as ). Instead of fixed abilities of the teams γr , the teams have matchday-specific abilities γir = βr0 + z Tir αr , depending on the covariates of team ar on matchday i. Both the home effect and the covariate effects could also be included as global parameters instead of team-specific parameters. To decide, whether the home effect or single covariate effects should be considered with teamspecific or global parameters, penalty terms will be used. In particular, the absolute values of all pairwise differences between the team-specific home advantages are penalized using the penalty term X P (δ1 , . . . , δm ) = |δr − δs |. r

Suggest Documents