For the bowlers, we considered 'No. of Innings'(0.791) and. 'Wickets-Economy Ratio'(0.209). CONCLUSION. â· Proposed methodology ensures maximal ...
Robust Player Selection Strategy: A Quantitative Perspective Nandish Chattopadhyay and Prajamitra Bhuyan Applied Statistics Unit, Indian Statistical Institute, Kolkata MOTIVATION AND BACKGROUND
SIMULATION STUDY
I Professional sports attract massive following, allures massive investments and is a diverse arena of social, economic and academic interest of research. I Prominent professional sports leagues include EPL, NBA, IPL. I In most of these leagues, players are picked up by franchises by auctions. I Subjective decisions based on the game’s understanding are often unreliable. I To make an objective decision while selecting players, one needs to rely on a ranking system, based on some sound statistical mechanism. I Methodologies available in the literature have different short-comings. I Our contribution: . Objective: Efficient Scoring and Ranking Scheme of players, based on experience and performance related data. . Principle: I Maximally discriminate individual players from one another. I Incorporate the consistency of performances, along with the averages. I Eliminate redundant information.
I Validate that our method works for simplistic scenarios with varying consistencies but same averages where subjective decision is possible. I Observe that the discriminating capability of our method is higher than others available in the literature. I Account for correlation among variables (for a 4-variate data, with high and low correlation among variables).
DATA PROCESSING I The data is usually available in comma-separated-values files in different formats, like ball-wise in cricket. I Using Structured Query Languages, we convert to the following format. Player ID Match ID Attribute 1 ... Attribute m (1) (m) . . X11 1 1 X11 .. .. .. .. .. (m) (1) . . X1k1 1 k1 X1k1 (m) (1) . . X21 2 1 X21 .. .. .. .. .. (m) (1) . . X2k2 2 k2 X2k2 .. .. .. .. .. (m) (1) . . Xn1 n 1 Xn1 .. .. .. .. .. (1) (m) . n kn Xnkn . Xnkn METHODOLOGY
I Variable Selection: The following scree plot suggests two variables can be dropped.
DATA ANALYSIS AND RESULTS I The data for IPL was collected from www.cricbuzz.com, www.cricsheets.org. I For the batsmen, we considered ‘No. of Innings’(0.746) and ‘Strike Rate’(0.254). I For the bowlers, we considered ‘No. of Innings’(0.791) and ‘Wickets-Economy Ratio’(0.209).
I Denote Yij as the score of the i-th player in the j-th match: m X (p) Yij = lp Zij i = 1, . . . , n; j = 1, . . . , ki . p=1
. lp is the weight corresponding to the p-th performance variable. .
(p) Zij
=
(p) (p) Xij −minij Xij (p) (p) maxij Xij −minij Xij
is the normalized score.
I To obtain an efficient scoring and ranking system, we have to ensure: . Maximal discrimination among individual players from one another. . Best player gets the highest score, i.e., positive weights for variables that positively impact the players’ abilities. I Solve the following constrained optimization problem: Pn 2 k ( y ¯ − y ¯ ) i=1 i i ˆ ˆ (l1, . . . , lm) = argmax . Pn Pki 2 (y − y ¯ ) (l1,...,lm )∈[0,∞) i i=1 j=1 ij I There is no closed form solution, we need to use numerical methods. Pm (p) lp I Normalized weights: wp = Pm l , so that the score Sij = p=1 wp Zij , p=1 p lies in the range [0, 1]. I The rating is therefore obtained by sorting the players, with respect to the Pki 1 average score S¯i = ki j=1 Sij , for i = 1, . . . , n, in the descending order. VARIABLE SELECTION I I I I
There could be redundancy in data, due to high correlation among variables. Choosing a good subset of variables is important. Proposition: a backward subset selection technique. To determine the appropriate number of variables, we look for a knee (bend) in the scree plot (value attained by the objective function versus the number of variables removed).
IISA CONFERENCE 2017
CONCLUSION I Proposed methodology ensures maximal discrimination amongst individuals. I Selection of players based on the proposed ranking scheme is therefore more justified. I Can be generalized into any field where data is available in repeated class-wise observations and efficient ranking is needed. I For auctions, prices attributed to players can be determined using this ranking scheme, but given the sequential nature of the auctions, one might be inclined to use the dynamic pricing approach. HYDERABAD, INDIA