the whole 'tail' idea comes from the fact that, under measure theory, the conditional probability ... all will be fine, in the end! (Bernstein-Von Mises theorem).
Random sets at the interface of statistics and AI Fifth Bayesian, Fiducial, and Frequentist (BFF5) Conference Prof Fabio Cuzzolin School of Engineering, Computing and Mathematics Oxford Brookes University, Oxford, UK
Ann Arbor, MI, May 7 2018
Prof Fabio Cuzzolin
Random sets at the interface of statistics and AI
Ann Arbor, MI, May 7 2018
1 / 31
Uncertainty
Second-order uncertainty
Orders of uncertainty
the difference between predictable and unpredictable variation is one of the fundamental issues in the philosophy of probability second order uncertainty: being uncertain about our very model of uncertainty has a consequence on human behaviour: people are averse to unpredictable variations (as in Ellsberg’s paradox) how good are Bayesian and frequentist probability at modelling second-order uncertainty? Prof Fabio Cuzzolin
Random sets at the interface of statistics and AI
Ann Arbor, MI, May 7 2018
2 / 31
Uncertainty
Second-order uncertainty
Fisher has not got it all right
the setting of frequentist hypothesis testing is (arguably) arguable I
I
I
the scope is quite narrow: rejecting or not rejecting a hypothesis (although it can provide confidence intervals) the criterion is arbitrary: who decides what an ‘extreme’ realisation is (choice of α)? what is the deal with 0.05 and 0.01? the whole ‘tail’ idea comes from the fact that, under measure theory, the conditional probability (p-value) of a point outcome x is zero – seems trying to patch an underlying problem with the way probability is mathematically defined
cannot cope with pure data, without assumptions on the process (experiment) which generated them
Prof Fabio Cuzzolin
Random sets at the interface of statistics and AI
Ann Arbor, MI, May 7 2018
3 / 31
Uncertainty
Second-order uncertainty
The problem(s) with Bayes pretty bad at representing ignorance I I
Jeffrey’s uninformative priors are just not good enough different results on different parameter spaces
Bayes’ rule assumes the new evidence comes in the form of certainty: “A is true” I
in the real world, often this is not the case (‘uncertain’ or ‘vague’ evidence)
beware the prior! → model selection in Bayesian statistics I
I
I I
results from a confusion between the original subjective interpretation, and the objectivist view of a rigorous objective procedure why should we ‘pick’ a prior? either there is prior knowledge (beliefs) or there is not all will be fine, in the end! (Bernstein-Von Mises theorem) asymptotically, the choice of the prior does not matter (really!)
Prof Fabio Cuzzolin
Random sets at the interface of statistics and AI
Ann Arbor, MI, May 7 2018
4 / 31
Uncertainty
Set-valued observations
The die as random variable
� face 1 face 2
1
2
face 6 face 3 face 5 face 4
3
4
X 5
6
a die is a simple example of (discrete) random variable there is a probability space Ω = {face1, face2, ..., face6} which maps to a real number: 1, 2, ..., 6 (no need for measurability here) now, imagine that face1 and face2 are cloaked, and we roll the die
Prof Fabio Cuzzolin
Random sets at the interface of statistics and AI
Ann Arbor, MI, May 7 2018
5 / 31
Uncertainty
Set-valued observations
The cloaked die: set-valued observations � face 1 face 2
1
2
face 6 face 3 face 5 face 4
3
4
X 5
6
the same probability space Ω = {face1, face2, ..., face6} is still there (nothing has changed in the way the die works) however, now the mapping is different: both face1 and face2 are mapped to the set of possible values {1, 2} (since we cannot observe the outcome) this is a random set [Matheron,Kendall,Nguyen, Molchanov]: a set-valued random variable whenever data are missing observations are inherently set-valued Prof Fabio Cuzzolin
Random sets at the interface of statistics and AI
Ann Arbor, MI, May 7 2018
6 / 31
Belief functions
Random set definition
Dempster’s multivalued mappings Dempster’s work formalises random sets via multivalued (one-to-many) mappings Γ from a probability space (Ω, F, P) to the domain of interest Θ
� drunk (0.2)
�
�
not drunk (0.8) Mary
Peter John
examples taken from a famous ‘trial’ example [Shafer] elements of Ω are mapped to subsets of Ω: once again this is a random set I
in the example Γ maps {not drunk } ∈ Ω to {Peter , John} ⊂ Θ
the probability distribution P on Ω induces a mass assignment m : 2Θ → [0, 1] on the power set 2Θ = {A ⊆ Θ} via the multivalued mapping Γ : Ω → 2Θ Prof Fabio Cuzzolin
Random sets at the interface of statistics and AI
Ann Arbor, MI, May 7 2018
7 / 31
Belief functions
Belief and plausibility
Belief and plausibility measures the belief in A as the probability that the evidence implies A: X Bel(A) = P({ω ∈ Ω|Γ(ω) ⊆ A}) = m(B) B⊆A
the plausibility of A as the probability that the evidence does not contradict A: Pl(A) = P({ω ∈ Ω|Γ(ω) ∩ A 6= ∅}) = 1 − Bel(A) originally termed by Dempster lower and upper probabilities belief and plausibility values can (but this is disputed) be interpreted as lower and upper bounds to the values of an unknown, underlying probability measure: Bel(A) ≤ P(A) ≤ Pl(A) for all A ⊆ Θ belief measures include probability ones as a special case: what does replace Bayes’ rule? shift from conditioning (on certain events) to combination (of pieces of evidence)
Prof Fabio Cuzzolin
Random sets at the interface of statistics and AI
Ann Arbor, MI, May 7 2018
8 / 31
Belief functions
Dempster’s combination
Dempster’s �combination �
drunk (0.2)
�
��
not drunk (0.8) Peter
Mary
John
cleaned (0.6) not cleaned (0.4)
��
new piece of evidence: a blond hair has been found; also, there is a probability 0.6 that the room has been cleaned before the crime
��
the assumption is that pairs of outcomes in the source spaces ω1 ∈ Ω1 and ω2 ∈ Ω2 support the intersection of their images in 2Θ : θ ∈ Γ1 (ω1 ) ∩ Γ2 (ω2 ) if this is done independently, then the probability that pair (ω1 , ω2 ) is selected is P1 ({ω1 })P2 ({ω2 }), yielding Dempster’s rule of combination: X 1 (m1 ⊕ m2 )(A) = m1 (B)m2 (C), ∀∅ 6= A ⊆ Θ, 1−κ B∩C=A
Bayes’ rule is a special case of Dempster’s rule Prof Fabio Cuzzolin
Random sets at the interface of statistics and AI
Ann Arbor, MI, May 7 2018
9 / 31
Belief functions
Semantics
Semantics of belief functions Modelling second-order uncertainty
p(x) = 1 probability simplex
�
p(z) = 0.7
B
�
A
�(B)
1
p(x) = 0.6
Bel p(x) = 0.2
�(A)
0 p(z) = 1
p(z) = 0.2
p(y) = 1
belief functions have multiple interpretations as set-valued random variables (random sets) as (completely monotone) capacities (functions from the power set to [0, 1]) as a special class of credal sets (convex sets of probability distributions) [Levi,Kyburg] as such, they are a very expressive means of modelling uncertainty on the model itself, due to lack of data quantity or quality, or both Prof Fabio Cuzzolin
Random sets at the interface of statistics and AI
Ann Arbor, MI, May 7 2018
10 / 31
Rare events
Rare events and second-order uncertainty
What’s a rare event?
what is a ‘rare’ event? clearly we are interested in them because they are not so rare, after all! examples of rare events, also called ‘tail risks’ or ‘black swans’, are: volcanic eruptions, meteor impacts, financial crashes .. mathematically, an event is ‘rare’ when it covers a region of the hypothesis space which is seldom sampled – it is an issue with the quality of the sample Prof Fabio Cuzzolin
Random sets at the interface of statistics and AI
Ann Arbor, MI, May 7 2018
11 / 31
Rare events
Rare events and second-order uncertainty
Rare events and second-order uncertainty probability distributions for the system’s behaviour are built in ‘normal’ times (e.g. while a nuclear plant is working just fine), then used to extrapolate results at the ‘tail’ of the distribution P(Y=1|x)
'rare' event
1
popular statistical procedures (e.g. logistic regression) can sharply underestimate the probability of rare events
0.5 training samples
−6
−4
−2
0
0
2
4
6
x
Harvard’s G. King [2001] has proposed corrections based on oversampling the ‘rare’ events w.r.t the ‘normal’ ones
the issue is really one with the reliability of the model! we need to explictly model second-order uncertainty belief functions can be employed to model this uncertainty: rare events are a form of lack of information in certain regions of the sample space how do we infer belief functions from sample data? Prof Fabio Cuzzolin
Random sets at the interface of statistics and AI
Ann Arbor, MI, May 7 2018
12 / 31
Statistical inference with belief functions
Likelihood-based inference
Inference from classical likelihood [Shafer76, Denoeux] consider a statistical model L(θ; x) = f (x|θ), x ∈ X, θ ∈ Θ , where X is the sample space and Θ the parameter space BelΘ (θ|x) is the consonant belief function (with nested focal elements) with plausibility of the singletons equal to the normalized likelihood: pl(θ|x) =
L(θ; x) supθ0 ∈Θ L(θ0 ; x)
compatible with the likelihood principle takes the empirical normalised likelihood to be the upper bound to the probability density of the sought parameter! (rather than the actual PDF) the corresponding plausibility function is PlΘ (A|x) = supθ∈A pl(θ|x) the plausibility of a composite hypothesis A ⊂ Θ PlΘ (A|x) =
supθ∈A L(θ; x) supθ∈Θ L(θ; x)
is the usual likelihood ratio statistics Prof Fabio Cuzzolin
Random sets at the interface of statistics and AI
Ann Arbor, MI, May 7 2018
13 / 31
Statistical inference with belief functions
Belief likelihood function
Belief likelihood function Generalising the sample likelihood [Cuzzolin UAI’18, u/r]
different take: instead of using conventional likelihood to build a belief function, can we define a belief likelihood function of a sample x ∈ X the traditional likelihood function is a conditional probability of the data given a parameter θ ∈ Θ, i.e., a family of PDF over X parameterised by θ it is natural to define a belief (set-) likelihood function as a family of belief functions on X, BelX (.|θ) parameterised by θ ∈ Θ note that a belief likelihood takes values on sets of outcomes – individual outcomes are a special case a natural setting for computing likelihoods of set-valued observations, such as those which naturally arise when data are missing coherent with the random set philosophy
Prof Fabio Cuzzolin
Random sets at the interface of statistics and AI
Ann Arbor, MI, May 7 2018
14 / 31
Statistical inference with belief functions
Belief likelihood function
Belief likelihood function Multivariate analysis what can we say about the belief likelihood function of a series of trials observations are a tuple x = (x1 , ..., xn ) ∈ X1 × · · · × Xn , where Xi = X denotes the space of quantities observed at time i by definition the belief likelihood function is BelX1 ×···×Xn (A|θ), where A is any subset of X1 × · · · × Xn
Belief likelihood function of repeated trials . ↑× X ↑× X BelX1 ×···×Xn (A|θ) = BelX1 i i · · · BelXn i i (A|θ) here is an arbitrary combination rule (Dempster’s, conjunctive, disjunctive..) ↑× X
BelXj i i is the vacuous extension of BelXj to the Cartesian product X1 × · · · × Xn where the observed tuples live I
(assigns the mass of B ⊂ ΘX to B × ΘY )
Prof Fabio Cuzzolin
Random sets at the interface of statistics and AI
Ann Arbor, MI, May 7 2018
15 / 31
Statistical inference with belief functions
Belief likelihood function
Belief likelihood function for ‘sharp’ samples can we reduce this to the belief values of the individual trials? yes, if we wish to compute likelihood values of tuples of individual outcomes {x = (x1 , ..., xn )} rather than arbitrary subsets of X1 × ... × Xn it makes sense to call the following lower and upper likelihoods
Lower and upper likelihoods of a sample {x = (x1 , ..., xn )} ∩ or ⊕ as a combination rule in the definition of belief likelihood When using both function, the following factorisations hold: n Y . L(x) = BelX1 ×···×Xn ({(x1 , ..., xn )}|θ) = BelXi (xi ) i=1 n
Y . L(x) = PlX1 ×···×Xn ({(x1 , ..., xn )}|θ) = PlXi (xi ) i=1
the second result holds under conditional conjunctive independence [Smets] ∪ similar regularities hold when using the more cautious disjunctive combination
the top decomposition also holds for Cartesian products of subsets of Xi Prof Fabio Cuzzolin
Random sets at the interface of statistics and AI
Ann Arbor, MI, May 7 2018
16 / 31
Statistical inference with belief functions
Lower and upper likelihoods
Lower and upper likelihoods (Bernoulli trials) Bernoulli trials example: Xi = X = {1, 0}, iid random variables under conditional independence and equidistribution, the traditional likelihood for a series of Bernoulli trials reads as pk (1 − p)n−k , where k is the number of successes (1) and n the number of trials let us compute the belief likelihood function for Bernoulli trials! we seek the belief function on X = {1, 0}, parameterised by p = m({1}), q = m({0}) (with p + q ≤ 1 this time) which best describes the observed sample if we apply the previous result, since all Beli are equally distributed the lower and upper likelihoods of the sample x = (x1 , ..., xn ) are: L(x) = BelX ({x1 }) · · · · · BelX ({xn }) = pk q n−k L(x) = PlX ({x1 }) · · · · · PlX ({xn }) = (1 − q)k (1 − p)n−k after normalisation, these are PDFs over the space B of all belief functions definable on X!
Prof Fabio Cuzzolin
Random sets at the interface of statistics and AI
Ann Arbor, MI, May 7 2018
17 / 31
Statistical inference with belief functions
Lower and upper likelihoods
Lower and upper likelihoods (Bernoulli trials) Numerical example
both lower likelihood (left) and upper likelihood (right) subsume the traditional likelihood pk (1 − p)n−k for p + q = 1 the maximum of the lower likelihood is the traditional ML estimate I
makes sense: the lower likelihood is highest for the most ‘committed’ belief functions (i.e. the probability measures, which attach mass to singleton elements)
upper likelihood (right) has maximum in p = q = 0 (the vacuous BF on {1, 0}) the interval of BFs joining max L with max L is the set of belief functions such that p k = n−k , those which preserve the ratio between the empirical counts q Prof Fabio Cuzzolin
Random sets at the interface of statistics and AI
Ann Arbor, MI, May 7 2018
18 / 31
Generalised logistic regression
Logistic regression
Logistic regression logistic regression models data in which one or more independent observed variables determine an outcome, represented by a binary variable conditional probabilities are assumed to have a logistic form: pi = P(Yi = 1|xi ) =
1 e−(β0 +β1 xi ) , 1 − pi = P(Yi = 0|xi ) = 1 + e−(β0 +β1 xi ) 1 + e−(β0 +β1 xi ) (1)
given a series of observations D = {(xi , Yi ), i = 1, ..., n} the parameters β0 , β1 are estimated by maximum likelihood of the sample, where L(β0 , β1 |Y ) =
n Y
Y
pi i (1 − pi )1−Yi
i=1
where Yi ∈ {0, 1} and pi is a function of β0 , β1 logistic regression yields a single conditional PDF to express second-order uncertainty on the model, we replace the conditional probability (pi , 1 − pi ) on X = {0, 1} with a conditional belief function there and look for the BFs whose parameters (masses) optimise either the lower or the upper likelihood (or a combination of both) Prof Fabio Cuzzolin
Random sets at the interface of statistics and AI
Ann Arbor, MI, May 7 2018
19 / 31
Generalised logistic regression
Framework
Generalised logistic regression upper and lower likelihoods can then be computed as L(β|Y ) =
n Y
Y
1−Yi
pi i qi
,
L(β|Y ) =
i=1
n Y (1 − qi )Yi (1 − pi )1−Yi i=1
as in logistic regression, the Beli are not equally distributed how do we generalise the logit link between observations x and outputs y ? just assuming (1) does not yield any analytical dependency for qi first simple proposal: add a parameter β2 such that qi = m(Yi = 0|xi ) = β2
e−(β0 +β1 xi ) 1 + e−(β0 +β1 xi )
(2)
we can then find lower and upper optimal estimates for the parameters β arg max L 7→ β 0 , β 1 , β 2 β
arg max L 7→ β 0 , β 1 , β 2 β
plugging these optimal parameters into (1), (2) yields an upper and a lower family of conditional belief functions given x BelX (.|β, x) BelX (.|β, x) Prof Fabio Cuzzolin
Random sets at the interface of statistics and AI
Ann Arbor, MI, May 7 2018
20 / 31
Generalised logistic regression
Dealing with rare events
Rare events with belief functions how do we use belief functions to be cautious about rare event prediction? having learned a lower and an upper family of conditional BFs given x from a training set D .. .. when observing a new x we plug it into BelX (.|β, x) and BelX (.|β, x), and get a pair of lower and upper belief functions on Y
note that each such belief function is really an envelope of logistic functions this will produce two intervals of probability values for the same x
open issues: how does this relate to results of classical logit regression? how are these two intervals related? what about optimising a combination of lower and upper likelihood instead? Prof Fabio Cuzzolin
Random sets at the interface of statistics and AI
Ann Arbor, MI, May 7 2018
21 / 31
Generalised Laws of Probability
Central limit theorem
Central limit theorems for random sets an ongoing effort is about generalising the laws of classical probability to belief functions (and random sets) the Gaussian distribution is central in probability theory and its applications (‘normal’ distribution) I I I
is the PDF with maximum entropy, among those with given mean and σ central limit theorem shows that all sums of iid random variables is Gaussian whenever test statistics or estimators are functions of sums of random variables, they will have asymptotical normal distributions
an old proposal by Dempster and Liu merely transfers normal distributions on the real line by Cartesian product with Rm more sensible/interesting option: investigating how Gaussian distributions are transformed under (appropriate) multivalued mappings other avenue of research: central limit theorems for random sets Larry G. Epstein & Kyoungwon Seo (Boston University) [2011]: ‘A Central Limit Theorem for Belief Functions’ Xiaomin Shi (Shandong University) [2015]: ‘Central limit theorems for belief measures’ Prof Fabio Cuzzolin
Random sets at the interface of statistics and AI
Ann Arbor, MI, May 7 2018
22 / 31
Generalised Laws of Probability
Total belief theorem
The total belief theorem Generalising the law of total probability conditional belief functions are crucial for our approach to inference complementary link of the chain: generalisation of the law of total probability refining: mapping from elements of one set Ω to elements of a disjoint partition of a second set Θ Bel0 = 2
�
[0,1]
�
�i
Beli = 2
�i
[0,1]
� ������������ i i
Prof Fabio Cuzzolin
Random sets at the interface of statistics and AI
Ann Arbor, MI, May 7 2018
23 / 31
Generalised Laws of Probability
Total belief theorem
The total belief theorem [Zhou & Cuzzolin, UAI’17] Total belief theorem Suppose Θ and Ω are two finite sets, and ρ : 2Ω → 2Θ the unique refining between them. Let Bel0 be a belief function defined over Ω = {ω1 , ..., ω|Ω| }. Suppose there exists a collection of belief functions Beli : 2Πi → [0, 1], where Π = {Π1 , ..., Π|Ω| }, Πi = ρ({ωi }), is the partition of Θ induced by Ω. Then, there exists a belief function Bel : 2Θ → [0, 1] such that: 1
Bel0 is the marginal of Bel to Ω (Bel0 (A) = Bel(ρ(A)));
2
Bel ⊕ BelΠi = Beli ∀i = 1, ..., |Ω|, where BelΠi is the logical belief function: mΠi (A) = 1 A = Πi , 0 otherwise the belief function
−→ . Bel = Bel0↑Θ ⊕ Bel −→ −−→ −−−→ where Bel = Bel1 ⊕ · · · ⊕ Bel|Ω| is Dempster’s sum of the (conditional embeddings) of the conditional BFs Beli , and Bel0↑Θ is the vacuous extension of Bel0 from Ω to Θ, is a solution other distinct solutions exists, and they likely form a graph with symmetries Prof Fabio Cuzzolin
Random sets at the interface of statistics and AI
Ann Arbor, MI, May 7 2018
24 / 31
Machine learning in the wild
Model adaptation
The problem with machine learning Generalising from scarce data
machine learning: designing algorithms that can learn from data BUT, we train them on a ridicously small amount of data: how can we make sure they are robust to new situations never encountered before (model adaptation)? we need to look at the foundations: statistical learning theory [Vapnik] Prof Fabio Cuzzolin
Random sets at the interface of statistics and AI
Ann Arbor, MI, May 7 2018
25 / 31
Machine learning in the wild
Statistical learning theory
Vapnik’s statistical learning theory makes predictions on the reliability of a training set based on simple quantities such as number of samples N generalisation issue: training error is different from the expected error: Ex∼p [δ(h(x) 6= y (x))] 6=
N X
δ(h(xn ) 6= y (xn ))
n=1
the training data x = (x1 , ..., xn ) is assumed drawn from a distribution p, h(x) is the predicted label for input x and y (x) the actual label
Probabilistically Approximately Correct learning The learning algorithm finds with probability at least 1 − δ a model h ∈ H which is approximately correct, i.e. it makes a training error of no more than PAC learning aims at providing generalisation bounds of the kind: ˆ − L(h∗ ) > ] ≤ δ, P[L(h) ˆ of model h ˆ learned from the training set, on the difference between the loss L(h) and the minimal theoretical loss L(h∗ ) for that class of models h ∈ H Prof Fabio Cuzzolin
Random sets at the interface of statistics and AI
Ann Arbor, MI, May 7 2018
26 / 31
Machine learning in the wild
Towards a robust statistical learning theory
Generalising statistical learning theory training distribution probability simplex test distribution
training samples
test samples
random set
the issue is: training and test data are assumed to be sampled from the same (unknown) probability distribution p machine learning deployment ‘in the wild’ has shown that that is hardly the case, leading to sometimes catastrophic failure (see Tesla, or recently Uber) we recently [BELIEF’18, u/r] took a first step towards robustifying PAC learning, by analysing the in the case of finite, realisable models we adopt the relaxed assumption that training and test distribution come from a known convex set of distributions (or possibly a random set) Prof Fabio Cuzzolin
Random sets at the interface of statistics and AI
Ann Arbor, MI, May 7 2018
27 / 31
Machine learning in the wild
Towards a robust statistical learning theory
Generalisation bounds for finite, realisable model spaces we wish to generalise the proof of Th 4, https://web.stanford.edu/class/cs229t/notes.pdf Let H be a hypothesis class, where each hypothesis h ∈ H maps some X to Y , l be the zero-one loss: l((x, y ), h) = I[y 6= h(x)], p be any distribution over X × Y ˆ be the empirical risk minimiser. and h
Theorem Assume that: (1) the model space H is finite, and (2) there exists a hypothesis h∗ ∈ H that obtains zero expected risk, that is: L(h∗ ) = E(x,y )∼p [l((x, y ), h∗ )] = 0, Then, with probability at least 1 − δ: ˆ ≤ L(h)
Prof Fabio Cuzzolin
log |H| + log(1/δ) . n
Random sets at the interface of statistics and AI
Ann Arbor, MI, May 7 2018
28 / 31
Machine learning in the wild
Towards a robust statistical learning theory
Credal realisability Generalising statistical learning theory [Cuzzolin, BELIEF’18, u/r] a central notion is that of realisability: there exists a hypothesis h∗ ∈ H that obtains zero expected risk . L(h∗ ) = E(x,y )∼p [l((x, y ), h∗ )] = 0 in the credal case we can replace this with credal realisability: ∃h∗ ∈ H, p∗ ∈ P : Lp∗ (h∗ ) = 0 unfortunately, the traditional proof does not apply under credal realisability uniform credal realisability can also be proposed: ∀p ∈ P, ∃hp∗ ∈ H : Ep [l(hp∗ )] = Lp (hp∗ ) = 0 in general, SLT proofs rely on classical concentration inequalities: random set version? does assuming that the credal set is actually a random set simplify the derivations? Prof Fabio Cuzzolin
Random sets at the interface of statistics and AI
Ann Arbor, MI, May 7 2018
29 / 31
Conclusions
Some conclusions, and a research programme we have appreciate the role of belief and random set theory at the boundary of statistics and AI I I I I
belief likelihood function as a generalisation of traditional likelihood generalised logistic regression for rare event analysis generalised laws of probability and the total belief theorem robustification of statistical learning theory
further development of machine learning tools I I
generalisation of maximum entropy classification and log-linear models random set random forests
fully developed theory of statistical inference with random sets I I
random set random variables, generalisation of Radon-Nikodym derivative frequentist inference with random sets
intriguing solutions to high impact problems I I
robust climatic change predictions robust statistical learning theory for machine learning ‘in the wild’
Prof Fabio Cuzzolin
Random sets at the interface of statistics and AI
Ann Arbor, MI, May 7 2018
30 / 31
Appendix
For Further Reading
For Further Reading I
G. Shafer. A mathematical theory of evidence. Princeton University Press, 1976. I. Molchanov. Theory of Random Sets. Springer, 2017. F. Cuzzolin. Visions of a generalized probability theory. Lambert Academic Publishing, 2014. F. Cuzzolin. The geometry of uncertainty - The geometry of imprecise probabilities. Springer-Verlag (in press). Prof Fabio Cuzzolin
Random sets at the interface of statistics and AI
Ann Arbor, MI, May 7 2018
31 / 31