Functional Registration and Local Variations - arXiv

7 downloads 8312 Views 3MB Size Report
Feb 12, 2017 - which bridges functional registration and optimal transportation. ..... them, since the warped data are not observed at all points of their domain. .... in Equation 3 of Definition 1 by just this one (the finest grid we get to observe).
Functional Registration and Local Variations Anirvan Chakraborty and Victor M. Panaretos

arXiv:1702.03556v1 [stat.ME] 12 Feb 2017

Institut de Math´ematiques, ´ Ecole Polytechnique F´ed´erale de Lausanne e-mail: [email protected]; [email protected] Abstract: We study the problem of nonparametric registration of functional data that have been subjected to random deformation (warping) of their time scale. The separation of this phase variation (“horizontal” variation) from the amplitude variation (“vertical” variation) is crucial in order to properly conduct further analyses, which otherwise can be severely distorted. We determine precise conditions under which the two forms of variation are identifiable, under minimal assumptions on the form of the warp maps. We show that these conditions are sharp, by means of counterexamples. We then propose a nonparametric registration method based on a “local variation measure”, which bridges functional registration and optimal transportation. Our method is proven to consistently estimate the warp maps from discretely observed data, without requiring any penalisation or tuning on the warp maps themselves. This circumvents the problem of over/under-registration often encountered in practice. Similar results hold in the presence of measurement error, with the addition of a pre-processing smoothing step. We carry out a detailed theoretical investigation of the strong consistency and the weak convergence properties of the resulting functional estimators, establishing rates of convergence. We also give a theoretical study of the impact of deviating from the identifiability conditions, quantifying it in terms of the spectral gap of the amplitude variation. Numerical experiments demonstrate the good finite sample performance of our method, and the methodology is further illustrated by means of a data analysis. Keywords: Identifiability, Optimal Transportation, Registration, Total variation, Warping

Contents 1

Introduction . . . . . . . . . . . . . . . . . 1.1 Our Contributions . . . . . . . . . . 2 Model Identifiability . . . . . . . . . . . . 3 Methodology . . . . . . . . . . . . . . . . 3.1 Fully Observed Functions . . . . . . 3.2 Discretely Observed Functions . . . 3.3 Measurement Error . . . . . . . . . . 4 Asymptotic Theory . . . . . . . . . . . . . 4.1 Consistency and Weak Convergence 4.2 Effect of Model Misspecification . . . 5 Numerical Experiments . . . . . . . . . . 5.1 Well-specified Model without error . 5.2 Well-specified Model with error . . . 5.3 Misspecified Model . . . . . . . . . . 6 Data Analysis . . . . . . . . . . . . . . . . Appendix – Proofs of Formal Statements . . References . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . 1

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

2 3 4 6 7 7 9 10 11 14 16 16 19 19 20 23 44

A. Chakraborty and V. M. Panaretos/Functional Registration and Local Variations

2

1. Introduction Functional observations can fluctuate around their mean structure in broadly two ways: (a) amplitude variation, and (b) phase variation. The first type of variation is analysed using functional principal component analysis, which stratifies the variation in amplitude (or variation in the “vertical axis”) across the different eigenfunctions of the covariance operator of the underlying distribution. The second kind of variation, if present, is more subtle and can drastically distort the analysis of a functional dataset. It typically manifests itself in functional data representing physiological processes or physical motion, and consists in deformations of the time scale of the functional data (or variation in the “horizontal axis”), associating to each observation its own unobservable time scale resulting from a transformation of the original time scale by a time warp. Specifically, instead of observing curves tXi ptq : r0, 1s Ñ ri “ Xi ˝ T ´1 , where the Ti ’s are unobservable (random) Runi“1 , one actually observes warped versions X i homeomorphisms termed warp maps. In the presence of phase variation, the mean of the warped data conditional on the warping, EpXi |Ti q “ µ˝Ti´1 , is a distortion of the true mean µ by the warp map. Failing to account for the time transformation will yield deformed mean estimates, converging to Erµ˝Ti´1 s rather than µ. More dramatic still will be the effect on the estimation of the covariance of the latent process, inflating its essential rank, and yielding uninterpretable principal components. We refer to Section 2 in Panaretos and Zemel (2016) for a detailed discussion of these effects. Consequently, in the presence of phase variation in the data, the natural first step in the analysis should be to register the data, i.e., to simultaneously transform/synchronise the curves back to the objective time scale. Owing to the rather complex nature of the registration problem, a variety of different assumptions on the nature of the warp maps Ti have been considered, and correspondingly a multitude of methods have been investigated: landmark based registration (Kneip and Gasser, 1992); template/target based registration (Ramsay and Li, 1998); registration using dynamic time warping (Wang and Gasser, 1997, 1999); registration based on local regression (Kneip et al., 2000); a “self-modelling” approach by Gervini and Gasser (2004) for warp maps expressible as linear combinations of B-splines; related registration procedures under assumptions on functional forms of the warp maps that result in a finite dimensional family of deformations (Rønn, 2001; Gervini and Gasser, 2005); a functional convex synchronization approach to registration (Liu and M¨ uller, 2004); registration using “moments” of the data curves (James, 2007); registration based on a parsimonious representation of the registered observations by the principal components (Kneip and Ramsay, 2008); pairwise registration of the warped functional data under monotone piecewise-linear warp maps (Tang and M¨ uller, 2008); a joint amplitude-phase analysis with this pairwise registration procedure but considering step-function (thus finite dimensional) approximations of the warp maps using finite difference of their log-derivatives (centered log-ratio transform) (Hadjipantelis et al., 2015); registration when the warp maps are generated as compositions of elementary “warplets” (Claeskens, Silverman and Slaets, 2010); and registration using a warp-invariant metric between curves when the warp functions are diffeomorphisms on an interval (Srivastava et al., 2011). The above list is not exhaustive and we refer to Marron et al. (2015) for an oveview and comparison of some of the registration procedures mentioned above. Several of the above contributions consider the case when the warp maps are themselves random, and in such cases, a canonical set of assumptions is usually required: (a) T is an increasing homeomorphism with probability one, and (b) EpT q “ Id, where Id is the identity map, Idpxq “ x.

A. Chakraborty and V. M. Panaretos/Functional Registration and Local Variations

3

The first assumptions rules out “time-reversal” or “time-jumps”, while the second disallows an overall speed-up or slow-down of time. Further to these natural assumptions, additional smoothness and structural assumptions are typically also imposed on T (e.g. Rønn (2001), Gervini and Gasser (2004, 2005), Tang and M¨ uller (2008) and Claeskens, Silverman and Slaets (2010)) which require tuning parameters to be selected. However, it is unclear whether these additional assumptions are truly necessary for identifiability to hold. It is natural to ask what assumptions must one minimally impose on the latent functional data generating process so that the registration problem be identifiable only under conditions (a) and (b) on the warp maps. There seems to be a consensus that identifiability (and hence consistency) rests crucially on an implicit assumption that either phase variation or amplitude variation is of low rank. In other words, that one or the other type of variation is dominant. This is reflected directly or indirectly in many of the earlier contributions in the literature, whether by explicit model assumptions, or by the asymptotic regime considered. Often, either the warp maps are assumed to be essentially parametric or the data generating process is low rank, or both e.g., Rønn (2001), Gervini and Gasser (2005), James (2007) and Claeskens, Silverman and Slaets (2010). Variants of the model Xi ptq “ ξi φptq ` δi ptq,

i “ 1, 2, . . . , n

(1)

for the latent process are often encountered in the literature, with φ a unit norm deterministic function, ξ a random variable, i ptq a zero-mean error term of unit variance (i.e. E||i ||22 “ 1), and δ 2 small relative to vartξi u. In other words, it is postulated that if it were not for phase variation, important landmark features such as peaks and valleys of the latent process would not drastically change from realisation to realisation. Gervini and Gasser (2004) assumed this representation, but imposing rather restrictive conditions on the ξi ’s and the Ti ’s. Tang and M¨ uller (2008) assumed that the Ti admit a p-dimensional representation in a known basis, ξi “ 1 for all i, and additionally that δ Ñ 0 as n Ñ 8 to ensure asymptotically identifiable (consistent) registration. Similar models were also considered by Wang and Gasser (1997, 1999) and Srivastava et al. (2011). Note that Model (1) is a rank one model for the latent process when one has exactly δ “ 0 rather than just δ 2 ! vartξi u. This model also fits with the landmark principle of registration (Kneip and Gasser, 1992), which essentially stipulates that the true curves have similar shape (thus having the same landmarks) but possibly differ in their amplitude component. 1.1. Our Contributions Our contributions in this paper are twofold. Firstly, we elucidate the link between the identifiability issue for a general registration problem, the rank of the latent model, and the warp map structure as reflected in assumptions (a) and (b) (Section 2). We prove that the registration problem is identifiable when the amplitude variation is exactly of rank 1 (Theorem 1). Conversely, unless more restrictive structural conditions are imposed on the warp maps, we show by means of counterexamples that the rank 1 assumption is also necessary – even a low rank assumption on the warp maps will not do. Secondly, we develop methodology to address the problem of nonparametric and consistent recovery of the warp maps from discretely warped curves, without structural assumptions on the warp maps further to (a) and (b), and without any penalisation or tuning parameters related to the warp maps themselves. Minimal structural assumptions are particularly desirable since, in practice, one rarely has more detailed insights regarding the underlying warping phenomenon. And circumventing penalisation/tuning has the

A. Chakraborty and V. M. Panaretos/Functional Registration and Local Variations

4

important practical advantage that there is no danger of “over-registering” (overfitting) the data, on account of the tuning of a penalty on the registration maps. The key to our identifiability and methodological results is the novel use of a criterion that measures the local amount of deformation of the time scale (Section 3). Specifically, we introduce the local variation şt measure of X, with associated cumulative distribution JX ptq “ 0 |X 1 puq|du, which reflects how the total variation of the curve is distributed on the real axis. We show that the effect of the time deformation on the local variation distribution has a transparent interpretation in terms of optimal transportation (see, e.g., Villani (2003)). Our approach exploits this connection in order to deduce identifiability and to estimate the unobservable warp maps and register the functional data. Indeed, it is precisely the structure of optimal transportation that exempts us from the need of additional smoothness/structural conditions on the warp maps T , and consequently from the need to introduce registration tuning parameters – even when the curves are observed over a discrete grid1 . Although our procedure involves derivatives, we actually do not need to estimate any derivatives from discretely observed data if there is no measurement error, as we can exploit an equivalent definition of total variation using finite differences over partitions of the domain. If there is measurement error, then a pre-processing smoothing step is required, but still no additional penalisation of the registration maps is necessary. We carry out a complete asymptotic analysis in all possible observation regimes: complete observation, discrete observation without error, and measurement error contamination. In all cases, we prove that the nonparametric estimators obtained are consistent as the number of observations grows, and the measurement grid becomes dense, and additionally derive rates of convergence and weak convergence for all the quantities involved (Section 4, Theorems 2, 3, 4, 5). We also investigate the potential impact of model misspecification (departures from the identifiable regime) in Section 4.2, where we derive theoretical results quantifying the asymptotic amount of bias incurred in terms of the spectral gap of the amplitude variation (Theorem 6). The finite sample performance of our methodology is studied in Section 5, for all possible observation regimes. We further numerically probe the impact misspecification, and quantify the stability of our method depending on the degree of misspecification. The method is further illustrated by analysis of a functional dataset of Triboleum beetle larvae growth curves, in Section 6. The proofs of all the theorems are collected in the Appendix. 2. Model Identifiability Recall that a “standard model” for the latent process prior to warping (Equation 1) takes the general form Xptq “ ξφptq ` δptq. This, depending on the constraints imposed on the random variable ξ and the scalar δ, can be of arbitrarily large rank. Usually vartξu is expected to be the dominant effect relative to δ (i.e. δ 2 ăă vartξu), corresponding to an “approximately rank 1” model. We now give sufficient conditions on the “standard 1

Of course, once the warp maps are estimated, one would have to smooth the warped discrete data in order to register them, since the warped data are not observed at all points of their domain. And, if there is measurement error in the observations, then some pre-smoothing will be needed. But in either case, this smoothing will be on the data itself (either as a pre-processing or post-processing step), and no smoothing penalties or structural assumptions will be required on the registration maps themselves.

A. Chakraborty and V. M. Panaretos/Functional Registration and Local Variations

5

model” so that identifiability will hold, without any assumptions on T further to the minimal assumptions (a) and (b). In simple terms, the process must be exactly of rank 1: either δ must be zero, or ptq a deterministic function in the span of φ. Theorem 1 (Identifiability). Let tX1 , X2 u be a random elements in C 1 r0, 1s of rank one, i.e., Xi ptq “ ξi φi ptq for deterministic functions φi with ||φi ||2 “ 1, and with φ1i vanishing on at most a countable set. Assume that tT1 , T2 u are strictly increasing homeomorphisms in C 1 r0, 1s, and such that EpTi q “ Id. ri “ Xi pT ´1 ptqq. Then, Write X i ! ) d r d r1 “ X X2 ðñ T1 “ T2 , φ1 “ ˘φ2 , ξ1 “ ˘ξ2 . The next counterexample shows that the rank 1 condition is necessary, unless more than (a) and (b) is assumed on T . Counterexample 1 (Necessity of rank 1 condition). We will show that the same process can arise in two ways: (i) as a rank one process subjected to warping by a non-trivial warp map T satisfying (a) and (b), and (ii) a rank two process with no warping (i.e., with identity warp map, which obviously satisfies (a) and (b)). Define f ptq “ t2 & gptq “ p4t ´ t2 q{3, t P r0, 1s. ? Take ξ to be a standard Gaussian random variable and φptq “ t{ 3 for t P r0, 1s. Now define a random warp map T such that P rT “ f s “ 1´P rT “ gs “ 1{4. Then T satisfies (a), and EpT q “ f {4`3g{4 “ Id, so (b) is also satisfied. Now, r “ ξφ ˝ T ´1 “ ξ1 T ´1 “ ξ1 pf ´1 U ` g ´1 p1 ´ U qq, X ? where U is a Bernoulli random variable with success probability 1{4 and ξ1 “ ξ{ 3. Let V “ ξ1 U and W “ ξ1 p1 ´ U q so that r “ V f1 ` W g1 , X ? ? where f1 ptq “ f ´1 ptq “ t and g1 ptq “ g ´1 ptq “ 2 ´ 4 ´ 3t, t P r0, 1s. It is easy to check that V arpV q “ 1{12, V arpW q “ 1{4 and CovpV, W q “ 0. Further, it is easy to show that f1 and g1 are linearly independent. Consequently, we may define a new process Y “ V f1 ` W g1 , which is a rank two d r r “ r Yr ) but they have been generated using process. Define Yr “ Y ˝ Id´1 “ Y . Then, X Y (in fact X“ two different warp maps, namely T and Id, which of course do not have the same distribution Note that the model for X considered in Theorem 1 implies that µptq :“ ErXptqs “ φptqEpξq, i.e., µ is constrained to lie in the span of φ. In particular the model X “ µ ` ξφ is not identifiable subject to warping unless µ P spantφu. A counterexample can be constructed along the same lines as above. Finally, one might also wonder, whether at least the assumption that ErT s “ Id can be relaxed in the context of the rank 1 model. The next counterexample shows that this is not the case either. Counterexample 2 (Necessity of EpT q “ Id). Suppose that EpT q “ f0 with f0 ‰ Id and f0 being a strictly increasing homeomorphism on r0, 1s. Define S “ T ˝ f0´1 . It follows that EpSq “ Id. Now r “ ξφ ˝ T ´1 “ ξφ ˝ f ´1 ˝ S ´1 “ ξφ0 ˝ S ´1 , where φ0 “ φ ˝ f ´1 . Let c0 “ ||φ0 ||2 . Define ξ0 “ c0 ξ X 0 0 and φ1 “ φ0 {c0 . Then, ||φ1 || “ 1. So the resulting processes have the same distribution (actually the processes are equal) but have been generated using different warp maps S and T , which do not have the same distribution because they have different means.

A. Chakraborty and V. M. Panaretos/Functional Registration and Local Variations

6

In light of Theorem 1, we will postulate the following model: r : r0, 1s Ñ R Assumptions 1 (Identifiable Phase Variation Model). The warped random function X arises as r Xptq “ XpT ´1 ptqq “ ξϕpT ´1 ptqq, (2) where: (M1) ξ is a real-valued random variable of finite variance. (M2) ϕ P C 1 pr0, 1sq is a deterministic function of unit L2 -norm, whose derivative vanishes at most on a countable subset of r0, 1s. (M3) T is a strictly increasing random C 1 -homeomorphism on r0, 1s such that ErT s “ Id. With model identifiability being established, we now turn to nonparametric methods of estimation. For these, we will require the notion of local variation measure, introduced in the next section. 3. Methodology Recall that the total variation of a continuous function hpxq : r0, 1s Ñ R measures the total distance sweeped by the ordinate y “ hpxq of its graph, as the abscissa x moves from 0 to 1. By distorting functions “in the x-domain” through an increasing homeomorphism, phase variation will not affect the total amount of variation accrued over the interval r0, 1s. However, it will redistribute this total variation over the subintervals of r0, 1s. This redistribution can be measured by focussing on local variation: Definition 1 (Local Variation Distribution). Given any real function h P Cpr0, 1sq, we define Jh ptq “ sup

|K| ÿ

|hpτk`1 q ´ hpτk q|

(3)

KPKt k“0

where Kt “ tτ0 , τ1 , . . . , τ|K| u is a partition of r0, ts and Kt is the collection of all finite partitions of r0, ts. When h P C 1 pr0, 1sq, it holds that żt Jh ptq “ |h1 puq|du. (4) 0

Noting that Jh p1q is the total variation of h, we define the local variation distribution as L Fh ptq “ Jh ptq Jh p1q. The main motivation behind our approach is that warping affects the local variation of the underlying process in a rather predictable manner – one that can be used as a basis for estimation: Lemma 1 (Local Variations and Warp Maps). In the context of Model 2, let F and Fr denote the local r Then, under assumptions (M1)-(M3), F and variation distributions associated with functions X and X. Fr are strictly monotone almost surely, and satisfy, F “ Fφ ,

T “ Fr´1 ˝ Fφ ,

&

EpFr´1 q “ Fφ´1 .

Remark 1. In the language of optimal transportation, Lemma 1 simply says that the warp map pushes forward the original local variation distribution F (“ Fφ ) to the warped local variation distribution Fr, and indeed optimally in terms of quadratic transportation cost. Furthermore, the equality EpFr´1 q “ Fφ´1 at the level of quantile functions is equivalent to stating that Fφ (“ F ) is the Fr´echet mean of the random distribution Fr, with respect to the 2-Wasserstein distance (Villani (2003)).

A. Chakraborty and V. M. Panaretos/Functional Registration and Local Variations

7

ri : i “ 1, 2, . . . , nu of randomly warped functional data issued from Now consider an i.i.d. sample tX Model 2. We wish to construct nonparametric estimators of the tφi uni“1 and predictors of the tTi uni“1 on Ăi un . The key to this will be to employ the Equation EpFr´1 q “ F ´1 in Lemma 1. We the basis of tX i“1 φ Ăi un in the next three sections: complete observation, consider three different observation regimes on tX i“1 discrete noiseless observation, and discrete observation with measurement error. 3.1. Fully Observed Functions r are fully observed, we may proceed as follows: If the functions tXu Step 1: Estimate Fφ by its natural (as per Lemma 1) estimator ˜ Fpφ “

n´1

n ÿ

¸´1 Fri´1

,

i“1

ri u. noting that the tFri u are immediately available by complete observation of the tX Step 2: Predict the warp map Ti by Tpi “ Fri´1 ˝ Fpφ , and the registration map Ti´1 by Tpi´1 . pi “ X ri ˝ Tpi . Step 3: Register the observed warped functional data, by means of X p Step 4: Compute the empirical covariance operator, say, Kx r of the registered data tXi u and estimate φ by the leading eigenfunction φp of Kx r (as a convention, assume that this estimator is aligned with the p true φ, i.e., xφ, φy ě 0). p 2. pi , φy Step 5: Estimate ξi by ξpi “ xX

3.2. Discretely Observed Functions ri ’s are not fully observed. Instead, we observe point evaluations In the discretely observed setting, the X ri,d “ pX ri pt1 q, X ri pt2 q, . . . , X ri ptr qq1 , X

i “ 1, ..., n.

Here, 0 ď t1 ă t2 ă . . . ă tr ď 1 is a grid over r0, 1s, assumed asymptotically homogeneous in that max1ďjďr´1 ptj`1 ´ tj q “ Opr´1 q as r Ñ 8. Note that since the observations have been generated from Model 2, we have Xi ptj q “ ξi φpTi´1 ptj qq. The latent discrete process is denoted by Xi,d “ pXi pt1 q, Xi pt2 q, . . . , Xi ptr qq1 and satisfies Xi ptj q “ ξi φptj q. Our strategy will be to mimic Steps 1–5 from the fully observed setup. Since the Xi ’s are no longer fully observed, though, in order to have versions of the Fi and Fri , we will draw inspiration from the general definition of the local variation distribution (Equation 3 in Definition 1). First, define Fi,d ptq “

ÿ jPIt

N r´1 ÿ |Xi ptj`1 q ´ Xi ptj q| |Xi ptj`1 q ´ Xi ptj q| j“1

for t P r0, 1s and each i “ 1, 2, . . . , n, where It is the set of all j’s satisfying tj`1 ď t. Note that because we only observe each curve over the grid 0 ď t1 ă t2 ă . . . ă tr ď 1, we have replaced the supremum

A. Chakraborty and V. M. Panaretos/Functional Registration and Local Variations

8

over all grids in Equation 3 of Definition 1 by just this one (the finest grid we get to observe). Notice that Fi,d ptq “ Fd ptq for all i “ 1, 2, . . . , n, where Fd ptq “

ÿ

N r´1 ÿ |φptj`1 q ´ φptj q|. |φptj`1 q ´ φptj q| j“1

jPIt

Clearly, Fd has jump discontinuities at the grid points tj ’s, is cadlag, and satisfies Fd ptq “ 0 for all t P r0, t2 q and Fd ptq “ 1 for all t P rtr , 1s. Also, its jumps are at most of size ar “ max1ďjďr´1 |φptj`1 q ´ ř φptj q|{ r´1 j“1 |φptj`1 q ´ φptj q|. For the (discretely) observable warped process, we define Fri,d ptq “

ÿ

ri ptj`1 q ´ X ri ptj q| |X

N r´1 ÿ



(5)

j“1

jPIt

ÿ

ri ptj`1 q ´ X ri ptj q| |X

|φpsi,j`1 q ´ φpsi,j q|

N r´1 ÿ

jPIt

|φpsi,j`1 q ´ φpsi,j q|,

j“1

where si,j “ Ti´1 ptj q for each i and j are unobserved random variables. The Fri,d ’s have jump discontinuities at the grid points, and are cadlag. The maximum jump size of Fri,d is Ai,r “ max1ďjďr´1 |φpsi,j`1 q ´ ř φpsi,j q|{ r´1 j“1 |φpsi,j`1 q ´ φpsi,j q|. With these definitions in place, we can now adapt Steps 1–5 to the discrete case. In what follows, the generalized inverse of a function G is denoted by G´ , i.e., G´ ptq “ inftu : Gpuq ě tu. The first two steps will remain invariant, except for the fact that they will now employ the discrete local variation measures. This means that we will not require any tuning parameters or smoothness assumptions to estimate the warp and registration maps. The registration itself (the last three steps) will require some smoothing, of course, if it is to make sense: ř ř ´ ´ ´ . u , and Fd´ by Fpd˚ “ n´1 ni“1 Fri,d Step 1˚ : Estimate Fd by Fpd “ tn´1 ni“1 Fri,d ´ ˚ “ Step 2˚ : Predict the random warp map Ti by Tpi,d “ Fri,d ˝ Fpd and the registration map Ti´1 by Tpi,d ř ´ Fpd˚ ˝ Fri,d “ tn´1 ni“1 Fri,d u ˝ Fri,d .

ri ’s are observed discretely, we do not have information about their values between Step 3˚ : Since the X ri,d using the Nadaraya-Watson kernel regression grid points. Thus, we first smooth each of the X estimator for an appropriately chosen kernel k and bandwidth h, denoting resulting smoothed functions by Xi: , ˆ ˙ Nÿ ˆ ˙ r r ÿ t ´ tj r t ´ tj : Xi ptq “ k Xi ptj q k . h h j“1 j“1 Define p ˚ ptq “ X : pTpi,d ptqq, X i i

i “ 1, 2, . . . , n ř p ˚ for their mean. to be the registered functional observations and write X r˚ “ n´1 ni“1 X i p˚ Step 4˚ : Compute the empirical covariance operator Kx r˚ of the registered curves Xi , and use its leading eigenfunction φp˚ as the estimator of φ (again, assume the convention that the sign is correctly identified, i.e., xφp˚ , φy ě 0). p ˚ , φp˚ y2 for each i ě 1. Step 5˚ : Finally, estimate ξi by ξpi˚ “ xX i

A. Chakraborty and V. M. Panaretos/Functional Registration and Local Variations

9

We should point out here that our method is also straightforwardly applicable in the situation where ri ’s are observed, say, 0 ď ti,1 ă ti,2 ă . . . ă ti,r ď 1, differs with i. The reason the grid over which the X i for this compatibility is the fact that our approach considers only one curve at a time. We formulate it in the notationally simpler case of a common grid, in order to alleviate the notation in the statement of our asymptotic results in Section 4. 3.2.1. Some Practical Issues As mentioned earlier, Fri,d is a step function with jump discontinuities at the grid points. In particular, ´ ´ Fri,d ptq “ 0 for t P r0, t2 q and Fri,d ptq “ 1 for t P rtr , 1s. Thus, Fri,d p0q “ 0 and Fri,d p1q “ tr , which is less p than 1 if tr ă 1, i.e., the grid does not include the right end-point. In this case, Fd ptq and thus Tpi,d ptq is ´ properly defined only for t P r0, tr s. Also, Fri,d puq ď tr and equality holds iff u P pFri,d ptr´1 q, 1s. Thus, + # n ÿ ´ ´1 Fr puq ě tr “ inftu : Fr´ puq “ tr @ i “ 1, 2, . . . , nu Fpd ptr q “ inf u : n i,d

i,d

i“1

“ inftu : u P Xni“1 pFri,d ptr´1 q, 1su “ max Fri,d ptr´1 q. 1ďiďn

´ p Fpi,d pFd ptr qq

´ Fpi,d pmax1ďjďn Frj,d ptr´1 qq

Then, Tpi,d ptr q “ “ “ tr . One can then extend Tpi,d ptq to the whole of r0, 1s by, e.g., linearly interpolating between ptr , Tpi,d ptr qq “ ptr , tr q and p1, 1q. This practical modification, in case tr ă 1, enjoys the same asymptotic properties as the originally defined estimator (Section 4), since the effect of the modification is asymptotically negligible due to the homogeneity assumptions on the grid. ř ´ puq “ tr iff u P Xni“1 pFri,d ptr´1 q, 1s “ pmax1ďiďn Fri,d ptr´1 q, 1s. So, in Similarly, Fpd˚ puq “ n´1 ni“1 Fri,d ˚ p1q “ F p˚ pFri,d p1qq “ Fp˚ p1q “ tr ă 1. This is not a problem since this estimator case tr ă 1, we have Tpi,d d d is not used in the registration procedure and the problem disappears asymptotically anyway, just as described above. We conclude this section by noting that, since the estimates Tpi,d of the warp maps do not involve any smoothing and are obtained from compositions of step functions, the resulting registered curves will not be very smooth. This will be particularly noticeable if the number of grid points is small. Note that even in that case, the estimated mean function will be smoother if the sample size is moderately large. If one is interested in obtaining a smooth registration of the sample curves, the following procedure may be adopted. First, we produce smooth versions of the Tpi,d by some non-parametric smoothing procedure, e.g., polynomial splines of a fixed degree m, and call these new estimates as Tpi,s , say. Then, we plug-in these smoothed estimates of the warp functions and define the new registered observations as p ˚ ptq :“ X : pTpi,s ptqq. It is well-known that a spline smoothed estimate of a smooth function converges to X i i that function in the L2 r0, 1s sense provided the oscillations of the function go to zero as the number of knots grows to infinity (see Theorem 6.27 in Schumaker (2007)). The latter holds for the Tpi,d ’s since they lie in L2 r0, 1s (see equation (2.121) in Theorem 2.59 in Schumaker (2007)). Thus, this modified estimator will also provide consistent registration. 3.3. Measurement Error It can often happen that the discretely observed functional data be additionally contaminated by measurement error. In this case, one has to suitably adapt the registration procedure. In the presence of measureri,d ` ei , where X ri,d was defined in Section 3.2, and ei “ pi,1 , i,2 , . . . , i,r q1 ment error, we observe Yi,d “ X

A. Chakraborty and V. M. Panaretos/Functional Registration and Local Variations

10

with the ti,j : j “ 1, 2, . . . , r, i “ 1, 2, . . . , nu being a collection of i.i.d. error variables with zero mean and variance σ 2 , independent of the processes and warp maps. The latent process Xp¨q is assumed to be rank one as earlier. We will modify the registration procedure as follows. First, construct a non-parametric function estir 1 , which is the derivative of the warped process X ri , using the observation Yi,d for each i, and mator of X i p1q p p¨q. Define analogues of the Fri ’s as call this estimator X i,w żt Fri,w ptq “ 0

Nż 1 p1q p p p1q puq|du, |Xi,w puq|du |X i,w

t P r0, 1s.

0

Note that unlike the discrete observation case described in the previous section, we now have fully r 1 for each i, which allows us to mimic the algorithm in the fully observed scenario functional versions of X i in Section 3.1. ´ ¯´1 ř ´1 Step 1˚˚ : Estimate Fφ by Fpφ,e “ n´1 ni“1 Fri,w . ´1 ´1 Step 2˚˚ : Predict the warp map Ti by Tpi,e “ Fri,w ˝ Fpφ,e , and the registration map by Tpi,e .

ri ’s using the Yi,d ’s, and call them X pi,w p¨q’s. Step 3˚˚ : Construct non-parametric function estimators of the X p ˚ ptq “ X pi,w pTpi,e ptqq, i “ 1, 2, . . . , n to be the registered functional observations. Define X i,e ř pi,e for the mean of the registered observations and let Kx Step 4˚˚ : Write X e˚ “ n´1 ni“1 X e˚ denote their empirical covariance operator. Take its leading eigenfunction, denoted by φpe˚ , as the estimator of φ (assuming the same sign convention as earlier). pi,e , φpe˚ y for each i ě 1. Step 5˚˚ : Finally, estimate ξi by ξpi˚,e “ xX There are two smoothing steps involved in the above algorithm. Given the large literature on nonparametric smoothing techniques, one can choose any smoother. However, the asymptotic results will depend on the efficiency of the chosen smoothing techniques. From now on in this paper, we will use a p p1q . We will then local quadratic regression approach with kernel k1 p¨q and bandwidth h1 p¨q for finding X i,w pi,w . These choices are use a local linear estimator with kernel k2 p¨q and bandwidth h2 p¨q for estimating X motivated by the advantages of local polynomial estimators in dealing with boundary effects (see, e.g., Fan and Gijbels (1996) and Wand and Jones (1995) for further details on various smoothing techniques). More details on the choices of smoothing parameters are given in Remark 4 after Theorem 5. 4. Asymptotic Theory We next study the asymptotic properties of the estimators obtained above. First, in Section 4.1, we develop asymptotic theory assuming that Model 2 has been correctly specified (in other words, that we are in an identifiable regime). We develop separate results for each of the three observation regimes considered (full observation, discrete observation, discrete observation with measurement errors). Then, we consider the effect of model misspecification on the asymptotic performance of our estimators in Section 4.2. In what follows, the space C 1 r0, 1s is equipped with the norm |||f |||1 “ ||f ||8 ` ||f 1 ||8 , where ||¨||8 is the usual sup-norm. The 2-Wasserstein distance between distributions G1 and G2 will be denoted by dW pG1 , G2 q.

A. Chakraborty and V. M. Panaretos/Functional Registration and Local Variations

11

4.1. Consistency and Weak Convergence Our first two results concern the fully observed case, as described in Section 3.1. Write µ “ EpX1 q “ Epξ1 qφ, and K “ COV pX1 q “ EpX1 b X1 q ´ µ b µ, where pf b gqh “ xg, hy2 f for any triple f, g, h P L2 r0, 1s. Let ||| ¨ ||| denote the trace norm for operators on L2 r0, 1s. The covariance kernel of X is denoted pi ’s is denoted by K p r p¨, ¨q. by Kp¨, ¨q and the empirical covariance kernel of the X Theorem 2 (Strong Consistency – Fully Observed Case). Further to Assumptions 2, assume also that φ1 is H¨ older continuous with exponent α P p0, 1s. Then, the estimators in Section 3.2 satisfy the following asymptotic results, where convergence is always with probability one: (a) (b) (c) (d) (e) (f )

d2W pFpφ , Fφ q Ñ 0 as n Ñ 8. ||Tpi´1 ´ Ti´1 ||8 Ñ 0 and ||Tpi ´ Ti ||8 Ñ 0 as n Ñ 8 for each i ě 1. pi ´ Xi ||8 Ñ 0 as n Ñ 8 for each i ě 1. ||X d2W pFpi , Fφ q Ñ 0 as n Ñ 8 for each i ě 1. ř pi . ||X r ´ µ||8 Ñ 0 as n Ñ 8, where X r “ n´1 ni“1 X p p |||Kx r ´ K ||| Ñ 0 and ||Kr ´ K||8 “ sups,tPr0,1s |Kr ps, tq ´ Kps, tq| Ñ 0 as n Ñ 8. Moreover, ||φp ´ φ||8 Ñ 0 and |ξpi ´ ξi | Ñ 0 as n Ñ 8 for each i ě 1.

Furthermore, if we additionally assume that Ep||T11 ||8 q ă 8 and inf tPr0,1s T 1 ptq ě δ ą 0 almost surely for a deterministic constant δ (call this “Condition 1”), then the following stronger results hold with probability one, in lieu of (b), (c), and (e): (b’) |||Tpi´1 ´ Ti´1 |||1 Ñ 0 and |||Tpi ´ Ti |||1 Ñ 0 as n Ñ 8 for each i ě 1. pi ´ Xi |||1 Ñ 0 as n Ñ 8 for each i ě 1. (c’) |||X ř pi . (e’) |||X r ´ µ|||1 Ñ 0 as n Ñ 8, where X r “ n´1 ni“1 X Some remarks are in order: Remark 2. 1. Uniformity: It is observed from the proof of the uniform convergence of Tpi´1 in part (b) of the above theorem that max1ďiďn ||Tpi´1 ´ Ti´1 ||8 Ñ 0 as n Ñ 8 almost surely. Under Condition 1, the same conclusion is true now with the finer norm ||| ¨ |||1 . The convergence in part (d) also holds uniformly for all i “ 1, 2, . . . , n. ´1 2. Fisher Consistency: It can be directly verified that Fpφ´1 “ T ˝ Fφ´1 so that Fpφ “ Fφ ˝ T . Also, řn p ´1 ´1 pi “ ξi φ ˝ T ´1 for each i. Further, Kx Tpi “ Ti ˝ T , Tpi´1 “ T ˝ Ti´1 , and X r “ n i“1 pXi ´ ř ř 2 ´1 ´1 n n ´1 2 ´1 p pi ´ X r q “ tn X r q b pX q b pφ ˝ T q, where ξ “ n i“1 ξi ´ ξ upφ ˝ T i“1 ξi . Thus, φ “ ´1 ´1 ´1 p “ ξi ||φ ˝ T ||2 . Since all of the above estimators are pi , φy pφ ˝ T q{||pφ ˝ T q||2 , and ξpi “ xX measurable functions of the sample averages of the Ti ’s, the ξi ’s and the ξi2 ’s, it follows that all of the above estimators are Fisher consistent for their population counterpart. 3. An Example: The condition inf tPr0,1s T 1 ptq ě δ ą 0 almost surely for a deterministic constant δ can be relaxed to inf tPr0,1s T 1 ptq ě δi almost surely for i.i.d. positive random variables δi provided we assume that Epδ1´1 q ă 8. An example of random warp functions that satisfy inf tPr0,1s T 1 ptq ě δ ą 0 can be found Section 8 of Panaretos and Zemel (2016). Define ζ0 ptq “ t and for k ‰ 0, define ζk ptq “ t ´ sinpπktq{p|k|πβq for some β ą 0. If K is an integer-valued, symmetric random variable, then EpζK q “ Id. For a fixed J ě 2, let tKj uJj“1 be i.i.d. integer-valued, symmetric random variables, and tUj uJ´1 j“1 be i.i.d. U nif r0, 1s random variables independent of the Kj ’s. ř Define T ptq “ Up1q ζK1 ptq ` J´1 j“1 pUpjq ´ Upj´1q qζKj ptq ` p1 ´ UpJ´1q qζKJ ptq. Then, T is a strictly

A. Chakraborty and V. M. Panaretos/Functional Registration and Local Variations

12

increasing homeomorphism on r0, 1s, T P C 1 r0, 1s surely, EpT q “ Id. Further, it can be easily shown that inf tPr0,1s T 1 ptq ě 1 ´ β ´1 . Thus, the condition inf tPr0,1s T 1 ptq ě δ ą 0 holds if we choose β “ p1 ´ δq´1 . Further to strong consistency, we also derive weak convergence of the estimators: Theorem 3 (Weak Convergence – Fully Observed Case). Further to Assumptions 2, assume also that φ1 is H¨ older continuous with exponent α P p0, 1s, and that that Ep||T11 ||28 q ă 8. Then, the estimators in Section 3.1 satisfy the following asymptotic results, (a) (b) (c) (d) (e) (f )

nd2W pFpφ , Fφ q converges weakly as n Ñ 8. ? p´1 ? npTi ´ Ti´1 q and npTpi ´ Ti q converge weakly in the Cr0, 1s topology as n Ñ 8 for each i ě 1. ? p npXi ´ Xi q converges weakly in the Cr0, 1s topology as n Ñ 8 for each i ě 1. nd2 pFpi , Fφ q converges weakly as n Ñ 8 for each i ě 1. ?W npX r ´µq converges weakly to a zero mean Gaussian distribution in the Cr0, 1s topology as n Ñ 8. ? x ? p npKr ´ K q converges weakly in the topology of Hilbert-Schmidt operators, and npK r ´ Kq 2 converges weakly in the Cpr0, 1s q topology as n Ñ 8. In both cases, the limits are zero mean ? p Gaussian distributions. Moreover, npφ´φq converges weakly to a zero mean Gaussian distribution ? p in the Cr0, 1s topology, and npξi ´ ξi q converges weakly as n Ñ 8 for each i ě 1.

Since Cpr0, 1sk q is a stronger topology than L2 pr0, 1sk q for any finite k “ 1, 2, . . ., it follows that the weak convergence results in the above theorem which hold in the Cpr0, 1sk q topology also hold in the L2 pr0, 1sk q topology by virtue of the continuous mapping theorem. We shall now study some the asymptotic properties of the estimators in the discrete observation setup (without measurement error). Theorem 4 (Limit Theory – Discretely Observed Case Without Measurement Error). Further to the ş1 conditions of Theorem 3, assume that φ P C 2 r0, 1s, 0 |φ1 puq|´ ă 8 for some  ą 0, and inf tPr0,1s T 1 puq ě δ ą 0 almost surely for a deterministic constant δ. Define α “ {p1 ` q. The kernel kp¨q is assumed to be supported on r´1, 1s. If h “ hpnq “ opn´1{2 q and r “ rpnq satisfies r ąą n1{2α as n Ñ 8, then the estimators introduced in Section 3.2 satisfy (a) d2W pFpd˚ , Fφ q Ñ 0 as n Ñ 8 almost surely, and d2W pFpd˚ , Fφ q “ OP pn´1 q as n Ñ 8. ˚ ´ T ´1 || Ñ 0 and ||T pi,d ´ Ti ||8 Ñ 0 as n Ñ 8 almost surely. Further, ?npTp˚ ´ T ´1 q and (b) ||Tpi,d 8 i i i,d ? p npTi,d ´ Ti q converge weakly in the L2 r0, 1s topology as n Ñ 8 for each i ě 1. p ˚ ´ Xi ||8 Ñ 0 as n Ñ 8 almost surely, and ?npX p ˚ ´ Xi q converges weakly in the L2 r0, 1s (c) ||X i i topology as n Ñ 8 for each i ě 1. (d) d2W pFpi˚ , Fφ q Ñ 0 as n Ñ 8 almost surely, and d2W pFpi˚ , Fφ q “ OP pn´1 q as n Ñ 8 for each i ě 1. ? (e) ||X r˚ ´ µ||8 Ñ 0 as n Ñ 8 almost surely, and npX r˚ ´ µq converges weakly in the L2 r0, 1s topology as n Ñ 8. ? x (f ) |||Kx npKr˚ ´ K q converges weakly in the topology r˚ ´ K ||| Ñ 0 as n Ñ 8 almost surely, and p r˚ ´ K||8 Ñ 0 as n Ñ 8, and ?npK p r˚ ´ Kq converges of Hilbert-Schmidt operators. Further, ||K weakly in the L2 pr0, 1s2 q topology as n Ñ 8. Moreover, ||φp˚ ´ φ||2 Ñ 0 as n Ñ 8 almost surely, ? and npφp˚ ´ φq converges weakly in the L2 r0, 1s topology. Also, |ξpi˚ ´ ξi | Ñ 0 as n Ñ 8 almost ? surely, and npξpi˚ ´ ξi q converges weakly as n Ñ 8 for each i ě 1. In all the weak convergence results stated above, the limits are identical to the corresponding limits obtained in the fully observed scenario in Theorem 3.

A. Chakraborty and V. M. Panaretos/Functional Registration and Local Variations

13

ri ’s are Remark 3. 1. The asymptotic results remain valid in the case where the grid over which the X observed, say, 0 ď ti,1 ă ti,2 ă . . . ă ti,ri ď 1, differs with i. The proof, however, will be notationally quite cumbersome. In this case, the requirement on the grid will be as follows: max1ďjďri ´1 ptj`1 ´ tj q “ Opri´1 q as ri Ñ 8 for each i, and rrn :“ min1ďiďn ri satisfies rrn ąą n1{2α as n Ñ 8. 2. The choice of h in Theorem 4 is an under-smoothing choice. It is made on account of the absence of measurement errors in the observations, which enables us to under-smooth the data without ? damaging n-consistency. This is unlike what happens in classical non-parametric regression due to the presence of errors in that scenario. Also, the boundary points inflate the bias of the NadarayaWatson estimator to an order of h (the same order as that obtained in Theorem 4 for all points). However, these issues are of no consequence in this scenario. It is also natural to under-smooth in this situation since appropriate under-smoothing retains the features of the curves better and allows estimation at a parametric rate even under non-parametric smoothing. If instead of the Nadaraya-Watson estimator, one uses a local linear estimator with bandwidth h, then the bias is of order h2 (even at the boundaries). In this case, h has to be opn´1{4 q to achieve parametric rates of convergence, which is again an under-smoothing choice. Thus, the choice of smoothing method does not play a crucial role in this setup. 3. Unlike Theorem 3, the weak convergence results are all in the L2 topology. This is because unlike the fully observed case, the estimators involved are not continuous functions in r0, 1s. We could not consider the weaker Dr0, 1s topology since not all estimators will be cadlag functions. However, we still retain the strong consistency results in parts (b), (c) and (e) in the sup norm similar to Theorem 2. This is due to the fact that those estimators are uniformly bounded almost surely, and thus have finite sup-norm. Further, in all cases, there is no issue with the measurability of the supremum. 4. The condition φ P C 2 r0, 1s can be relaxed to requiring that φ1 is Lipschitz continuous. Moreover, ş1 the requirement 0 |φ1 puq|´ ă 8 for some  ą 0 is not restrictive. Of course, it holds if φ1 is bounded away from zero on r0, 1s, in which case one can choose α “ 1. Consider the case when φ P C 2 r0, 1s and let t0 P p0, 1q be such that φ1 pt0 q “ 0. If φ2 pt0 q ą 0, then we can choose an interval Aδ “ pt0 ´ δ, t0 ` δq Ă p0, 1q such that inf uPAδ |φ2 puq| ě β ą 0. Then, a first order Taylor expansion ş ş yields Aδ |φ1 ptq|´ dt ď β ´ Aδ |t ´ t0 |´ dt ă 8 for any  ă 1. Here, we have used the fact that şδ ´ 1 2 0 t dt ăş8 for any δ ą 0 iff  ă 1. Thus, if none of the zeros of φ and φ coincide, then the 1 condition 0 |φ1 puq|´ ă 8 holds for any  ă 1. In general, if φ P C m r0, 1s for some m ě 2, and m1 1 be the least integer between 2 and m such that none of the zeros of φ1 and φpm q coincide, then ş1 1 ´ ă 8 holds for any  ă 1{pm1 ´ 1q. 0 |φ puq| We finally study the asymptotic properties of the estimators in the modified registration procedure employed when one has contamination by measurement error (described in Section 3.3). Theorem 5 (Limit Theory – Measurement Error Case). In addition to the assumptions of Theorem ş1 3, assume that φ P C 4 r0, 1s, 0 |φ1 puq|´ du ă 8 for some  ą 0. Define α “ {p1 ` q. Assume that T P C 4 r0, 1s a.s. and inf tPr0,1s T 1 puq ě δ ą 0 almost surely for a deterministic constant δ. The kernels k1 p¨q and k2 p¨q are assumed to be supported on r´1, 1s, symmetric and continuously differentiable. The errors plq tij u are assumed to be a.s. bounded. Also assume that Et|ξ1 |´2α{p2´αq u ă 8 as well as Ep||T1 ||28 q ă 8 for l “ 2, 3, 4. The bandwidths satisfy h1 , h2 Ñ 0, rh31 , rh2 Ñ 8. Then, the estimators in Section 3.3

A. Chakraborty and V. M. Panaretos/Functional Registration and Local Variations

14

satisfy the following properties. (a) (b) (c) (d) (e)

3 ´α ` n´1 q as n Ñ 8. d2W pFpφ,e , Fφ q “ OP ph4α 1 ` prh1 q ´1 3 ´α{2 ` n´1{2 q as n Ñ 8. Both ||Tpi,e ´ Ti´1 ||8 and ||Tpi,e ´ Ti ||8 are OP ph2α 1 ` prh1 q p ˚ ´ Xi ||8 “ OP ph2α ` prh3 q´α{2 ` h2 ` prh2 q´1{2 ` n´1{2 q as n Ñ 8. ||X 1 1 2 i,e 3 q´α{2 ` h2 ` prh q´1{2 ` n´1{2 q as n Ñ 8. ` prh ||X e˚ ´ µ||8 “ OP ph2α 2 1 1 2 2α 3 ´α{2 2 x |||Ke˚ ´ K ||| “ OP ph1 ` prh1 q ` h2 ` prh2 q´1{2 ` n´1{2 q as n Ñ 8. Consequently, ||φpe˚ ´ φ||2 and |ξpi˚,e ´ ξ| have the same rates of convergence for each fixed i.

Remark 4. 1. Analogous rates of convergence can also be obtained if one uses different non-parametric smoothing techniques than the ones in the theorem. One may, e.g., use a Nadaraya-Watson estimator in Step 3** with boundary kernels to alleviate the boundary bias problem that is well-known r 1 , one may use higher for this estimator (see, e.g., Wand and Jones (1995)). Also, to estimate X i order local polynomials with even orders. However, these will be computationally more intensive as well as need additional smoothness assumptions on the latent process and the warp maps. 2. It is observed in the above theorem that the rates of convergence are slower than the parametric rates achieved in the earlier settings due to the non-parametric smoothing steps involved – especially the estimation of derivatives, which is known to have quite slow rates of convergence. Further, the contributions of the two smoothing steps in the convergence rates are clear. It is well known in local linear regression that the optimal rate for h1 is r´1{7 and that for h2 is r´1{5 . With these rates, we have d2W pFpφ,e , Fφ q “ OP pr´4α{7 ` n´1 q, and the remaining quantities are OP pr´2α{7 ` n´1{2 q. 3. Let β “ 2α{p2 ´ αq and observe that β ă 2 since α ă 1. The condition Et|ξ1 |´β u ă 8 in Theorem 5 is obviously satisfied if |ξ1 | is bounded away from zero. Suppose that ξ1 has a continuous density fξ , say, either on r0, 8q or on p´8, 8q in which case it is assumed to be symmetric about zero. If supyPr0,aq fξ pyq ă 8 for some a ą 0, then it is easy to show that Et|ξ1 |´β u ă 8 if β ă 1 ô  ă 2, which is quite general in view of point (4) in Remark 3. If β P r1, 2q, then this expectation is finite if supyPr0,aq y ´1 fξ pyq ă 8. 4.2. Effect of Model Misspecification In practice, it may happen that the underlying distribution is not exactly of rank one as stipulated by Model 2, and required for identifiability. In this section, we carry out a theoretical analysis of the stability of our procedure to such departures. Since identifiability is lost, it is clear that consistency is no longer achievable. However, we can quantify how much the estimators deviate from their population counterparts, at least asymptotically. Since the model is unidentifiable, strictly speaking there is no unique setting corresponding to the law of the data. For this reason, as a convention, we will assume that a “true” underlying distribution is known and fixed. For simplicity of exposition, we focus on the rank two case. This will be seen to carry the essence of the underlying effects, as we discuss in the last point of Remark 5. To obtain more transparent results, we focus on the case where the underlying functions are completely observable as continuous objects. Let Xi “ ξi1 φ1 ` ξi2 φ2 for i “ 1, 2, . . . , n, where ξi1 and ξi2 are uncorrelated. Let µ “ EpX1 q “ Epξ11 qφ1 ` Epξ12 qφ2 . Denote γl2 “ V arpξ1l q and Yil “ rξil ´ Epξil qs{γil for l “ 1, 2. Then, Xi “ µ ` γ1 Yi1 φ1 ` γ2 Yi2 φ2

(6)

A. Chakraborty and V. M. Panaretos/Functional Registration and Local Variations

15

gives the Karhunen-Lo`eve expansion of Xi . The (random) local variation distribution induced by Xi is şt ş1 Fi ptq “ 0 |Xi1 puq|du{ 0 |Xi1 puq|du for t P r0, 1s. Note that contrary to the rank one case, where µ did not play a role in Fi (due to cancellation of the term ξ1 from the numerator and the denominator), here it cannot be neglected. We will later see that it will play a role in the performance of the estimators. Defining η “ γ2 {γ1 , which is the square root of the inverse of the condition number, it follows that şt ´1 1 |γ µ puq ` Yi1 φ11 puq ` ηYi2 φ12 puq|du . Fi ptq “ ş01 1´1 1 puq ` Y φ1 puq ` ηY φ1 puq|du |γ µ i2 i1 2 1 0 1 ri “ Xi ˝ T ´1 is given by The local variation distribution induced by the observed warped data X i şt 1 şTi´1 ptq ´1 1 r |γ µ puq ` Yi1 φ11 puq ` ηYi2 φ12 puq|du 0 |Xi puq|du r Fi ptq “ ş1 “ Fi pTi´1 ptqq. “ 0 ş1 ´11 1 puq ` Y φ1 puq ` ηY φ1 puq|du r 1 puq|du | X |γ µ i2 2 i1 1 i 0 0 1 The idea is that if under suitable conditions the Fi ’s manifest small variability, then the registration procedure will work quite well. We will illustrate two different situations where this is the case. The estimators of the population parameters will be the same as those considered earlier. The next theorem gives bounds on the estimation errors. Theorem 6. In the setting of Model 6, define $ ş &2 1 |X 1 puq ´ µ1 puq|du{ ş1 |X 1 puq|du i i 0 0 Zi “ %2η ş1 |Y φ1 puq|du{ ş1 |Y φ1 puq ` ηY φ1 puq|du i1 1 i2 2 i2 2 0 0

if µ1 ‰ 0, if µ1 “ 0

,

for i “ 1, 2, . . . , n.

ş1 ş1 If µ1 ‰ 0, assume that 0 |µ1 puq|´ du ă 8 for some  ą 0, and if µ1 “ 0, assume that 0 |φ11 puq|´ du ă 8 for some  ą 0. Set α “ {p1 ` q. Suppose that assumption (M3) holds and that for each i “ 1, 2, φi lie in C 1 r0, 1s with the derivative being αi -H¨ older continuous for some αi P r0, 1s. Also assume that α EpZ1 q ă 8. Then: (a) lim supnÑ8 ||Tpi´1 ´Ti´1 ||8 ď const.tEpZ1α q`Zi u, and lim supnÑ8 ||Tpi ´Ti ||8 ď const.||Ti1 ||8 tZiα ` E α pZ1α qu almost surely, where the constant term is uniform in i. pi ´ Xi ||8 ď OP p1qtEpZ α q ` Zi u almost surely. (b) lim supnÑ8 ||X 1 Remark 5. 1. Theorem 6 reveals that if the Zi are small, the effect of misspecification is also small. Here are two such cases: şt ş1 (a) When µ1 ‰ 0, Zi “ 0 |Yi1 φ11 puq ` ηYi2 φ12 puq|du{ 0 |γ1´1 µ1 puq ` Yi1 φ11 puq ` ηYi2 φ12 puq|du. So, in this case, if |γ1´1 µ1 | has a large enough contribution compared to |Yi1 φ11 ` ηYi2 φ12 | for all i, then the Zi ’s are small. (b) On the other hand, if µ1 “ 0, then if η is small, i.e., the condition number of the process is large (which essentially implies that the process is “close” to a rank one process provided Epξ12 q “ 0), then the Zi ’s are small. This can be compared to the minimum eigenvalue registration principle of Ramsay and Silverman (2005), where one tries to find the warp function that minimises the second eigenvalue of the cross-product matrix between the target function and the registered function. Assume that Epξi1 q “ Epξi2 q “ 0 and without loss of generality that γ1 “ 1. If in reality the true unobserved curves are rank one, i.e., the ξi1 φ1 component, and we observe warped versions of the rank two curves Xi ’s, then (in the population case) correct registration is achieved by Ti if the minimum eigenvalue, namely γ22 “ η 2 , of the expected

A. Chakraborty and V. M. Panaretos/Functional Registration and Local Variations

16

cross-product matrix equals zero. Thus, in the empirical case, if η is close to zero, we may expect Tpi to be close to Ti and consequently expect the registration procedure to have good performance. 2. Bounds similar to those in (a) and (b) of Theorem 6 can also be obtained for the mean, the covariance, the γl ’s and the φl ’s as well as the principal components Yil ’s. We do not include them in the statement of the theorem because they need more complicated conditions involving the parameters. ř 3. General (possibly infinite) rank situation: Let Xi “ µ ` M j“1 γj Yij φj for some 1 ď M ď 8, where the tYij : j “ 1, 2, . . . , M u are uncorrelated with zero mean and unit variance. Without loss of generality, we assume that γ1 ą γ2 ą . . . ě 0. The errors in estimation when µ1 ‰ 0 remain the same as ş1 ş1 ř in Theorem 6. When µ1 “ 0, then we define Zi “ 2η 0 |Yi2 φ12 puq` kě3 δk Yik φ1k puq|du{ 0 |Yi1 φ11 puq` ř ηrYi2 φ12 puq` kě3 δk Yik φ1k puqs|du for i “ 1, 2, . . . , n, where δk “ γk {γ2 for k ě 3. In this case, under the conditions of Theorem 6, the bounds as in that theorem still hold true. Note that δk ď 1 for all k ě 3. So, in the general case, the performance of the registration procedure studied in the paper will only depend on how small η is and does not in general depend on the values of the δk ’s (or the γj ’s for j ě 3). In other words, only the behaviour of the second frequency component relative to the first one matters (which elucidates the role of δ in the standard model, i.e. Equation 1, whose role is precisely to tune this behaviour). Of course, the magnitude of the error in estimation for the same value of η will now differ from the rank 2 case because of the presence of the additional terms. We have investigated these issues in a simulation study in Section 5.3 (see, in particular, Figure 7). 5. Numerical Experiments We now carry out some simulation experiments to prove the finite-sample performance of our registration procedure. First we treat the case of a well-specified identifiable model, then the case of a misspecified unidentifiable model, and finally the case when the warped observations under an identifiable model are observed with measurement error. 5.1. Well-specified Model without error Let Xptq “ ξφptq, t P r0, 1s, and consider two models: Model 1: ξ „ N p1.5, 1q, φptq “ 70 ´ 20 expp1.7pt ´ 0.49q2 q ´ 50 expp´pt ´ 0.51q2 q; Model 2: ξ „ 1 ` Betap2, 2q, φptq “ t1 ´ pt ´ 0.25q2 u cosp3πtq. In either case, the sample size is n “ 50 and the curves are observed at r “ 101 equally spaced points in r0, 1s. The warp maps are chosen according to point (3) of Remark 2 with the parameters J “ 2, K “ V1 V2 , where V1 „ P oissonp3q, P pV2 “ ˘1q “ 1{2 with V2 independent of V1 , and β “ 1.01. The kernel for the Nadaraya-Watson estimator as well as the one used to smooth the Tpi,d ’s is the Epanechnikov kernel on r´1, 1s. For both the models, the bandwidths used in the registration procedure were chosen to under-smooth the data so that the features (maxima, minima, etc.) are not smeared out. In order to provide smooth registered curves, we have smoothed the Tpi,d ’s using cubic splines with 11 equi-spaced knots on r0, 1s, prior to synchronising the data. We present both raw and smoothed registered data curves, means and eigenfunctions.

A. Chakraborty and V. M. Panaretos/Functional Registration and Local Variations

6 4 2 0 −2

−2

0

2

X(t)

4

6

8

Warped curves

8

Original curves

17

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

t

Registered curves − unsmoothed

Registered curves − smoothed

1.0

6 4 2 0 −2

−2

0

2

X(t)

4

6

8

t

8

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

t

t

Mean

Eigenfunction

0.8

1.0

−0.5

0

0.0

0.5

1

1.0

2

1.5

3

2.0

0.0

0.0

0.2

0.4

0.6 t

0.8

1.0

0.0

0.2

0.4

0.6 t

0.8

1.0

Fig 1. Plots of the true, warped and registered data curves, means and leading eigenfunctions under Model 1. The first plot in the last row shows the true mean (dashed curve), the mean of the un-smoothed registered data (dotted curve), the mean of the smoothed registered data (solid curve), and the mean of the warped data (dot-dashed curve). The second plot in the last row is the analogous plot for the leading eigenfunction.

A. Chakraborty and V. M. Panaretos/Functional Registration and Local Variations

2 1 0 −3

−2

−1

0 −3

−2

−1

X(t)

1

2

3

Warped curves

3

Original curves

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

t

Registered curves − unsmoothed

Registered curves − smoothed

1.0

2 1 0 −1 −2 −3

−3

−2

−1

0

1

2

3

t

3

0.0

X(t)

18

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

t

t

Mean

Eigenfunction

0.8

1.0

−1.5

−2

−1.0

−1

−0.5

0

0.0

0.5

1

1.0

2

1.5

0.0

0.0

0.2

0.4

0.6 t

0.8

1.0

0.0

0.2

0.4

0.6 t

0.8

1.0

Fig 2. Plots of the true, warped and registered data curves, means and leading eigenfunctions under Model 2. The first plot in the last row shows the true mean (dashed curve), the mean of the un-smoothed registered data (dotted curve), the mean of the smoothed registered data (solid curve), and the mean of the warped data (dot-dashed curve). The second plot in the last row is the analogous plot for the leading eigenfunction.

A. Chakraborty and V. M. Panaretos/Functional Registration and Local Variations

19

Figures 1 and 2 show the plots of the true, warped and registered data curves; the true, warped and registered means; and the true, warped and registered leading eigenfunctions under Model 1 and Model 2, respectively. Figures 1 and 2 suggest that the procedure studied in this paper has been able to adequately register the discretely observed and warped sample curves. Moreover, it is clear that the cross-sectional mean and the leading eigenfunction of the warped curves differ from the true mean and leading eigenfunction in either amplitude or phase (under either model), while the registration procedure corrects the problem, and the resulting estimates (whether smoothed or raw) are very close to the true functions. 5.2. Well-specified Model with error We now consider the situation when the warped observations under an identifable rank one model have been observed with measurement errors. As observed in our theoretical study in Section 4.1, the rate of convergence will be much slower than the case when there is no measurement error. For our simulations, we thus keep the same two models as in Section 5.1 but increase the sample size to n “ 250. The measurement errors in both situations are i.i.d. U nif p´0.4, 0.4q. The bandwidths for the smoothing steps involved in the registration procedure are chosen using built-in cross-validation methods in the locpol package in the R software. Figures 3 and 4 show the plots of the unobserved true rank one curves, the warped curves that are observed with error and the registered curves. They also contain the plots of the mean function and the leading eigenfunction of the true, warped and registered data under the two models. It is observed that even subject to measurement error contamination, the registration procedure is able to adequately register the curves. In particular, under Model 2, the means as well as the leading eigenfunction of the true and the registered curves are quite close. We also performed the registration procedure with a Nadaraya-Watson estimator (without boundary kernels) for obtaining an estimate of ri ’s (see Step 3**). Interestingly, the performance was not that different than the one when using a the X local linear estimator. 5.3. Misspecified Model We next carry out experiments to probe the performance of the registration procedure in a rank 2 and a rank 3 setting – these correspond to an unidentifiable (and hence misspecified) model. The model considered in the rank 2 case are X “ ξ1 φ1 ` ξ2 φ2 with ξ1 „ N p1.5, 1q, ξ2 „ N p´0.5, 0.15q, φ1 ptq “ ? ? 2 sinpπtq and φ2 ptq “ 2 cosp2πtq, t P r0, 1s. In the rank 3 case, we consider X “ ξ1 φ1 ` ξ2 φ2 ` ξ3 φ3 with the same choices of ξj and φj as above for j “ 1, 2 along with ξ3 „ N p0.5, p0.15q2 q and φ3 ptq “ ? 2 cosp4πtq. The warp maps are the same as those considered in the simulation study in Section 5. The plots of the true curves, the warped curves and the registered curves are provided in Figures 5 and Figure 6 for the rank 2 and the rank 3 models, respectively. It is observed that the registration procedure performs quite well and aligns the peak (present in the true curves) adequately under both models (see Figure 5). Further, the two smaller troughs near the end-points present in the rank 3 model are also reasonably aligned (see Figure 6). There are some amplitude differences between the first two estimated eigenfunctions and the corresponding true eigenfunctions under both models. Surprisingly, the third eigenfunction in the rank 3 model is very well estimated. Overall, even though the rank 1 assumption is violated, the registration procedure performs quite satisfactorily. In order to probe the breakdown point of the registration procedure in the misspecified setting, we

A. Chakraborty and V. M. Panaretos/Functional Registration and Local Variations

8 6 4

6 4

6 0.0

0.2

0.4

0.6

0.8

1.0

2 −2

0

2 0 −2

−2

0

2

X(t)

4

20

Registered curves

8

Warped curves observed with error

8

Original curves

0.0

0.2

0.4

t

0.6

0.8

1.0

0.0

0.2

0.4

t

0.6

0.8

1.0

t

Eigenfunction

−0.5

0

0.0

0.5

1

1.0

2

1.5

3

2.0

Mean

0.0

0.2

0.4

0.6 t

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

t

Fig 3. Plots of the true, warped and registered data curves, means and leading eigenfunctions under Model 1 with error. The first plot in the last row shows the true mean (dashed curve), the mean of the registered data (solid curve), and the mean of the warped data (dot-dashed curve). The second plot in the last row is the analogous plot for the leading eigenfunction.

also considered classes of rank 2 and rank 3 models, recorded the relative L2 -error in estimation of the pi ´ Xi ||{||Xi ||, i “ 1, 2, . . . , n, and consider a threshold of 10% error data curves, i.e, the median of ||X as a criterion for good performance. The models are generated similar to the earlier simulation. For the rank 2 case, let X “ ξ1,c φ1 ` ξ2,c,r φ2 , where ξ1 „ N p3c, 1q, ξ2 „ N p´c, rq, where c P r0.1, 2s and r P r0.01, 0.3s. The choices of c and r ensure that we include both approximately rank 1 models (c and r close to zero) as well as proper rank 2 models (large values of r). Similarly, for the rank 3 case, let X “ ξ1,c φ1 ` ξ2,c,r φ2 ` ξ3,c,r φ3 , where ξ3 „ N pc, r2 q. Figure 7 shows a plot of the relative L2 -errors under these classes of models, for various combinations of the parameters c and r. It is seen that when c is large, the performance of the registration procedure is good, which conforms with our theoretical arguments in Theorem 6. In fact, for this class of rank 2 models, the maximum L2 error does not exceed 12.9%. On the other hand, when c is small, the allowable range of r values for good performance is much greater in the rank 2 setup compared to the rank 3 setup (cf. (c) in Remark 5). In fact, in the rank 3 setup, the error is more than 10% for all r in the range considered when c ď 0.2. Further, the maximum L2 error is now 29.8%. 6. Data Analysis In this section, we illustrate the performance of our registration procedure on a data set of growth curves of Tribolium beetle larvae, collected and analysed by Irwin and Carter (2013). Each curve represents the mass measurement (in milligrams) as a function of the age of the larvae since hatching (in days).

A. Chakraborty and V. M. Panaretos/Functional Registration and Local Variations

0.2

0.4

0.6

0.8

1.0

3 2 −3

−2

−1

0

1

2 −3

−2

−1

0

1

2 1 0

X(t)

−1 −2 −3 0.0

21

Registered curves

3

Warped curves observed with error

3

Original curves

0.0

0.2

0.4

t

0.6

0.8

1.0

0.0

0.2

0.4

t

0.6

0.8

1.0

t

Eigenfunction

−1.5

−2

−1.0

−1

−0.5

0

0.0

0.5

1

1.0

2

1.5

Mean

0.0

0.2

0.4

0.6 t

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

t

Fig 4. Plots of the true, warped and registered data curves, means and leading eigenfunctions under Model 1 with error. The first plot in the last row shows the true mean (dashed curve), the mean of the registered data (solid curve), and the mean of the warped data (dot-dashed curve). The second plot in the last row is the analogous plot for the leading eigenfunction.

Their analysis of Tribolium growth suggests that these beetles’ growth patterns differ from those of other animals with determinate growth (that is, growth that is contained in certain life stages). Usually, the longer the growth period, the larger the maximal mass attained (see Irwin and Carter (2014), and references therein). In Tribolium, however, it seems that beetles that tend to grow faster, and thus have a shorter growth period, also tend to attain larger size (e.g. Figure 8, top left). See Irwin and Carter (2013) for more details and background. This observation suggests that the Tribolium data could be wellsuited for a phase-amplitude analysis under a latent rank 1 model that has been warped: one expects that correcting for different “growth clocks” (phase variation) should yield curves that are roughly of unimodal amplitude variation, due to final mass. Conversely, it suggests a potential latent model that produces rank 1 vertical variation related only to final mass, and horizontal variation due to growth timing (i.e. how this total final mass is accumulated in time). For our analysis, we have only considered the part of the dataset where there were at least 10 discrete measurements per individual curve, which results in a sample size of 159. Also, not all larvae were recorded on the same day so that the number of observations differed across individuals. Since there are relatively few measurements (maximum 12) per individual larvae, we smoothed each observation vector as a pre-processing step. This was done using the built-in function splinefun in the R software with the method monoH.FC that uses monotone Hermite spline interpolation proposed by Fritsch and Carlson (1980) (since the curves are expected to be approximately increasing). As is typically the case with growth curves, one expects that, if unaccounted for, the lurking phase

A. Chakraborty and V. M. Panaretos/Functional Registration and Local Variations

0.4

0.6

0.8

1.0

6 4 2 0 −2

−2

0

2

^ X(t)

4 ~ X(t)

4 X(t) 2 0 −2

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

t

t

t

Mean

1st Eigenfunction

2nd Eigenfunction

0.8

1.0

0.8

1.0

Orig. mean Estm. mean Warp. mean 0.0

0.2

0.4

0.6 t

0.8

1.0

1.0 0.0 −1.5 −1.0 −0.5

0.0

0

0.5

1

0.5

1.0

2

1.5

1.5

2.0

0.2

3

0.0

22

Registered curves

6

Warped curves

6

Original curves

Orig. eigfun Estm. eigfun Warp. eigfun 0.0

0.2

0.4

0.6 t

0.8

1.0

Orig. eigfun Estm. eigfun Warp. eigfun 0.0

0.2

0.4

0.6 t

Fig 5. Plots of the true, warped and registered data curves along with the means and eigenfunctions of the true, warped and the registered data under the rank 2 model.

variation would give the impression of several modes of amplitude variation. The aim our analysis is thus to register the curves, estimate the warp maps, estimate the mean of the registered curves, and carry out an eigenanalysis of the registered data. It is indeed observed that prior to any registration, the data present at least two susbtantial modes of amplitude variation, with the first three principal components explaining 78.4%, 12% and 3.85% of the total variation, respectively. However, after registration using our method, the empirical covariance operator is almost precisely of rank 1, with the leading principal component explaining 99.72% of the total variation. Interestingly, the mean of the registered data has the same shape as the leading eigenfunction and is in fact roughly equal to 776 times the leading eigenfunction. This can be seen as a model diagnostic, corroborating the model: if the rank 1 model were correct, then after registration one would expect to have a single mode of amplitude variation and a mean in the span of the corresponding eigenfunction (see the discussion after Counterexample 1). Figure 8 show the plots of the actual data, the monotone spline smoothed data and the registered data, as well as the plot of the estimated warp maps and the average warp map, which is very close to the identity. It also shows the plots of the mean and the leading eigenfunction of the warped and the registered data. Although the means of the warped and the registered data are very close, there are substantial qualitative differences between the corresponding eigenfunctions. The eigenfunction of the registered data shows that the variation in growth pattern essentially starts at about the 8 days after hatching. Between ages 10´16 days post hatching, there is a notable increase in the growth variation, and it somewhat recedes after that age. These periods are in fact compatible with biologically interpretable phases of growth: the larvae enter an “instar” (a distinct growth period between exoskeleton moults) characterised by exponential growth at around day 7-8; then, around day 17, they enter the “wandring

A. Chakraborty and V. M. Panaretos/Functional Registration and Local Variations

0.2

0.4

0.6

0.8

8 6 2 0

0

2

^ X(t)

4

6 ~ X(t)

4

6 4 X(t) 2 0 0.0

23

Registered curves

8

Warped curves

8

Original curves

1.0

0.0

0.2

t

0.4

0.6

0.8

1.0

0.0

0.2

0.4

t

0.6

0.8

1.0

t

Mean

1st Eigenfunction 0.0 0.5 1.0 1.5

Orig. eigfun Estm. eigfun Warp. eigfun

0

1

2

3

Orig. mean Estm. mean Warp. mean

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

t

t

2nd Eigenfunction

3rd Eigenfunction

0.8

1.0

2

2.0

0.0

1

Orig. eigfun Estm. eigfun Warp. eigfun

−2

−1.5

−1

0.0

0

1.0

Orig. eigfun Estm. eigfun Warp. eigfun

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

t

0.6

0.8

1.0

t

Fig 6. Plots of the true, warped and registered data curves along with the means and eigenfunctions of the true, warped and the registered data under the rank 3 model.

phase” and begin losing weight in preparation for pupation. Appendix – Proofs of Formal Statements Proof of Lemma 1. Since Xptq “ ξφptq, t P r0, 1s, we have żt

ż1 1

F ptq “

|X puq|du{ 0

żt 1

|X puq|du “ 0

ż1 1

|φ1 puq|du “ Fφ ptq

|φ puq|du{ 0

0

r r 1 ptq “ ξφ1 pT ´1 ptqq{T 1 pT ´1 ptqq. Thus, using the strict by Definition 1. Next, Xptq “ ξφpT ´1 ptqq so that X monotonicity of T , we have "ż t * "ż t * żt ż1 1 1 1 ´1 1 ´1 1 ´1 1 ´1 r r r F ptq “ |X puq|du{ |X puq|du “ |φ pT puqq|{T pT puqqdu { |φ pT puqq|{T pT puqqdu . 0

0

0

0

A standard change-of-variable argument and the fact that T is a bijection with T p0q “ 0 and T p1q “ 1 şT ´1 ptq 1 ş1 now yields Frptq “ 0 |φ puq|du{ 0 |φ1 puq|du “ Fφ pT ´1 ptqq. So, Fr “ Fφ ˝ T ´1 , equivalently, T “ Fr´1 ˝ Fφ Ø T ˝ Fφ´1 “ Fr´1 . Using the assumption that EpT q “ Id, we now have EpFr´1 q “ Fφ´1 .

A. Chakraborty and V. M. Panaretos/Functional Registration and Local Variations rank 2 model

24

rank 3 model

0.30

0.12 0.25

0.25 0.25 0.10

0.20

0.20 0.20

r

r

0.08 0.15

0.15 0.15 0.06

0.10

0.10 0.10 0.04

0.05

0.05 0.05 0.02

0.5

1.0

1.5

2.0

0.5

c

1.0

1.5

2.0

c

Fig 7. Level-plots of the relative L2 errors under the rank 2 and the rank 3 classes of models.

d r r1 “ Proof of Theorem 1. Note that f : C 1 r0, 1s ÞÑ f 1 P pCr0, 1s, || ¨ ||8 q is a Lipschitz map. Thus, X X2 d r1 1 r implies that X1 “ X2 . Consider the random probability measure given by ż ż r 1 puq|du{ r 1 puq|du Ψ1 pAq “ |X |X 1 1 A

r0,1s

ş ş r 1 puq|du{ r1 for A in the Borel σ-field of r0, 1s. Similarly, Ψ2 pAq “ A |X 2 r0,1s |X2 puq|du. We equip the space P of diffuse probability measures on r0, 1s with the L2 -Wasserstein metric (see, e.g., Villani (2003)) given by dW pµ, νq “ ||Fν´1 ´ Fµ´1 ||, where Fµ and Fν are the distribution functions associated with the probability ş1 measures µ and ν. Now for any f1 , f2 P C 1 r0, 1s satisfying 0 |fi1 puq|du ą 0 for i “ 1, 2, consider the ş1 ş1 measure µi with density |fi1 psq|{ 0 |fi1 puq|du for i “ 1, 2. The condition 0 |f 1 puq|du ą 0 is equivalent to f ‰ const.. Using the inequality dW pµ1 , µ2 q ď dT V pµ1 , µ2 q with the latter being the total variation distance (which holds because µ1 and µ2 are supported on r0, 1s – see, e.g., p. 888 in Madras and Sezer (2010)), it follows that ˇ ż ˇ 1 1 ˇˇ |f11 psq| |f21 psq| ˇˇ dW pµ1 , µ2 q ď ´ ş1 ˇş ˇ ds 1 ˇ 2 0 ˇ 1 |f11 puq|du 0 0 |f2 puq|du ˇ ˇ ż 1 ˇˇ ż ˇ 1 1 1 ˇˇ |f11 psq| |f11 psq| ˇˇ |f21 psq| ˇˇ ˇ |f11 psq| ď ´ ş1 ´ ş1 ˇş ˇş ˇ ds ` ˇ ds 1 1 ˇ ˇ 2 0 ˇ 1 |f11 puq|du 2 0 ˇ 1 |f21 puq|du 0 0 |f2 puq|du 0 0 |f2 puq|du ş1 1 |f psq ´ f21 psq|ds ď 0 ş11 1 0 |f1 psq|ds ||f 1 ´ f21 ||8 |||f1 ´ f2 |||1 ď ş1 1 ď ş1 1 1 0 |f1 psq|ds 0 |f1 psq|ds Thus, the embedding H : f ÞÑ µf is continuous when the domain, say, A is restricted to the set of all non-constant functions on C 1 r0, 1s. But the set Ac is a one dimensional linear subspace spanned by the constant function f ” 1, and this implies that Ac is a Borel measurable subset of C 1 r0, 1s. So, A is a Borel measurable subset of C 1 r0, 1s. Equip A with the Borel σ-field induced from C 1 r0, 1s. Since

A. Chakraborty and V. M. Panaretos/Functional Registration and Local Variations

4

● ● ● ● ● ●

7



350 300 250 200

● ● ●



50

100

● ● ●

150



● ● ● ● ● ● ● ● ● ● ● ● ● ●

10

13

16

19

22

0

● ● ● ● ●



● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

200



● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

1

4

7

10

13

16

19

22

1

4

7

10

13

Age

Age

Age

Warp maps

Mean

Eigenfunction

16

19

22

0.2

50

5

0.1

100

10

150

15

200

0.3

250

20

300

1

● ●

● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

0.4



● ● ● ● ● ● ● ● ● ● ● ● ●



● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

0

● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ●

150

150 0

100 50



● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

100





● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

50

300 250 200

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●



300



● ●

Registered curves

350

Warped smoothed curves

250

350

Warped discrete data

25

1

4

7

10

13

16

19

22

1

4

7

Age

10

13

16

19

Estm. eigfun Warp. eigfun

0.0

Estm. mean Warp. mean

0

Estm. warp Mean estm. warp

22

1

4

7

10

Age

13

16

19

22

Age

Fig 8. Plots in the first row are those of the Tribolium data, the smoothed curves and the registered curves. The first plot in the second row shows the estimated warp maps, where the dotted line is the identity map. The other two plots in the second row show the means and the leading eigenfunctions of the warped and the registered data.

r1 P Ac q “ 0, we have that HpX r1 q is a valid random probability measure on r0, 1s. Note that for any P pX r1 qpAq “ Ψ1 pAq. Thus, for any Borel subset B of P, we have Borel subset A of r0, 1s, we have HpX r1 q P Bq “ P pX r1 P H ´1 pBqq “ P pX r2 P H ´1 pBqq “ P pHpX r2 q P Bq. P pHpX r1 P Ac q “ 0 discussed The first equality follows from the continuity of H on A and the fact that P pX r1 and X r2 have the same distributions by assumpabove. The second equality follows from the fact that X d r1 q “ HpX r2 q as random probability measures. tion. So, HpX ri q, i “ 1, 2, have strictly increasing cdfs almost surely. Next, note that the random measures HpX ri q, γqu Proposition 2 in Panaretos and Zemel (2016) states that for each i “ 1, 2, the map γ Ñ Etd2W pHpX ´1 admits a unique minimizer given by EtFrΨi u, where FrΨi is the random distribution function of the ranri q. Since X ri “ ξi φi pT ´1 q with Ti being a strictly increasing homeomorphism on r0, 1s, it dom measure HpX i ş ş 1 ri qpAq “ Ψi pAq “ ´1 |φ1 puq|du{ follows from the change-of-variable formula that HpX i Ti pAq r0,1s |φi puq|du. Thus, FrΨi “ Fφi ˝ Ti´1 , equivalently, FrΨ´1 “ Ti ˝ Fφ´1 , where Fφi is the cdf associated with the (deterşi 1 şi ministic) probability measure Φi pAq “ A |φi puq|du{ r0,1s |φ1i puq|du. Note that Fφi has a continuous and strictly increasing cdf since φ1i is zero only on a countable set for i “ 1, 2. Since EpTi q “ Id, it follows that the minimizer EtFrΨ´1 u “ Fφi for i “ 1, 2. But since i d r1 q “ HpX r2 q, it now follows that Fφ “ Fφ . Also, Ti “ Fr´1 ˝ Fφ , equivalently, T ´1 “ F ´1 ˝ FrΨ . HpX 1

2

Ψi

i

i

φi

i

d

Using the above facts and the result obtained in the previous paragraph, it now follows that T1 “ T2 .

A. Chakraborty and V. M. Panaretos/Functional Registration and Local Variations

26

ri , T ´1 q, i “ 1, 2 are the same. To this end, consider We next claim that the joint distributions of pX i the map H1 : f ÞÑ pf, Hpf qq defined from A to A b P with the latter being equipped with the induced product topology and the induced product σ-field. It follows from the same arguments used to prove the continuity of H that H1 is continuous. Thus, for Borel subsets G1 and G2 of C 1 r0, 1s, we have r1 P G1 , FrΨ P Fφ pG2 qq r1 P G1 , T ´1 P G2 q “ P pX r1 P G1 , F ´1 ˝ FrΨ P G2 q “ P pX P pX 1 1 1 1 φ1 r1 q P G1 ˆ Fφ pG2 qq “ P pX r1 P H ´1 pG1 ˆ Fφ pG2 qqq “ P pH1 pX 1 1 1 r2 P H ´1 pG1 ˆ Fφ pG2 qqq “ P pX 2 1

[since Fφ1 “ Fφ2 ]

r2 q P G1 ˆ Fφ pG2 qq “ P pX r2 P G1 , FrΨ P Fφ pG2 qq “ P pH1 pX 2 2 2 r2 P G1 , T ´1 P G2 q. r2 P G1 , F ´1 ˝ FrΨ P G2 q “ P pX “ P pX 2 2 φ2 ri ˝ Ti is the true unobserved process. It is easy to show that the map pf, gq ÞÑ f ˝ g Next, note that Xi “ X 1 1 from C r0, 1sbC r0, 1s into C 1 r0, 1s is continuous. Thus, using the observation in the previous paragraph, d we have X1 “ X2 as random elements in C 1 r0, 1s. It follows from the equality of distributions that their covariance operators are equal, and thus the corresponding eigenfunctions are equal. Now, the covariance operator of Xi is given by V arpξi qφi b φi . Since Xi “ ξi φi is a rank one process, the equality of the covariance operators implies that φ1 “ ˘φ2 (since ||φ1 ||2 “ ||φ2 || “ 1). This equality along with the fact d d that X1 “ X2 implies that ξ1 “ xX1 , φ1 y2 “ xX2 , φ1 y2 “ xX2 , ˘φ2 y2 “ ˘ξ2 . Proof of Theorem 2. First observe that the Ti ’s are also i.i.d. random elements in Cr0, 1s. Moreover, since T1 is strictly increasing and positive, we have Ep||T1 ||8 q “ EpT1 p1qq “ 1 ă 8. Thus, by the strong law for Banach space valued random elements (see, e.g, Theorem 2.4 in Bosq (2000)), it follows that T Ñ EpT1 q “ Id as n Ñ 8 almost surely. In addition, if Ep||T11 ||8 q ă 8 implying that Ep|||T1 |||1 q ă 8, then the almost sure convergence T Ñ EpT1 q “ Id holds in C 1 r0, 1s. (a) Since Fpφ´1 “ T ˝ Fφ´1 , using Theorem 2.18 in Villani (2003), we get that d2W pFpφ , Fφ q “ ||Fpφ´1 ´ Fφ´1 ||22 ż1ˇ ˇ2 ˇ p´1 ˇ “ ˇFφ pFφ ptqq ´ tˇ Fφ pdtq 0 ż1 ˇ ˇ ˇT ptq ´ tˇ2 Fφ dt ď ||T ´ Id||28 Ñ 0 as n Ñ 8. “ 0

(b) Since each Ti is a strictly increasing bijection on r0, 1s, we have ||Tpi´1 ´ Ti´1 ||8 “

ˇ ˇ sup ˇT pTi´1 ptqq ´ Ti´1 ptqˇ “ ||T ´ Id||8 Ñ 0 as n Ñ 8.

tPr0,1s

Since both Tpi´1 and Ti´1 are strictly increasing homeomorphisms, the uniform convergence of Tpi to Ti follows as a consequence of the above uniform convergence. Suppose now that Condition 1 holds. We have discussed towards the beginning of the proof that in this case |||T ´ Id|||1 Ñ 0 as n Ñ 8 almost surely. In view of the first half of part (b) of the theorem along with the definition of the ||| ¨ |||1 norm, it is enough to show the uniform convergence of the derivatives. Since each Ti is a strictly increasing bijection on r0, 1s, so is T for every n ě 1. First note that ˇ 1 ˇ ˇ T pT ´1 ptqq ˇ 1 ˇ ˇ ´1 ´1 ´1 ´1 1 1 1 1 i ´ 1 ´1 ||pTpi q ´ pTi q ||8 “ sup |pT ˝ Ti q ptq ´ pTi q ptq| “ sup ˇ 1 ´1 ˇ ˇ Ti pTi ptqq ˇ tPr0,1s tPr0,1s Ti pTi ptqq

A. Chakraborty and V. M. Panaretos/Functional Registration and Local Variations

27

ˇ 1 ˇ ˇ T ptq ´ 1 ˇ 1 ˇ ˇ ´1 “ sup ˇ 1 ptq ˇˇ ď δ ||T ´ 1||8 , ˇ T tPr0,1s i where 1 is the constant function taking value 1. It thus follows from an earlier bound that 1

|||pTpi´1 q1 ´ pTi´1 q1 |||1 ď ||T ´ Id||8 ` δ ´1 ||T ´ 1||8 ď maxp1, δ ´1 q|||T ´ Id|||1 Ñ 0 as n Ñ 8. 1

Next note that T ptq “ n´1 Now, ||Tpi1

´

Ti1 ||8

řn

1 i“1 Ti ptq

ě n´1

řn

1 i“1 inf sPr0,1s Ti ptq

1

“ δ so that inf tPr0,1s T ptq ě δ ą 0.

ˇ ˇ ˇ ˇ ˇ T 1 pT ´1 ptqq ˇ ˇ ˇ T 1 ptq ˇ i ˇ ˇ ˇ i 1 1 “ sup |pTi ˝ T q ptq ´ “ sup ˇ 1 ´1 ´ Ti ptqˇ “ sup ˇ 1 ´ Ti pT ptqqˇ ˇ ˇ ˇ ˇ tPr0,1s tPr0,1s T pT tPr0,1s T ptq ptqq ˇ ˇ ˇ ˇ ˇ T 1 ptq T 1 pT ptqq ˇ ˇ T 1 pT ptqq ˇ ˇ ˇ ˇ ˇ ď sup ˇ i1 ´ i 1 ´ Ti1 pT ptqqˇ ˇ ` sup ˇ i 1 ˇ ˇ ˇ ˇ T ptq T ptq tPr0,1s T ptq tPr0,1s ˇ ˇ 1 ď δ ´1 sup ˇTi1 ptq ´ Ti1 pT ptqqˇ ` δ ´1 ||Ti1 ||8 ||T ´ 1||8 . ´1 1

Ti1 ptq|

tPr0,1s

Since Ti1 is continuous on r0, 1s, it is uniformly continuous. This and the fact that ||T ´ Id||8 Ñ 0 as ˇ ˇ n Ñ 8 almost surely implies that suptPr0,1s ˇTi1 ptq ´ Ti1 pT ptqqˇ Ñ 0 as n Ñ 8 almost surely. Combining 1 this fact with the uniform convergence of T to 1, we get that |||Tpi ´ Ti |||1 Ñ 0 as n Ñ 8 almost surely. (c) Note that pi ´ Xi ||8 “ |ξi | sup |φpT ||X

´1

ptqq ´ φptq| “ |ξi | sup |φpT ptqq ´ φptq| Ñ 0 as n Ñ 8, tPr0,1s

tPr0,1s

since ||T ´ Id||8 Ñ 0 as n Ñ 8 almost surely, and φ is continuous on r0, 1s and hence uniformly continuous. Suppose now that Condition 1 holds. Then, as before, ˇ ˇ ˇ ˇ ˇ ˇ φ1 ptq ˇ ˇ φ1 pT ´1 ptqq ˇ ˇ ˇ ˇ 1 1 1 1 p ´ φ ptqˇ “ |ξi | sup ˇ 1 ´ φ pT ptqqˇ ||Xi ´ Xi ||8 “ |ξi | sup ˇ 1 ´1 ˇ ˇ ˇ ˇ tPr0,1s T ptq tPr0,1s T pT ptqq ˇ ˇ ˇ ˇ 1 ˇ ˇ φ1 ptq ˇ φ1 pT 1 ptqq φ1 pT ptqq ˇˇ ˇ ˇ ˇ 1 ´ ` |ξ | sup ´ φ pT ptqq ď |ξi | sup ˇ 1 ˇ ˇ ˇ i 1 1 ˇ T ptq ˇ tPr0,1s ˇ T ptq tPr0,1s ˇ T ptq 1

1

ď |ξi |δ ´1 sup |φ1 ptq ´ φ1 pT ptqq| ` |ξi | ||φ1 ||8 δ ´1 ||T ´ 1||8 . tPr0,1s

p 1 ´ X 1 ||8 Ñ 0 and hence |||X pi ´ Xi |||1 Ñ 0 as Using similar arguments as earlier, we conclude that ||X i i n Ñ 8 almost surely. pi “ ξi φ ˝ T ´1 “ Xi ˝ T ´1 , it follows from the change-of-variable formula that (d) Observe that since X ´1 Fpi “ Fφ ˝ T . Thus, ż1 d2W pFpi , Fφ q

“ ||Fpi´1 ´ Fφ´1 ||22 “ ||T ˝ Fφ´1 ´ Fφ´1 ||22 “

0

ˇ ˇ ˇT ptq ´ tˇ2 Fφ pdtq ď ||T ´ Id||28 Ñ 0 as n Ñ 8.

(e) Observe that ||X r ´ µ||8 “ ||n

´1

n ÿ

pi ´ Xi q ` n´1 pX

i“1

n ÿ i“1

Xi ´ µ||8 ď n

´1

n ÿ i“1

pi ´ Xi ||8 ` ||n´1 ||X

n ÿ i“1

Xi ´ µ||8 .

A. Chakraborty and V. M. Panaretos/Functional Registration and Local Variations

28

Since the Xi ’s are i.i.d. random elements in Cr0, 1s with Ep||X1 ||8 q “ Ep|ξ1 |q||φ||8 ă 8, we conclude ř from the strong law for Banach space valued random elements that ||n´1 ni“1 Xi ´ µ||8 Ñ 0 as n Ñ 8 almost surely. Also, from the proof of part (c), we have that n´1

n ÿ

pi ´ Xi ||8 “ sup |φpT ptqq ´ φptq| ˆ n´1 ||X tPr0,1s

i“1

n ÿ

|ξi | “ sup |φpT ptqq ´ φptq| ˆ tEp|ξ1 |q ` op1qu

i“1

tPr0,1s

as n Ñ 8 almost surely. Thus, using similar arguments as in part (c) of the theorem, we obtain ř pi ´Xi ||8 Ñ 0 as n Ñ 8 almost surely. Combining the above facts, we conclude ||X r ´µ||8 Ñ n´1 ni“1 ||X 0 as n Ñ 8 almost surely. ř Note that since Xi “ ξi φ, it follows that ||n´1 ni“1 Xi1 ´ µ1 ||8 Ñ 0 as n Ñ 8 almost surely. Now, suppose that Condition 1 holds. A similar decomposition as above yields 1

||X r ´ µ1 ||8 ď n´1

n ÿ

p 1 ´ X 1 ||8 ` ||n´1 ||X i i

n ÿ

Xi1 ´ µ1 ||8 .

i“1

i“1

The proof of part (c) implies that ¸# + ˜ n n ÿ ÿ 1 1 p 1 ´ X 1 ||8 ď δ ´1 n´1 |ξi | n´1 ||X sup |φ1 ptq ´ φ1 pT ptqq| ` ||φ1 ||8 ||T ´ 1||8 . i i tPr0,1s

i“1

i“1

The right-hand term above converges to zero as n Ñ 8 almost surely. The result is now established upon combining the above facts. (f) Straightforward algebraic manipulations yield ´1 Kx r “ n

“ n´1

n ÿ

p i ´ X r q b pX pi ´ X r q pX

i“1 n ÿ

n ÿ

i“1

i“1

pXi ´ Xq b pXi ´ Xq ` n´1

` n´1

n ÿ

pi ´ Xi q b pX pi ´ Xi q ´ pX ´ X r q b pX ´ X r q pX

pi ´ Xi q b pXi ´ Xq ` pXi ´ Xq b pX pi ´ Xi qu. tpX

i“1

ř Denote Kx“ n´1 ni“1 pXi ´ Xq b pXi ´ Xq. Then, x |||Kx r ´ K ||| ď

n n 2ÿ p 1ÿ p ||Xi ´ Xi ||2 ||Xi ´ X||2 ` ||Xi ´ Xi ||22 ` ||X ´ X r ||22 . n i“1 n i“1

ř pi ´ Xi ||2 ||Xi ´ X||2 ď tn´1 řn ||X pi ´ Using the Cauchy-Schwarz inequality, we have n´1 ni“1 ||X i“1 řn řn 2 1{2 ´1 2 1{2 ´1 2 Xi ||2 u tn , and n i“1 ||Xi ´ X||2 u i“1 ||Xi ´ X||2 “ Op1q as n Ñ 8 almost surely. It follows from the arguments in the proof of part (c) of the theorem that ˜ ¸ n n n ÿ ÿ ÿ pi ´ Xi ||2 ď n´1 pi ´ Xi ||2 ď sup |φpT ptqq ´ φptq|2 n´1 n´1 ||X ||X |ξi |2 , 2

i“1

8

i“1

tPr0,1s

i“1

and the right hand side is op1q as n Ñ 8 almost surely since Ep|ξ1 |2 q ă 8. Further, ||X ´ X r ||22 “ op1q x as n Ñ 8 almost surely. Thus, |||Kx r ´ K ||| “ op1q as n Ñ 8 almost surely. p r ps, tq to Kps, tq is obtained by use of a decomposition of The proof of the uniform convergence of K p r ps, tq similar to the one used above, noting that Kps, p tq converges uniformly to Kps, tq (by the strong K

A. Chakraborty and V. M. Panaretos/Functional Registration and Local Variations

29

law of large numbers in Cpr0, 1s2 q), and the fact that all the other bounds hold in the supremum norm. ş ş ´1 1 Kps, tqφpsqds for all t P r0, 1s, where p “λ p´1 1 K p p Next, note that φptq 0 r ps, tqφpsqds and φptq “ λ 0 ? ´1 x p ´ λ| ď |||Kx p |λ ´ K ||| Ñ 0 as n Ñ 8 almost surely. Also, || φ ´ φ|| r 2 ď 2 2λ |||Kr ´ K ||| Ñ 0 as n Ñ 8 almost surely. So, ˇ ˇ ż ż1 ˇ ˇ ´1 1 ´1 p p p p p p ˇ Kps, tqφpsqdsˇˇ |φptq ´ φptq| ď ˇλ Kr ps, tqφpsqds ´ λ 0 0 ˇ ˇ ż1 ż ˇ ˇ ´1 1 ´1 p p p ˇ Kps, tqφpsqdsˇˇ Kps, tqφpsqds ´ λ ` ˇλ 0 0 ˇ ˇ ż1 ż ˇ ˇ ´1 1 p Kps, tqφpsqdsˇˇ Kps, tqφpsqds ´ λ´1 ` ˇˇλ 0 0 ˇ ˇ ˇ p´1 ˇ ´1 p ´1 p p p ď λ ||Kr ´ K||8 ` λ ||K||8 ||φ ´ φ||2 ` ˇpλ ´ λ´1 qλφptqˇ p ´1 ` op1qq´1 ||φ||8 p r ´ K||8 ` ||K||8 ||φp ´ φ||2 u ` |λ ´ λ|pλ ď pλ´1 ` op1qqt||K as n Ñ 8 almost surely. Thus, ||φp ´ φ||8 Ñ 0 as n Ñ 8 almost surely. p ´ xXi , φy| ď |xX p ` |xXi , φp ´ φy| ď ||X pi ´ Xi ||8 ` ||φp ´ φ||2 Ñ 0 pi , φy pi ´ Xi , φy| Finally, |ξpi ´ ξi | “ |xX as n Ñ 8 almost surely. Proof of Theorem 3. We have |T1 ptq ´ T1 psq| ď ||T11 ||8 |s ´ t| and by assumption Ep||T11 ||28 q ă 8. So, by the CLT for i.i.d. Cr0, 1s valued random elements (see, e.g, Theorem 2.4 Bosq (2000)), we have ? d npT ´ Idq Ñ Y for a zero mean Gaussian random element Y in Cr0, 1s. ş1 (a) From the proof of part (a) of Theorem 2, one has that d2W pFpφ , Fφ q “ 0 |T ptq ´ t|2 Fφ pdtq. Now, it ş1 is easy to check that the map Cr0, 1s Q f Ñ 0 |f ptq|2 Fφ pdtq is continuous. The result follows from the continuous mapping theorem. ? ? (b) Note that for each fixed i ě 1, we have npTpi´1 ´ Ti´1 q “ Un ˝ Vn , where Un “ npT ´ Idq and Vn “ Ti´1 . We will first derive the weak limit conditional on Ti “ ti . From the previous paragraph, it ř ? d follows that conditional on Ti “ ti , Un “ npn´1 ti ` n´1 j‰i Tj ´ Idq Ñ Y , and Vn , being a constant sequence, converges conditionally in probability to t´1 as n Ñ 8. So, by Theorem 4.4 in Billingsley i d ´1 (1968), conditional on Ti “ ti , we have pUn , Vn q Ñ pY, ti q in the Cr0, 1s topology. Using the fact that the map pf, gq ÞÑ f ˝ g is continuous in Cpr0, 1s2 q (see, e.g., p. 155 in Billingsley (1968)), it follows from ? d the continuous mapping theorem that conditional on Ti “ ti , npTpi´1 ´ Ti´1 q Ñ Y ˝ t´1 as n Ñ 8 i for each fixed i ě 1. Thus, by the Dominated Convergence Theorem, the unconditional distribution of ? p´1 npTi ´ Ti´1 q converges weakly as n Ñ 8 for each fixed i ě 1. ? ? ´1 To prove the weak convergence of npTpi ´ Ti q “ npTi ˝ T ´ Ti q, we will as earlier first derive its weak limit conditional on Ti “ ti . Now, using the fact that Ti1 P Cr0, 1s almost surely, we have ´1 ´1 Tpi psq ´ ti psq “ ti pT psqq ´ ti psq “ ti ps ` T psq ´ sq ´ ti psq

“ pT

´1

psq ´ sq ˆ t1i ps ` βpT

´1

psq ´ sqq

for some β1 P r0, 1s (possibly depending on s and i). Thus, ? ? ? ´1 ´1 npTpi ´ ti q “ t npT ´ Idqu ˆ t1i p¨ ` oP p1qq “ t npId ´ T q ˝ T u ˆ t1i p¨ ` oP p1qq ´1

where the oP p1q term is uniform in s since ||T ´ Id||8 Ñ 0 as n Ñ 8 almost surely. Using similar arguments as in the above proof and noting that ||T ´ Id||8 as n Ñ 8 almost surely, we deduce that ? p d npTi ´ ti q Ñ Y ˆ t1i as n Ñ 8. Thus, by the Dominated Convergence Theorem, the unconditional

A. Chakraborty and V. M. Panaretos/Functional Registration and Local Variations

30

? distribution of npTpi ´ Ti q converges weakly as n Ñ 8 for each fixed i ě 1. (c) Note that for each fixed i ě 1, pi psq ´ Xi psq “ ξi tφpT ´1 psqq ´ φpsqu “ ξi tpT ´1 psq ´ sqφ1 ps ` β2 pT ´1 psq ´ sqqu X ? ? pi ´ Xi q “ ξi t npId ´ T q ˝ T ´1 u ˆ φ1 p¨ ` oP p1qq, npX

ñ

where β2 P r0, 1s, and the oP p1q term is uniform in s as earlier. Similar arguments as in part (b) above ? p d 1 yield npX i ´ Xi q Ñ ξi Y ˆ φ as n Ñ 8 for each fixed i ě 1. (d) The proof is similar to that of part (a) and is omitted. (e) Note that # + n ÿ ? ? ´1 ´1 ξi φ ˝ T ´ Epξ1 qφ npX r ´ µq “ n n # ? n n´1



i“1 n ÿ

+

pξi ´ Epξ1 qq φ ˝ T

´1

) ? ! ´1 ` Epξ1 q n φ ˝ T ´ φ

i“1 d

Ñ N p0, V arpξ1 qqφ ` Epξ1 qY ˆ φ1 , which follows from similar arguments as in part (c) and the independence of the ξi ’s and the Ti ’s. (f) For the first part, note that ´1 Kx r “ n

“ n´1

n ÿ

pi ´ X r q p i ´ X r q b pX pX

i“1 n ÿ

pi ´ µq b pX pi ´ µq ´ pX r ´ µq b pX r ´ µq pX

i“1

“ S1 ` S2 ,

say.

Now, some straightforward manipulations yield n ÿ

S1 “ n´1 “ n´1

tξi φ ˝ T

´1

i“1 n ÿ

´ Epξ1 qφu b tξi φ ˝ T

tξi ´ Epξ1 qu2 pφ ˝ T

´1

q b pφ ˝ T

´1

´1

´ Epξ1 qφu

q ` E 2 pξ1 qpφ ˝ T

´1

´ φq b pφ ˝ T

´1

´ φq

i“1

`n

´1

Epξ1 q

n ÿ

” ı ´1 ´1 ´1 ´1 tξi ´ Epξ1 qu pφ ˝ T q b pφ ˝ T ´ φq ` pφ ˝ T ´ φq b pφ ˝ T q .

i“1

So, ? npS1 ´ K q # + n ÿ ? ´1 ´1 “ n n´1 tξi ´ Epξ1 qu2 pφ ˝ T q b pφ ˝ T q ´ K # ? “ n n´1 # ? “ n n´1

i“1 n ÿ i“1 n ÿ

+ tξi ´ Epξ1 qu2 pφ ˝ T

´1

q b pφ ˝ T

´1

q ´ V arpξ1 qφ b φ

“ ‰ ´1 ´1 tξi ´ Epξ1 qu2 ´ V arpξ1 q pφ ˝ T q b pφ ˝ T q

i“1

” ı ´1 ´1 ` V arpξ1 q pφ ˝ T q b pφ ˝ T q ´ φ b φ

A. Chakraborty and V. M. Panaretos/Functional Registration and Local Variations ´1

31

´1

` E 2 pξ1 qpφ ˝ T ´ φq b pφ ˝ T ´ φq + n ” ı ÿ ´1 ´1 ´1 ´1 ` n´1 Epξ1 q tξi ´ Epξ1 qu pφ ˝ T q b pφ ˝ T ´ φq ` pφ ˝ T ´ φq b pφ ˝ T q i“1

The first term on the right hand side of the above equality converges in distribution to N p0, Etξ1 ´ Epξ1 qu4 qφ b φ since T Ñ Id as n Ñ 8 almost surely. For the latter reason, the third and the fourth terms converge to zero in probability as n Ñ 8. For the second term, note that pφ ˝ T

´1

“ pφ ˝ T

´1

q b pφ ˝ T

´1

q´φbφ

´ φq b φ ` pφ ˝ T

´1

q b pφ ˝ T

´1

´ φq.

Thus, by similar arguments as in part (c) earlier, and the continuity of the mapping pf, gq ÞÑ f b g from L2 pr0, 1s2 q to the space of Hilbert Schmidt operators, we have that the second term converges in distribution to V arpξ1 qtpY ˆ φ1 q b φ ` φ b pY ˆ φ1 qu. Combining the above observations and the fact ? that nS2 Ñ 0 in probability (follows from part (e)), we deduce that ?

d 4 1 1 npKx r ´ K q Ñ N p0, Etξ1 ´ Epξ1 qu qφ b φ ` V arpξ1 qtpY ˆ φ q b φ ` φ b pY ˆ φ qu

as n Ñ 8. ? p In order to prove the weak convergence of the empirical process t npK r ps, tq ´ Kps, tqq : s, t P r0, 1su 2 in Cpr0, 1s q, we follow the same decomposition as in the proof of the weak convergence of the operators in the Hilbert Schmidt topology. Now, note that the proof of part (c) of the theorem implies that the ? ´1 empirical process t npφpT ptqq ´ φptqq : t P r0, 1su in Cr0, 1s converges in distribution to the process tY ptqφ1 ptq : t P r0, 1su in Cr0, 1s. This fact and the same arguments as in part (f) yield ? p r ps, tq ´ Kps, tqq : s, t P r0, 1su t npK d

Ñ tZφpsqφptq ` V arpξ1 qrY psqφ1 psqφptq ` Y ptqφ1 ptqφpsqs : s, t P r0, 1su as n Ñ 8, where Z „ N p0, Etξ1 ´ Epξ1 qu4 q does not depend on s, t. řn ´1 ´1 ´1 2 p first note that Kx For the weak convergence of φ, q b pφ ˝ T q. Thus, r “ n i“1 pξi ´ ξq pφ ˝ T ´1 ´1 φp “ pφ ˝ T q{||φ ˝ T ||2 . Now, φp ´ φ “

φ˝T ||φ ˝ T



φ˝T



´φ“

´φ

´1

´

||2

´

´φ

´1

φp||φ ˝ ||φ ˝ T

||2

´φ

´1

φ˝T

||φ ˝ T

||2

´1

´1

||φ ˝ T Using the weak convergence of

´1

´1

||φ ˝ T φ˝T

´1

´1 T ||22

||2 p||φ ´1

||φ ˝ T

˝T

||2 ´ 1q

´1

||2

´ 1q ´1

||2 ` 1q

´ φ||22 ` 2xφ ˝ T ´1

´1

||φ ˝ T

||2

´1

φp||φ ˝ T

´

φp||φ ˝ T

||2 p||φ ˝ T

´1

´1

´ φ, φyq

? ´1 npφ ˝ T ´ φq to Y ˆ φ1 in the Cr0, 1s topology, we have that

? 1 d npφp ´ φq Ñ Y ˆ φ1 ´ ˆ 2xY ˆ φ1 , φyφ “ Y ˆ φ1 ´ xY ˆ φ1 , φyφ 2 as n Ñ 8 in the Cr0, 1s topology. Finally, for the weak convergence of the ξpi ’s, observe that ?

npξpi ´ ξi q “

.

||2 ` 1q

? pi ´ Xi , φp ´ φy ` xX pi ´ Xi , φy ` xXi , φp ´ φyu ntxX

A. Chakraborty and V. M. Panaretos/Functional Registration and Local Variations



32

? ´1 ´1 ntξi xpφ ˝ T ´ φq, pφp ´ φqy ` ξi xpφ ˝ T ´ φq, φy ` ξi xφ, pφp ´ φqyu.

Using the independence of ξi and the Tj ’s, and using the asymptotic distributions obtained above and in part (c), it follows that ? d npξpi ´ ξi q Ñ ξi txY ˆ φ1 , φy ` xφ, pY ˆ φ1 ´ 2´1 t||Y ˆ φ1 ` φ||22 ´ 1uφqyu as n Ñ 8. In order to prove Theorem 4, we will first prove a few crucial results. Proposition 1. Assume that φ P C 2 r0, 1s and inf tPr0,1s T 1 puq ě δ ą 0 almost surely for a deterministic ş1 1 ř constant δ. Then, for each i ě 1, we have r´1 j“1 |φpsi,j`1 q ´ φpsi,j q| “ 0 |φ puq|du ` B1,r almost surely, ř where B1,r “ Opr´1 q almost surely with the Op1q term being uniform in i. Further, jPIt |φpsi,j`1 q ´ şT ´1 ptq 1 |φ puq|du ` B2,r ptq for all t P r0, 1s almost surely, where ||B2,r ||8 “ Opr´1 q almost surely φpsi,j q| “ 0 i ş1 1 ř with the Op1q term being uniform in i. Consequently, we have r´1 j“1 |φptj`1 q ´ φptj q| “ 0 |φ puq|du ` B3,r şt ř and jPIt |φptj`1 q ´ φptj q| “ 0 |φ1 puq|du ` B4,r ptq for all t P r0, 1s almost surely, where B3,r “ Opr´1 q and ||B4,r ||8 “ Opr´1 q almost surely. Proof of Proposition 1. First, let us define t0 “ 0 and tr`1 “ 1 in case t1 ą 0 and tr ă 1. Then, ř ttj : 0 ď j ď r ` 1u is a partition of r0, 1s. Consider the sum Si “ rj“0 |φpsi,j`1 q ´ φpsi,j q| and note that ř si,j q|, where sri,j P rsi,j , si,j`1 s. The right hand side is by a Taylor expansion, Si “ rj“0 psi,j`1 ´ si,j q|φ1 pr ş1 1 a Riemann sum approximation of 0 |φ puq|du with tsi,j “ Ti´1 ptj q : 0 ď j ď r ` 1u as the partition of r0, 1s, since Ti is a strictly increasing bijection. Thus, writing ∆ “ max0ďjďr psi,j`1 ´ si,j q, we have ż1 |Si ´ |φ1 puq|du| ď supt| |φ1 ptq| ´ |φ1 psq| | : s, t P r0, 1s and |t ´ s| ď ∆u 0

ď supt|φ1 ptq ´ φ1 psq| : s, t P r0, 1s and |t ´ s| ď ∆u ď ||φ2 ||8 ∆. Now for any 0 ď j ď r, we have si,j`1 ´ si,j “ Ti´1 ptj`1 q ´ Ti´1 ptj q “ ptj`1 ´ tj q{Ti1 pTi´1 pr tj qq, for some r tj P rtj , tj`1 s. Using the assumption in the theorem and that on the grid, it now follows that ş1 ∆ “ max0ďjďr psi,j`1 ´ si,j q ď δ ´1 Opr´1 q uniformly on i. Thus, |Si ´ 0 |φ1 puq|du| ď ||φ2 ||8 δ ´1 Opr´1 q. ř To complete the first part of the proof, note that r´1 j“1 |φpsi,j`1 q ´ φpsi,j q| differs from Si by at most two ´1 terms, and both of these terms are Opr q uniformly over i by the same arguments as those for Si . For the second part, fix any t P r0, 1s. Defining B2,r p0q “ 0, there is nothing to prove when t “ 0. For t ą 0, define t0 “ 0. If j ˚ is the largest j for which tj`1 ď t, define tj ˚ `1 “ t if tj ˚ `1 ă t. Note that j ˚ depends on t. Then, ttj : 0 ď j ď j ˚ `1u is a partition of r0, ts, and hence tsi,j “ Ti´1 ptj q : 0 ď j ď j ˚ `1u ř˚ is a partition of r0, Ti´1 ptqs. Define Ri ptq “ jj“0 |φpsi,j`1 q´φpsi,j q|. Then, by similar arguments as earlier, we have ˇ ˇ ż T ´1 ptq ˇ ˇ i ˇ ˇ 1 |φ puq|duˇ ď ||φ2 ||8 δ ´1 max˚ psi,j`1 ´ si,j q “ B2,r ptq, say. ˇRi ptq ´ ˇ ˇ 0ďjďj 0 Thus, ||B2,r ||8 ď Opr´1 q uniformly over i. The proof is completed upon noting that Ri ptq differs from ř ´1 jPIt |φpsi,j`1 q ´ φpsi,j q| by at most two terms, and both of them are Opr q uniformly over i by the same argument as before. The last statement of the proposition is an immediate corollary for the case T “ Id almost surely.

A. Chakraborty and V. M. Panaretos/Functional Registration and Local Variations

33

Note that the Bl,r ’s are not continuous functions, but we can still define their || ¨ ||8 norms as all of them are uniformly bounded functions on r0, 1s. The following corollary is a consequence of Proposition ş1 1 and the fact that 0 |φ1 puq|du P p0, 8q. Corollary 1. Under the assumptions of Proposition 1, we have Fri,d ptq “ Fri ptq ` C1,r ptq for all t P r0, 1s almost surely for each i ě 1, where ||C1,r ||8 “ Opr´1 q almost surely uniformly over i. Further, Fd ptq “ Fφ ptq ` C2,r ptq for all t P r0, 1s, where ||C2,r ||8 “ Opr´1 q. ş1 Lemma 2. Assume that 0 |φ1 puq|´ du ă 8 for some  ą 0. Then, |Fφ´1 psq ´ Fφ´1 ptq| ď Cφ |t ´ s|{p1`q , ş1 where Cφ1` “ 0 |φ1 puq|´ du. In other words, Fφ´1 is α-H¨ older continuous for α “ {p1 ` q. Proof of Lemma 2. Note that the assumption in the statement of the lemma implies that φ1 ą 0 almost everywhere with respect to the Lebesgue measure on r0, 1s. This fact along with Zarecki’s theorem on the inverse of an absolutely continuous function (see, e.g., p. 271 in Natanson (1955)) applied to the function şt Fφ yields that Fφ´1 is absolutely continuous on r0, 1s. Thus, Fφ´1 ptq “ 0 rFφ1 pFφ´1 puqqs´1 du. Now, using H¨older’s inequality and some algebraic manipulations, we obtain ˆż 1 ˙1{q ´1 ´1 1 1{p 1 ´q`1 |Fφ psq ´ Fφ ptq| ď ||φ ||8 |t ´ s| |φ puq| du . 0

To complete the proof, choose q “ 1 ` , which implies that p “ p1 ` q{. Proposition 2. Assume that the conditions of Proposition 1 and Lemma 2 hold. Let α “ {p1 ` q as in Lemma 2. Then, for each i ě 1, (a) Fri´1 is α-H¨ older continuous almost surely. ´ r r (b) Fi,d ptq “ Fi´1 ptq ` ||Ti1 ||8 D1,r ptq for all t P r0, 1s almost surely, where ||D1,r ||8 “ Opr´α q almost surely uniformly over i. Proof of Proposition 2. (a) Using the definition of Fri , it follows that |Fri´1 psq ´ Fri´1 ptq| “ |Ti pFφ´1 psqq ´ Ti pFφ´1 ptqq| ď ||Ti1 ||8 |Fφ´1 psq ´ Fφ´1 ptq| ď ||Ti1 ||8 Cφ |s ´ t|α , where the last inequality follows from Lemma 2. This completes the proof of part (a). (b) As mentioned earlier, Fri,d is a cadlag step function with maximum jump discontinuities given by Ai,r . ´ Thus, if t P pFri,d ptj q, Fri,d ptj`1 qs for any 1 ď j ď r´1, it follows that Fri,d pFri,d ptqq “ Fri,d ptj`1 q “ t`qi,j,r ptq, where qi,j,r ptq “ Fri,d ptj`1 q ´ t. So, |qi,j,r ptq| ď Fri,d ptj`1 q ´ Fri,d ptj q ď Ai,r , where Ai,r is the maximum step size of Fri,d defined earlier. Now, from arguments similar to those used in Proposition 1, it follows ´ that Ai,r “ Opr´1 q uniformly in i. Thus, Fri,d pFri,d ptqq “ t ` Qi,r ptq for all t P r0, 1s almost surely, where ´1 ||Qr ||8 “ Opr q almost surely uniformly over i. From Proposition 1, we know that Fri,d psq “ Fri psq ` C1,r psq for all s P r0, 1s almost surely, where ´ ||C1,r ||8 “ Opr´1 q almost surely uniformly over i. Letting s “ Fri,d ptq, we now have t ` Qr ptq “ ´ ´ ´ r r r Fi pFi,d ptqq ` C1,r pFi,d ptqq for all t almost surely. Re-arranging terms, we obtain Fri,d ptq “ Fri´1 pt ` Q1,r ptqq ´ for all t P r0, 1s almost surely, where Q1,r ptq “ Qr ptq ´ C1,r pFri,d ptqq. Thus, ||Q1,r ||8 “ Opr´1 q almost ´ surely uniformly over i. Now, using part (a), we can conclude that Fri,d ptq “ Fri´1 ptq ` ||Ti1 ||8 D1,r ptq for all t P r0, 1s almost surely, where D1,r ptq “ Cφ |Q1,r ptq|α satisfies ||D1,r ||8 “ Opr´α q almost surely uniformly over i. Proof of Theorem 4. (a) Note that Fpd˚ ptq “ n´1

n ÿ i“1

´ Fri,d ptq “ n´1

n ÿ i“1

˜ tFri´1 ptq ` ||Ti1 ||8 D1,r ptqu “ Fpφ´1 ptq `

n´1

n ÿ i“1

¸ ||Ti1 ||8 D1,r ptq

A. Chakraborty and V. M. Panaretos/Functional Registration and Local Variations

34

“ Fpφ´1 ptq ` D2,r ptq for all t P r0, 1s almost surely, where ||D2,r ||8 “ Opr´α q almost surely since ||D1,r ||8 “ Opr´α q almost ř surely and n´1 ni“1 ||Ti1 ||8 “ Ep||T11 ||8 q ` op1q almost surely. Thus, it follows from Theorem 2.18 in Villani (2003) that d2W pFpd , Fφ q “ ||Fpd˚ ´ Fφ´1 ||22 ď 2||Fpφ´1 ´ Fφ´1 ||22 ` 2||D2,r ||22 ď 2d2W pFpφ , Fφ q ` Opr´2α q almost surely. Combining the above statement with part (a) of Theorem 2 and 3 completes the proof of part (a) of Theorem 4. (b) Next, note that ˚ Tpi,d ptq “ n´1

“ n´1 “ n´1

ÿ l“1 n ÿ

´ r Frl,d pFi,d ptqq “ n´1

n ! ) ÿ ´1 r 1 r r Fl pFi,d ptqq ` ||Ti ||8 D1,r pFi,d ptqq l“1

Frl´1 pFri ptq ` C1,r ptqq ` n´1

n ÿ

||Ti1 ||8 D1,r pFri,d ptqq

i“1

l“1 n ” ÿ

n ! )ı ÿ Frl´1 pFri ptqq ` Frl´1 pFri ptq ` C1,r ptqq ´ Frl´1 pFri ptqq ` n´1 ||Ti1 ||8 D1,r pFri,d ptqq i“1

l“1

“ Tpi´1 ptq ` n´1

n ÿ

n ! ) ÿ Frl´1 pFri ptq ` C1,r ptqq ´ Frl´1 pFri ptqq ` n´1 ||Ti1 ||8 D1,r pFri,d ptqq, i“1

l“1

for all t P r0, 1s almost surely. By part (a) of Proposition 2, we have |tFrl´1 pFri ptq`C1,r ptqq´ Frl´1 pFri ptqqu| ď ||Ti1 ||8 D3,r ptq for all t P r0, 1s almost surely, where ||D3,r ||8 “ Opr´α q almost surely uniformly over ř i. Thus, suptPr0,1s n´1 ni“1 |tFrl´1 pFri ptq ` C1,r ptqq ´ Frl´1 pFri ptqqu| ď tEp||T11 ||8 q ` op1quOpr´α q almost ř surely. Similar arguments yield suptPr0,1s n´1 ni“1 ||Ti1 ||8 |D1,r pFri,d ptqq| ď tEp||T11 ||8 q ` op1quOpr´α q almost surely. Thus, ˚ ptq “ Tpi´1 ptq ` D4,r ptq, Tpi,d

(7)

for all t P r0, 1s almost surely, where ||D4,r ||8 “ Opr´α q almost surely uniformly over i. Consequently, ˚ ||Tpi,d ´ Ti´1 ||8 ď ||Tpi´1 ´ Ti´1 ||8 ` Opr´α q

almost surely, where the Op1q term is uniform over i. This along with part (b) of Theorem 2 shows that ˚ ´ T ´1 || Ñ 0 as n Ñ 8 almost surely for all i ě 1. Equation (7) implies that ?npT p˚ ´ T ´1 q “ ||Tpi,d 8 i i,d ? p´1 i ´1 ? ´α npTi ´ Ti q ` Op nr q in L2 r0, 1s. This in conjunction with part (b) of Theorem 3 proves that ? p˚ ? npTi,d ´ Ti´1 q has the same asymptotic distribution as npTpi´1 ´ Ti´1 q in the L2 r0, 1s topology. ´ p Next we consider Tpi,d ptq “ Fri,d pFd ptqq “ Fri´1 pFpd ptqq ` ||Ti1 ||8 D1,r pFpd ptqq for all t P r0, 1s almost ř ´ ´ u ptq “ tGn ` D5,r u´ ptq, where surely (from part (b) of Proposition 2). Note that Fpd ptq “ tn´1 nl“1 Frl,d ř ř n n ´1 Gn psq “ n´1 l“1 Frl psq and D5,r psq “ n´1 l“1 ||Tl1 ||8 D1,r psq. Thus, ||D5,r ||8 “ Opr´α q. Also note r n,r “ Gn ` D5,r “ n´1 řn Fr´ so that Gn is a strictly increasing homeomorphism on r0, 1s. Define G l“1 l,d r that Gn,r is an increasing function (not necessarily strictly increasing) from r0, 1s onto r0, 1s. In fact, ´ since each Frl,d is left continuous and has right limits (being the generalized inverse of the cadlag function r r Fl,d ), Gn,r is also left continuous and has right limits. r n,r pvq, G r n,r pv`qs for some v P r0, 1s with G r n,r pv`q ą G r n,r pvq, then G r n,r pFpd ptqq “ G r n,r pG r ´1 ptqq “ If t P pG n,r r n,r pvq “ t ` pG r n,r pvq ´ tq. Now, |G r n,r pvq ´ t| ď |G r n,r pv`q ´ G r n,r pvq| “ |Gn pv`q ´ Gn pvq ` D5,r pv`q ´ G

A. Chakraborty and V. M. Panaretos/Functional Registration and Local Variations

35

D5,r pvq| “ |D5,r pv`q ´ D5,r pvq| “ Opr´α q uniformly in t almost surely, where the penultimate equality r n,r pFpd ptqq ´ D5,r pFpd ptqq “ t ` Opr´α q follows from the continuity of Gn . So, in these cases, Gn pFpd ptqq “ G uniformly in t almost surely, i.e., t “ Gn pFpd ptqq ` Opr´α qq uniformly in t almost surely. r n,r pv1 q “ G r n,r pv2 q, G r n,r pvq ă G r n,r pv1 q for v ă v1 and Next, suppose that for some v1 ă v2 , we have G r n,r pvq ą G r n,r pv2 q for v ą v2 . If t “ G r n,r pv1 q “ G r n,r pv2 q, then G r n,r pFpd ptqq “ t if v1 is a continuity point G r n,r . If not, then this is already taken care of in the previous paragraph. In the former case, we have of G t “ Gn pFpd ptqq ` Opr´α q uniformly over t almost surely. r n,r , then G r n,r pFpd ptqq “ t as well, Finally, if t is a point of both continuity and strict increment of G which implies that t “ Gn pFpd ptqq ` Opr´α q uniformly over t almost surely. Thus, all possibilities are exhausted. Let us denote the Opr´α q term by D6,r p¨q. řn r´1 ´1 ř ´1 ´1 Now note that G´1 “ pn´1 nl“1 Tl ˝Fφ´1 q´1 “ Fφ ˝T . Thus, it follows from our n “ pn l“1 Fl q ´1 work above that Fpd ptq “ Fφ tT pt´D6,r ptqqu. Recall that Fr´1 “ Ti ˝F ´1 and that Tpi,d ptq “ Fr´1 pFpd ptqq` i

φ

i

´1 for all t P r0, 1s almost surely as obtained earlier. Since Fpd ptq “ Fφ tT pt´D6,r ptqqu, it ´1 follows from the decomposition of Tpi,d ptq that Tpi,d ptq “ Ti tT pt´D6,r ptqqu` ||Ti1 ||8 D1,r pFpd ptqq for all t P ř 1 r0, 1s almost surely. Since inf tPr0,1s T 1 ptq ě δ ą 0, it follows that inf tPr0,1s T ptq ě n´1 nl“1 inf tPr0,1s Tl1 ptq ě ´1 ´1 δ ą 0. So, by Taylor expansion, we have Ti tT pt´D6,r ptqqu “ Ti pT ptqq`||Ti1 ||8 D7,r ptq for all t P r0, 1s almost surely, where ||D7,r ||8 “ Opr´α q almost surely, where the Op1q term is uniform over i. Combining the above findings, we arrive at

||Ti1 ||8 D1,r pFpd ptqq

1 p Tpi,d ptq “ Fri´1 pFpd ptqq ` ||Ti1 ||8 D1,r pFpd ptqq “ Fri´1 pG´1 n ptq ` D7,r ptqq ` ||Ti ||8 D1,r pFd ptqq

“ Ti pT

´1

ptqq ` ||Ti1 ||8 D7,r ptq ` ||Ti1 ||8 D1,r pFpd ptqq,

where the last equality follows from the discussion in the previous paragraph. Since ||D1,r ||8 “ Opr´α q almost surely uniformly over i, we obtain Tpi,d ptq “ Tpi ptq ` ||Ti1 ||8 D8,r ptq for all t P r0, 1s almost surely, where ||Dr,8 ||8 “ Opr´α q almost surely uniformly over i. Consequently, ||Tpi,d ´ Ti ||8 ď ||Tpi ´ Ti ||8 ` Op1qr´α , almost surely. Combined with part (b) of Theorem 2, this shows that ||Tpi,d ´ Ti ||8 Ñ 0 as n Ñ 8 almost ? ? ? surely for all i ě 1. Equation (8) implies that npTpi,d ´Ti q “ npTpi ´Ti q`Op nr´α q in L2 r0, 1s. This in ? conjunction with part (b) of Theorem 3 proves that npTpi,d ´ Ti q has the same asymptotic distribution ? as npTpi ´ Ti q in the L2 r0, 1s topology. This completes the proof of part (b) of Theorem 4. (c) Next we register the warped functional observations. As mentioned earlier, since the warped observations are only recorded over a discrete grid, the registration algorithm in the fully observed case will not work. So, as a pre-processing step, we need to first smooth the warped discrete observations. We do this by using the Nadaraya-Watson kernel regression estimator as follows. Let kp¨q be any kernel supported pi,d is given by on r´1, 1s and choose a bandwidth parameter h ą 0. Then, the smooth version of X ¯ ´ ¯ ´ řr řr t´t t´t ri ptj q k hj X k h j φpTi´1 ptj qq j“1 j“1 ´ ¯ ´ ¯ “ ξi , t P r0, 1s. Xi: ptq “ řr řr t´tj t´tj k k j“1 j“1 h h Now, note that ¯ ¯ ´ ´ ˇ ř ˇ řr t´tj t´tj r ´1 ´1 ˇ ˇ tφpT pt qq ´ φpT ptqqu k |tj ´ t| k j ˇ ˇ i i h h ri ptq| “ ˇξi j“1 ˇ ď ||φ1 ||8 δ ´1 |ξi | j“1 ´ ¯ ´ ¯ |Xi: ptq ´ X ď c|ξi |h, řr řr ˇ ˇ t´tj t´tj ˇ ˇ k j“1 k j“1 h h

A. Chakraborty and V. M. Panaretos/Functional Registration and Local Variations

36

for all t P r0, 1s almost surely, where c is a constant not depending on i and t. The first inequality above follows from arguments similar to those used in the proof of Theorem 1. The second inequality follows form the fact that kp¨q is supported on r´1, 1s so that only those j’s in the numerator for which |tj ´t| ď h ri ||8 ď c|ξi |h almost surely. will contribute to the sum. Thus, ||Xi: ´ X ri,d by defining X p ˚ “ X : ˝ Tpi,d for each 1 ď i ď n. We register the warped discrete observation X i i Observe that p ˚ ptq ´ X pi ptq| ď |X p ˚ ptq ´ X ri pTpi,d ptqq| ` |X ri pTpi,d ptqq ´ X pi ptq| |X i i ri ||8 ` |ξi | |φpT ´1 pTpi,d ptqqq ´ φpT ´1 pTpi ptqqq| ď ||Xi: ´ X i i ď c|ξi |h ` |ξi | |φpTi´1 pTpi ptq ` ||Ti1 ||8 D8,r ptqqq ´ φpTi´1 pTpi ptqqq| ď c|ξi |h ` |ξi | ||Ti1 ||8 |D8,r ptq| ||φ1 ||8 δ ´1 ď Op1q|ξi |ph ` ||Ti1 ||8 r´α q

(8)

for all t P r0, 1s almost surely, where the Op1q term is uniform in i and t. The last two inequalities above follow from a first order Taylor expansion and the fact that ||D8,r ||8 “ Opr´α q almost surely uniformly over i. Hence, p˚ ´ X pi ||8 “ Op1q|ξi ||ph ` ||T 1 ||8 r´α q ||X i i p ˚ ´ Xi ||8 Ñ 0 as n Ñ 8 almost surely. In conjunction with part (c) of Theorem 2, this shows that ||X ? p i ? ? p˚ almost surely for all i ě 1. Equation (8) implies that npXi ´ Xi q “ npXi ´ Xi q ` Op nph ` r´α qq ? p˚ in L2 r0, 1s. Invoking part (c) of Theorem 3 thus establishes that npX i ´ Xi q has the same asymptotic ? p distribution as npXi ´ Xi q in the L2 r0, 1s topology. This completes the proof of part (c) of Theorem 4. p ˚ as (d) Next, define the random measure induced by X i Fpi˚ ptq “

ÿ

p ˚ ptj`1 q ´ X p ˚ ptj q| |X i i



p : pTpi,d ptj`1 qq ´ X p : pTpi,d ptj qq| |X i i

jPIt

$ &ÿ “ %

p ˚ ptj`1 q ´ X p ˚ ptj q| |X i i

j“1

jPIt

ÿ

N r´1 ÿ

jPIt

N r´1 ÿ

p : pTpi,d ptj`1 qq ´ X p : pTpi,d ptj qq| |X i i

j“1

ri pTpi,d ptj`1 qq ´ X ri pTpi,d ptj qq| ` Ophq|ξi | |X

, # . N r´1 ÿ -

+ ri pTpi,d ptj`1 qq ´ X ri pTpi,d ptj qq| ` Ophq|ξi | |X

j“1

for all t P r0, 1s almost surely, where the Op1q term is uniform in i and t, and the last equality follows ri ||8 ď c|ξi |h almost surely. Also note that by definition of X ri , the term |ξi | from the fact that ||Xi: ´ X cancels from the numerator and the denominator. Using the fact that Tpi,d ptq “ Tpi ptq`||Ti1 ||8 D8,r ptq with ||D8,r ||8 “ Opr´α q almost surely, and arguments similar to those used in the proof of Proposition 1, one obtains Fpi˚ ptq “ Fpφ ptq ` Op1qph ` ||Ti1 ||8 r´α q for all t P r0, 1s almost surely, where the Op1q term is uniform in i and t almost surely. Now, using Lemma 2 and arguments similar to those used in the proof of part (b) of Proposition 2, we have pFpi˚ q´ ptq “ Fpφ´1 ptq ` Op1qr´α ph ` ||Ti1 ||8 r´α q for all t P r0, 1s almost surely, where the Op1q term is uniform in i and t almost surely. Thus, d2W pFpi˚ , Fφ q “ ||pFpi˚ q´ ´ Fφ´1 ||22 ď 2||Fpφ´1 ´ Fφ´1 ||22 ` Op1qr´2α ph2 ` r´2α q

A. Chakraborty and V. M. Panaretos/Functional Registration and Local Variations

37

“ 2d2W pFpφ , Fφ q ` Op1qr´2α ph2 ` r´2α q almost surely. Combining the above statement with part (d) of Theorems 2 and 3 completes the proof of part (d) of Theorem 4. ř p˚ ´ X pi ||8 “ Op1q|ξi ||ph ` ||T 1 ||8 r´α q almost surely, it p ˚ . Since ||X (e) Next, define X r˚ “ n´1 ni“1 X i i i follows that ||pX r˚ ´ µq ´ pX r ´ µq||8 ď n´1

n ÿ

p˚ ´ X pi ||8 ď Op1qth ` r´α n´1 ||X i

n ÿ

||Ti1 ||8 u

i“1

i“1

ď Op1qph ` r

´α

(9)

q

almost surely since Ep||T11 ||8 q ă 8. Along with part (e) of Theorem 2, this shows that ||X r˚ ´ µ||8 Ñ 0 ? ? ? as n Ñ 8 almost surely. Equation (9) implies that npX r˚ ´ µq “ npX r ´ µq ` Op nph ` r´α qq in ? L2 r0, 1s. So by part (e) of Theorem 3 we see that npX r˚ ´ µq has the same asymptotic distribution as ? npX r ´ µq in the L2 r0, 1s topology, and the proof of part (e) of Theorem 4 is complete. p ˚ ’s which we will denote by Kx (f) Next, we consider the empirical covariance operator of the X r˚ “ i ř ř n n p ˚ ´ X r˚ q. Recall S1 “ n´1 p ˚ ´ X r˚ q b pX p p n´1 i“1 pX i i i“1 pXi ´ µq b pXi ´ µq from the proof of part (f) of Theorem 3. Now, some straightforward manipulations yield n ÿ

´1 Kx r˚ “ S1 ` n

p˚ ´ X p i q b pX p˚ ´ X pi q ´ pX r˚ ´ µq b pX r˚ ´ µq pX i i

i“1

` n´1

n ÿ

p˚ ´ X p i q b pX pi ´ µq ` pX pi ´ µq b pX p˚ ´ X pi qu tpX i i

i“1

“ S1 ` W1 ´ W2 ` W3 ,

say.

ř p˚ ´ X pi ||2 ď Op1qth2 n´1 řn |ξi |2 ` r´2α n´1 řn ||T 1 ||2 u “ Op1qph2 ` Note that |||W1 ||| ď n´1 ni“1 ||X 2 i 8 i i“1 i“1 ´2α r q almost surely. Next, from the previous paragraph, it follows that |||W2 ||| ď ||X r˚ ´µ||22 ď Op1qph2 ` ř p˚ ´ X pi ||2 ||X pi ´ µ||2 ď Op1qn´1 řn th|ξi | ` r´2α q ` 2||X r ´ µ||28 . Moreover, |||W3 ||| ď 2n´1 ni“1 ||X i i“1 pi ´ µ||2 almost surely. Observe that ||Ti1 ||8 r´α u||X n

´1

n ÿ

pi ´ µ||2 “ n´1 |ξi | ||X

i“1

ď n

´1

n ÿ i“1 n ÿ i“1

|ξi | ||ξi φ ˝ T

´1

´ Epξ1 qφ||2

|ξi | |ξi ´ Epξ1 q| ||φ ˝ T

´1

||2 ` n

´1

n ÿ

|ξi | |Epξ1 q| ||φ ˝ T

´1

´ φ||2 .

i“1

´1

Since ||φ ˝ T ´ φ||8 Ñ 0 almost surely, it follows that the first term above is Op1q almost surely, and ř pi ´ µ||2 “ Op1q the second term is op1q almost surely. Similar arguments show that n´1 ni“1 ||Ti1 ||8 ||X almost surely. Thus, |||W3 ||| ď Op1qph ` r´α q almost surely. Also, S2 in the proof of part (f) of Theorem 3 satsifies |||S2 ||| “ OP pn´1 q. Combining the above facts and using the decomposition of Kx r in the proof of part (f) of Theorem 3, it follows that ´α ´α ` ||X r ´ µ||28 q Kx ` ||X r ´ µ||28 q “ Kx r ` Op1qph ` r r˚ “ S1 ` Op1qph ` r

(10)

almost surely. This along with part (f) of Theorem 2 shows that |||Kx r˚ ´ K ||| Ñ 0 as n Ñ 8 almost ? surely. By part (e) of Theorem 3, it follows that n||X r ´ µ||8 “ OP p1q as n Ñ 8. So, equation (10) ? ? x ? implies that npKx npKr ´ K q ` Op nph ` r´α qq in L2 r0, 1s. This in conjunction with part r˚ ´ K q “ ? ? x (f) of Theorem 3 proves that npKx npKr ´ K q in r˚ ´ K q has the same asymptotic distribution as

A. Chakraborty and V. M. Panaretos/Functional Registration and Local Variations

38

the Hilbert-Schmidt topology. p ˚ ptq ´ p r˚ ps, tq “ n´1 řn rX p ˚ psq ´ X r˚ psqsrX For the convergence of the empirical covariance kernel K i i i“1 X r˚ ptqs, we follow the same decomposition as above for the case of the operator. Noting the all the bounds used for that proof remain valid in the sup-norm and using the same arguments, we arrive that p r˚ ps, tq “ K p r ps, tq ` Op1qph ` r´α ` ||X r ´ µ||2 q K 8

(11)

for all s, t P r0, 1s almost surely, where the Op1q term is uniform in s, t almost surely. This along with p r˚ ´ K||8 Ñ 0 as n Ñ 8 almost surely. Equation (11) implies that part (f) of Theorem 2 shows that ||K ? p ? p ? ´α qq in L r0, 1s t npKr˚ ps, tq´Kps, tqq : s, t P r0, 1su “ t npK r ps, tq´Kps, tqq : s, t P r0, 1su`Op nph`r 2 with the Op1q term being uniform in s, t. This in conjunction with part (f) of Theorem 3 proves that ? p ? p t npK r˚ ps, tq ´ Kps, tqq : s, t P r0, 1su has the same asymptotic distribution as t npKr ps, tq ´ Kps, tqq : s, t P r0, 1su in the L2 pr0, 1s2 q topology. To prove the strong consistency and the weak convergence of the estimated eigenfunction, we will use perturbation bounds for compact operators (see, e.g., Ch. 5 of Hsing and Eubank (2015)). The leading ? ´1 x p eigenfunction φp˚ of Kx r˚ satisfies the inequality ||φ˚ ´ φ||2 ď 2 2λ |||Kr˚ ´ K ||| Ñ 0 as n Ñ 8 almost surely. Further, Theorem 5.1.8 of Hsing and Eubank (2015), specifically equation (5.27), implies that ? ? p npφ˚ ´ φq has the same asymptotic distribution (in L2 r0, 1s) as that of S npKx r˚ ´ K qφ, where, in ´1 our setup, S “ ´λ pI ´ φ b φq with λ “ V arpξ1 q being the leading eigenvalue of K , and I being the identity operator. Thus, from the results already establishes, it follows that the asymptotic distribution ? ? of npφp˚ ´ φq is that of ´λ´1 pI ´ φ b φq npKx r ´ K qφ. Using the expression of the asymptotic ? x distribution of npKr ´ K q obtained in part (f) of Theorem 3 and some simple calculations, it follows ? that the asymptotic distribution of npφp˚ ´ φq is that of Y ˆ φ1 ´ xY ˆ φ1 , φyφ, which is the same as in Theorem 3. The proof of the strong consistency and the weak convergence of ξpi˚ follows in direct analogy to that of ξpi upon using part (c) and the above facts. The proof of part (f) of Theorem 4 is now complete. Proof of Theorem 5. First observe that ˇ ˇş ˇ ˇ şt p1q şt p1q ˇ |X 1 puq|du ˇ p puq|du ˇ ˇ t |X p p1q puq|du şt |X p puq|du r | X ˇ ˇ ˇ 0 i,w ˇ i,w i,w |Fri,w ptq ´ Fri ptq| ď ˇ ş01 p1q ´ ş01 ´ ş01 i ˇ`ˇ ş ˇ r 1 puq|du ˇ ˇ 1 |X r 1 puq|du r 1 puq|du ˇ ˇ |X p puq|du | X | X i i i 0 0 0 i,w 0 ş1 p1q p puq ´ X r 1 puq|du p p1q ´ X r 1 ||2 2||X 2 0 |X i i i,w i,w ď “ dφ |ξi |´1 Ai,r , say. ď ş1 1 ş1 r puq|du |X |ξi | |φ1 puq|du 0

i

ñ ||Fri,w ´ Fri ||8 ď dφ |ξi |´1 Ai,r .

0

(12)

Since the term Ai,r will be key for our proof, we will first bound EtA2i,r u. To achieve this, we will first provide bounds on EtA2i,r |ξi , Ti u using standard tools from non-parametric regression. So, we will have to estimate the MSE for the regression problem Yij “ ξi φpTi´1 ptj qq ` ij and integrate this MSE over u P r0, 1s, when ξi and Ti are fixed. The expression for the MSE in the deterministic design case is the same as the conditional MSE (given design points) in the random design case with the design pi,w puq|ξi , Ti q does not depend on ξi and distribution being uniform on r0, 1s. Next, observe that V arpX Ti and is thus uniform over i (since the ij ’s are i.i.d.). For u P rh1 , 1 ´ h1 s, the expression of this variance is given in p. 137 in Wand and Jones (1995) and equals Opprh1 q´3 q, where the Op1q term depends on k1 , is bounded and is uniform over u P rh1 , 1 ´ h1 s. Next, we have to take into account the boundary points. Let u “ αh1 for some α P r0, 1q. It follows from a similar analysis that even in this case,

A. Chakraborty and V. M. Panaretos/Functional Registration and Local Variations

39

pi,w puq|ξi , Ti q “ Opprh1 q´3 q, where the Op1q term is integrable over α P r0, 1q (see, e.g. pp. 244-247 V arpX in Schimek (2000)). Similar estimates also hold for t P r1 ´ h1 , 1s, say t “ 1 ´ αh1 . Hence, we get that pi,w puq|ξi , Ti q “ Opprh1 q´3 q for all u P r0, 1s with the Op1q term being integrable over u P r0, 1s. V arpX Next we consider the bias. In our case the degree of the fitted polynomial is one more than the degree of derivative estimated. Thus, applying Taylor’s formula and using the expressions in Thm. 9.1 and pi,w puq|ξi , Ti q| “ ||X r p3q ||8 Oph2 q ` ||X r p4q ||8 oph2 q for all pp. 244-247 in Schimek (2000), we have |BiaspX 1 1 i i u P r0, 1s. Here, the Op1q and op1q terms are non-random and are integrable in u P r0, 1s. So, using the moment assumptions on the sup-norm of the derivatives of T , the independence of the ξi ’s and the Ti ’s along with the assumption that inf tPr0,1s T 1 ptq ě δ ą 0, it follows that EtA2i,r u “ Oph41 q ` Opprh31 q´1 q

(13)

ri ’s are i.i.d). This also implies (using where the Op1q terms are bounded and do not depend on i (the X Markov’s inequality) that n

´1

n ÿ

A2i,r “ OP ph41 ` prh31 q´1 q

(14)

i“1 ´1 We will now proceed with the rest of the proof. First, let ui,t “ Fri,w ptq. From (12), it follows that ´1 r r r Fi pui,t q “ t ´ Ai,r ptq, where ||Ai,r ||8 ď dφ |ξi | Ai,r . Thus, using part (a) of Proposition 2, it follows that ´1 ri,r ptqq´ Fr´1 ptq| ď ||T 1 ||8 c1 |ξi |´α Aα for a constant c1 . So, |Fri,w ptq´ Fri´1 ptq| “ |ui,t ´ Fri´1 ptq| “ Fri´1 pt´ A i φ řin r´1 řφn r´1 i,r r ´1 ´1 r ´1 ´α α 1 1 ´1 ´1 p r p´1 `B rr , where ||Fi,w ´Fi ||8 ď ||Ti ||8 cφ |ξi | Ai,r . Thus, Fφ,e “ n F “ n F ` B “ F r i“1 i,w i“1 i φ rr ||8 ď c1 n´1 řn ||T 1 ||8 |ξi |´α Aα . Define Rr “ n´1 řn ||T 1 ||8 |ξi |´α Aα . By H¨older’s inequality, ||B i i,r i i,r i“1 i“1 φ the law of large numbers, independence of Ti ’s and ξi ’s, and (14), we get that

« Rr ď ñ Rr “

n´1

n ÿ

ff1´α{2 « 2{p2´αq ||Ti1 ||8 |ξi |´2α{p2´αq

i“1 2α OP ph1 `

n´1

n ÿ

ffα{2 A2i,r

i“1

prh31 q´α{2 q

(15)

´1 ´1 (a) Since d2W pFpφ,e , Fφ q “ ||Fpφ,e ´ Fφ´1 ||22 ď 2||Fpφ,e ´ Fpφ´1 ||22 ` 2||Fpφ´1 ´ Fφ´1 ||22 ď 2Rr2 ` 2d2W pFpφ , Fφ q, the proof follows using part (a) of Theorem 3 and (15). ´1 ´1 r rr pFri,w ptqq using statements proved earlier. Now, (b) Note that Tpi,e ptq “ Fpφ,e pFi,w ptqq “ Fpφ´1 pFri,w ptqq ` B rr ptq, arguments in the proof of part (b) of Theorem 3 along with (12) yield Fpφ´1 pFri,w ptqq “ Tpi´1 ptq ` C ´1 ´1 r1,r , where ||C r1,r ||8 ď const.Rr . The proof of the first where ||Cr ||8 ď const.Rr . Thus, Tri,e “ Tri ` C statement in part (b) of this theorem now follows using part (b) of Theorem 3 and (15). ´1 p r2,r,i ptq, where ||C r2,r,i ||8 ď ||T 1 ||8 c1 |ξi |´α Aα Next consider Tpi,e ptq “ Fri,w pFφ,e ptqq “ Fri´1 pFpφ,e ptqq ` C i,r i φ ´1 r3,r pvq, where from statements proved earlier. Note that if Fpφ,e ptq “ v then t “ Fpφ,e pvq “ Fpφ´1 pvq ` C r3,r pvqq. Noting that Fr´1 “ Ti ˝ F ´1 , we r3,r ||8 ď Rr . So, Fpφ,e ptq “ v “ Fpφ pt ´ C r3,r pvqq “ Fφ pT ´1 pt ´ C ||C i

φ

´1 r3,r pvqqq “ Ti pT ´1 ptqq`||T 1 ||8 C r4,r pvq “ Fr´1 pFpφ ptqq`||T 1 ||8 C r4,r pvq “ get that Fri´1 pFpφ,e ptqq “ Ti pT pt´C i i i r4,r pvq, where ||C r4,r ||8 ď Rr . This follows from arguments similar to those used earlier Tpi ptq ` ||Ti1 ||8 C using the smoothness of T and the assumption that inf tPr0,1s T 1 ptq ě δ ą 0. Thus, we finally have

||Tpi,e ´ Tpi ||8 ď const.t||Ti1 ||8 Rr ` ||Ti1 ||8 |ξi |´α Aαi,r u.

(16)

The proof of the second statement of part (b) of this theorem is now completed via part (b) of Theorem 3, (13) and (15).

A. Chakraborty and V. M. Panaretos/Functional Registration and Local Variations

40

pi,w ´ X ri ||2 |ξi , Ti u for each i. Recall For proving part (c) of the theorem we will first have to control Et||X 8 that r s2 pt; h2 q ´ sp1 pt; h2 qptj ´ tquk2,h2 ptj ´ tqYij 1 ÿ tp p , Xi,w ptq “ r j“1 sp2 pt; h2 qp s0 pt; h2 q ´ sp21 pt; h2 q

ř pl pt; h2 q “ r´1 rj“1 ptj ´ tql k2,h2 ptj ´ tq for l “ 0, 1, 2. Call the where k2,h2 puq “ h´1 2 k2 pu{h2 q and s denominator fpptq, which is deterministic. We will first analyse the term Yqi,w ptq which is defined like pi,w ptq but with X ri ptj q in place of Yij . Define Zqi,w ptq “ X pi,w ptq ´ Yqi,w ptq. X ri ptj q “ X ri ptq ` ptj ´ tqX r 1 ptq ` 2´1 ptj ´ tq2 X r 2 ptq ` 6´1 ptj ´ Using Taylor’s formula, we get that X i i r p3q pr r tq3 X t q, where t lies between t and t . Plugging-in this expansion in the definition of Yqi,w ptq, we i,j i,j j i have 2 r2 s3 pt; h2 q ri ptq ` Xi ptq sp2 pt; h2 q ´ sp1 pt; h2 qp Yqi,w ptq “ X 2 sp2 pt; h2 qp s0 pt; h2 q ´ sp21 pt; h2 q r r p3q pr s2 pt; h2 q ´ sp1 pt; h2 qptj ´ tquk2,h2 ptj ´ tqptj ´ tq3 X ti,j q 1 ÿ tp i ` 2 6r j“1 sp2 pt; h2 qp s0 pt; h2 q ´ sp1 pt; h2 q

ri ptq ` Qi,1 pt; h2 q ` Qi,2 pt; h2 q, “ X

say

r 1 ptq vanishes, which plays a crucial role in putting the for all t P r0, 1s. Note that the term involving X i local linear estimator at an advantage over other standard non-parametric regression estimators near pi,w ptq ´ X ri ptq| ď |Yqi,w ptq ´ X ri ptq| ` |Zqi,w ptq| ď |Qi,1 pt; h2 q| ` the boundary of the data set. Thus, |X |Qi,2 pt; h2 q| ` |Zqi,w ptq|. ş1 By approximations of Riemann sums, we have spl pt; h2 q “ hl2 ´1 ul k2 puqdu ` Opprh2 q´1 q uniformly for ş1 t P rh2 , 1 ´ h2 s. Also, for t P r0, h2 q, say, t “ αh2 with α P r0, 1q, we have spl pt; h2 q “ hl2 ´α ul k2 puqdu ` Opprh2 q´1 q uniformly for α P r0, 1q. The same estimate also holds for t P p1 ´ h2 , 1s, say, t “ 1 ´ ş1 αh2 . Define µl,α “ ´α ul k2 puqdu for l “ 0, 1, 2. These estimates imply that for t P rh2 , 1 ´ h2 s, we ş r 2 ||8 th2 1 u2 k2 puqdu ` Opprh2 q´1 qu. Further, for boundary points, we have have |Qi,2 pt; h2 q| ď 2´1 ||X 2 ´1 i r 2 ||8 th2 |Bα | ` Opprh2 q´1 qu for α P r0, 1q, where Bα “ rµ2 ´ µ1,α µ3,α s{rµ2,α µ0,α ´ |Qi,2 pt; h2 q| ď 2´1 ||X 2 2,α i µ21,α s. In both case, the Op1q terms are non-random (hence does not depend on i) and uniform over choices of t. Note that the leading term in the squared bias term obtainable from the previous bias expression is an upper bound for the coefficient of the squared bias term in the general result obtained in Thm. 3.3 r p3q ||8 oph2 q, in Fan and Gijbels (1996). It can be shown using similar arguments that |Qi,3 pt; h2 q| ď ||X 2 i where the op1q term is non-random and uniform over t P r0, 1s. Note that for α “ 1, which correspond to ş1 t P rh2 , 1´h2 s, we have Bα “ ´1 u2 k2 puqdu by the symmetry of the kernel. Further, it can be shown that the denominator (which is positive by the Cauchy-Schwarz inequality) in the definition of Bα is a strictly increasing function of α P r0, 1s and hence its infimum is achieved at α “ 0, where it takes the value ş1 2 ş1 ş1 2 u k puqdu k puqdu´p 2 2 0 0 0 uk2 puqduq “: a0 ą 0 (again by the Cauchy-Schwarz inequality) for any nondegenerate k2 . Thus supαPr0,1s |Bα | ď supαPr0,1s |µ22,α ´ µ1,α µ3,α |{a0 ă 8 as the numerator is uniformly ri ||8 ď 2´1 ||X r 2 ||8 th2 supαPr0,1s |Bα | ` Opprh2 q´1 qu ` ||X r p3q ||8 oph2 q ď bounded in α. Hence, ||Yqi,w ´ X i

p3q

2

i

2

r 2 ||8 tOph2 q ` Opprh2 q´1 qu ` ||X r ||8 oph2 q, where the Op1q and the op1q terms are non-random (and ||X 2 2 i i hence do not depend on i). ri and hence does not depend We next control Et||Zqi,w ||28 u. Observe that this does not depend on X

A. Chakraborty and V. M. Panaretos/Functional Registration and Local Variations

41

on i (the errors are i.i.d.). Now, $ ˇ ˇ2 , r & ˇ1 ÿ ˇ . s p tp s pt; h q ´ pt; h qpt ´ tquk pt ´ tq ˇ 2 2 1 2 j ij ˇ 2,h2 j E sup ˇ ˇ %tPr0,1s ˇ r j“1 ˇ sp2 pt; h2 qp s0 pt; h2 q ´ sp21 pt; h2 q # + r 2 pt ´ tq2 s2 pt; h2 q ´ sp1 pt; h2 qptj ´ tqu2 k2,h j 1 ÿ tp ij 2 ď E sup 2 ` rp s2 pt; h2 qp s0 pt; h2 q ´ sp21 pt; h2 qs2 tPr0,1s r j“1 $ , &1 ÿ tp s2 pt; h2 q ´ sp1 pt; h2 qptj ´ tqutp s2 pt; h2 q ´ sp1 pt; h2 qptj 1 ´ tquk2,h2 ptj ´ tqk2,h2 ptj 1 ´ tq . ij ij 1 sup E % r2 rp s2 pt; h2 qp s0 pt; h2 q ´ sp21 pt; h2 qs2 tPr0,1s j‰j 1 r 2 pt ´ tq s2 pt; h2 q ´ sp1 pt; h2 qptj ´ tqu2 k2,h j 1 ÿ tp 2 2 pt; h qs2 r s s s p rp pt; h qp pt; h q ´ 2 2 0 2 2 tPr0,1s j“1 1

ď M 2 r´1 sup

sp22 pt; h2 qr s0 pt; h2 q ` sp21 pt; h2 qr s2 pt; h2 q ´ 2p s1 pt; h2 qp s2 pt; h2 qr s1 pt; h2 q . 2 2 rp s2 pt; h2 qp s0 pt; h2 q ´ sp1 pt; h2 qs tPr0,1s

“ M 2 prh2 q´1 sup

(17)

The second term on the right hand side of the first inequality vanishes due to the uncorrelatedness of the errors and the fact that the tj ’s are non-random. The bound for the first term follows from the a.s. ř 2 boundedness of the errors, say with bound M . Here, srl pt; h2 q “ r´1 rj“1 ptj ´tql h´1 2 k2 tptj ´tq{h2 u, which is a definition similar to spl pt; h2 q but with a new “kernel” k22 . As earlier, by Riemann sum approximations, ş1 we have srl pt; h2 q “ hl2 ´α ul k22 puqdu ` Opprh2 q´1 q for α P r0, 1s with the Op1q term being uniform on ş1 t P r0, 1s. Define νl,α “ ´α ul k22 puqdu. Then,



s0 pt; h2 q ` sp21 pt; h2 qr s2 pt; h2 q ´ 2p s1 pt; h2 qp s2 pt; h2 qr s1 pt; h2 q sp22 pt; h2 qr rp s2 pt; h2 qp s0 pt; h2 q ´ sp21 pt; h2 qs2 µ2,α ν0,α ` µ21,α ν2,α ´ 2µ1,α µ2,α ν1,α ` Opprh2 q´1 q “ Cα ` Opprh2 q´1 q, rµ2,α µ0,α ´ µ21,α s2

say,

for all α P r0, 1s, where the Op1q term is uniform over t P r0, 1s. Note that the expression of Cα is the same as the coefficient of the variance term in the general result obtained in Thm. 3.3 in Fan and Gijbels (1996) (with necessary adaptations). Using (17), it now follows that Et||Zqi,w ||28 u ď M tsupαPr0,1s Cα uprh2 q´1 ` ri ||8 opprh2 q´1 q “ Opprh2 q´1 q. Hence, using the assumptions in the theorem and the bounds on ||Yqi,w ´ X obtained earlier as well as the previous bound, it follows that pi,w ´ X ri ||2 u “ Oph4 q ` Opprh2 q´1 q, Et||X 8 2

(18)

where the Op1q terms are bounded and do not depend in i. Thus, using Markov’s inequality, we have n

´1

n ÿ

pi,w ´ X ri ||8 “ OP th2 ` prh2 q´1{2 u. ||X 2

(19)

i“1

p ˚ ptq “ X pi,w pTpi,e ptqq. Thus, using (16) we have (c) Recall that X i,e p ˚ ptq ´ X pi ptq| ď |X pi,w pTpi,e ptqq ´ X ri pTpi,e ptqq| ` |X ri pTpi,e ptqq ´ X ri pTpi ptqq| |X i,e pi,w ´ X ri ||8 ` ||X r 1 ||8 ||Tpi,e ´ Tpi ||8 ď ||X i p˚ ´ X pi ||8 ď ||X pi,w ´ X ri ||8 ` const.|ξi | ||T 1 ||8 tRr ` |ξi |´α Aα u. ñ ||X i,e i i,r The proof of part (c) of this theorem now follows from (13), (15), (18) and part (c) of Theorem 3. (d) Observe that by (20), we have ||X e˚ ´ n´1

n ÿ i“1

pi ||8 X

(20)

A. Chakraborty and V. M. Panaretos/Functional Registration and Local Variations

ď n

´1

n ÿ

# pi,w ´ X ri ||8 ` const. Rr ||X

˜ n

´1

n ÿ i“1

i“1

¸ |ξi | ||Ti1 ||8

`n

´1

n ÿ

42

+ 1´α

|ξi |

||Ti1 ||8 Aαi,r

.

i“1

The third term on the right hand side can be bounded using H¨older’s inequality and (14) as earlier. The bounds on the first two terms are given by (19) and (15), respectively. The proof of this part of the theorem is now completed upon using these bounds along with part (e) of Theorem 3. (e) For the proof of this part of theorem, we will use a decomposition of Kx e˚ similar to that of x Kr in the proof of part (f) of Theorem 3. In the same notation, we obtain the following bounds ř p˚ ´ X pi ||2 ď 2n´1 řn ||X pi,w ´ X ri ||2 ` on W1 , W2 and W3 . First, note that |||W1 ||| ď n´1 ni“1 ||X 8 2 i,e i“1 ř const.n´1 ni“1 ξi2 ||Ti1 ||28 tRr ` |ξi |´α Aαi,r u2 . Applying H¨older’s inequality and using (14), (15) and (18), 3 ´α u. Next, using part (d) of this theorem and part (e) we get that |||W1 ||| “ OP th42 `prh2 q´1 `h4α 1 `prh1 q ř 2 pi ||2 ` 2||n´1 řn X p of Theorem 3, it follows that |||W2 ||| ď ||X e˚ ´ µ||22 ď 2||X e˚ ´ n´1 ni“1 X 2 i“1 i ´ µ||2 “ ř n 3 ´α `h4 `prh q´1 `n´1 u. In a similar manner, |||W ||| ď 2n´1 p˚ p p OP th4α 2 3 1 `prh1 q 2 i“1 ||Xi,e ´ Xi ||2 ||Xi ´µ||2 “ 3 ´α{2 u by the Cauchy-Schwarz inequality and the bounds obtained earlier. OP th22 ` prh2 q´1{2 ` h2α 1 ` prh1 q 2 ´1{2 ` h2α ` prh3 q´α{2 ` n´1{2 u. So, using part (f) of Theorem 3, we have |||Kx e˚ ´ K ||| “ OP th2 ` prh2 q 1 1 The bounds for the leading eigenvalue and eigenfunction follow directly by standard bounds in the theory of perturbation of operators. ş1 şt Proof of Theorem 6. First assume that µ1 ‰ 0. Then, define Gptq “ 0 |γ1´1 µ1 puq|du{ 0 |γ1´1 µ1 puq|du “ ş1 1 şt 1 ´1 r 0 |µ puq|du{ 0 |µ puq|du and Gi ptq “ GpTi ptqq for t P r0, 1s and i “ 1, 2, . . . , n. Some algebraic manipulations yield |Fi ptq ´ Gptq| ˇ ˇ şt şt ´1 1 şt ´1 1 ˇ ˇ 1 puq ` ηY φ1 puq|du |Y φ |γ µ puq|du |γ µ puq|du i1 i2 ˇ ˇ 1 2 0 1 ď ş1 ´1 0 ` ˇ ş1 ´1 ´ ş01 1´1 ˇ 1 1 1 1 1 1 ˇ |γ µ1 puq ` Yi1 φ puq ` ηYi2 φ puq|du ˇ 1 2 0 |γ1 µ puq ` Yi1 φ1 puq ` ηYi2 φ2 puq|du 0 1 0 |γ1 µ puq|du ş1 2 |Yi1 φ11 puq ` ηYi2 φ12 puq|du ď ş1 ´1 0 “ Zi . 1 1 1 0 |γ1 µ puq ` Yi1 φ1 puq ` ηYi2 φ2 puq|du r i ||8 “ suptPr0,1s |Fi pT ´1 ptqq ´ GpT ´1 ptqq| “ Thus, ||Fi ´ G||8 ď Zi almost surely for each i. So ||Fri ´ G i i suptPr0,1s |Fi ptq ´ Gptq| ď Zi , where the last equality holds because Ti is a bijection on r0, 1s. Next, let ci “ Fi´1 ptq and c “ G´1 ptq. So, t “ Fi pci q “ Gpcq. Also, Gpcq ´ Gpci q “ Gpcq ´ Fi pci q ` Fi pci q ´ Gpci q “ Fi pci q ´ Gpci q so that |Gpcq ´ Gpci q| ď ||Fi ´ G||8 ď Zi . The conditions of the theorem and arguments as in Lemma 2 earlier show that G´1 is α-H¨older continuous for α “ {p1 ` q. Thus, for a finite, positive constant Cµ , we have |Fi´1 ptq ´ G´1 ptq| “ |ci ´ c| “ |G´1 pGpci qq ´ G´1 pGpcqq| ď Cµ |Gpci q ´ Gpcq|α ď Cµ Ziα . r ´1 ||8 “ suptPr0,1s |Ti pF ´1 ptqq ´ Thus, ||Fi´1 ´ G´1 ||8 ď Cµ Ziα almost surely. Consequently, ||Fri´1 ´ G i i Ti pG´1 ptqq| ď ||Ti1 ||8 ||Fi´1 ´ G´1 ||8 ď Cµ ||Ti1 ||8 ||Ziα almost surely. Further, p ´1 ||8 ď ||Fp´1 ´ G

n n Cµ ÿ 1 ÿ r´1 r ´1 ||Fi ´ Gi ||8 ď ||T 1 ||8 Ziα ď 2Cµ Ep||T11 ||8 qEpZ1α q, n i“1 n i“1 i

as n Ñ 8 almost surely. Here, the last inequality follows from the moment assumptions in the theorem, the Cauchy-Schwarz inequality, the strong law of large numbers and the fact that the Yil ’s (and hence the Xi ’s) are independent of the Ti ’s. Thus, |Tpi´1 ptq ´ Ti ptq| “ |Fp´1 pFi pTi´1 ptqqq ´ Ti´1 ptq|

A. Chakraborty and V. M. Panaretos/Functional Registration and Local Variations

43

p ´1 pFi pT ´1 ptqqq| ` |G p ´1 pFi pT ´1 ptqqq ´ G p ´1 pGpT ´1 ptqqq| ď |Fp´1 pFi pTi´1 ptqqq ´ G i i i p ´1 pGpT ´1 ptqqq ´ T ´1 ptq| ` |G i i p ´1 ||8 ` |T pG´1 pFi pT ´1 ptqqqq ´ T pG´1 pGpT ´1 ptqqqq| ď ||Fp´1 ´ G i i ` |T pG´1 pGpTi´1 ptqqqq ´ Ti´1 ptq| 1

p ´1 ||8 ` ||T ||8 Cµ |Fi pT ´1 ptqq ´ GpT ´1 ptqq|α ď ||Fp´1 ´ G i i ` |T pTi´1 ptqq ´ Ti´1 ptq| #

+

n ÿ

p ´1 ||8 ` Cµ n´1 ď ||Fp´1 ´ G

||Tj1 ||8 ||Fi ´ G||8 ` ||T ´ Id||8

j“1

( ` Zi ` ||T ´ Id||8 , ( ď const. EpZ1α q ` Zi ` ||T ´ Id||8 ď const.

ñ ||Tpi´1 ´ Ti´1 ||8

EpZ1α q

as n Ñ 8 almost surely, where the constant term is uniform in i. ř ř Next, let t “ Fp´1 puq. Then, n´1 ni“1 Ti pFi´1 puqq “ t. Let t˚ “ n´1 ni“1 Ti pG´1 puqq “ T pG´1 puqq “ p ´1 puq so that u “ Gpt p ˚ q. Note that Fpptq ´ Gptq p p ˚ q ` Gpt p ˚ q ´ Gptq p p ˚ q ´ Gptq p G “ Fpptq ´ Gpt “ Gpt “ ´1 ´1 GpT pt˚ qq ´ GpT ptqq. Thus, using the assumptions in the theorem and arguments similar to those used in the proof of part (b) of Theorem 2, we have ´1 ´1 ´1 ´1 p |Fpptq ´ Gptq| “ |GpT pt˚ qq ´ GpT ptqq| ď ||G1 ||8 |T pt˚ q ´ T ptq|

ď ||G1 ||8 δ ´1 |t˚ ´ t| n ˇ ÿ ˇ ˇTi pF ´1 puqq ´ Ti pG´1 puqqˇ ď ||G1 ||8 δ ´1 n´1 i i“1 1

ď ||G ||8 δ

´1

Cµ n

´1

n ÿ

||Ti1 ||8 Ziα ď const.Ep||T11 ||8 qEpZ1α q

i“1

p 8 ď const.EpZ α q ñ ||Fp ´ G|| 1 as n Ñ 8 almost surely. Therefore, |Tpi ptq ´ Ti ptq| “ |Fri´1 pFpptqq ´ Ti ptq| r ´1 pFpptqq| ` |G r ´1 pFpptqq ´ G r ´1 pGptqq| p r ´1 pGptqq p ď |Fri´1 pFpptqq ´ G ` |G ´ Ti ptq| i i i i r ´1 ||8 ` |Ti pG´1 pFpptqq ´ Ti pG´1 pGptqq| p p ď ||Fri´1 ´ G ` |Ti pG´1 pGptqq ´ Ti ptq| i r ´1 ||8 ` ||T 1 ||8 Cµ |Fpptq ´ Gptq| p α ` |Ti pT ´1 ptqq ´ Ti ptq| ď ||Fri´1 ´ G i i r ´1 ||8 ` ||T 1 ||8 Cµ ||Fp ´ G|| p α ` ||T 1 ||8 ||T ´1 ´ Id||8 ď ||Fri´1 ´ G i i i

ñ ||Tpi ´ Ti ||8

r ´1 ||8 ` ||T 1 ||8 Cµ ||Fp ´ G|| p α ` ||T 1 ||8 ||T ´ Id||8 “ ||Fri´1 ´ G i i i ( 1 α α α ď const.||Ti ||8 Zi ` E pZ1 q ` ||T ´ Id||8 ( ď const.||Ti1 ||8 Ziα ` E α pZ1α q ` ||T ´ Id||8

as n Ñ 8 almost surely, where the constant term is uniform in i. pi “ X ri ˝ Tpi “ Xi ˝ T ´1 ˝ Tpi “ µ ˝ T ´1 ˝ Tpi ` γ1 Yi1 φ1 ˝ T ´1 ˝ Tpi ` γ2 Yi2 φ2 ˝ T ´1 ˝ Tpi . So, Next, note that X i i i i pi ptq ´ Xi ptq| ď |µpT ´1 pTpi ptqqq ´ µptq| ` γ1 |Yi1 | |φ1 pT ´1 pTpi ptqqq ´ φ1 ptq| |X i i ` γ2 |Yi2 | |φ2 pTi´1 pTpi ptqqq ´ φ2 ptq| ď |Ti´1 pTpi ptqq ´ t| ||µ1 ||8 ` γ1 |Yi1 | ||φ11 ||8 ` γ1 |Yi2 | ||φ12 ||8

(

A. Chakraborty and V. M. Panaretos/Functional Registration and Local Variations

pi ´ Xi ||8 ď ||Tp´1 ´ T ´1 ||8 ||µ1 ||8 ` γ1 |Yi1 | ||φ1 ||8 ` γ1 |Yi2 | ||φ1 ||8 ñ ||X 1 2 i i ( α ď OP p1q EpZ1 q ` Zi ` ||T ´ Id||8

44

(

as n Ñ 8 almost surely, where the OP p1q term is independent on n. şt ş1 Next, consider the case when µ1 “ 0. Then, define Gptq “ 0 |φ11 puq|du{ 0 |φ11 puq|du. Some algebraic manipulations yield ˇ şt ˇ ˇ |Y φ1 puq ` ηY φ1 puq|du şt |φ1 puq|du ˇ i2 2 ˇ 0 i1 1 ˇ 0 1 ´ ş1 |Fi ptq ´ Gptq| “ ˇ ş1 ˇ 1 ˇ |Yi1 φ1 puq ` ηYi2 φ1 puq|du ˇ puq|du |φ 2 1 1 0 0 ş1 2η 0 |Yi2 φ12 puq|du “ Zi . ď ş1 1 puq ` ηY φ1 puq|du |Y φ i2 i1 2 1 0 Similar arguments as in the case of µ1 ‰ 0 now yield the error bounds on the estimators Acknowledgements We are grateful to Dr. Kristen Irwin (EPFL) for kindly sharing and discussing her Triboleum data set. References Billingsley, P. (1968). Convergence of probability measures. John Wiley & Sons, Inc., New YorkLondon-Sydney. MR0233396 Bosq, D. (2000). Linear processes in function spaces. Lecture Notes in Statistics 149. Springer-Verlag, New York Theory and applications. MR1783138 Claeskens, G., Silverman, B. W. and Slaets, L. (2010). A multiresolution approach to time warping achieved by a Bayesian prior-posterior transfer fitting strategy. J. R. Stat. Soc. Ser. B Stat. Methodol. 72 673–694. MR2758241 Fan, J. and Gijbels, I. (1996). Local polynomial modelling and its applications. Monographs on Statistics and Applied Probability 66. Chapman & Hall, London. MR1383587 Fritsch, F. N. and Carlson, R. E. (1980). Monotone piecewise cubic interpolation. SIAM J. Numer. Anal. 17 238–246. MR567271 Gervini, D. and Gasser, T. (2004). Self-modelling warping functions. J. R. Stat. Soc. Ser. B Stat. Methodol. 66 959–971. MR2102475 Gervini, D. and Gasser, T. (2005). Nonparametric maximum likelihood estimation of the structural mean of a sample of curves. Biometrika 92 801–820. MR2234187 Hadjipantelis, P. Z., Aston, J. A. D., Mller, H. G. and Evans, J. P. (2015). Unifying Amplitude and Phase Analysis: A Compositional Data Approach to Functional Multivariate Mixed-Effects Modeling of Mandarin Chinese. J. Amer. Statist. Assoc. 110 545-559. Hsing, T. and Eubank, R. (2015). Theoretical foundations of functional data analysis, with an introduction to linear operators. Wiley Series in Probability and Statistics. John Wiley & Sons, Ltd., Chichester. MR3379106 Irwin, K. and Carter, P. (2013). Constraints on the evolution of function-valued traits: a study of growth in Tribolium castaneum. Journal of evolutionary biology 26 2633–2643. Irwin, K. and Carter, P. (2014). Artificial selection on larval growth curves in Tribolium: correlated responses and constraints. Journal of evolutionary biology 27 2069–2079.

A. Chakraborty and V. M. Panaretos/Functional Registration and Local Variations

45

James, G. M. (2007). Curve alignment by moments. Ann. Appl. Stat. 1 480–501. MR2415744 Kneip, A. and Gasser, T. (1992). Statistical tools to analyze data representing a sample of curves. Ann. Statist. 20 1266–1305. MR1186250 Kneip, A. and Ramsay, J. O. (2008). Combining registration and fitting for functional models. J. Amer. Statist. Assoc. 103 1155–1165. MR2528838 Kneip, A., Li, X., MacGibbon, K. B. and Ramsay, J. O. (2000). Curve registration by local regression. Canad. J. Statist. 28 19–29. MR1789833 ¨ ller, H.-G. (2004). Functional convex averaging and synchronization for time-warped Liu, X. and Mu random curves. J. Amer. Statist. Assoc. 99 687–699. MR2090903 Madras, N. and Sezer, D. (2010). Quantitative bounds for Markov chain convergence: Wasserstein and total variation distances. Bernoulli 16 882–908. MR2730652 Marron, J. S., Ramsay, J. O., Sangalli, L. M. and Srivastava, A. (2015). Functional data analysis of amplitude and phase variation. Statist. Sci. 30 468–484. MR3432837 Natanson, I. P. (1955). Theory of functions of a real variable. Frederick Ungar Publishing Co., New York Translated by Leo F. Boron with the collaboration of Edwin Hewitt. MR0067952 Panaretos, V. M. and Zemel, Y. (2016). Amplitude and phase variation of point processes. Ann. Statist. 44 771–812. MR3476617 Ramsay, J. O. and Li, X. (1998). Curve registration. J. R. Stat. Soc. Ser. B Stat. Methodol. 60 351–363. MR1616045 Ramsay, J. O. and Silverman, B. W. (2005). Functional data analysis, second ed. Springer Series in Statistics. Springer, New York. MR2168993 Rønn, B. B. (2001). Nonparametric maximum likelihood estimation for shifted curves. J. R. Stat. Soc. Ser. B Stat. Methodol. 63 243–259. MR1841413 Schimek, M. G., ed. (2000). Smoothing and regression. Wiley Series in Probability and Statistics: Applied Probability and Statistics. John Wiley & Sons, Inc., New York Approaches, computation, and application, A Wiley-Interscience Publication. MR1795148 Schumaker, L. L. (2007). Spline functions: basic theory, third ed. Cambridge Mathematical Library. Cambridge University Press, Cambridge. MR2348176 Srivastava, A., Wu, W., Kurtek, S., Klassen, E. and Marron, J. S. (2011). Registration of functionl data using Fisher-Rao metric. Tech. Report arXiv:1103.3817v2. ¨ ller, H.-G. (2008). Pairwise curve synchronization for functional data. Biometrika Tang, R. and Mu 95 875–889. MR2461217 Villani, C. (2003). Topics in optimal transportation. Graduate Studies in Mathematics 58. American Mathematical Society, Providence, RI. MR1964483 Wand, M. P. and Jones, M. C. (1995). Kernel smoothing. Monographs on Statistics and Applied Probability 60. Chapman and Hall, Ltd., London. MR1319818 Wang, K. and Gasser, T. (1997). Alignment of curves by dynamic time warping. Ann. Statist. 25 1251–1276. MR1447750 Wang, K. and Gasser, T. (1999). Synchronizing sample curves nonparametrically. Ann. Statist. 27 439–460. MR1714722

Suggest Documents