When the data are functions - Electrical and Computer Engineering

20 downloads 2753 Views 1MB Size Report
Key words: continuous data, functional analysis, duality diagram. Introduction ... Figure 2 offers an approach to the concept of a functional datum. In the upper left.
PSYCHOMETRIKA--VOL.47, NO. 4. DECEMBER,1982

WHEN THE DATA ARE FUNCTIONS J. O . RAMSAY MCGILL UNIVERSITY

A datum is often a continuous function x(t) of a variable such as time observed over some interval. One or more such functions are observed for each subject or unit of observation. The extension of classical data analytic techniques designed for p-variate observations to such data is discussed. The essential step is the expression of the classical problem in the language of functional analysis, after which the extension to functions is a straightforward matter. A schematic device called the duality diagram is a very useful tool for describing an analysis and for suggesting new possibilities. Least squares approximation, descriptive statistics, principal components analysis, and canonical correlation analysis are discussed within this broader framework. Key words: continuous data, functional analysis, duality diagram.

Introduction Sophisticated data collection hardware often produce data which are a set of continuous functions. I am sure that all of us have seen such data: E E G and E M G records, learning curves, paths in space, subject responses continuous in time, speech production measurements during vocalization, bioassay data, and so on. Consider as a further example the curves displayed in Figure 1. These indicate the height of the tongue d o r s u m during ten different utterances of the sound " a h - k a h " by a single subject [Keller & Ostry, N o t e 1]. It is natural to consider each curve as a single observation, to summarize the ten curves in terms of an average curve, and to measure in some way the variation of the curves a b o u t this average. This paper considers the extension of classical statistical techniques to include functional data. It will be an elementary and simplified treatment, which m a y a n n o y those wanting more subtlety and rigor. I must warn you, however, that a fundamental change of point of view about what data are will be required, and if y o u leave my address aware that an altered state of statistical consciousness is possible, I shall be content. In dealing with functional data I will refer frequently to two lines of development. The first is the expression of traditional data analytic technology in the language of functional analysis. M u c h o f this w o r k has taken place in France and is not available in English. I a m particularly indebted to the m o n g r a p h s of Cailliez and Pages 1-1976] and Dauxois and Pousse [1976]. We are very fortunate to have with us for these meetings a n u m b e r of those associated with this work, and in part m y talk is only an introduction to t o m o r r o w ' s symposium.* The second line of development that has fascinated m y colleague Suzanne Winsberg and I in recent years has been statistical applications of spline functions. I feel that splines are destined to play a fundamental role in the analysis of functional data, but I will try to show h o w in only a vague way at this point. Finally, this * "New glances at principal components and correspondence analysis" was a symposium at the 1982 Joint

Meetings of the Classification Society and Psychometric Society, Montreal, Canada. Presented as the Presidential Address to the Psychometric Society's Annual Meeting, May, 1982. I wish to express my gratitude to my colleagues in France, especially at the University of Grenoble, for their warm hospitality during my sabbatical leave. Preparation of this paper was supported by Grant APA 0320 from the Natural Sciences and Engineering Research Council of Canada. Requests for reprints should be sent to: J. O. Ramsay, Dept. of Psychology, 1205 Dr. Penfield Ave., Montreal, Qurbec, Canada H3A 1B1. 0033-3123/82/1200-5004500.75/0 © 1982 The Psychometric Society

379

PSYCHOMETRIKA

380

TaNGUE MOVEMENT DURING "RH-KRH" B.O

5.5 I--

I,iJ ',rE

5.0

(f) 0 hJ Z ID I..-

q.5

tl.0 0.0

I

I

0.2

O.q

.......

!

I

0.6

0.8

1.0

TIME FIGURE 1

The height of the tongue dorsum over a 400 millisecond interval of time during which the sound "ah-kah" was uttered. Each curve represents a single utterance. The same subject was involved in all ten replications. The average curve is represented by a dashed line and was computed by averaging the ten curves at each point in time. The time units have been arbitrarily scaled to the interval [0, 1].

p a p e r will be correctly perceived b y m a n y of the r e a d e r s o f Psychometrika as being a g e n e r a l i z a t i o n of the p i o n e e r i n g w o r k of T u c k e r [1958], a n d it is a privilege to again a c k n o w l e d g e the w o r k o f s o m e o n e w h o has so often been there first. F i g u r e 2 offers an a p p r o a c h to the c o n c e p t o f a f u n c t i o n a l d a t u m . I n the u p p e r left c o r n e r we have the d o m a i n of the classical d a t a m a t r i x : e a c h of n subjects is p a i r e d with each of p variables a n d to each p a i r a n u m b e r x u is assigned as the c o n s e q u e n c e of a n e x p e r i m e n t o r d a t a collection. As one m o v e s d o w n from this corner, we c o m e to the situation where n is in effect infinity a n d we a r e discussing p o p u l a t i o n characteristics. Let us n o w fix the n u m b e r o f subjects n a n d a l l o w the n u m b e r o f v a r i a b l e s p to increase w i t h o u t limit. T h i s process c a n b e e x t e n d e d even b e y o n d c o u n t a b i l i t y to the

381

J. O. RAMSAY

Domain o f X Number of Variables p