May 2015. Abstract. In this paper, we develop new estimation results for functional regressions ..... n?9 +u9 $ @z9 ' a!@z9 $ @z9. aE4! @Z $. @. Z5,. 3. ) n?9 +%u9 ' a!@z9& $ @z9. aC-4 -4,,. 14 .... (Comte and Johannes, 2012). 6 Discrete ...
Functional linear regression with functional response David Benatia
Marine Carrasco
Université de Montréal
Université de Montréal
Jean-Pierre Florens Toulouse School of Economics May 2015
Abstract In this paper, we develop new estimation results for functional regressions where both the regressor Z(t) and the response Y (t) are functions of an index such as the time or a spatial location. Both Z(t) and Y (t) are assumed to belong to Hilbert spaces. The model can be thought as a generalization of the multivariate regression where the regression coe¢ cient is now an unknown operator
. An
interesting feature of our model is that Y (t) depends not only on contemporaneous Z(t) but also on past and future values of Z. We propose to estimate the operator amounts to apply a penalty on the
L2
by Tikhonov regularization, which
norm of . We derive the rate of convergence
of the mean-square error, the asymptotic distribution of the estimator, and develop tests on
. Often, the full trajectories are not observed but only a discretized
version is available. We address this issue in the scenario where the data become more and more frequent (in-…ll asymptotics). We also consider the case where Z is endogenous and instrumental variables are used to estimate
.
Key Words: Functional regression, instrumental variables, linear operator,Tikhonov regularization
The authors thank the participants of the 6th French Econometrics Conférence and especially their discussant Jan Johannes for helpful comments. Carrasco thanks NSERC for …nancial support.
1
Introduction
With the increase of storage capability, continuous time data are available in many …elds including …nance, medecine, meteorology, and microeconometrics. Researchers, companies, and governments look for ways to exploit this rich information. In this paper, we develop new estimation results for functional regressions where both the regressor Z(t) and the response Y (t) are functions of an index such as the time or a spatial location. Both Z(t) and Y (t) are assumed to belong to Hilbert spaces. The model can be thought as a generalization of the multivariate regression where the regression coe¢ cient is now an unknown operator . An interesting feature of our model is that Y (t) depends not only on contemporaneous Z (t) but also on past and future values of Z. We propose to estimate the operator by Tikhonov regularization, which amounts to apply a penalty on the L2 norm of . The choice of a L2 penalty, instead of L1 used in Lasso, is motivated by the fact that - in the applications we have in mind - there is no reason to believe that the relationship between Y and Z is sparse. We derive the rate of convergence of the mean-square error (MSE) and the asymptotic distribution of the estimator for a …xed regularization parameter and develop tests on . In some applications, it would be interesting to test whether Y (t) depends only on the past values of Z or only on contemporaneous values of Z. If the application is on network and t refers to the spatial location, our model could describe how the behavior of a …rm Y (t) depends on the decision of neighboring …rms Z (s). Testing properties of will help to characterize the strategic response of …rms. Often, the full trajectories are not observed but only a discretized version is available. This case raises speci…c challenges which will be addressed in the scenario where the data become more and more frequent (in-…ll asymptotics). We also consider the case where Z is endogenous and instrumental variables are used to estimate : To the best of our knowledge, the model with functional response and endogenous functional regressor has never been studied before. We derive an estimator based on Tikhonov regularization and show its rate of convergence. There is a large body of work done on linear functional regression where the response is a scalar variable Y and the regressor is a function. Some recent references include Cardo, Ferraty, and Sarda (2003), Hall and Horowitz (2007), Horowitz and Lee (2007),
1
Darolles, Fan, Florens and Renault (2011), and Crambes, Kneib, and Sarda (2009). In contrast, only a few researchers have tackled the functional linear regression in which both the predictor Z and the response Y are random functions. The object of interest is the estimation of the conditional expectation of Y given Z. In this setting, the unknown parameter is an integral operator. This model is discussed in the monographs by Ramsay and Silverman (2005) and Ferraty and Vieu (2006). Cuevas, Febrero, and Fraiman (2002) consider a …xed design setting and propose an estimator of based on interpolation. Yao, Müller, and Wang (2005) consider the case where both predictor and response trajectories are observed at discrete and irregularly spaced times. Their estimator is based on spectral cut-o¤ regularized inverse using nonparametric estimators of the principal components. Crambes and Mas (2013) consider again a spectral cut-o¤ regularized inverse and derive the asymptotic mean square prediction error which is then used to derive the optimal choice of the regularization parameter. Our model is also related to the functional autoregressive (FAR) model studied by Bosq (2000), Kargin and Onatski (2008), and Aue, Norinho, and Hormann (2014), among others. The estimation methods used in these papers are based on functional principal components and di¤er from ours. Antoch, Prchal, Rosa, and Sarda (2010) use a FAR to forecast the electricity consumption. In their model, the weekday consumption curve is explained by the curve from the previous week. The authors use B-spline to estimate the operator. The paper is organized as follows. Section 2 introduces the model and the estimators. Section 3 derives the rate of convergence of the MSE. Section 4 presents the asymptotic normality of the estimator for a …xed regularization parameter. Issues relative to the choice of the regularization parameter are discussed in Section 5. Discrete observations are addressed in Section 6. Section 7 considers an endogenous regressor. Section 8 presents simulation results. Section 9 presents an application to the electricity market. The proofs are collected in Appendix.
2 2.1
The model and estimator The model
We consider a regression model where both the predictor and response are random functions. We observe pairs of random trajectories (yi ; zi ) i = 1; 2; :::; n with square
2
integrable predictor trajectories zi and response trajectories yi : They are realizations of random processes (Y; Z) with zero mean functions and unknow covariance operators. The extension to the case, where the mean is unknown but estimated, is straightforward. The arguments of Y and Z are denoted t which may refer to the time, a location or a characteristic such as the age or income of an agent. We assume that Y belongs to a Hilbert space E equipped with an inner product h; i and Z belongs to a Hilbert space F equipped with an inner product h; i (to simplify notations, we use the same notation for both inner products even though they usually di¤er). The model is Y = Z +U (1) where U is a zero mean random element of E and is a nonrandom Hilbert-Schmidt operator from F to E. Moreover, Z is exogenous so that cov (Z; U ) = 0: This assumption will be relaxed in Section 7. For illustration, consider the following example E =
g:
F =
f:
Z
g (t)2 dt < 1 ;
ZS
f (t)2 dt < 1
T
where S and T are some intervals of R. Then, can be represented as an integral operator such that Z (s; t) ' (t) dt ( ') (s) = T
for any ' 2 F. is referred to as the kernel of the operator : Model (1) means that Y (t) depends not only on Z (t) but also on all the Z (s), for s 6= t. The object of interest is the estimation of the operator . Below, we describe two examples of applications of Model (1). Example 1. Electricity market. Let Yi (t) be the electricity consumption for the province of Ontario in Canada for day i at hour t; and Zi (t) be the average temperature for the same province for day i at hour t. We believe that the consumption of electricity over one day may depend on the temperature of the current day but also on that of the previous and following 3
days. We could model this relationship as a multivariate regression where the dependent variable is the 24 1 vector of electricity consumptions at di¤erent hours of the day and the dependent variable is a 72 1 vector of temperatures. Estimating this model by ordinary least squares (OLS) would require inverting a 72 72 matrix which would likely be near singular. The variance of the resulting OLS estimator would be very large. Moreover, if the frequency of the observations increases, it is expected that the successive observations will become more and more correlated. It is therefore natural to assume that the observations result from the discretization of a curve. Model (1) seems to be an attractive alternative to the multivariate regression in this setting. This application will be continued in Section 9. Example 2. Technological spillovers in productivity This example is inspired from Manresa (2015). Assume that Yi (t) is the log of the average output of …rms with industry code t at time i and Zi (t) is the log of the average R&D expenditures for …rms with industry code t at time i. Industry code goes from 1 to 6 digits. A higher number of digits correspond to a thinner grid on the continuum of industry codes. Model (1) would permit to characterize how …rms from one sector bene…t from R&D advancements done in adjacent sectors. This example will not be pursued here.
2.2
The estimator
We denote VZ the operator from F to F which associates to functions ' 2 F: VZ ' = E [Z hZ; 'i] : Note that, as Z is centered, VZ is the covariance operator of Z. We denote CY Z the covariance operator of (Y; Z). It is the operator from F to E such that CY Z ' = E [Y hZ; 'i] Using (1), we have cov (Y; Z) = cov ( Z + u; Z) =
cov (Z; Z) + cov (u; Z) :
4
Hence, we have the following relationships: CY Z = CZY where
is the adjoint of
(2)
VZ ;
(3)
= VZ
. CZY is de…ned as the operator from E to F such that CZY
= E [Z hY; i]
for any in E. Note that CZY is the adjoint of CY Z , CY Z : First we describe how to estimate using (3). The unknown operators VZ and CZY are replaced by their sample counterparts. The sample estimate of VZ is 1X zi hzi ; 'i V^Z ' = n i=1 n
for ' 2 F. The sample estimate of CZY is 1X zi hyi ; i = n i=1 n
C^ZY
for 2 E. An estimator of can not be obtained directly by solving C^ZY = VbZ because the initial equation CZY = VZ is an ill-posed problem in the sense that VZ is invertible only on a subset of E and its inverse is not continuous. Note that V^Z has …nite rank equal to n and hence is not invertible. A Moore-Penrose generalized inverse could be used but it would not be continuous. To stabilize the inverse, we need to use some regularization scheme. We adopt Tikhonov regularization (see Kress, 1999 and Carrasco, Florens, and Renault, 2007). The estimator of is de…ned as ^ = and that of
I + V^Z
1
C^ZY
(4)
is de…ned by ^ = C^Y Z
I + V^Z
5
1
(5)
where is some positive regularization parameter which will be allowed to converge to zero as n goes to in…nity. The estimators (4) and (5) can be viewed as generalization of ordinary least-squares estimators. They also have an interpretation as the solution to an inverse problem. At this stage, it is useful to make the link with the inverse problem literature. Let H be the Hilbert space of linear Hilbert-Schmidt operators from F to E: The inner product on H is h 1 ; 2 iH = tr ( 1 2 ) : Dropping the error term in (1), we obtain, for the sample, the equation r^ = K where r^ = (y1 ; :::; yn )0 and K is the operator from H to E n such that K The inner product on E n is
= ( z1 ; :::; zn )0 .
1X hfi ; gi iE = n i=1 n
hf; giE n
with f = (f1 ; ::; fn )0 and g = (g1 ; :::; gn )0 : Let us check that ^ is a classical Tikhonov regularized inverse of the operator K: ^ = ( I + K K)
1
K r^:
We need to …nd K . We look for the operator B from F to E solution of hK ; f iE n = h ; BiH :
(6)
Note that h ; BiH = tr ( B ) X = B ' j ; 'j j
=
X j
6
B 'j ;
'j
where 'j is a basis of E. On the other hand, 1X h zi ; fi iE n i 1X hzi ; fi iF : = n i
hK ; f iE n =
Using fi =
P
j
fi ; 'j 'j , we obtain 1 XX fi ; 'j zi ; 'j n i j + * X 1X fi ; 'j zi ; 'j : = n i j
hK ; f iE n =
It follows from (6) that B 'j =
1 n
P
i
fi ; 'j zi for all j and hence
B '=
1X hfi ; 'i zi n i
for all ' in E. Now, we look for B the adjoint of B . B is the solution of hB '1 ; '2 iF = h'1 ; B'2 iE : We have 1X hfi ; '1 i hzi ; '2 iF n i * + 1X = '1 ; hzi ; '2 i fi : n i
hB '1 ; '2 iF =
E
Hence, B' = (K f ) ' = We have K K
=
1X hzi ; 'i fi : n i
1X hzi ; 'i zi = n i 7
VbZ
and K r^ = It follows that ^
1X hzi ; :i yi = C^Y Z : n i
= ( I + K K) = C^Y Z
1
K r^ 1
I + V^Z
:
The estimator ^ is also a penalized least-squares estimator: ^
zk2 +
= arg min ky = arg min
n X i=1
zi k2 +
kyi
where ~ j are the singular values of the operator
2.3
k k2HS X
~ 2j
.
Identi…cation
It is easier to study the identi…cation from the viewpoint of Equation (3). Let H be the space of Hilbert-Schmidt operators from E to F. Let T be the operator from H to H de…ned as T H = VZ H for H in H. According to (3), is identi…ed if and only if T is injective. VZ injective implies T injective. Indeed, we have TH
= 0 , VZ H = 0 , VZ H , H
= 0, 8
= 0, 8
by the injectivity of VZ . Hence H = 0. It turns out that T is injective if and only if VZ is injective. This can be shown by deriving the spectrum of T .
8
First, we show that T is self-adjoint. The adjoint T of T satis…es hT H; Ki = hH; T Ki for arbitrary operators H and K of H. We have hT H; Ki = tr (T HK ) = tr (VZ HK ) = tr (HK VZ ) because VZ is self-adjoint. Hence, T K = (K VZ ) = VZ K = T K: Therefore, T is self-adjoint. The spectrum of T is also closely related to that of VZ : Let j , Hj j=1;2:::: denote the eigenvalues and eigenfunctions of T and j ; 'j j=1;2;::: be the eigenvalues and eigenfunctions of VZ so that VZ 'j = j 'j . Hj is necessarily of the form, Hj = 'j h ; :i where is the 1 function in E. Then, T Hj = VZ 'j h ; :i h ; :i
=
j 'j
=
j Hj :
So that the eigenvalues of T are the same as those of VZ . In summary, a necessary and su¢ cient condition for the identi…cation of VZ is injective.
2.4
is that
Computation of the estimator
To show how to compute ^ explicitly, we multiply the left and right of (4) by to obtain C^ZY
=
1X zi hyi ; i = n i=1
I + V^Z ^
,
1X D ^ + zi zi ; n i=1
n
n
^
9
E
:
I + V^Z
(7)
Then, we take the inner product with zl , l = 1; 2; :::; n on the left and right hand side of (7), to obtain n equations: D
1X hzl ; zi i hyi ; i = n i=1 n
E
D with n unknowns zi ; ^
E
zl ; ^
D 1X + hzl ; zi i zi ; ^ n i=1
E
n
, l = 1; 2; :::; n;
(8)
; i = 1; 2; :::; n: Let M be the n n matrix with (l; i) element E D ^ and w the n vector of hyi ; i. (8) is equivalent hzl ; zi i =n; v the n vector of zi ; to M w = ( I + M ) v: And v = ( I + M )
1
Mw = M ( I + M)
1
w. For a given , we can compute:
1 X zi hyi ; i n i=1 n
^
=
D
1 0 z I M ( I + M) n 1 0 z ( I + M) 1 w = n
=
E
zi ; ^ 1
(9)
w
where z is the n vector of zi . Now, we explain how to estimate ' for any ' 2 F. Taking the inner product with ' in the left and right hand sides of (9), we obtain D
D for all
'; ^ ^ ';
E E
D 1 X h'; zi i hyi ; i zi ; ^ n i=1 n D E 1 X h'; zi i yi ^ zi ; n i=1 n
= =
E
,
2 E. This implies X ^ '= 1 h'; zi i yi n i=1 n
10
^ zi :
(10)
Hence, to compute ^ ', we need to know ^ zi . From (5), we have ^ + ^ V^Z = C^Y Z : Applying the l.h.s and r.h.s to zi , i = 1; 2; :::; n, we obtain
^ zi (t) + 1 n
n X j=1
^ zi + ^ V^Z zi = C^Y Z zi , n X ^ zj (t) hzj ; zi i = 1 yj (t) hzj ; zi i , i = 1; 2; ::; n: (11) n j=1
For each t, we can solve the n equations with n unknowns deduct ^ ' from (10).
^ zj (t) given by (11) and
The prediction of Yi is given by y^i = ^ zi :
3
Rate of convergence of the MSE
In this section, we study the rate of convergence of the mean square error (MSE) of ^ : Several assumptions are needed. Assumption 1. Ui is a random process of E such that E (Ui ) = 0, cov(Ui ; Uj jZ1 ; Z2 ; :::; Zn ) = 0 for all i 6= j and = VU for i = j where VU is a trace-class operator. Assumption 2. belongs to H (F, E) the space of Hilbert-Schmidt operators. 2 Assumption 3. VZ is a trace-class operator and V^Z VZ = Op (1=n) : HS
Assumption 4. There is a Hilbert-Schmidt operator R from E to F and a constant =2 > 0 such that = VZ R: P An operator K is trace-class if j K j ; j < 1 for any basis j . If K is selfadjoint positive de…nite, it is equivalent to say that the sum of the eigenvalues of K is …nite. Given VU is a covariance operator, VU is trace-class if and only if E kUi k2 < 1: The notation kkHS refers to the Hilbert-Schmidt norm of operators. An operator K P is Hilbert-Schmidt (noted HS) if kKk2HS j : If j K j ; K j < 1 for any basis K is self-adjoint positive de…nite, it is equivalent to the condition that the eigenvalues 2 of K are square summable. A su¢ cient condition for V^Z VZ = Op (1=n) is that HS
11
Zi is a i.i.d. random process and E kVi k4 < 1, see Proposition 5 of Dauxois, Pousse, and Romain (1982). Assumption 4 is a source condition needed to characterize the rate of convergence of the MSE. Moreover, it guarantees that belongs to the orthogonal of the null space of VZ denoted N (VZ ). Given this condition, there is no need to impose N (VZ ) = f0g to get the identi…cation. The MSE is de…ned by E
2
^
HS
jZ1 ; ::; Zn :
Proposition 1 Under Assumption 3, ^ belongs to H (F, E) for all
> 0.
Proof: See Appendix. Replacing yi by zi + ui in the expression of C^ZY , we obtain C^ZY
1X zi hyi ; :i n i 1X 1X = zi hui ; :i + zi h zi ; :i n i n i
=
= C^ZU + V^Z We decompose ^ ^
:
in the following manner: =
I + V^Z
=
I + V^Z +
1
1
C^ZY C^ZU 1
I + V^Z
+ ( I + VZ )
1
V^Z
VZ
(12) ( I + VZ ) :
1
VZ
(13) (14)
To study the rate of convergence of the MSE, we will study the rates of the three terms (12), (13), and (14).
12
Proposition 2 Assume Assumptions 1 to 4 hold. If > 1, then MSE=Op n1 + ^2 : If
< 1, then MSE=Op
^2
n
2
+
^2
:
Proposition 2 shows that the MSE exhibits the usual trade-o¤ between the variance decreasing in and the bias increasing in .
4
Asymptotic normality for …xed
and tests
Assumption 5. (Ui ; Zi ) are iid and E (Ui jZi ) = 0: Under Assumption 5 and some extra moment conditions (see Dauxois, Pousse, and Romain (1982) and Mas (2006)), we have p p
d
n V^Z
VZ ! N (0; KZ ) ; d
nC^ZU ! N (0; KZU )
where KZ and KZU are covariance operators and the convergence is either in the space of Hilbert-space operators (Dauxois et al. 1982) or in the space of trace-class operators p p (Mas, 2006). Moreover, n V^Z VZ and nC^ZU are asymptotically independent. In this section, we consider the case where is …xed. In that case, ^ is not consistent and keeps an asymptotic bias. It is useful to de…ne = ( I + VZ )
13
1
VZ
the regularized version of :
:
We have ^
1
I + V^Z
= +
C^ZU 1
I + V^Z 1
= ( I + VZ )
1
I + V^Z
+
I + V^Z 1
( I + VZ ) 1
1 n
1
VZ
1
C^ZU
V^Z ( I + VZ )
VZ
1
C^ZU 1
+ ( I + VZ ) +Op
( I + VZ )
C^ZU
+
= ( I + VZ )
V^Z
(15) V^Z
VZ ( I + VZ )
1
(16)
:
p converges to zero and is n asymptotically normal. The As n goes to in…nity, ^ p …rst two terms of the r.h.s are Op (1= n) and will a¤ect the asymptotic distribution. This distribution is not simple. We are going to characterize it below. From Equations (15) and (16), neglecting the Op (1=n) term, we have ^?
?
= ( I + VZ ) 1 C^ZU + ( I + VZ ) 1 (V^Z VZ )( I + VZ ) 1 ? 1X 1X (ui zi ) + ( I + VZ ) 1 (zi zi VZ )( I + VZ ) = ( I + VZ ) 1 n i n i 1X ui ( I + VZ ) 1 zi + ( I + VZ ) 1 zi ( I + VZ ) 1 zi = n i ( I + VZ ) 1 VZ ( I + VZ ) 1 1X = ui ( I + VZ ) 1 zi + n i
?
( I + VZ ) 1 zi
E[ ( I + VZ ) 1 Z ( I + VZ ) 1 Z] 1X ~ = ui z~i + z~i z~i E[ Z~ Z] n i 1X = (ui + z~i ) z~i CZ~ Z~ ; n i
14
( I + VZ ) 1 zi
1
?
where the …rst equality makes use of the de…nition of the empirical covariance operators using tensor products. The second line uses the elementary properties K(Y X) = Y KX and (Y X)K = K ? Y X for X 2 F; X 2 E and K 2 H. The third line uses the de…nition of VZ and the fourth introduces the notation Z~ ( I + VZ ) 1 Z. The interchange of the expectation operator and ( I + VZ ) 1 is allowed since the latter is a bounded linear operator, by Banach inverse theorem the inverse of a bounded linear operator is itself linear and bounded (see, for instance, Rudin, 1991). The last equality holds since the functional tensor product, denoted , distributes over addition. ? The covariance operator of ^ ? is an operator which maps the space of HilbertSchmidt operators from E to F, denoted H, into itself. Such an operator may be di¢ cult to write explicitly. Fortunately, the properties of tensor products of in…nitedimensional Hilbert-Schmidt operators de…ned on separable Hilbert spaces are wellknown,1 and may be used like in Dauxois, Pousse and Romain (1982) to write explicitly the covariance operator of an in…nite-dimensional Hilbert-Schmidt random operator. The tensor product 1 ~ 2 for ( 1 ; 2 ) 2 H2 is a mapping from H into itself, hence 1 ~ 2 is an element of the Hilbert space of Hilbert-Schmidt operators from H to H equipped with the Hilbert-Schmidt inner product. For T = ' 2 H, 1 = X Z 2 H and 2 = Y W 2 H, this tensor product is equivalently de…ned as : 1~
(i) ( (ii)
(X
2 )T
= hT;
Z) ~ (Y
1 iH
2
2H
W ) ('
)=
(X
Y )'
(Z
W)
;
8'; X; Y 2 F; ; Z; W 2 E; Based upon de…nition (i), the covariance operator of E h;
1
E[
1 ]iH
2
E[
2]
=E
1
1
and
E[
1]
2
~
naturally writes as 2
E[
2]
:
(17)
Furthermore, to show asymptotic normality we shall use the classical central limit theorem for i.i.d. processes in separable Hilbert spaces. The following is stated as Theorem 2.7 in Bosq (2000) and is reproduced here for clarity.
1
See, for instance, Vilenkin (1968, p.59-65).
15
Theorem 3 (Bosq, 2000) Let (Zi ; i 1) be a sequence of i.i.d. F-valued random variables, where F is a separable Hilbert space, such that EkZi k2 < 1, E(Zi ) = Z and VZ = V , then one has 1 X d p (Zi Z) ! N (0; V ); n i We are now geared to derive the asymptotic covariance operator of interest in its general form, under some standard assumptions. < 1, EkZi k4 < 1, EkUi k2 kZi k2 < 1, then
Proposition 4 Assume (Ui ; Zi ) i.i.d., p
n( ^ ?
?
d
) ! N (0;
where the asymptotic covariance operator2 =E
~ Z)
(U +
for …xed
Z~ ~ (U +
(18)
);
~ Z)
is given by Z~
2
CZ~
~ Z
~ CZ~
~; Z
(19)
which simpli…es to 0
when
=E U
VZ 1 Z ~ U
VZ 1 Z
(20)
;
! 0.
2 2 Under the assumptions (Ui ; Zi ) i.i.d., EkZi k4 < 1, and EkU! i k kZi k < P Ui Zi 1, Theorem 3 ensures the root-n asymptotic normality of n1 . By the Zi Z i i ! A continuous mapping theorem, one has (18) using the continuous transformation 7! B p ? ( I + VZ ) 1 A + ( I + VZ ) 1 B( I + VZ ) 1 ? . The covariance operator of n( ^ ? )
Proof.
2
The kernel of
has four dimensions and its kernel may be written as
! (s; t; r; ) = E U (s) +
~ Z(s) Z(t) U (r) +
~ Z(r) Z( ))
16
2
E
~ Z(t) ~ Z(s) E
~ Z( ~ ) ; Z(r)
may be written using (17) as =E
1 X p (ui + n i
=E
(U +
~ Z)
z~i ) Z~
z~i CZ~
CZ~ ~ Z
1 X ~ p (ui + n i
~ Z
~ Z)
~ (U +
Z~
CZ~
z~i ) ~ Z
z~i
CZ~
~ Z
;
where the second line is obtained from the i.i.d. assumption. Straightforward developments yield (19). Now letting ! 0 gives (20). Furthermore, under the strict exogeneity assumption, E[Ui jZi ] = 0, the asymptotic covariance operator in (19) is simpli…ed into =E
U
Z~ ~ U
Z~
+
2
Z~
E
Z~ ~
Z~
Z~
2
CZ~
~ Z
~ CZ~
~: Z
In econometrics, we are often interested in testing the signi…cance of estimates and produce con…dence bands. However, there is no obvious meaningful way to perform standard signi…cance tests using the derived asymptotic covariance. Indeed, for …xed the estimated residuals will be biased and one must specify ? . On the other hand, if we assume ! 0, an estimator of (20) may be uninformative since VZ 1 does not necessarily exist. A more practical approach would be to keep …xed to obtain an estimate of ( I + VZ ) 1 and use it to derive an estimator of (20). Other statistical tests may involve applying a test operator to . We want to test the null hypothesis: H0 : = 0 where 0 is known. A simple way to test this hypothesis is to look at C^ZY V^Z 0 . Under H0 , this operator equals C^ZU and should be close to zero. Moreover, under H0 ; p
n C^ZY
V^Z
d
! N (0; KZU )
0
where KZU = E (u and (x y) (f ) = hx; f i y and Romain, 1982)
1e
2
Z) e (u
T = hT;
17
Z) 1 iH
2
(see Dauxois, Pousse, and
Let
j;
j
: j = 1; 2; :::; q be a set of test functions, then 2 p D n 6 6 6 4 p D n
C^ZY C^ZY
V^Z .. . V^Z
0
1;
1
0
q;
q
E 3
7 7 7 E 5
converges to a multivariate normal distribution with mean 0q and covariance matrix the q q matrix with (j; l) element: jl
= E =
hDp
nC^ZU
j ; VZ
l
j;
j
j ; VU
E Dp
nC^ZU l ;
:
l
l
Ei
This covariance matrix can be easily estimated by replacing VZ and VU by their sample counterpart. The appropriately rescaled quadratic form converges to a chi-square distribution with q degrees of freedom which can be used to test H0 . The test functions could be cumulative normals as in Conley, Hansen, Luttmer, and Scheinkman (1997) or could be normal densities with same small variance but centered at di¤erent means.
5
Data-driven selection of
The estimator involves a tuning parameter, chosen as the solution to 1 ^ ^ VZ min
; which needs to be selected. It can be C^ZY
2
: HS
See Engl, Hanke, and Neubauer (2000, p.102). Another possibility is to use leave-one-out cross-validation min
1X yi n j
( i)
^(
i)
2
zi
where ^ has been computed using all observations except for the ith one. Centorrino (2014) studies the properties of the leave-one-out cross-validation for nonparametric IV regression and shows that this criterion is rate optimal in mean squared error. This method is also used in a binary response model by Centorrino and Florens (2014). 18
Various data-driven selection techniques are compared via simulations in Centorrino, Fève, and Florens (2013). An alternative approach would be to use a penalized minimum contrast criterion as in Goldenshluger and Lepski (2011). This could lead to a minimax-optimal estimator (Comte and Johannes, 2012).
6
Discrete observations
In this section, to simplify the exposition, we will refer to the arguments of (yi ; zi ), t, as time even though it could refer to a location or other characteristic. Suppose that the data (yi ; zi ) are not observed in continuous time but at discrete (not necessarily equally spaced) times. We use some smoothing to construct pairs of curves (yim ; zim ), i = 1; 2; :::; n such that yim 2 E and zim 2 F. This smoothing can be obtained by approximating the curves by step functions or kernel smoothing for instance. The subscript m corresponds to the smallest number of discrete observations across i = 1; 2; :::; n: m grows with the sample size n. Using the smoothed observations, we compute the corresponding estimators of VZ m and the estimator of denoted ^ m : and CZY denoted V^Zm , C^ZY ^m =
I + V^Zm
1
m C^ZY :
To assess the rate of convergence of ^ m , we add the following conditions which guarantee that the discretization error is negligible with respect to the estimation error. Assumption 6. kzim zi k = Op (f (m)) and kyim yi k = Op (f (m)) : Assumption 7. f (m) ^2 =o : n Proposition 5 Under Assumptions 1 to 4, 6, and 7, the MSE of ^ m same rate of convergence as that of the MSE of ^ in Proposition 2.
19
has the
7
Case where Z is endogenous
Now, assume Z is endogenous but we observe instrumental variables W such that cov(U; W ) = 0. Hence, E ((Y Z) hW; :i) = 0: It follows that CY W =
(21)
CZW
where CY W = E (Y hW; :i) and CZW = E (Z hW; :i) : Similarly, we have (22)
CW Y = CW Z where CW Z = E (W hZ; :i) We need the following identi…cation conditions: Assumption 8. CW Z is injective.
Under this assumption, is uniquely de…ned from (21). To see this, assume that there are two solutions 1 and 2 to (21). It follows that ( 1 2 ) CZW = 0 or equivalently CW Z ( 1 2 ) = 0: Hence the range of ( 1 2 ) belongs to the null space of CW Z . However, under Assumption 6, the null space of CW Z is reduced to zero and thus the range of ( 1 2 ' = 0 for all 1' 2 ) is equal to zero. It follows that ', hence 1 = 2 : To construct an estimator of , we …rst apply the operator CZW on the l.h.s and r.h.s of Equation (22) to obtain CZW CW Y = CZW CW Z
:
Note that CZW = CW Z and therefore the operator CZW CW Z is self-adjoint. The operators CZW ; CW Z , and CW Y can be estimated by their sample counterparts. The estimator of is de…ned by 1 ^ = I + C^ZW C^W Z C^ZW C^W Y : (23) Similarly, the estimator of
is given by
^ = C^Y W C^W Z
I + C^ZW C^W Z
20
1
:
Now, we explain how to compute ^ is practice. From (23), we have I + C^ZW C^W Z ^
= C^ZW C^W Y :
Note that 1 X hyj ; i hwi ; wj i zi ; n2 i;j 1 XD ^ E = zj ; hwi ; wj i zi : n2 i;j
C^ZW C^W Y
=
C^ZW C^W Z ^
Taking the inner product with zl yields n equations 1 XD ^ E zj ; hwi ; wj i hzl ; zi i n2 i;j 1 XD ^ E = zj ; hwi ; wj i hzl ; zi i , l = 1; 2; :::; n n2 i;j D
D with n unknowns zj ; ^ from ^
E
zl ; ^
E
+
; j = 1; 2; :::; n: Then, for each
=
1 h^ CZW C^W Y
C^ZW C^W Z ^
, ^ i
can be computed
:
The computation of ^ ' can be done using the same approach as in Section 2. 2 Assumption 9. CZW CW Z is a trace-class operator and C^ZW C^W Z CZW CW Z = HS Op (1=n) : Assumption 10. There is a Hilbert-Schmidt operator R from E to F and a constant > 0 such that = (CZW CW Z ) =2 R: We decompose ^ in the following manner: ^ =
I + C^ZW C^W Z
=
I + C^ZW C^W Z +
1
1
C^ZW C^W Y
(24)
C^ZW C^W U
(25)
1
I + C^ZW C^W Z
+ ( I + CZW CW Z )
1
C^ZW C^W Z
CZW CW Z 21
( I + CZW CW Z ) :
1
CZW CW Z
(26) (27)
Proposition 6 Under Assumptions 1, 2, 8, 9, and 10, the MSE of ^ same rate of convergence as in Proposition 2.
8
has the
Simulations
This section consists of a simulation study of the estimator presented earlier. Let E = F = L2 [0; 1] and S = T = [0; 1]. is an integral operator from to L2 [0; 1] to L2 [0; 1] with kernel (s; t) = 1 js tj2 .3 We consider an Ornstein-Uhlenbeck process with zero mean and mean reversion rate equal to one to represent the error function. It is described by the di¤erential equation dU (s) = U (s)ds + u dGu (s), for s 2 [0; 1] and where Gu is a Wiener process and u denotes the standard deviation of its increments dGu . Note that this error function is stationary. We study the model Yi = Zi + Ui ; i = 1; :::; n in two di¤erent settings. First, we consider design functions uncorrelated to the error functions (cov(U; Z) = 0), then investigate the case where Z is endogenous (cov(U; Z) 6= 0).
8.1
Exogenous predictor functions
We consider the design function Zi (t) =
(
+ i) t ( i) + ( i) i
i
1
(1
t)
i
1
+
i
for t 2 [0; 1], with i ; i iid U [2; 5] and i iid N (0; 1), for all i = 1; :::; n. These predictor functions are probability density functions of some random beta distributions over the interval [0; 1], with an additive gaussian term. The numerical simulation is performed as follows: 1. Construct both a pseudo-continuous interval of [0; 1], denoted T , consisting of 1000 equally-spaced discrete steps, and a discretized interval of [0; 1], denoted T~ , consisting of only 100 equally-spaced discrete steps. 3
Simulations have also been performed using di¤erent kernels. In particular, we have considered multiple kernels, allowing to include multiple functional predictors in a single functional model. Results suggest that the performance of the estimator is analogous in "multivariate" functional linear regression.
22
2. Generate n predictor functions zi (t) and error functions ui (s), where t; s 2 T so as to obtain pseudo-continuous functions. 3. Generate the n response functions yi (s) using the speci…ed model where s 2 T . 4. Generate the sample of n discretized pairs of functions (~ zi ; y~i ) by extracting the corresponding values of the pairs (zi ; yi ) for all t; s 2 T~ . 5. Estimate using the regularization method on the sample of n pairs of functions (~ zi ; y~i ) and a …xed smoothing parameter = :01. 6. Repeat steps 2-5 100 times and calculate the M SE by averaging the quantities RR (^ (s; t) (s; t))2 dtds over all repetitions. k^ k2HS = T~ T~
All numerical integrations are performed using the trapezoidal rule (i.e. piecewise linear interpolation) although it is possible to use other quadrature rules (such as another Newton–Cotes rule or adaptive quadrature).4 In addition, the simulations of the stochastic processes for the error terms are constructed using the Euler-Maruyama method for approximating numerical solutions to stochastic di¤erential equations. Figure 1 shows 10 discretized predictor functions (zi ), Ornstein-Uhlenbeck error functions for u = 1 (ui ), response functions (yi ) and an example of a response function for various values of u . Table 1 reports the M SE for 4 di¤erent sample sizes (n = 50; 100; 500; 1000) and 5 values of the standard deviation parameter ( u = 0:1; 0:25; 0:5; 1; 2). Naturally, the use of a …xed smoothing parameter = :01 that is independent of the sample size prevents the M SE from converging towards zero. In fact, the M SE converges to k k2HS , which is a measure of the squared bias introduced by the regularization method.5 The ~ 2 ) functional last two columns of Table 1 report the true global (R2 ) and extended local (R 4
In practice, the nature of the functions of interest should provide guidance for the researcher with regards to the selection of the appropriate integration method. As we study square integrable functions in this setup, the trapezoidal rule allows reducing the discretization bias with respect to the rectangular rule. 5 The magnitude of this bias depends on both the design functions and the value of since = ( I + VZ ) 1 Vz . We perform Monte-Carlo simulations to approximate the regularized operator using 100 random samples of 1000 zi ’s.
23
Figure 1: Examples of simulated functions (top left: discretized yi ; top right: discretized ui for U = 1, bottom left: discretized zi , bottom right: a single yi for various u ).
24
coe¢ cients of determination, de…ned as R
R var(E[Y (s)jZ])ds var( Z(s))ds S S 2 R R = = R var(Y (s))ds var(Y (s))ds S
~2
R =
Z
S
var(E[Y (s)jZ])ds = var(Y (s))ds
Z
var( Z(s))ds ; var(Y (s))ds
S
S
which are directly related to those proposed in Yao, Muller and Wang (2005).6
Table 1: Simulation results: Mean-Square Errors over 100 replications Std errors Empirical MSE Squared bias Coef. of d. ~2 n = 50 n = 100 n = 500 n = 1000 k k2HS R2 R .0154 .0135 .0126 .0124 .0095 .995 .995 u = 0:1 (.0027) (.0017) (.0008) (.0005) .0291 .0205 .0138 .0130 .0095 .976 .976 u = 0:25 (.0098) (.0063) (.0022) (.0013) = 0:5 .0773 .0438 .0194 .0156 .0095 .910 .911 u (.0363) (.0193) (.0057) (.0028) .2909 .1354 .0371 .0257 .0095 .712 .724 u = 1 (.1789) (.0659) (.0161) (.0089) .9128 .4755 .1245 .0668 .0095 .383 .423 u = 2 (.5495) (.2607) (.0660) (.0378) Note: Standard deviations are reported in parentheses.
Simulations results are in line with the theoretical results. We observe that, for a …xed , the M SE decreases as the sample size grows. Further, the coe¢ cients of determination decrease as the error function’s standard deviation parameter increases, since the estimation is made more di¢ cult. As a result, the MSE grows with u : For illustration purposes, we provide two sets of surface plots. Figure 2 shows 3Dplots of the actual kernel (top-left), the regularized kernel (top-right), their superposition 6
These true coe¢ cients are approximated by their mean values using 1000 random functions over 100 simulations. In practice (when the true is unknown) it is possible to use a consistent estimators of those coe¢ cients by using ^ and the sample counterpart of variance operators.
25
Figure 2: True kernel vs. regularized kernel (top left: True; top right: Regularized, bottom left: True vs. regularized, bottom right: Bias).
(bottom-left) and the bias computed as their di¤erence (bottom-right). The Tikhonov regularization appears to introduce most of the bias on the edges of the kernel. Figure 3 shows the mean estimated kernel for n = 500 and Ornstein-Uhlenbeck errors with u = 1 (top-left), against the true kernel (bottom-left), against the regularized kernel (top-right), and its mean errors with respect to the true kernel (bottom-right). One may observe that the mean estimate is relatively close to the regularized kernel. However it does not perform well on the edges when compared to the true kernel. Let us now turn to the case where Z is endogenous. 26
Figure 3: True kernel vs. mean estimate (100 runs with n = 500, u = 1) (top left: Mean estimate, top right: Regularized vs. mean estimate, bottom left: True vs. mean estimate, bottom right: Mean errors)
27
8.2
Endogenous predictor functions
We consider the design function Zi (t) = bWi (t) + i (t); where
i (t)
= aUi (t) + c i (t) and the instrument wi is de…ned as Wi (t) =
(
+ i) t ( i) + ( i) i
i
1
(1
t)
i
1
+
i
for t 2 [0; 1], i ; i iid U [2; 5] and i iid N (0; 1), for all i = 1; :::; n. Moreover, Ui and "i are Ornstein-Uhlenbeck processes with standard deviation parameters u = " = 1. It is easily shown that i is also an Ornstein-Uhlenbeck process with unit meanp reversion rate described by the di¤erential equation d (t) = (t)+ a2 2u + c2 2" dG (t). R We further assume a = 1, b 2 [0; 1] and c such that var(Y (s))ds is unchanged as b S
varies.7 Hence, the choice of b amounts to that of the instrument’s strength. The numerical simulation design is slightly modi…ed so as to incorporate the generation of the instruments W and the dependence between Z and U : 1. Construct both a pseudo-continuous interval of [0; 1], denoted T , consisting of 1000 equally-spaced discrete steps, and a discretized interval of [0; 1], denoted T~ , consisting of only 100 equally-spaced discrete steps. 2. Generate n instrument functions wi (t) and error functions ui (s) and "i (s), where t; s 2 T so as to obtain pseudo-continuous functions. 3. Generate n predictor functions zi (t) using the design speci…ed above, where t; s 2 T so as to obtain pseudo-continuous functions. 4. Generate the n response functions yi (s) using the speci…ed model where s 2 T . 5. Generate the sample of n discretized pairs of functions (w~i ; z~i ; y~i ) by extracting the corresponding values of the pairs (wi ; zi ; yi ) for all t; s 2 T~ . 7
This assumption allows to keep the variance of Y stable when varying instrument strength. It s R (var( W (s))ds
implies c =
1 + (1
b2 ) SR var(
"(s))ds
.
S
28
6. Estimate using the regularization method on the sample of n triplets of functions (w~i ; z~i ; y~i ) and a …xed smoothing parameter = :01. 7. Repeat steps 2-5 100 times and calculate the M SE by averaging the quantities RR k^ k2HS = (^ (s; t) (s; t))2 dtds over all repetitions. T~ T~
Table 2 reports the M SE for 4 di¤erent sample sizes (n = 50; 100; 500; 1000) and 4 values of b when estimating the model without accounting for the endogeneity of Z. Unsurprisingly, the estimation errors are important. The squared bias is smaller to that ~ 2 for of the previous design and decreases with b. The last two columns report R2 and R R the full model. They are relatively stable since var(Y (s))ds is …xed. S
Table 2: Non-IV estimator: Mean-Square Errors over 100 replications Instr. strength Empirical MSE Squared bias Coef. of deter. ~2 R R2 n = 50 n = 100 n = 500 n = 1000 k k2HS b = 0:25 2.4834 1.4642 .4690 .3214 .0060 .5144 .5461 (c = 2:3) (.4678) (.2011) (.0435) (.0317) b = 0:5 2.3346 1.4504 .5826 .4541 .0027 .5140 .5450 (c = 1:96) (.4014) (.2416) (.0679) (.0460) b = 0:75 2.1858 1.5363 .8535 .7529 .0011 .5294 .5591 (c = 1:55) (.4825) (.2974) (.1027) (.0640) b=1 2.4219 2.0547 1.6583 1.6310 .0006 .5633 .5919 (c = 1) (.5305) (.3525) (.1581) (.1121) Note: Standard deviations are reported in parentheses.
We now turn to the simulations results for the IV estimator. Table 3 reports the M SE’s along with R2 and the squared regularization biases. Squared biases are fairly small in this setup. This is related to the covariance operator of the predictor functions. RF2 S denotes the …rst-stage regression’s coe¢ cient of determination. It shows how b relates to the instrument’s strength. Naturally, weaker instruments are associated with larger M SE’s, although the spread seems to vanish rather quickly in this setup.
29
Table 3: IV Instr. str. n = 50 b = 0:25 .2383 (c = 2:3) (.2019) b = 0:5 .1040 (c = 1:96) (.0859) b = 0:75 .0682 (c = 1:55) (.0364) b=1 .0466 (c = 1) (.0244)
estimator: Mean-Square Errors over 100 replications Empirical MSE Squared bias Coef. n = 100 n = 500 n = 1000 k k2HS R2 .1710 .0752 .0542 .0060 .0175 (.1422) (.0779) (.0209) .0619 .0315 .0276 .0027 .0737 (.0349) (.0099) (.0053) .0444 .0242 .0216 .0011 .1767 (.0203) (.0044) (.0028) .0330 .0211 .0199 .0006 .3287 (.0138) (.0029) (.0021)
of d. RF2 S .0246 .1092 .2683 .5048
Note: Standard deviations are reported in parentheses.
For comparisons with the exogenous case, we provide a …nal set of surface plots. Figure 4 shows 3D-plots of the mean IV estimated kernel (top-left), the mean non-IV (top-right), the superposition of the mean IV and the true kernels (bottom-left) and the mean estimation errors computed as the di¤erence between the true kernel and the mean IV estimate (bottom-right). Note that the mean IV estimate is relatively close to the actual kernel, whereas the estimate when neglecting endogeneity exhibits a large bias.
9 9.1
Application Introduction
This section presents an empirical application of the functional linear regression model with functional response to the study of the dynamics between daily electricity consumption and temperature patterns. There is a tradition of applications to electricity data in functional data analysis (Ferraty and Vieu, 2003; Antoch et al., 2008; Andersson and Lillestol, 2010; Liebl, 2013), although particularly focused on statistical predictions. Rather, we propose an application illustrating the usefulness of our estimator in structural econometrics. The growing deployment of smart-metering technologies in electricity systems creates a need for micro-econometric models able to capture the dynamic behaviors of end-users 30
Figure 4: True kernel vs. mean IV estimate (100 runs with n = 500, u = 1 and b = 0:75) (top left: Mean estimated IV; top right: Mean estimated non-IV, bottom left: True vs. mean IV estimate, bottom right: Mean IV errors)
31
with respect to market fundamentals. Such models allow to take advantage of the large amount of data so as to provide new insights to practitioners and policymakers about the behavior of consumers. In particular, information with regards to the end-users’ behavior with respect to changes in weather or prices is valuable to local distribution companies since it can contribute to improve their demand-side bidding strategies in day-ahead markets, as proposed by Patrick and Wolak (1997). We develop a discrete-time model with partial adjustment dynamics based on the literature on dynamic linear rational expectations (Muth, 1961; Hansen & Sargent, 1980). The model generates a decision rule that coincides with the functional regression of interest. We abstract from uncertainty and estimate the corresponding Euler equation using actual future values, as in Kennan (1979). The remaining of the section provides details on data construction before performing a preliminary data analysis. Finally, estimation results are interpreted using contour plots and report goodness-of-…t statistics.
9.2
The model
A representative forward-looking agent with preferences representable by a time-additive utility function is assumed to have perfect foresight over future weather realizations.8 At each period s, the agent chooses how much electrical heating ys to consume so as to minimize her discomfort in the next period, as well as her consumption level for other purposes, denoted cs , which generates instantaneous utility gains only. The level of discomfort is de…ned as the quadratic di¤erence between the desired heat stock x?s and actual heat stock xs . Considering the agent to live in a representative home, x?s would be the ideal level of indoor temperature while xs would be actual indoor temperature. Following McLaughlin et al. (2011) and Yu et al. (2012), the evolution of xs is assumed to depend on the previous period outside air temperature zs 1 . We further assume that the heat stock is subject to time-dependent adjustment constraints. Those may be related to the disutility of abrupt temperature variations, or production capacity limits 8
It is also possible to estimate a model like Y (s) =
Rs
0
0 (s; t)Z(t)dt
+
R1
1 (s; t)E[Z(t)jIs ]dt
+ U (s),
s
which explicitly accounts for expectations, although it requires a signi…cant departure from the original model presented in this paper.
32
on the heating system. The agent is considered to solve the dynamic recursive problem Vs (xs 1 ; xs ) = max us (cs ) fys ;cs g
s.t. xs+1
1 (xs 2
s
x?s )2
xs = (zs
2
xs 1 )2
(xs
ps (cs + ys ) + Vs+1 (xs ; xs+1 )
xs ) + ys ;
where us (:) is a strictly increasing and concave utility function associated with the consumption of cs . For the sake of simplicity and since electricity consumption is relatively price-inelastic, we neglect budget considerations and assume a constant price ps = p. Moreover, the entire trajectories of zs and x?s are known. The …rst order condition with respect to ct gives the usual optimality condition u0s (cs ) = p;
(28)
which states that the agent will consume electricity until its marginal utility equates the energy price. Making use of the …rst order condition with respect to qt and the envelope theorem, one obtains the Euler equation which determines the optimal path for the heat stock given by xs+1
s
+ (1 +
s+1 )
s
xs +
s+1
xs
1
=
s+1
p s+1
x?s
:
(29)
s+1
This second-order di¤erence equation has varying coe¢ cients depending on the stringency of the adjustment constraint in both the current and future periods. Iterative substitutions of the expressions of past and future values of the heat stock yields a general solution of the form x(s) =
0 (s)
+
1 X
1 (s; t)x
?
(t);
(30)
t=1
where 0 (s) and 1 (s; t) are aggregates of the model parameters, which crucially depend on the trajectory of adjustment costs. Let us de…ne the desired electrical heating consumption level as ys? = 1 (x?s+1 (1 )x?s zs ). Therefore, multiplying both sides of 1 (30) by (1 (1 )L)L , where L is the lag operator, yields a general solution for ys
33
given by y(s) =
0 0 (s)
+
1 X
0? 1 (t)
+ z(t));
(31)
s=1
which depends on outdoor temperature through its e¤ect on the motion of the heat stock. Finally, specifying x?s = 1;s + 2;s zs + "s , where 1;s , 2;s are known functions of time and "s is a random component unobservable to the econometrician, we obtain ys? = ( 1;s+1 (1 ) 1;s ) + 2;s+1 xs+1 )zs + ("s+1 (1 )"s ). Substituting 2;ts (1 this expression in (31) allows to write the electrical heating consumption decision in terms of the path of temperature and a serially correlated error term (s) as y(s) =
00 0 (s)
+
1 X
00 1 (s; t)z(t)
+ (s);
(32)
t=1
for any s. Since ys is hardly observable in practice, the estimation of the above equation appears di¢ cult. Instead one can use (28) and specify the inverse marginal utility function, such as u0s 1 (p) = v(s) 8p. Now, if one only observes aggregate electricity consumption Y (s) = ys + cs , each of the parameters 001 (s; t) can be estimated using Y (s) =
000 0 (s)
+
1 X
00 1 (s; t)z(t)
+ (s);
(33)
t=1
as long as v(s) is independent of outdoor temperature in all periods. The decision rule for the analogous continuous-time model is therefore of the form
Y (s) =
0 (s)
+
Z1
1 (s; t)z(t)dt
+ u(s):
(34)
0
This model can be estimated using the procedure described above if one has an i.i.d. sample of observations fYi ; zi gi=1;:::;n , and if ui can be presumed a mean-zero i.i.d. random functional error process. We propose to estimate the model using daily patterns extracted from annual aggregate electricity consumption and temperature trajectories at the level of the province of Ontario. Unfortunately, this approach introduces correlation across observations by construction. A more suited data set would consist of measurements from individual households’smart meters.
34
9.3
Preliminary data analysis
Prior to presenting the data set construction, let us present some facts about the electricity market in Ontario. There are two types of consumers. First, small consumers (residential end-users and small businesses) are billed for electricity usage by their local distribution company. The vast majority pays …xed time-of-use rates which are updated from season to season. Second, large consumers (large businesses and the public sector) are subject to the wholesale market price.9 The wholesale market price is determined on an hourly basis in a uniform-price multi-unit auction subject to operational constraints. It is considered as being quite volatile. Therefore, some large consumers choose to go with retail contractors to avoid market risk exposure, although the bulk of electricity trade goes through the wholesale market. Furthermore, all large consumers must also pay the monthly Global Adjustment which represents other charges related to market, transport and regulatory operations. Consequently, it is di¢ cult to evaluate the extent to which aggregate electricity consumption depends on the wholesale price and the time-of-use price. In our theoretical model, we assumed the price to be constant since short-term demand for electricity is typically perceived as being very price inelastic.10 Note that it would be possible to relax this assumption by using the time-of-use prices and the wholesale price. We could estimate the price elasticity of the electricity demand using the instrumental variable version of the Tikhonov estimator. A natural instrument for the wholesale electricity price in Ontario is wind power production, which depreciates prices signi…cantly and is as good as randomly assigned. We do not pursue this approach here. The original data set consists of hourly observations of real-time aggregate electricity consumption and weighted average temperature in Ontario from January 1, 2010 to September 30, 2014. Hourly power data for Ontario are publicly available on the system operator’s website,11 whereas hourly province-wide temperature data have been constructed from hourly measurements at 77 weather stations in Ontario,12 publicly 9
Roughly speaking, businesses are considered large when their electricity bills exceed $2,000 per month. 10 For example, the recent implementation of time-of-use pricing has been proved to have had limited e¤ects on electricity consumption in Ontario (Faruqui et al., 2013) 11 http://www.ieso.ca 12 The complete data set contains 139 weather stations although once matched to neighboring cities, only 77 are found relevant.
35
Table 4: Descriptive statistics Hourly temperature (C) Hourly consumption Mean SD Min Max Mean SD Min 8.27 9.17 -17.12 29.13 16.23 2.59 10.62 8.02 9.37 -17.96 31.15 16.15 2.45 10.76 9.32 8.67 -14.32 30.57 16.09 2.40 10.99 7.55 9.36 -17.40 28.88 16.07 2.39 10.77 7.58 11.02 -19.84 26.52 16.08 2.36 10.71 8.18 9.49 -19.84 31.15 16.12 2.45 10.62
Year 2010 2011 2012 2013 2014 All
(GW) Max Obs 25.07 8760 25.43 8760 24.60 8784 24.49 8760 22.77 6552 25.43 41616
available on Environment Canada’s website.13 . Let Z(t) denote our measure of temperature in hour t for the entire province. It is constructed in three steps. First, we match a set of 41 Ontarian cities (of above 10,000 inhabitants)14 to their three nearest weather stations, then compute a weighted average using a distance metric. Finally, we obtain Z(t) as a weighted average of cities’ temperatures, where weights are de…ned as each city’s relative population. The constructed province-wide hourly temperature variable is formally de…ned by Z(t) =
X c
where
c
=
(
P opc P P opj )
X
cf
w(c) Zw(c) (t)g;
w(c)
is city c’s weight,
w(c)
j
=
8i; h
((latc latw(c) )6 +(lonc lonw(c) )6 ) P ((latc latl(c) )6 +(lonc lonl(c) )6 )
1 1
is station
l(c)
w’s weight for city c’s temperature average, lat denotes latitude, lon longitude, and Zw (t) is the temperature measurement at station w in hour t.15 Finally, we use robust locally weighted polynomial regression on the constructed temperature series in order to smooth implausible jumps, which are most likely due to measurement errors. Table 1 reports descriptive statistics for hourly electricity consumption and our constructed measure of temperature. Unsurprisingly, we observe some correlation between annual consumption peaks and maximum temperatures.
13
http://climat.meteo.gc.ca/ Those cities represent 85.3% of the province’s population as of 2011. 15 Distance is calculated as the di¤erence in geographic coordinates to the sixth power. The exponent is chosen so as to put arbitrarily more weight on nearby weather stations with respect to those located further away from the cities. 14
36
The prices charged to individual consumers vary depending on the time of the day and the two main seasons (winter and the rest of the year) but follow a …xed pattern. The winter period spans from November 1 to April 30 and has two daily peak periods: one in the morning (7am-11am) and the other after worktime (5pm-7pm), a mid-peak period (11am-5pm) and an o¤-peak period (7pm-7am). The remaining part of the year is considered as the summer period and mid-peak and peak periods are reversed with respect to winter. Given this fact and the model developed above, dividing up the sample into two corresponding periods appears natural. We de…ne the periods consequently as shown in Figure 5 (blue for winter and red for summer) though decide to discard the spring, from May 1 to May 31, and autumn, from Sep 15 to Oct 31, periods (black line) given the limited heating or and air conditioning needs in these periods. Figure 6 displays the relationship between electricity consumption and temperature in winter, summer and the discarded months. 26
40
24
20 20
Temperature (C)
Electricity consumption (GWh)
30 22
18 16
10
0
14 -10 12 10
2011
2012
2013
-20
2014
2011
Time
2012
2013
2014
Time
Figure 5: Entire data series
The plots show evidence of a relatively linear relationship between the load and temperature for winter and summer months. On the other hand, the relation is much ‡atter in October and May. The …gures also suggest that power usage is more sensitive to warm weather than cold weather. Probably because electricity represents a small
37
May1-May31 & Sep16-Oct31
Jun1-Sept15 (Summer)
26
26
24
24
24
22
22
22
20
18
16
Electricity consumption (GWh)
26
Electricity consumption (GWh)
Electricity consumption (GWh)
Nov1-Apr30 (Winter)
20
18
16
20
18
16
14
14
14
12
12
12
10
-10
0
10
20
T emperature (C)
10
5
10
15
20
25
10
T emperature (C)
10
15
20
25
30
T emperature (C)
Figure 6: Electricity Consumption (GWh) vs. Temperature (C) share of heating fuel for residential users in Ontario,16 or because cooling can be more energy-intensive than heating. The functional data samples are …nally constructed with daily trajectories of 25 discrete observations for the dependent variable and a three-day window of 73 observations for the predictor variable. This window is chosen so that the dependent will always be regressed on at least 24 lagged hours and 24 future hours. We discard weekends, Mondays and Fridays as well as holidays.17 For ease of interpretation, the temperature variable is transformed into heating and cooling degrees for each season using Zh (t) = max (Z(t)) Z2Winter
Zc (t) = Z(t) where
min (Z(t)) = 6 C and
Z2Summer
Z(t)
min (Z(t));
Z2Summer
max (Z(t)) = 19 C. The samples are presented in
Z2Winter
16
Only 14% as of 2011 according to Statistics Canada (2011) In particular, Good Fridays and the holiday season for the winter period, and Canada day and Labour day for the summer period. 17
38
Figure 7 and Figure 8, the bold lines show the sample mean trajectories.
24
40
Day-1
Day
Day+1
35
22
Co oling De g r e e s ( C)
Ele ctr icity Co n su mp tio n ( G W)
30 20
18
16
25
20
15
14 10
12
10 12AM
5
03AM
06AM
09AM
12PM
03PM
06PM
09PM
0 12AM 06AM 12PM 06PM 12AM 06AM 12PM 06PM 12AM 06AM 12PM 06PM 12AM
12AM
Figure 7: Winter sample (333 observations)
26
30
Day-1
Day
Day+1
24 25
20
He a tin g De g r e e s ( C)
Ele ctr icity Co nsu mp tio n ( G W)
22
20
18
16
15
10 14 5 12
10 12AM
03AM
06AM
09AM
12PM
03PM
06PM
09PM
12AM
0 12AM 06AM 12PM 06PM 12AM 06AM 12PM 06PM 12AM 06AM 12PM 06PM 12AM
Figure 8: Summer sample (213 observations)
9.4
The functional model and estimation results
Based on the model developed above and previous data considerations, we consider E = L2 [0; 24] and F = L2 [ 24; 48] with S = [0; 24] and T = [ 24; 48]. The functional linear regression model of interest is given by 39
Yi (s) =
0 (s)
+
Z
1 (s; t)Zi (t)dt
+ Ui (s);
(35)
[ 24;48]
where Yi (s) is total electricity consumption in hour s, 0 (s) is a constant function, Zi (s) is temperature in hour s 2 [ 24; 48], and Ui (t) is a zero-mean error term. The object of interest is the kernel 1 , which characterizes the dynamic relation between electricity consumption for heating/AC needs and temperature patterns. Since we are interested in daily electricity consumption patterns, the cross-sectional unit i denotes a daily observation. We use Tuesdays, Wednesdays, and Thursdays. Using successive days may induce autocorrelation in the error which is not taken into account in our theory. However, we believe that our estimators are still consistent in this setting. Figures 9 and 10 display contour plots of the estimated kernels using the samples detailed previously. The regularization parameter was chosen using leave-one-out cross validation. This data-driven selection criterion suggests = 1:4 for the winter sample and = 0:7 for the summer period.
In order to ease interpretation of the results, we plot indicative dashed lines to separate out the daily windows and add a diagonal so as to emphasize the contemporaneous relation between the functions of interest. One should remain cautious with regards to the interpretation of the estimated kernel. Estimated correlations may be read both horizontally and vertically. The e¤ects of the entire temperature pattern on electricity consumption at a given hour of the day is observed horizontally, whereas the e¤ect of temperature at a speci…c time upon the daily electricity consumption pattern is read vertically. The magnitudes of the correlation are indicated using colors from dark red to white with corresponding values given in the legend. The seemingly S-shape of the kernel around the indicative diagonal is likely to be an artifact of the regularization scheme, as observed in our simulations, and consequently, interpretative attempts of the results should account of this feature. We observe from Figure 9 that the contemporaneous correlation between temperature in winter is the largest during the morning and evening peaks, whereas it is low between 10 am and 4 pm for the summer period. These high demand periods also seem to exhibit 40
D ay-1
Time (Electricity usage patterns)
12AM
D ay
D ay+1
09PM
10
06PM
8
03PM
6
12PM
4
09AM 2
06AM 0
03AM -2
12AM 12AM
06AM
12PM
06PM
12AM
06AM
12PM
06PM
12AM
06AM
12PM
06PM
12AM
Time (Temperature patterns )
Figure 9: Contour plots of estimation results for winter months with
12AM
Day-1
Day
= 1:4
Day+1
09PM
20
Time (Electricity usage patterns)
06PM 15
03PM 10
12PM
5
09AM
06AM 0
03AM -5
12AM 12AM 06AM 12PM 06PM 12AM 06AM 12PM 06PM 12AM 06AM 12PM 06PM 12AM
Time (Temperature patterns)
Figure 10: Contour plots of estimations results for summer months with 41
= 0:7
some serial correlation with previous and future peaks in winter but not in summer. Generally, the e¤ect of temperature on demand is found to be of larger magnitude in summer than in winter. Furthermore, correlations between current consumption and past and future temperatures appear to span more smoothly around the diagonal than in the winter period. Estimates could …nally be used to recover the parameters of the structural model developed earlier and perform counterfactual analyses.
10 9
4.25
10 9
3.1
Winter Summer
3.08
4.15
3.06
CV criterion
4.2
4.1
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
3.04
Figure 11: Leave-one out cross validation criterion
Figure 11 displays the leave-one-out cross-validation criterion as a function of . Figure 13 and Figure 12 display comparisons of predictions against actual trajectories for the year 2011.
Additionally, plots of the residuals and goodness-of-…t statistics are included in the appendix.18 Figures 14 and 15 display u bi (t) =Yi (t) where i is on the x-axis. The 24 curves correspond to the 24 hours of the days, namely di¤erent values of t: The estimated 18
We …nd signi…cant serial correlation in the residuals but no concern with regards to the stationarity of the series. Various speci…cations of the Augmented Dickey-Fuller tests were performed on the hourly residuals and all rejected the presence of a unit-root.
42
Summer
Electricity Consumption (GW )
25
Actual consumption Predicted consumption
20
15
10
06/27
07/05
07/18
07/26
08/03
08/15
08/23
Figure 12: Actual vs Prediction (Summer)
W inter Actual consumption Predicted consumption
20
Electricity Consumption (GW )
19
18
17
16
15
14
13
12 04/08
04/20
11/02
11/11
11/23
Figure 13: Actual vs Prediction (Winter)
43
12/01
12/09
12/21
kernel provides an overall good …t of the series, as suggested by the large coe¢ cients of determination of about .8 for both periods. In conclusion, the estimation results suggest some correlations between electricity consumption and past, current and future temperatures. The correlation with future values may be due to anticipatory behavior of agents who look at the temperature forecast before deciding on whether to leave the heater on that day or not for instance. It may also be due to a measurement error in the temperature variable. Indeed, we measure the temperature as a weighted average over the province, so it is necessarily a noisy measure of the temperature experienced by each individual agent. In general, the estimated coe¢ cients vary greatly across hours of the day and seasons. The model allows to obtain a good …t to the consumption data with temperature as a single predictor function. It would also be possible to include several predictor functions, each de…ned over possibly varying time-intervals. Furthermore, one could estimate a constrained version of the estimator by letting the time-interval to depend on s for instance. A direct adaptation of our model to this setup can easily be done by transforming predictor functions appropriately using binary variables. The study of the estimator’s predictive power as well as the introduction of a more thorough treatment of the temporal dimensions are left for future research.
44
Appendix W inter
0.15
0.1
0.05
Prediction err ors (%)
A
0
-0.05
-0.1
-0.15
50
100
150
200
Figure 14: Residuals (Winter)
45
250
300
Summer
0.15
0.1
Prediction err ors (%)
0.05
0
-0.05
-0.1
-0.15
20
40
60
80
100
120
140
160
Figure 15: Residuals (Summer)
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 12AM
03AM
06AM
09AM
12PM
03PM
06PM
09PM
Figure 16: Coe¢ cients of determination
46
12AM
180
200
B
Proofs
Proof of Proposition 1.
^
2
C^Y Z
= HS
I + V^Z
1 2 HS
C^Y Z
2
I + V^Z
HS
1 op
using the fact that, if A is a HS operator and B is a bounded operator, kABkHS kAkHS kBkop where kBkop supk k 1 kB k is the operator norm. Then, we have ^
2
1
HS
C^Y Z
2
: HS
It remains to show that C^Y Z is a HS operator. C^Y Z is an integral operator with degenP erate kernel n1 ni=1 yi (s) zi (t). A su¢ cient condition for C^Y Z to be HS is that its kernel is square integrable which is true because Yi and Zi are elements of Hilbert spaces. The result of Proposition 1 follows. Proof of Proposition 2. To prove Proposition 2, we need two preliminary lemmas. Lemma 7 Let A = B + C where B is a zero mean random operator and C is a nonrandom operator. Then, E kAk2HS = E kBk2HS + kCk2HS :
47
Proof of Lemma 7. E kAk2HS
X
= E
A j; A
!
j
j
X
= E
A A j;
j
j
X
= E
(B + C) (B + C)
j
X
= E
B B j;
j
j
+E
!
X
C B j;
! j
j
+E
X
B C
j;
j
j
+E
X
C C
j;
j;
j
j
j
!
!
!
!
:
The second and third terms on the r.h.s are equal to zero because E (B) = 0 and C is deterministic. We obtain E kAk2HS = E kBk2HS + kCk2HS : Lemma 8 Let A be a random operator from E to F. E kAk2HS = trE (A A) : Proof of Lemma 8. We have E
kAk2HS
= E =
X
X
A A j;
j
j
E (A A)
j;
!
j
j
= trE (A A) : We turn to the proof of Proposition 2. Applying Lemma 7 on the decomposition 48
(12), (13), and (14), we have E
^
2
jZ1 ; Z2 ; :::; Zn
HS
= E k(12)k2HS jZ1 ; Z2 ; :::; Zn + k(13) + (14)k2HS
E k(12)k2HS jZ1 ; Z2 ; :::; Zn + 2 k(13)k2HS + 2 k(14)k2HS :
We study the …rst term of the r.h.s. By Lemma 8, E k(12)k2HS jZ1 ; Z2 ; :::; Zn = E
1
I + V^Z
2
C^ZU HS
= trE = tr
1
I + V^Z I + V^Z
1
jZ1 ; Z2 ; :::; Zn
C^ZU C^ZU
I + V^Z
!
1
jZ1 ; Z2 ; :::; Zn
E C^ZU C^ZU jZ1 ; Z2 ; :::; Zn
I + V^Z
1
:
Note that 1 X zi hzj ; 'i hui ; uj i ; n2 i;j 1X zi hzi ; 'i E [hui ; ui i jZ1 ; Z2 ; :::; Zn ] = n i 1X = zi hzi ; 'i tr (VU ) n i
C^ZU C^ZU ' = E C^ZU C^ZU 'jZ1 ; Z2 ; :::; Zn
=
1 tr (VU ) V^Z ' n
because the ui are uncorrelated. To see that E [hu; ui] = trVU , decompose u on the P basis formed by the eigenfunctions j of VU so that u = j u; j j . It follows that P P 2 hu; ui = j u; j and E hu; ui = j VU j ; j = tr (VU ) : Hence, E k(12)k2HS jZ1 ; Z2 ; :::; Zn
=
1 tr (VU ) tr n C n
49
I + V^Z
1
V^Z
I + V^Z
1
where C is a generic constant. It follows that E k(12)k2HS Now, we turn toward the term (13). We have 1
I + V^Z =
Using I =
1
1
( I + VZ ) 1
I + V^Z
I
I + V^Z
V^Z
V^Z
+ I
C n
:
VZ ( I + VZ )
1
VZ
:
I + V^Z , we obtain 1
I + V^Z
I
V^Z =
I + V^Z
1
:
Hence, 1
I + V^Z
(13) =
1
I + V^Z
=
1
B
( I + VZ ) V^Z
VZ
where the last equality follows from A Now, we have I + V^Z
+
1
1
( I + VZ )
=A
1
1
B) B 1 :
(A
2
1
VZ
V^Z
( I + VZ )
1 HS
I + V^Z
1 2
VZ
2
V^Z
op
op
where
1 2
I + V^Z
1=
2
,
VZ
V^Z
op 2
( I + VZ )
1
2 HS
2 op
= Op (1=n) by Assumption 2 and
^2 ( I + VZ ) 1 =O : HS If > 1 then the term corresponding to (13) is negligeable with respect to (12). If < 1; then (12) is negligeable with respect to (13).
50
Now, we turn our attention toward the term (14). We have
by Assumption 4. Let VZ :
( I + VZ )
1
VZ
= ( I + VZ )
1
(VZ
=
( I + VZ )
1
=
( I + VZ )
1
VZ )
=2
VZ R
be the eigenvalues and orthonormal eigenfunctions of
j ; 'j
2
=2
=
I
( I + VZ ) 1 VZ R HS XD =2 1 2 ( I + VZ ) VZ R'j ; ( I + VZ )
1
j
=
2
X j
2
sup
j
(
j
+ )
2
( + )
^2
= O
2
:
R'j ; R'j
P
j
R'j ; R'j
2
= kRk2HS < 1 and, using
2 2
2
( + )2
2
j
The last equality follows from the fact that the notation = 2 , we have sup
E
2
R'j ; R'j
X
=2
VZ R'j
= sup
(
2
+ )2
^2
=O
by Carrasco, Florens, and Renault (2007, Proposition 3.11). Consequently, =2
2
^2 ( I + VZ ) 1 VZ R =O : HS This concludes the proof of Proposition 2. Proof of Proposition 5. We have
^m
= ^m
^ +^
51
:
We focus on the term ^ m ^m
^
m C^ZY
^ .
=
I + V^Zm
=
I + V^Zm
C^ZY
1
1
m C^ZY
C^ZY
= HS
1 n
n X
i=1 n X i=1
zi ) hyi ; :ik2HS =
f(zim
n
(zim
j
zi k2 2
= Op f (m) X
zi ) yi ;
j
X
j
hyim
= kzim k2 kzim k2
X
; (zim
HS 2
yi ; :ig HS
zi ) yi ;
yi ;
yim
; zim yim
j
yi ;
2 j
y i k2
Hence, I+
1
2 m C^ZY
yi ;
j
kyim
C^ZY
= Op HS
52
j
:
= Op f (m)2 :
V^Zm
yi ; :ik2HS :
2
yi ;
1
2
j
zim yim
j
=
zim
I + V^Z
zi ) hyi ; :ik2HS + kzim hyim
= kzim
yi ; :ik2HS =
1
1X zi hyi ; :i n i=1
zi ) hyi ; :i +
k(zim X
C^ZY I + V^Zm
+
n
2 n2
kzim hyim
1
I + V^Z
1X m m z hyi ; :i n i=1 i
2
=
k(zim
m C^ZY
f (m)2 2 n2
!
:
j
C^ZY :
1
I + V^Zm =
I + V^Zm
=
I + V^Zm
1
1
1
I + V^Z V^Z
V^Zm
V^Z
V^Zm ^ :
C^ZY 1
I + V^Z
C^ZY
Hence, 1
I + V^Zm
2
V^Z
V^Zm ^
= Op HS
f (m)2 2 n2
!
:
This concludes the proof of Proposition 5. Proof of Proposition 6. Using the fact that ka + b + ck2HS 3 kak2HS + kbk2HS + kck2HS ; we can evaluate the terms (25), (26), and (27) separately. The proof follows closely that of Proposition 2. Let Z and W be the sets (Z1 ; Z2 ; :::; Zn ) and (W1 ; W2 ; :::; Wn ) : E k(25)k2HS jZ; W = tr
I + C^ZW C^W Z
1
C^ZW E C^W U C^U W jZ; W C^W Z
I + C^ZW C^W Z
1
:
Using 1 E C^W U C^U W jZ; W = tr (VU ) V^W ; n we obtain E k(25)k2HS
C n
for some constant C: The proof regarding the rates of convergence of (26) and (27) is similar to that of Proposition 2 and is not repeated here.
53
References Andersson, J. and J. Lillestol, 2010, Modeling and forecasting electricity consuption by functional data analysis. The Journal of Energy Markets, 3:315. Antoch, J., L. Prchal, M. De Rosa, and P. Sarda, 2010, Electricity consumption prediction with functional linear regression using spline estimators, Journal of Applied Statistics, 37, 2027-2041. Aue, A., D. Norinho, and S. Hormann, 2014, On the prediction of stationary functional time series, Journal of American Statistical Association, in press. Bosq, D., 2000, Linear processes in function spaces: Theory and applications. SpringerVerlag, New York. Breunig, C. and J. Johannes, 2015, Adaptive estimation of functionals in nonparametric instrumental regression. forthcoming in Econometric Theory (arXiv:1109.0961). Cardo, H., F. Ferraty, and P. Sarda, 2003, Spline estimators for the functional linear model, Statistica Sinica, 13, 571-591. Carrasco, M. and J. P. Florens, 2000, Generalization of GMM to a continuum of moment conditions. Econometric Theory, 16, 797-834. Carrasco, M., J. P. Florens, and E. Renault, 2007, Linear Inverse Problems in Structural Econometrics: Estimation based on spectral decomposition and regularization, Handbook of Econometrics, Vol. 6B, edited by J.J. Heckman and E.E. Leamer. Centorrino, S., 2014, Data-driven selection of the regularization parameter in additive nonparametric instrumental regressions, mimeo, Stony Brook University. Centorrino, S. and J. P. Florens, 2014, Nonparametric instrumental variable estimation of binary response models, mimeo, Stony Brook University. Centorrino, S., F. Fève, and J. P. Florens, 2013, Implementation simulations and bootstrap in nonparametric instrumental variable estimation, mimeo, Stony Brook University. Comte, F. and J. Johannes, 2012, Adaptive functional linear regression. Annals of Statistics, 40(6), 2765-2797. Conley, T., L. Hansen, E. Luttmer, J. Scheinkman, 1997, Short-Term Interest Rates as Subordinated Di¤usions, The Review of Financial Studies, 10(3). Crambes, C, A. Kneip, and P. Sarda, 2009, Smoothing spline estimators for functional linear regression, The Annals of Statistics, 37, 35-72.
54
Crambes, C. and A. Mas, 2013, Asymptotics of prediction in functional linear regression with functional outputs, Bernoulli, 19, 2627-2651. Cuevas, A., M. Febrero, and R. Fraiman, 2002, Linear functional regression: the case of …xed design and functional response. The Canadian Journal of Statistics. Vol 30., N.2, 285-300. Darolles, S., Y. Fan, J. P. Florens, and E. Renault, 2011, Nonparametric Instrumental Regression, Econometrica, 79, 1541-1565. Dauxois, J., A. Pousse, and Y. Romain, 1982, Asymptotic Theory for the Principal Component Analysis of a Vector Random Function: Some Applications to Statistical Inference, Journal of Multivariate Analysis, 12, 136-154. Engl, H., M. Hanke, and A. Neubauer, 2000, Regularization of Inverse Problems, Kluwer Academic Publishers, Dordrecht. Eubank, R., 1988, Spline Smoothing and Nonparametric Regression, Marcel Dekker, New York. Faruqui, A., S. Sergici, N. Lessem, F. Denton, B. Spencer, and C. King, 2013, Report prepared for Ontario Power Authority, Impact Evaluation of Ontario’s Time-of-Use Rates: First Year Analysis. Ferraty, F. and P. Vieu, 2006, Nonparametric Functional Data Analysis: Methods, Theory, Applications and Implementations, Springer-Verlag, London. Gabrys, R., L. Horvath, and P. Kokoszka, 2010, Tests for Error Correlation in the Functional Linear Model, Journal of the American Statistical Association, 105:491, 11131125. Goldenshluger, A. and O. Lepski, 2011, Bandwidth selection in kernel density estimation: oracle inequalities and adaptive minimax optimality. Annals of Statistics, 39, 1608-1632. Hall, P. and J. Horowitz, 2007, Methodology and convergence rates for functional linear regression, The Annals of Statistics, 35, 70-91. Hansen, L.P. and T. Sargent, 1980, Formulating and estimating dynamic linear rational expectations models, Journal of Economic Dynamics and Control, Elsevier, vol. 2(1), pages 7-46. He, G., H. G. Müller, and J. L. Wang, 2000, Extending Correlation and Regression from Multivariate to Functional Data, Asymptotics in Statistics and Probability, 1-14. Horowitz, J. and S. Lee, 2007, Nonparametric Instrumental Variables Estimation of a 55
Quantile Regression Model, Econometrica, Vol. 75, No. 4, 1191-1208. Kargin, V. and A. Onatski, 2008, Curve forecasting by functional autoregression, Journal of Multivariate Analysis, 99, 2508-2526. Kennan, J., 1979, The Estimation of Partial Adjustment Models with Rational Expectations, Econometrica, vol.47 Kress, R., 1999, Linear Integral Equations, Springer. Liebl, D., 2013, Modeling and forecasting electricity spot prices: A functional data perspective. Annals of Applied Statistics 7 no. 3, 1562-1592. Manresa, E. (2015) Estimating the structure of social interactions using panel data, mimeo, MIT Sloan. McLaughlin, L., Jia, L., Yu, Z. and Tong, L., 2011, Thermal Dynamic for Home Energy Management: A Case Study. Cornell Univ., Ithaca, NY, USA, Tech. Rep. ACSP-TR10-11-01. Muth, J., 1961, "Rational Expectations and the Theory of Price Movements", Econometrica 29, pp. 315-335. Patrick, R. and F. Wolak, 1997, Estimating customer-level demand for electricity under real time pricing. Document available from http://www.stanford.edu/-wolak. Ramsay, J.O. and B.W. Silverman, 2005, Functional Data Analysis, 2nd edition, Springer, New York. Rudin, W., 1991, Functional Analysis, 2nd Edition, McGraw Hill. Vilenkin, N. I. A, 1968, Special functions and the theory of group representations, American Mathematical Society. Statistics Canada, 2011, Households and the Environment: Energy Use, Catalogue no. 11-526-S. Yao, F., H. G. Müller, and J. L. Wang, 2005, Functional Linear Regression Analysis for Longitudinal Data, The Annals of Statistics, Vol. 33, N.6, 2873-2903. Yu, Z., McLaughlin, L., Jia, L., Murphy-Hoye, M.C., Pratt, A. and Tong, L., 2012, Modeling and Stochastic Control for Home Energy Management. Proc. IEEE PES Gen. Meet., July.
56