Optimal infilling of missing values in

0 downloads 0 Views 1MB Size Report
May 7, 2010 - of (non missing) observations (e.g. Koutsoyiannis and Langousis, 2010). ... or of dry years to cluster into drought periods (e.g. Koutsoyiannis,.
EuropeanGeosciencesUnionGeneralAssembly2010

Vienna,Austria,27May2010

SessionHS5.5:Stochasticsinhydrometeorologicalprocesses

Optimalinfillingofmissingvaluesinhydrometeorologicaltimeseries Y.Dialynas,P.Kossieris,K.Kyriakidis,A.Lykou,Y.Markonis, C.Pappas,S.M.PapalexiouandD.Koutsoyiannis DepartmentofWaterResourcesandEnvironmentalEngineering,NationalTechnicalUniversityofAthens(www.itia.ntua.gr) 1.Abstract BeingagroupofundergraduatestudentsintheNationalTechnical UniversityofAthens,attendingthecourseofStochasticMethods inWater Resources,westudy,incooperationwithourtutors,theinfillingof missingvaluesofhydrometeorologicaltimeseriesfrommeasurementsat neighbouringtimes.Theliteratureprovidesaplethoraofmethods,mostof whicharereducedtoalinearstatisticalinterpolatingrelationship. Assumingthattheunderlyinghydrometeorologicalprocessbehaves like eitheraMarkovianoraHurstKolmogorovprocessweestimatethe missingvaluesusingtwotechniques,i.e.,(a)alocalaverage(withequal weights)basedontheoptimalnumberofmeasurementsreferringtoa numberofforwardandbackwardtimesteps,and(b)aweightedaverage usingallavailabledata.Ineachofthecaseswedeterminethe unknown quantities(therequirednumberofneighbouringvaluesorthesequenceof weights)soastominimizetheestimationmeansquareerror.The resultsof thisinvestigationareeasilyapplicableforinfillingtimeseriesinrealworld applications.

4.Longrangevs.shortrangedependence(2)

7.Resultsforlongrangedependence

10.Validationbysimulation(LongRangeDependence)

MSE– Hurstcoefficientcurvesfordifferentnumberoftimesteps

FortheFGNprocess(forexamplewhentheHurstcoefficientisH=0.96), theautocorrelationforlag40(years)isashighas0.23,whereasinthe Markovianprocesstheautocorrelationispracticallyzeroevenformuch shorterlags.

• As Hurst coefficient increases, the optimal number of neighbouring timestepsfortheinterpolationdecreases(minimumMSE) • AglobalaverageispreferableforlowvaluesofHurstcoefficient,while asmallnumberofneighbouringtimestepsisrequiredforhighvalues ofHurstcoefficient. • Inparticular,forH  0.80theoptimaln is1.

Asclearlyshown,theestimationoftheMSEfromthesimulation(FGN model)isnearlyidenticaltothetheoreticalMSE.

2.Motivation

5.Methodology(1)

8.Resultsforshortrangedependence

11.Theoryvs.Simulation(ShortRangeDependence)

• Missingvaluesinhydrometeorologicaltimeseriesisacommon problem. • Theliteratureprovidesaplethoraofmethodsforinfillingmissing valuesintimeandspace(Thiessen,NormalRatio,BLUE,Organic Correlation,etc.).Almostallmethodsarebasedonaweightedaverage of(nonmissing)observations(e.g.KoutsoyiannisandLangousis, 2010). • Hereweconsiderthesimplestmethodwhereallweightsareequal to1. • Interpolationinspaceistypicallymadebytakingtheaverageofnearest points(e.g.withinacertaindistance),ratherthantheglobalaverageof allpointswithmeasurements. • Paradoxically,infillingintimetheglobalaverageofallavailabledatais usuallypreferred. • Weseektoexploreifthispracticeisjustifiedandunderwhich conditions.Inotherwords,weinvestigatewhenaglobaltimeaverageis preferableoveralocalaverageandwhennot. • Weinvestigatetwotypesoftimedependenceoftheunderlyingprocess: Markovian(shortrangedependence)andHurst– Kolmogorov(HK; longrangedependence).

• Theinterpolationproblemreferstotheestimationofanunknown quantityy fromknownvaluesxi (i =1,…,n)(measurements)ofthesame quantityandthesametimeperiodatdifferentpoints(oratdifferenttime periodsonthesamepoint).Mathematicallytheinterpolationproblem canbesimplyexpressedasalinearequation:

3.Longrangevs.shortrangedependence(1) • Longrangedependence,alsoknownastheHurstphenomenon,is commoninhydrologicalandothergeophysicalprocesses.Insimple wordsitexpressesthetendencyofwetyearstoclusterintowet periods, orofdryyearstoclusterintodroughtperiods(e.g.Koutsoyiannis, 2002). • Ontheotherhand,shortrangedependenceappearsinMarkovian processeswherethefuturedoesnotdependonthepastwhenthe presentisknown. • TheAR(1)(autoregressiveoforder1)modelisaverypopular representativeofMarkovianprocesses,whereastheFractionalGaussian Noise(FGN)process(e.g.Mandelbrot,1965),orHKprocess,iswidely usedforreproducinglongrangedependence. • ThemajordifferenceintheabovemodelsisthatwhileinAR(1)the autocorrelationdeclinesexponentially,inHKitisapowerfunctionof lagthusresultinginslowdecayofautocorrelation.

Autocorrelationfunctionfor1 =0.8fortheAR(1)andFGNprocesses

TheoreticalandsimulatedMSE ncurvesfordifferentvaluesofHurstcoefficient

y =w1x1+… +wnxn+e •

wherex,y,e:randomvariables wi:weightingfactor e:estimationerror

• Invectorform:

MSE– Autocorrelationcoefficientcurvesfordifferenttimesteps

y =wTx +e

ļ e =y  wTx

wherew :=[w1,…,wn]T x :=[x1,…,xn]T

6.Methodology(2)

• InaMarkovianprocessthenumberoftheoptimalneighbouring time steps(n)dependsonacriticalvalueofautocorrelationforlagone (cr  0.24). • Thus,for cr onlyone stepforwardandbackwardisneeded.

Again,theestimationoftheMSEfromthesimulation(AR(1)model)is nearlyidenticaltothetheoreticalMSE.

9.Optimalnumberoftimesteps

12.Conclusions

• Assumptions: o thesettingisstationary o equalweightingfactorswi =1(forsimplicityanddirectapplicability) • TheoptimalinfillingproducesminimumMeanSquareError(MSE): MSE:=E[e2]=e2+e2 wheree := [e] e :=(Var[e])½ • Undertheaboveassumptions,theMSEcanbecalculatedas: 2 E ¬ªe ¼º

n n ª§ x t i  ¦ xt i «¨ ¦ i 1 E «¨ xt  i 1 «¨ 2n «¨ ¬©

2

· ¸ ¸ ¸ ¸ ¹

2

º » » » » ¼

n 2n ª º 1 V § · § · 2 ª º E ¬e ¼ ¨ ¸ « 2n  1 ¨ n  2¦ Ui ¸  ¦ 2n  1  i Ui » 2© n ¹ ¬ i 1 © ¹ i1 ¼ wheren: numberofneighbouring timesteps

TheoreticalandsimulatedMSE ncurvesfordifferentvaluesofautocorrelationcoefficient

Inlongrangedependence (ontheleft)forhighvalues oftheHurstcoefficient,the optimalnumberoftime stepsislimitedtoone, whereaslowvaluesrequire aglobalaverage. MSE– ncurvesfordifferentHurstcoefficients

Inshortrangedependence (ontheright)theglobal averageisrequiredforthe interpolationinprocesses withlowautocorrelation, whereasthenumberof optimaltimestepsreduces rapidlyto1forhighvalues ofautocorrelation

MSE– ncurvesfordifferentautocorrelationcoefficients

• IntimeserieswithHurst– Kolmogorovbehaviour,alocalaverageusinganoptimal numberofneighbouring measurementsispreferableoveraglobalaverageonthe infillingofamissingvalue.DependingontheHurstcoefficient,onlyafewforward andbackwardtimestepsareneededforinfillingthemissingvalue,whichmakesthe processsimpler.ParticularlyforanHKprocesswith: o H =0.500.60, theglobalaverageispreferable o H =0.70, 4timestepsbeforeand4aftertheinterpolationtime o H =0.72, 3timestepsbeforeand3aftertheinterpolationtime o H =0.74, 2timestepsbeforeand2aftertheinterpolationtime o H  0.80, 1timestepbeforeand1aftertheinterpolationtime • IntimeserieswithMarkovianbehaviour theoptimalnumberofneighbouring time stepsdependsonacriticalvalueofautocorrelation.Specifically: o   0.24,theglobalaverageispreferable o   0.24,1timestepbeforeand1aftertheinterpolationtime • Themethodologyisappropriateforquickinfillingofveryfewmissingvalues(not forlongperiodswithmissingvalues).Theuseoflocalaveragehasanadditional advantage,overtheglobalaverage,thatitdoesnotreducethevarianceofthetime series. REFERENCES Koutsoyiannis,D.,andA.Langousis,Precipitation,ch.27inTreatiseonWaterScience,Elsevier,2010(inpress). Koutsoyiannis,D.,TheHurstphenomenonandfractionalGaussiannoisemadeeasy,HydrologicalSciencesJournal,47(4),573– 795,2002 Koutsoyiannis,D.,LecturenotesonStochasticMethodsinWaterResourses,Edition3,100pages,NationalTechnicalUniversityofAthens, 2007 Mandelbrot,B.B.,Uneclassedeprocessusstochastiqueshomothetiquesasoi:applicationalaloiclimatologiquedeH.E.Hurst C.RAcad.Sci.Paris 260,1965