May 7, 2010 - of (non missing) observations (e.g. Koutsoyiannis and Langousis, 2010). ... or of dry years to cluster into drought periods (e.g. Koutsoyiannis,.
EuropeanGeosciencesUnionGeneralAssembly2010
Vienna,Austria,27May2010
SessionHS5.5:Stochasticsinhydrometeorologicalprocesses
Optimalinfillingofmissingvaluesinhydrometeorologicaltimeseries Y.Dialynas,P.Kossieris,K.Kyriakidis,A.Lykou,Y.Markonis, C.Pappas,S.M.PapalexiouandD.Koutsoyiannis DepartmentofWaterResourcesandEnvironmentalEngineering,NationalTechnicalUniversityofAthens(www.itia.ntua.gr) 1.Abstract BeingagroupofundergraduatestudentsintheNationalTechnical UniversityofAthens,attendingthecourseofStochasticMethods inWater Resources,westudy,incooperationwithourtutors,theinfillingof missingvaluesofhydrometeorologicaltimeseriesfrommeasurementsat neighbouringtimes.Theliteratureprovidesaplethoraofmethods,mostof whicharereducedtoalinearstatisticalinterpolatingrelationship. Assumingthattheunderlyinghydrometeorologicalprocessbehaves like eitheraMarkovianoraHurstKolmogorovprocessweestimatethe missingvaluesusingtwotechniques,i.e.,(a)alocalaverage(withequal weights)basedontheoptimalnumberofmeasurementsreferringtoa numberofforwardandbackwardtimesteps,and(b)aweightedaverage usingallavailabledata.Ineachofthecaseswedeterminethe unknown quantities(therequirednumberofneighbouringvaluesorthesequenceof weights)soastominimizetheestimationmeansquareerror.The resultsof thisinvestigationareeasilyapplicableforinfillingtimeseriesinrealworld applications.
4.Longrangevs.shortrangedependence(2)
7.Resultsforlongrangedependence
10.Validationbysimulation(LongRangeDependence)
MSE– Hurstcoefficientcurvesfordifferentnumberoftimesteps
FortheFGNprocess(forexamplewhentheHurstcoefficientisH=0.96), theautocorrelationforlag40(years)isashighas0.23,whereasinthe Markovianprocesstheautocorrelationispracticallyzeroevenformuch shorterlags.
• As Hurst coefficient increases, the optimal number of neighbouring timestepsfortheinterpolationdecreases(minimumMSE) • AglobalaverageispreferableforlowvaluesofHurstcoefficient,while asmallnumberofneighbouringtimestepsisrequiredforhighvalues ofHurstcoefficient. • Inparticular,forH 0.80theoptimaln is1.
Asclearlyshown,theestimationoftheMSEfromthesimulation(FGN model)isnearlyidenticaltothetheoreticalMSE.
2.Motivation
5.Methodology(1)
8.Resultsforshortrangedependence
11.Theoryvs.Simulation(ShortRangeDependence)
• Missingvaluesinhydrometeorologicaltimeseriesisacommon problem. • Theliteratureprovidesaplethoraofmethodsforinfillingmissing valuesintimeandspace(Thiessen,NormalRatio,BLUE,Organic Correlation,etc.).Almostallmethodsarebasedonaweightedaverage of(nonmissing)observations(e.g.KoutsoyiannisandLangousis, 2010). • Hereweconsiderthesimplestmethodwhereallweightsareequal to1. • Interpolationinspaceistypicallymadebytakingtheaverageofnearest points(e.g.withinacertaindistance),ratherthantheglobalaverageof allpointswithmeasurements. • Paradoxically,infillingintimetheglobalaverageofallavailabledatais usuallypreferred. • Weseektoexploreifthispracticeisjustifiedandunderwhich conditions.Inotherwords,weinvestigatewhenaglobaltimeaverageis preferableoveralocalaverageandwhennot. • Weinvestigatetwotypesoftimedependenceoftheunderlyingprocess: Markovian(shortrangedependence)andHurst– Kolmogorov(HK; longrangedependence).
• Theinterpolationproblemreferstotheestimationofanunknown quantityy fromknownvaluesxi (i =1,…,n)(measurements)ofthesame quantityandthesametimeperiodatdifferentpoints(oratdifferenttime periodsonthesamepoint).Mathematicallytheinterpolationproblem canbesimplyexpressedasalinearequation:
3.Longrangevs.shortrangedependence(1) • Longrangedependence,alsoknownastheHurstphenomenon,is commoninhydrologicalandothergeophysicalprocesses.Insimple wordsitexpressesthetendencyofwetyearstoclusterintowet periods, orofdryyearstoclusterintodroughtperiods(e.g.Koutsoyiannis, 2002). • Ontheotherhand,shortrangedependenceappearsinMarkovian processeswherethefuturedoesnotdependonthepastwhenthe presentisknown. • TheAR(1)(autoregressiveoforder1)modelisaverypopular representativeofMarkovianprocesses,whereastheFractionalGaussian Noise(FGN)process(e.g.Mandelbrot,1965),orHKprocess,iswidely usedforreproducinglongrangedependence. • ThemajordifferenceintheabovemodelsisthatwhileinAR(1)the autocorrelationdeclinesexponentially,inHKitisapowerfunctionof lagthusresultinginslowdecayofautocorrelation.
Autocorrelationfunctionfor1 =0.8fortheAR(1)andFGNprocesses
TheoreticalandsimulatedMSE ncurvesfordifferentvaluesofHurstcoefficient
y =w1x1+… +wnxn+e •
wherex,y,e:randomvariables wi:weightingfactor e:estimationerror
• Invectorform:
MSE– Autocorrelationcoefficientcurvesfordifferenttimesteps
y =wTx +e
ļ e =y wTx
wherew :=[w1,…,wn]T x :=[x1,…,xn]T
6.Methodology(2)
• InaMarkovianprocessthenumberoftheoptimalneighbouring time steps(n)dependsonacriticalvalueofautocorrelationforlagone (cr 0.24). • Thus,for cr onlyone stepforwardandbackwardisneeded.
Again,theestimationoftheMSEfromthesimulation(AR(1)model)is nearlyidenticaltothetheoreticalMSE.
9.Optimalnumberoftimesteps
12.Conclusions
• Assumptions: o thesettingisstationary o equalweightingfactorswi =1(forsimplicityanddirectapplicability) • TheoptimalinfillingproducesminimumMeanSquareError(MSE): MSE:=E[e2]=e2+e2 wheree := [e] e :=(Var[e])½ • Undertheaboveassumptions,theMSEcanbecalculatedas: 2 E ¬ªe ¼º
n n ª§ x t i ¦ xt i «¨ ¦ i 1 E «¨ xt i 1 «¨ 2n «¨ ¬©
2
· ¸ ¸ ¸ ¸ ¹
2
º » » » » ¼
n 2n ª º 1 V § · § · 2 ª º E ¬e ¼ ¨ ¸ « 2n 1 ¨ n 2¦ Ui ¸ ¦ 2n 1 i Ui » 2© n ¹ ¬ i 1 © ¹ i1 ¼ wheren: numberofneighbouring timesteps
TheoreticalandsimulatedMSE ncurvesfordifferentvaluesofautocorrelationcoefficient
Inlongrangedependence (ontheleft)forhighvalues oftheHurstcoefficient,the optimalnumberoftime stepsislimitedtoone, whereaslowvaluesrequire aglobalaverage. MSE– ncurvesfordifferentHurstcoefficients
Inshortrangedependence (ontheright)theglobal averageisrequiredforthe interpolationinprocesses withlowautocorrelation, whereasthenumberof optimaltimestepsreduces rapidlyto1forhighvalues ofautocorrelation
MSE– ncurvesfordifferentautocorrelationcoefficients
• IntimeserieswithHurst– Kolmogorovbehaviour,alocalaverageusinganoptimal numberofneighbouring measurementsispreferableoveraglobalaverageonthe infillingofamissingvalue.DependingontheHurstcoefficient,onlyafewforward andbackwardtimestepsareneededforinfillingthemissingvalue,whichmakesthe processsimpler.ParticularlyforanHKprocesswith: o H =0.500.60, theglobalaverageispreferable o H =0.70, 4timestepsbeforeand4aftertheinterpolationtime o H =0.72, 3timestepsbeforeand3aftertheinterpolationtime o H =0.74, 2timestepsbeforeand2aftertheinterpolationtime o H 0.80, 1timestepbeforeand1aftertheinterpolationtime • IntimeserieswithMarkovianbehaviour theoptimalnumberofneighbouring time stepsdependsonacriticalvalueofautocorrelation.Specifically: o 0.24,theglobalaverageispreferable o 0.24,1timestepbeforeand1aftertheinterpolationtime • Themethodologyisappropriateforquickinfillingofveryfewmissingvalues(not forlongperiodswithmissingvalues).Theuseoflocalaveragehasanadditional advantage,overtheglobalaverage,thatitdoesnotreducethevarianceofthetime series. REFERENCES Koutsoyiannis,D.,andA.Langousis,Precipitation,ch.27inTreatiseonWaterScience,Elsevier,2010(inpress). Koutsoyiannis,D.,TheHurstphenomenonandfractionalGaussiannoisemadeeasy,HydrologicalSciencesJournal,47(4),573– 795,2002 Koutsoyiannis,D.,LecturenotesonStochasticMethodsinWaterResourses,Edition3,100pages,NationalTechnicalUniversityofAthens, 2007 Mandelbrot,B.B.,Uneclassedeprocessusstochastiqueshomothetiquesasoi:applicationalaloiclimatologiquedeH.E.Hurst C.RAcad.Sci.Paris 260,1965