TEACHING ECONOMETRICS USING DYNAMIC ...

4 downloads 7230 Views 553KB Size Report
... are assigned tasks that enable them to master the mechanics of data analysis. ... graphic modules useful for teaching introductory econometric courses are given in ... to gain more insight about econometric concepts that these Monte Car-.
TEACHING ECONOMETRICS USING DYNAMIC GRAPHICS Mehmet Balcılar* Published in “Proceeding of the Osh State University International Science Conference on University Education,” Osh, Kyrgyzstan, 170-180, (2002). Abstract Computers and software are not only useful for analyzing data, but also for illustrating essential econometric and statistical topics. Dynamic graphic techniques help students to understand concepts that many students find hard to grasp. This paper illustrates the use of simulation and dynamic graphic techniques for teaching undergraduate econometric concepts. Several examples of teaching econometric concepts, such as confidence intervals, the central limit theorem, sampling distribution, least squares regression, etc., using dynamic graphical methods are illustrated. Particular emphasis is given to least squares regression. The paper also compares Monte Carlo simulation method to dynamic graphical methods and shows its disadvantages. We also show how graphical methods can help in advanced data analysis. Key Words: Teaching econometrics; Statistical software; Dynamic graphics; Monte Carlo simulation.

1. Introduction Instructors of undergraduate econometric courses are usually disappointed by the difficulty of teaching even the most basic statistical and econometric concepts. Econometrics courses are mostly offered to economics students. These students usually have a weak background in quantitative methods. First experience of many students with econometrics unfortunately is disappointing. Disappointing experiences created an anxiety against econometrics among economics students. In some instances the problem is as serious as to call it econometrics anxiety. The author also had discoursing experiences with econometrics students. It is not rare to find about students who did not understand the intuition behind a concept as central as the central limit theorem after teaching a fourteen weeks long basic econometrics course. We feel that responsibility of developing more effective methods of teaching basic econometrics usually rests on the side of instructors not on the side of students. With little more effort many of these students develop a deeper understanding and intuition of the topics that they find hard to grasp. During the last two decades, all instructors of undergraduate econometric courses recognized the value of active learning. This led many instructors of undergraduate econometrics courses to require students actively use statistical or more specialized econometrics software to analyze the data. During the course students are assigned tasks that enable them to master the mechanics of data analysis. Routine assignments involve learning how to use an econometric analysis software, entering and graphing data, obtaining basic descriptive statistics, running ordinary least squares regressions, detecting data and model deficiencies, and sometimes performing some specialized analysis methods, such as the estimation methods in the presence of deficient data. Specialized analyses usually involve performing estimation methods in the presence of autocorrelation and heteroskedasticity, analyzing panel data, and etc. Students are usually required to perform these tasks on real world data. Instructors think that using real world data makes the experience more realistic. Unfortunately, the amount of experience that a student can get by performing such computer assignments is limited. Although, students learn the mechanics of data analysis, they do not gain intuition about the basic concepts of econometric theory. A sizeable proportion of students have difficulty grasping the intuition behind many basic econometric concepts introduced in introductory econometric courses. Many instructors including the author of this paper feel that these concepts are very important for effective modeling and data analysis. We should focus on these concepts and develop better ways of teaching them. The ideas that will be present and methods that will be examined in this study is grown out of the experience of the author in the past seven years with students in three continents and six universities. This experience is mostly related to undergraduate econometrics and statistics courses. However, we also found that the methods examined in this paper are very useful for graduate courses. The plan of the paper is as follows: Advantages of using computer-aided visualization and dynamic graphic methods to teach introductory econometrics courses are discussed in Section 2. In Section 3, we compare Monte Carlo simulation and dynamic graphic methods. The relationship between Monte *

Assistant Professor of Econometrics, Çukurova University, Department of Econometrics, Adana, Turkey (on leave) and Kyrgyz-Turkish Manas University, Department of Economics, Bishkek, Kyrgyzstan.

Carlo and dynamic graphic methods is also examined in this section. Several examples of dynamic graphic modules useful for teaching introductory econometric courses are given in Section 4. Section 5 concludes the paper. 2. Using Computer-Aided Visualization to Teach Econometrics In the last fifteen years a young disciple called computer-aided visualization has arisen. The goal of computer-aided visualization is to create computerized visual tools for analyzing and communicating information with pictures or graphics. The computer-aided visualization have become a new partner to an old solution: presenting information to the human eye. Unlike other ways of obtaining information that the human use, the eye can absorb immense amounts of information if provided with informative pictures. Although, this is old hat in principle, the ultra high developments in computers since the 1980s have brought powerful facilities to display immense amount of data with pictures or graphics. Furthermore, the recent developments in software technology allow interaction between the display and user. Dynamic graphic methods allow users instantly observe the result of modifying certain properties or aspects of the data on the visual display. This opens a large territory to explore for development of interactive teaching methods. Over the years we have discovered that students are much more comfortable with graphical representation of the econometric concepts such as the, confidence intervals, the central limit theorem, sampling distribution, least squares regression, etc. Introduction of these concepts with graphics ease the understanding to a great extend. We have also discovered that graphical methods also lead to a deeper understanding of the topics. By using the power of current computing technology to dynamically display the numerical data, it is possible to supplement standard data analysis assignments and algebraic derivations. In this way, students become actively involved in the learning of important econometric concepts. Active involvement and visual display allow a deeper understanding of the concept. Most students easily develop good intuitive understanding with active involvement and visual displays. It is further possible to enhance the learning experience by giving students additional statistical experiences through combining carefully designed and implemented multiple simulations and dynamic graphics. 3. Monte Carlo Simulation versus Dynamic Graphics Many introductory econometric textbooks contain Monte Carlo simulation examples and/or exercises by which students are expected to gain more insight about econometric concepts that these Monte Carlo simulations are related. Although dynamic graphic methods are related to Monte Carlo simulation approach, dynamic graphic methods convey the information in a much effective way. Furthermore, some methods available in dynamic graphics cannot easily be performed by Monte Carlo simulations. For instance examining effects of individual observations on the OLS regression estimate illustrated in Figure 2 below is not possible with Monte Carlo simulations. Dynamic graphic methods are in many cases are much more useful and effective. However, it should not be overlooked that dynamic graphic methods are always not related to Monte Carlo simulations. Dynamic graphic methods benefit from Monte Carlo simulations. Some of the dynamic graphic implementations actually based on the information obtained from Monte Carlo simulations. However, dynamic graphical methods are more intuitive. They make students active participations possible. Thus, by providing interaction they become much more effective learning tools. The interactive environment that the dynamic graphical methods provide makes possible the further development of good insights about the important econometric concepts. In order to illustrate the difference between Monte Carlo simulation and dynamic graphic methods consider a simple example in which we try to illustrate properties of the OLS estimates. The regression equation we consider is given by

Yi = α + βX i + ui

(1)

where Yi is the dependent variable, Xi is the independent or control variable, α and β are parameters of interest, and ui is an identically and independently distributed normal random error term. In Monte Carlo experiments, students are required to replicate the OLS estimation of (1) M times using given true values of α, β, and X. In order to that, for each replicate they are required draw n, where n is the sample size, random values for ui from a given distribution. The assumed distribution is usually a normal distribution with mean 0 and variance σ2. Then, Yi is calculated according to (1). These Yi values along with given values of Xi are used to obtain one set of estimates of α and β. Repeating this M times yields M estimates of α and β. We denote these estimates by αˆ and βˆ , j=1,2, … M. One can then j

2

j

calculate arithmetic means of αˆ j and βˆ j , j=1,2, … M, and these arithmetic means can be interpreted as the expected values of αˆ and βˆ , denoted by E(αˆ ) and E ( βˆ ) . Instructors usually ask students to vary M starting from a small value to a sufficiently large number. Examining the mean values of αˆ and βˆ is expected to yield intuitive understanding of the properties of these estimators. Unfortunately, Monte Carlo simulation exercises have some disadvantages. Many students cannot easily perform Monte Carlo simulations themselves. These simulations require some statistical program knowledge. Undergraduate student usually find learning command driven econometrics software a discoursing experience. Students find themselves lost in the jungle of command syntax and concentrate on mechanical details of implementation rather than the econometric concepts the task relates. Furthermore, Monte Carlo simulation is a black box to many students. They usually misunderstand what Monte Carlo simulation means and how it is related to the concept they are attempting to understand. Therefore, rather than being helpful Monte Carlo simulations may complicate the matter further by bringing methodological and technical complications. On the other hand, dynamic graphical methods hide the methodological difficulties of Monte Carlo simulation. Students do not need to know how the Monte Carlo simulation is performed. Students click on buttons or drag objects while the required Monte Carlo simulation is performed behind. This removes the complication of Monte Carlo implementation and allows students to concentrate on the topic rather than details of the method. An example program that performs Monte Carlo simulation is given in Listing 1. Although, this program is written in Lisp-Stat–information about Lisp-Stat is given in the next section–and the syntax of Lisp-Stat language looks somewhat complicated, this is only apparent than real and the syntax of other statistical programming languages is not more appealing. Unfortunately, many undergraduate students do not have a good command of econometric or statistical software. Writing programs similar to the program given in Listing 1 discourages students. In order to reveal the advantages and disadvantages of Monte Carlo and dynamic graphical methods we performed a Monte Carlo experiment using Lisp-Stat. An example of the Lisp-Stat code that performs these Monte Carlo simulations is given in Listing 1 for M=100. Running this program only after changing the value of M produces the results in Table 1. In the Monte Carlo simulations we set Xi={1,2,2,2,2,3,3}, α=5, and β=2. Results of the Monte Carlo simulations for M=5, 10, 20, 50, 100, 500, 1000, 5000, 20000, and 50000 are given in Table 1. For each M in column 1 we report arithmetic mean of estimates of βˆ in column 2, standard deviation of estimates of βˆ in column 3, arithmetic mean of estimates of αˆ in column 4, and standard deviation of estimates of αˆ in column 5. Wee see that mean estimates of αˆ and βˆ converge to their true values as the simulation size M gets larger. This helps to understand both unbiasedness and consistency of the OLS estimators. The estimates of standard deviations also serve to show that estimates gets more precise (efficient) after M=20. However, we see sampling fluctuations in the mean estimates. For instance, for M=500 the mean estimate of βˆ is 1.9903 which is very close to the true value 2. However, when M=1000 the mean estimate of βˆ becomes of 1.9771 and thus gets farther away from the true value 2. Normally, students will expect the mean estimate of βˆ to get closer the true value 2 when M is increased to 1000. However, in the simulation opposite of this expected result happens. This will confuse many students. We can overcome this inconvenience by providing an interactive environment in which students can do many simulations similar to this one and see that this case is just an exception and does not happen often. A dynamic graphical environment in which students instantly can vary the sample and simulation sizes and simultaneously watch the change in the mean and distribution of the estimators is proven to be much more appealing to the students. Several examples of such implementations are given in Section 4, but we will demonstrate how the information in Table 1 can be more effectively displayed before examining these examples. In Figure 1, we display the histogram of the estimates of βˆ for simulation sizes M=100 and M=50000. The mean estimates are also displayed with red vertical lines. These graphics convey all information given in Table 1, but they also contain information that cannot be obtained from Table 1. That the OLS estimates are unbiased, consistent, and efficient is easily reveled by comparing the left and right panels of Figure 1. Furthermore the comparison of the left and right panels of Figure 1 also reveals the central limit theorem. How the distribution of the estimates converges to a normal distribution as the number of simulations (sample size) increases is very clearly demonstrated by these figures. Such information can only be demonstrated graphically.

3

Listing 1: An Example Lisp-Stat Program for Monte Carlo Simulations (def M 100) (def alpha (repeat 0 M)) (def beta (repeat 0 M)) (def x (list 1 2 2 2 2 3 3)) (def var 1) (dotimes (i M) (def y (+ 5 (* 2 x) (* (normal-rand (length x)) (^ var 0.5)))) (def reg (regression-model x y :print nil)) (setf (select alpha i) (select (send reg :coef-estimates) 0)) (setf (select beta i) (select (send reg :coef-estimates) 1)) ) (mean alpha) (mean beta) (histogram alpha) (histogram beta)

Table 1: Monte Carlo Simulation of an OLS Regression Slope ( βˆ ) Mean*[ E ( βˆ ) ] 2.3050 1.9810 2.0500 1.9470 2.0709 2.1065 1.9903 1.9771 1.9960 1.9981 2.0096

M 5 10 20 30 50 100 500 1000 5000 20000 50000 *

Intercept ( αˆ ) Standard Deviation 0.3115 0.5778 0.7251 0.7051 0.5543 0.6097 0.5984 0.6003 0.5969 0.5923 0.5939

Standard Deviation 0.4469 1.2531 1.6538 1.5037 1.1684 1.3437 1.3228 1.3096 1.3293 1.3180 1.3269

Mean [ E(αˆ ) ] 4.1700 5.0270 4.8710 5.1020 4.8900 4.8190 5.0446 5.0560 5.0121 5.0026 4.9800

True values of α and β used in these Monte Carlo simulations are 5 and 2, respectively.

Figure 1: Graphical Display of Monte Carlo Simulations ^ for M=50000 Histogram of b

1000

Frequency

500

20 10 0

0

5

Frequency

30

1500

^ for M=100 Histogram of b

0.0

0.5

1.0

1.5

2.0

2.5

3.0

0

3.5

1

2

3

4

5

^ b

^ b

The idea of using dynamic graphics to illustrate econometric concepts is based on observing the information contents of Table 1 and Figure 1. The idea is to provide an interactive environment that can display the graphs in Figure 1 dynamically. The environment also provides controls that allow instant

4

alteration of the sample and simulation sizes. As the student change any of these control variables the change in the properties of estimates simultaneously displayed in the graph. This is what we call a dynamic graphical module. The next section will examine some examples of these dynamic graphical modules. 4. Examples of Dynamic Graphical Teaching Modules In this section, we examine a number of instructional modules designed to illustrate basic econometric concepts. Most of these modules are developed by Marasinghe, Meeker, Cook and Shin (1996) and Marasinghe, Shin, and Duckworth (2000). These modules can be obtained from the WWW site http://www.public.iastate.edu/stat/. The author is working on several other modules that are special to econometrics. These modules will be made available to public when they are completed. The modules we will examine in the following pages provide important insights about the basic concepts that introductory econometric courses usually cover as a prerequisite. The modules lead to more meaningful experiences. The modules are based on the software components that are a combination of state-of-the-art computing hardware, statistical programming languages, high resolution color graphics, simulation, and a highly interactive user interface. The modules are not specific to specific courses and instructors can modify them to suit their needs. We would like to comment on the statistical programming language used to develop these modules. In order to develop such modules one needs a language that lends itself very well to object oriented programming, which is important for interactive dynamic graphics. Lisp-Stat is such a statistical programming language based on the computer programming language Lisp. Lisp-Stat was developed at the University of Minnesota by Luke Tierney in the late 1980s. The primary reference on Lisp-Stat is Tierney (1990). Lisp-Stat is freely available for noncommercial use. Source code and binaries for several architectures, including several variants of Unix and Windows operating systems, are available from the WWW site http://www.stat.umn.edu/~luke/xls/xlsinfo/xlsinfo.html. The first module we will examine illustrates the concept of confidence intervals. The discussion of this module and other modules closely follow Marasinghe, et al. (1996, 2000). The module displayed in Figure 2 shows the uncertainty associated with confidence intervals. The module illustrates this by dynamically resampling and displaying computed confidence intervals. The module presents four confidence intervals computed from four different samples in a two by two grid. The top row shows confidence intervals for sample size of 10 and the bottom row shows confidence intervals for sample size of 40. The 95% confidence intervals (C.L.) are displayed in the left column, while the 99% confidence intervals are displayed in the right column. The true population mean in each case is represented by a long fixed vertical black line. Figure 2: Confidence Interval Simulation and Display Module

5

The sample screen shot in Figure 2 shows a simulation with samples from a normal population, with mean 100 and standard deviation 5 and computing 100(1-α)% confidence intervals for the population mean for n=10 and n=40. In this simulation, the simulation size is 20. That is 20 samples have been taken with sample sizes n=10 and n=40. We see that the coverage rate for n=10 and C.L.=0.95 is 0.950, for n=10 and C.L.=0.99 is 1.0, for n=40, both C.L.=0.95 and C.L.=0.99 are 0.950. The confidence intervals for n=10 cover the true mean in both cases. One of the most important concepts in econometrics as well in statistics is the central limit theorem (CLT). In regard to practical applications the CLT states that the average of a sample of observations drawn from some population is approximately distributed as a normal distribution when certain conditions are satisfied. In econometrics, the CLT is used in several contexts. In introductory econometrics courses, the most important place where the CLT called for is the regression analysis. We have to resort to CLT in order to assume the normality of the OLS estimates. It is very important that students grasp the idea of the CLT. There are several other places in theoretical econometrics where some version of the CLT is utilized depending on how the conditions required for the CLT are specified. The module shown in Figure 3 is intended to illustrate the CLT. The main window on the left side contains controls to choose a distribution, choose sample sizes, and run the sampling simulations. The window on the right displays the histogram of the simulated data according to the values specified by the controls on the left window. The window on the right also contains slide-bar controls for adjusting the number of bins in the histogram for adjusting the smoothness of the kernel density estimate shown by a blue line. In the display shown in Figure 3, a 32 sample is drawn from a t-distribution with 6 degrees of freedom. A static graph of the density (mass) function of t-distribution with 6 degrees of freedom appears on the main window. When the user clicks on the new sample button the simulated sample is generated and displayed as a dot plot under the density curve. The sample mean of the simulated data is shown with a blue x in the main window. This mean is added to the sample distribution of accumulated sample means. These accumulated sample means are displayed as a dynamically updated histogram on the second window shown in the left panel of Figure 3. Figure 3: Exploration of the Central Limit Theorem

In introductory econometrics courses, the most basic model used to analyze how the variables are related is the regression model. The simplest textbook regression model that can be used to describe the relationship between two variables (y, x) is the simple linear least squares regression model. The linear least squares regression model is also known as the ordinary least squares regression. Sometimes it is also called straight-line model. The module shown in Figure 4 helps students to develop a good understanding of what a least squares regression represents. The module allows dragging any observation with mouse. As the observation moved the least squares fit is dynamically calculated ant the line representing this fit is updated. In the left panel of Figure 4, a simple least regression fit is displayed. In the right panel, the new fit after the observation at the very north-east is moved down is shown. This type of interactive experience helps students to better understand the least squares regression. The module is also useful to demonstrate the sensitivity of the least squares regression to outliers.

6

Figure 4: Examining the OLS Regression Line by Moving Points

The module in Figure 4 can be improved in order to allow more involvement of the students. One possible method of doing this is to fit a straight-line model by the eye-ball method. In this method, the regression is represented by the best fitting line through the data on a scatter plot of yi against xi. However, the conventional and better method of doing this is that of least squares, which finds the line by minimizing the sum of squared distances between observed points and the fitted line. The module in Figure 5 allows students to compare both methods. The student can change the slope and intercept of a line (shown in red color) using slide-bars and at the same time watch how this affects the way the line is drawn. The line is dynamically redrawn and the values of the coefficients are shown in the box on the top of the plot in the left side of Figure 5. The user attempts to fit the best regression line by eye-balling and can compare the line with the regression line obtained by the least squares method (shown in purple color). When the Residuals button is pushed an additional window (shown in the left panel of Figure 5) containing a plot of residuals against the explanatory variable x is displayed. The displayed residuals correspond to the current regression line on the left panel. The residuals plotted in this figure are also dynamically updated as plotted line changes. This residual plot can be used to explore various patterns exhibited by the different sample datasets. Figure 5: Comparing Eye-Ball and the Least Squares Regressions

For diagnostic purposes various statistics that correspond to each case may be calculated. Some of these statistics are useful for examining the adequacy of the model. Others can be used to test validity of the assumptions behind the model. There are also statistics that are useful for detecting outliers. The module shown in Figure 6 illustrates some of these statistics that are popular in regression analysis. In the upper left panel, the original least squares regression fit (shown in green) as well as the regression line after moving a point (shown in cyan) is displayed. In the example in Figure 6, point 19 is moved

7

upwards. Three additional figures are display for diagnostic purposes. The upper left panel displays the studentized residuals. This plot is useful to detect departure of the residuals from the assumed spherical behavior. The lower left panel displays the leverages. The leverages depend only on the explanatory variables and cases away from the center of the data have large leverages than those that are near the middle. The lower right panel displays Cook’s distance which is useful to detect influential observations. The points with a large Cook’s distance are deemed to be important outlier observations. These diagnostic plots are dynamically updated as the points in the upper left panel are moved. Figure 6: Examining Assumptions of the OLS Regression

One of the difficult topics in introductory econometrics is the multiple regression. Although, students develop some intuition about the simple regression model with the help of graphics and other illustrations, the multiple regression model is much difficult for students. In a simple linear regression model, the relationship between dependent variable y and independent variable x can easily be illustrated by a scatter plot of y versus x. The scatter plot gives a visual impression of the influence of independent variable x on dependent variable y. In a multiple regression model, the scatter plot does not have a comparable analogue. This is because the scatter plot of dependent variable y against any one of the dependent variables x does not account for the effect of other independent variables. Therefore scatter plot is not useful to show the contribution of any particular independent variable. To overcome this difficulty one can use the so called added-variable plot. The added-variable plot visually displays the influence of a particular independent variable after the effects of other independent variables are removed. A module that allows added variable plots is shown in the left panel of Figure 7. The right panel of this figure displays a scatter plot matrix. Each cell in this scatter plot matrix is a simple two dimensional scatter plot which gives an impression of the degree of relationship between the variables in the corresponding column and row. Suppose that we are interested in the relationship between variables y and one of the independent variables denoted by xk. First we adjust both y and xk for the effect of other independent variables. This

8

is achieved by running a least squares regression of y and xk against other independent variables. The added variable plot is then obtained by plotting the residuals from these two regressions against each other. This plot displays the relationship between of y and xk net of other variables. The module in Figure 7 allows a 2 or 3 variable model. In each case a 3-dimensional added variable plot is displayed. When this module is started-up, the main window (left frame in Figure 7) shows a (y,x1,x2) spin-plot with buttons labeled Pitch, Roll, and Yaw. These buttons are used to rotate the 3-dimensional point cloud around any one of three fixed axes. The module is useful to discover relationships in a multiple regression models. There are also additional controls in the module, such as highlighting individual observations, which allow further explorations. Figure 7: Illustrating Multiple Regression in Three Dimension

5. Conclusion Teaching undergraduate econometric courses is one of the challenges econometrics instructors face. Unless some basic econometric and statistical concepts are not clearly understood by students, teaching more complicated topics usually turns out to be a frustrating experience for both students and instructors. In this paper, we presented dynamic graphical methods that are useful for teaching undergraduate courses. The modules are very effective visual methods which are made possible by the recent developments in the computer hardware and visual programming. The paper shows several examples of these modules which are very useful for supplementing undergraduate econometric course material. The modules illustrate concepts, such as the confidence intervals, the central limit theorem, sampling distribution, least squares regression, etc., using dynamic graphical methods. In accord with the coverage of introductory econometrics courses particular emphasis is given to least squares regression. The paper also compares Monte Carlo simulation method to dynamic graphical method and shows its disadvantages. Some of the modules are also useful for teaching advanced regression topics. References Marasinghe, M.G., W. Q. Meeker, D. Cook, and T. Shin. (1996). “Using Graphics and Simulation to Teach Statistical Concepts,” Department of Statistics, Department of Statistics, Iowa State University. Marasinghe, M.G., T. Shin, and W. M. Duckworth. (2000). “Tools for Teaching Regression Concepts Using Dynamic Graphics,” Department of Statistics, Department of Statistics, Iowa State University. Tierney, L. (1990). Lisp–Stat: An Object oriented Environment for Statistical Computing and Dynamic Graphics, New York: John Wiley & Sons.

9