Classification And Regression Trees (CART)

29 downloads 0 Views 189KB Size Report
drivers and the DR DOS kernel into upper memory. In my case I ... DR DOS has a CACHE com- mand which establishes a ... from MS-DOS. Very large parti-.
95

H Monitor

placed in the CONFIG.SYS file. In the case of 386 and 486-based systems, the EMM386.SYS driver must be installed first in the CONFIG.SYS file. This allows application programs to use LIM 4.0 expanded memory and to gain access to upper memory for drivers and application programs. The DR DOS kernel also can be placed in upper or high memory. The HILOAD or HIINSTALL commands are used to place programs into upper memory. The HIDEVICE and HIDOS commands are used to place device drivers and the DR DOS kernel into upper memory. In my case I used the HIDOS and HIDEVICE commands in the CONFIG.SYS file to gain additional space in conventional memory. To track which memory type is being used for what purpose, the MEM command is used. This gives a summary of total and available memory, a graphical display of memory usage and, if desired, memory locations for drivers and programs. This can be very useful for fine tuning the memory management commands. DR DOS has a CACHE command which establishes a buffer save data in memory to frequently read from hard disk. The information in the cache is checked before the application program reads the data from the hard disk. If the required data is in the cache, then it is used rather than accessing the slower disk drive. The CACHE command may be placed in the CONFIGSYS file as a driver or called from the command line. The size of the buffer and its location in conventional, extended or expanded memory may be specified.

There are also other features of DR DOS that distinguish it from MS-DOS. Very large partitions of hard disk drives (up to 512 Mb) are supported without additional drivers. A question mark may be inserted in front of a line in the CONFIG.SYS file. When the computer loads this file, the user will be asked if this particular line should be ignored. This feature allows one to have a variety of drivers loaded or not depending on the application and saves the trouble of having several different CONFIGSYS files. The optional graphical user interface is available and the serial file transfer utility is useful if you need to transfer files between computers. DR DOS has eighteen new or extended commands ranging from CACHE to XDIR, which displays a file list including subdirectory files. Using the HISTORY command creates a buffer for storing commands which can be recalled and edited. Conclusion DR DOS is apparently totally compatible with programs running under MS-DOS. The advanced memory management capabilities of DR DOS alone make it invaluable for my statistical programs, which require very large RAM allocations. These memory management features will be most useful to users of 386 or 486-based computers. The queried optional lines in the CONFIG.SYS files are very helpful in using programs with different driver, buffer and file size requirements. The editor is a great improvement over the normal DOS editor.

If the additional memory or other features would be useful in your particular personal computer applications, then this operating system is recommended. The alternative is to wait for version 5 of MS-DOS which is rumored to have many of these same features. D.R. SCOTT -

Classification And Regression Trees (CART) There are several statistical methods currently available for solving classification problems including the parametric and nonparametric methods; for example, the Fisher’s linear discriminant analysis, logistic regression, and kth nearest neighbor. These methods perform well when the structure of data sets satisfies these methods’ assumptions. But data sets with large numbers of dimensions, mixture of data types or nonstandard data structure, do not satisfy the assumptions these methods make. Breiman et al. proposed a recursive partitioning methodology for classification problems called CART. It imposes minimum assumptions in its application and can be applied to various structures of data sets. The methodology is explained in the book Classification And Regression Tme (1984) by Breiman, Friedman, Olsen and Stone. The CART program can be adapted to

Chemometrics and Intelligent Laboratory Systems I

96

Software:

Classification And Regression Trees (CART)

Publisher:

California Statistical Software, Inc., 961 Yorkshire Court, Lafayette, CA 94649, USA

Price:

us $2900.00 Non-profit institution: US $1900.00 Colleges and Universities: US $600.00

Technical specifications: Computer:

Sun 31260

Operation system:

Sun OS 4.1 UNIX, also available on CMS and VMS

CMS, UNIX and VMS. The program for this review is installed in the UNIX operating system. CART is applied to three classification examples for this review: I. The ratio of current assets to current liabilities and the ratio of current assets to net sales are used to judge the financial status of a company. Of the 46 observations, 25 are ‘nonbankrupt’ companies and 21 are ‘bankrupt’ companies. The observations are multinormally distributed. II. The variables, age of patients, pain after therapy Or or I$, degree of dorsiflexion, objective ulnar deviation, degree of objective supination, pain distal radioulnar Or or n), and improvement of grip strength b or n), are used to determine the condition of a patient’s affected hand after therapy. Among 32 patients, 9 patients are labeled in ‘excellent’ condition, 9 gatients are labeled in ‘good’ condition, and 14 are

labeled in ‘fair’ Qdition. III. Two varieties of rice are to be classified. Twenty observations are collected, 10 are ‘rice G’ and 10 are ‘rice L’. Each observation has 150 variables (peaks). CART is used to obtain the results of the CART methodology and SAS is used to obtain the result of the Fisher’s linear discriminant method. The CART program is operated on UNIX system and SAS is operated on WYLBUR. Example I. This data set satisfies the assumption of the Fisher’s linear discriminant method (the observations are multinormally distributed). We use ‘PROC DISCRIM’ (discriminant analysis) in SAS; univariate split and Gini construction rule in CART. Equal prior probabilities are chosen for both CART and SAS. The misclassification rate of discriminant analysis is 0.152 and of CART it is 0.166. Example II. The data set of

this example consists of a mixture of numerical and categorical variables. We use the same options for SAS and CART as used in example I. For both runs, we use a cross-validation method to obtain the misclassification rate. The Fisher’s discriminant linear method gives an unsatisfactory result; the misclassification rate is 0.4606. The CART has better result; the misclassification rate is 0.2255. Example III. This data set can only use CART to perform the analysis because the Fisher’s linear discriminant method is not applicable (the number of observations is less than the number of variables). We used a 29fold cross-validation estimation method. The misclassification rate is 0.2. From the three examples above, we see that CART can not only be used with data sets having standard structures (e.g. example I), it can also be applied to data sets with unusual structures (e.g. example II and example III). It analyzes various structures of data sets and provides sensible results.

CI-IYON-HWA YEH Statistics Department, Texas A&M University, College Station, TX 77843, USA m

Suggest Documents