Class.3.0 : A Probabilistic Classi er Using

5 downloads 0 Views 107KB Size Report
Dallas, TX 75275{0122 USA [email protected]. August 1997. Abstract. Class.3.0 is a probabilistic classi er designed for use with decomposable models.
TECHNICAL REPORT 97-CSE-14

Class.3.0 : A Probabilistic Classi er Using Decomposable Models Ted Pedersen

Department of Computer Science and Engineering Southern Methodist University Dallas, TX 75275{0122 USA [email protected]

August 1997

Abstract

Class.3.0 is a probabilistic classi er designed for use with decomposable models. It classi es held out test instances based on parameter estimates made from training data. This document is a tutorial introduction to the classi er. Class.3.0 is written in Perl and is freely available from the author.

1 Introduction This is a general overview of the probabilistic classi er Class.3.0. This classi er is designed for supervised learning experiments. It has been used in a series of published studies (e.g., [5], [6],[7]). All of this work used the class of decomposable probabilistic models as classi ers for word sense disambiguation as proposed in [2]. The class of decomposable models was introduced in [3] and described in detail in [4].

2 Using Class.3.0 Class.3.0 is designed to be used in conjunction with the public{domain program CoCo [1] that performs model selection. Each model selected by CoCo is used by Class.3.0 as a probabilistic classi er. Class.3.0 also allows the user to bypass CoCo and input their own decomposable model to use as a classi er. Both modes of operation will be described in this paper.

STEP 1: Running prepare dat The input data for Class.3.0 may be numeric or character. It must

be contained in an ASCII le such that each column represents the values of a single variable. Each row in this le represents the feature values associated with a single observation of the event to be classi ed. The classi cation variable may be located in any column. A le of training data and a le of test data must be created. Both les must have the correct classi cation of each event included, although this value will only be used for evaluation purposes in the test data. The command prepare dat will convert the data to the format required by CoCo. Variables names will be assigned. The rst column will be A, the second B, and so on up to Z. Note that this implies a limit of 26 variables although experiments with more variables have been performed. However, this is risky given the unpredictable nature of the variable names. Prior to running prepare dat the following must be determined: 1. The name of the training data set, 2. the name of the test data set, 3. the classi cation variable name (using the scheme that column 1 = A, column 2 = B, and so on and, 4. the unique id to be used in naming output les. For instance, in the following example the training data is found in interest.train, the test data is in interest.test, the classi cation variable is I (column 9), and output les will be preceded by the unique identi er SAMPLE. prepare_dat interest.train interest.test I SAMPLE

prepare dat creates the following les:  SAMPLE.FACTORS  SAMPLE.LIST  SAMPLE.MAPPING  SAMPLE.TRN  SAMPLE.TST  SAMPLE.TST.OUT 1

STEP 2: Running CoCo SAMPLE.TRN is the input data le for CoCo. This must be speci ed in

a le that consists of a series of CoCo commands that will instruct CoCo to perform a model search and display the sequence of models selected. An example of such a le, coco.bat, is shown below. This le must specify the name of the particular input and output les to be used by CoCo. The name of the le output by CoCo (containing the models selected) is coco.SAMPLE. In general the output id speci ed with prepare dat is used as part of the le names created later by Class.3.0. set outputfile diary coco.SAMPLE # ^^^^^^^^^^^ # output file must be named coco.uniqid where uniqid is the # name of the output files as specified in prepare_dat set inputfile data SAMPLE.TRN # ^^^^^^^^^^ # this file is generated by prepare_dat read data # # tell CoCo to read in SAMPLE.TRN set decomposable model on # # restrict CoCo to searching for decomposable models (REQUIRED) set adjusted df on # # tells CoCo to adjust/not adjust degrees of freedom set aic # # the evaluation criterion in the model search. # other options exist. consult CoCo manual for full description. read model * only sorted recursive follow backward # # specify the search strategy to CoCo. other options exist. current # # make the last model CoCo selected the current model. print all models # # show the models selected by CoCo.

The input to CoCo is the le SAMPLE.TRN and the output will be stored in coco.SAMPLE. Execute CoCo as follows: 2

CoCo < coco.bat

coco.SAMPLE will contain a listing of all the models found in the model selection process. coco.SAMPLE is also be used by classify all so it is important to observe the naming conventions.

STEP 3: Running classify all classify all will classify the test data contained in interest.test using each of the models found by CoCo. The classi cation variable (I) must be speci ed as must the name of the unique output le identi er previously speci ed (SAMPLE). A new name for the output of classify all must also be indicated. This name should be an alpha string that is not the same as the output le id previously speci ed (SAMPLEALL). The values in parenthesis are example values and would be used with classify all as follows: classify_all I SAMPLE SAMPLEALL

The les created by classify all are:  SAMPLEALL.ACCURACY  SAMPLEALL.PRECISION  SAMPLEALL.RECALL  SAMPLEALL.MIXTURE  SAMPLEALL.RUNALL The les SAMPLEALL.fACCURACY,PRECISION,RECALLg consist of two columns of data. The rst column shows the number of edges in the model and the second shows the faccuracy,precision,recallg obtained using the model as a classi er. The actual model form can be found in SAMPLEALL.RUNALL or coco.SAMPLE. In the *.RUNALL output it can be observed that classify all throws out any marginals that do not contain the classi cation variable. This does not a ect classi cation accuracy in that the values of those marginals have no impact on the classi cation variable. The le SAMPLEALL.MIXTURE shows the results of \mixing" all the probabilities for all the models found and using that mixture as the basis of classi cation. The mixture created is known as the Naive Mix and is described in detail in [5] and [6]. The le SAMPLEALL.RUNALL shows the series of classify dat cond and grep commands that are used to create the faccuracy,precision,recallg results. This le also shows the actual form of the models selected.

Processing a single model If there is no need or desire to select a model with CoCo, then it is possible

to input a decomposable model directly to Class.3.0 and bypass CoCo processing. This is done via the command classify dat cond. In order to use classify dat cond it is necessary to understand how the parameter estimates for a decomposable model are computed. When using classify all this is masked from the user. Suppose we wish to use the decomposable model [ABEHI][ABFHI] as a classi er. Assume that the variables are A, B, C, D, E, F, G, H, I. Given this decomposable model then the parameter estimates are computed via the following product form where the probabilities are based upon marginal frequency counts in the training data: )  P (ABFHI ) (1) P (ABCDEFGHI ) = P (ABEHI P (ABHI ) 3

This closed form product is speci ed to classify dat cond using exponents. The marginal probabilities in the numerator are followed by 1 and those in the denominator are followed by a -1. In addition we must specify the previously chosen output id (SAMPLE) and a new id for the classify dat cond output (SAMPLE1). The command is submitted as follows: classify_dat_cond SAMPLE SAMPLE1 ABEHI 1 ABFH 1 ABH -1

Files created by classify dat cond:  SAMPLE1.EVAL  SAMPLE1.JOINT  SAMPLE1.TAGGED  SAMPLE1.TRN.ABFH  SAMPLE1.TRN.ABEHI  SAMPLE1.TRN.ABH All of these les are also created during classify all processing, however they are deleted before the conclusion of classify all to save space. The results of the classi cation process are shown in SAMPLE1.EVAL. The joint parameter estimates are shown in SAMPLE1.JOINT and the marginal parameter estimates are shown in SAMPLE1.TRN.*. If you are unable or unwilling to determine the product form manually, is possible to use Class.3.0 to determine the product form. This is simply done by tricking classify all into thinking that CoCo has performed a model search. Create a le called coco.SAMPLE using a text editor rather than having CoCo create it. Then specify the decomposable model to use as a classi er. The user MUST have veri ed that these models are decomposable! If they are not decomposable then the parameter estimates will be computed incorrectly and results will be unpredictable. For example, a one line version of coco.SAMPLE can specify the model form above as follows: Model no. 1 [[ABEHI][ABFH]]

Once this le is created, then use CoCo had selected this model.

classify all

as described above and Class.3.0 will process as if

3 Availability Class.3.0 is available free of charge from http://www.seas.smu.edu/~pedersen/

4 Acknowledgments This research was supported by the Oce of Naval Research under grant number N00014-95-1-0776.

4

References [1] J. Badsberg. An Environment for Graphical Models. PhD thesis, Aalborg University, 1995. [2] R. Bruce and J. Wiebe. Word-sense disambiguation using decomposable models. In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, pages 139{146, 1994. [3] J. Darroch, S. Lauritzen, and T. Speed. Markov elds and log-linear interaction models for contingency tables. The Annals of Statistics, 8(3):522{539, 1980. [4] S. Lauritzen. Graphical Models. Oxford University Press, New York, NY, 1996. [5] T. Pedersen. Naive mixes for word sense disambiguation. In Proceedings of the Fourteenth National Conference on Arti cial Intelligence, page 841, Providence, RI, July 1997. [6] T. Pedersen and R. Bruce. A new supervised learning algorithm for word sense disambiguation. In Proceedings of the Fourteenth National Conference on Arti cial Intelligence, pages 604{609, Providence, RI, July 1997. [7] T. Pedersen, R. Bruce, and J. Wiebe. Sequential model selection for word sense disambiguation. In Proceedings of the Fifth Conference on Applied Natural Language Processing, pages 388{395, Washington, DC, April 1997.

5

Suggest Documents