Research Sociological Methods - CiteSeerX

7 downloads 258 Views 984KB Size Report
programming language which is at once easy to learn, inexpensive to operate, and appropriate to statistical analysis. APL is also available on a wide range of ...
Sociological Methods & Research http://smr.sagepub.com

APL Applications To Social Science Research: An Alternative to Macrocomputer Dependent Statistical Packages Kimball P. Marshall and Stanley E. Wilson Sociological Methods Research 1978; 6; 469 DOI: 10.1177/004912417800600403 The online version of this article can be found at: http://smr.sagepub.com/cgi/content/abstract/6/4/469

Published by: http://www.sagepublications.com

Additional services and information for Sociological Methods & Research can be found at: Email Alerts: http://smr.sagepub.com/cgi/alerts Subscriptions: http://smr.sagepub.com/subscriptions Reprints: http://www.sagepub.com/journalsReprints.nav Permissions: http://www.sagepub.com/journalsPermissions.nav

Downloaded from http://smr.sagepub.com at PENNSYLVANIA STATE UNIV on April 10, 2008 © 1978 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

Although social science research has been greatly facilitated by the advent ofpreprogrammed "statistical packages,"such as SPSS and SAS, these generally require relatively large scale computer resources which are often unavailable to social science researchers at smaller colleges and universities. However, recent advances in computer technology have increased the availability of mini- and microlevel computers while significantly reducing equipment costs. Complementary advances in interactive programming languages have enhanced the research and instructional potentials of these new systems. In this paper, one such interactive language, APL, is discussed as to potential applications in social science research. This relatively easy to learn language is available on many large scale installations, as well as mini-and microlevel, low cost systems. Specifically, this paper presents generalizable APL programming techniques to guide the social science researcher in the use of this language. Topics covered include: (1) organization of data for input to the computer: (2) selection of subsets of cases for mathematical manipulation; (3) data modification and creation of new variables; and (4) application of analytical statistical procedures. on

APL APPLICATIONS TO

SOCIAL SCIENCE RESEARCH An Alternative to

Macrocomputer Dependent Statistical Packages KIMBALL P. MARSHALL Syracuse University STANLEY E. WILSON University of St. Thomas

ecent

years have seen significant advances in computer applications to social science research. Greater emphasis on large scale data bases and the increased complexity of social science quantitative techniques have led to ever more reliance on AUTHORS’ NOTE: Research was sponsored by the Texas Agricultural Experiment Station as a contribution to both TAEX Research Projects H-2811 and H-1995 and by the Maxwell Policy Center on Aging at Syracuse University. Authors shared equally in responsibility in this paper. Authors’ names appear in alphabetical order. SOCIOLOGICAL METHODS & RESEARCH, Vol. 6 No. 4, May 1978 @ 1978 Sage Publications, Inc.

[469] Downloaded from http://smr.sagepub.com at PENNSYLVANIA STATE UNIV on April 10, 2008 © 1978 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

[470]

preprogrammed &dquo;statistical packages,&dquo; such as SPSS (Nie et al., 1975) and SAS (Barr et al., 1976). Unfortunately, these computational tools generally assume relatively large scale computer hardware and software resources (IBM OS/370, IBM OS/360, CDC 6000, CYBER 70, Univac 1100). Thus, these packages may be unavailable to social scientists at smaller universities which provide only mini- or microlevel computer systems. Furthermore, even when such packages are available, funding resources and equipment limitations may inhibit extensive use of quantitative techniques as well as sophisticated statistical training of advanced students. The present paper reports on an alternative to such packaged programs and macrolevel computer systems. Recent advances in computer technology (notably the technology of integrated circuitry) have increased the availability of mini- and microlevel computers while significantly reducing equipment costs. Complementary advances in interactive programming languages have simultaneously enhanced the research and instructional potentials of these new systems. One such interactive language, APL (an anacronym for A Programming Language), holds particular promise for both research and educationoriented sociologists on small and large campuses alike. Based on concepts of matrix algebra, APL represents a type of algebraic programming language which is at once easy to learn, inexpensive to operate, and appropriate to statistical analysis. APL is also available on a wide range of computer systems from the very large to the very small. Most macrolevel systems permit APL interactive programming and at least one major manufacturer of computer equipment now offers desk-top microlevel models capable of processing from 16,000 to as many as 64,000 data bytes at purchase prices ranging from $9,000 to $20,000. As a universal language, APL is more appropriate than the BASIC interactive languages which may vary with the system, and also offers many of the advantages of conventional statistical packages in that various installations throughout the country have begun to compile tested programs which may be exchanged and shared. In the succeeding sections the reader is introduced to techniques by which APL may be applied to sociological research.

Downloaded from http://smr.sagepub.com at PENNSYLVANIA STATE UNIV on April 10, 2008 © 1978 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

[471]

Since the intent is to inform the reader as to the benefits of APL while suggesting a strategy of data analysis, rather than to instruct explicitly, the discussion will be general in nature. Those readers who are stimulated to investigate APL further are referred to several manuals, including those by Gilman and Rose (1974), Grey (1973), and Katzan (1970, 1971). In the succeeding sections data analysis is conceptualized in terms of four major procedures: (1) organization of data for input to the computer, (2) selection of subsets of cases for mathematical manipulation, (3) data modification and the creation of new variables, and (4) application of analytical statistical procedures. Although the authors realize that actual research requires movement among these procedures, it is hoped that this order of presentation will provide the most lucid demonstration of APL applications to the process of data analysis.

ORGANIZATION OF DATA: THE DATA MATRIX

In organizing data for statistical analysis using APL, it is useful to conceptualize the data set as a two dimensional matrix. In this matrix, which will be referred to by the symbol &dquo;M,&dquo; the rows, designated by the symbol &dquo;i,&dquo; represent the units of analysis (or cases) which were &dquo;observed.&dquo; The columns, designated by the symbol &dquo;j,&dquo; represent the variables which are to be considered. Thus the cell &dquo;M[i;j]&dquo; contains the value of the variable j for case i. APL permits the use of both alphanumeric symbols (symbols to which mathematical connotations are not attached), and numeric symbols (the arabic numerals, with or without decimals, and/or in scientific notation, to which mathematical connotations are attached). When defining the matrix for input to an APL system, the matrix must be specified as alphanumeric or numeric. A numeric matrix may later be respecified as an alphanumeric matrix. Unfortunately, if characters other than numeric symbols are used in an alphanumeric matrix, that matrix cannot later be respecified as a numeric matrix. Therefore, in the interest of research flexibility and since statistical analysis usually involves numeric considerations, it is recommended that only

Downloaded from http://smr.sagepub.com at PENNSYLVANIA STATE UNIV on April 10, 2008 © 1978 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

[472] TABLE 1

A

Sample Data Matrix

Racial codes 1 = White, 2 = Black, 3 = Indian b. SES coded as socioeconomic index scores c Sex codes 1 = Male, 2 = Female d Missing data code 9999 a

numeric characters be used in coding the data, and that the data matrix be specified as numeric when inputed to an APL system. Before introducing a sample data matrix, it is also useful to consider the question of missing data. In order for the data matrix to be input into the computer and analyzed through the use of APL, it is necessary that an element (in the present discussion a numeric element) be included in every cell. The researcher should,therefore, select some numeral which may serve to designate missing data. So as to avoid confusion, it is also recommended that the same missing data symbol be used for all variables. It may be suggested that an unusually large number be specified; as for example &dquo;9999.&dquo;i Such a number will stand out both visually and in the researcher’s mind and so may reduce the possibility of inadvertent inclusion in statistical processing. Later, depending upon the statistical procedures to be carried out, the user may replace missing data with values obtained through some estimation or randomization technique or may delete the case from consideration. Table 1 represents a sample data matrix which is designed for input to an APL system. Once entered, the data matrix will be contained in the computer’s core memory in what is termed an

Downloaded from http://smr.sagepub.com at PENNSYLVANIA STATE UNIV on April 10, 2008 © 1978 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

[473] &dquo;APL &dquo;APL

workspace&dquo; and may also be permanently stored in an library workspace.&dquo; On large scale systems, such library workspaces are stored on magnetic disks. Magnetic tape cartridges are used on micro or desk-top computers. In either case, the data matrix may be entered by typing out the matrix using the APL keyboard of the terminal or the APL keyboard of the desktop computer. Had the data originally been recorded on cards, disks, or tape, the user could make arrangements to have these read by the main computer system for input to an assigned &dquo;APL file&dquo; allowing direct access by the user. If the user directly inputs the data matrix using the APL keyboard, he must specify in APL format: (1) the name of the matrix, (2) the dimensions of the matrix, (3) the form of the matrix (numeric or alphanumeric), and (4) the actual elements. For example, the matrix in Table 1 could be input by the following APL instructions (instructions are in IBM APL typeface): Step Step Step Step Step Step

I

2 3 4

5 6

M - 5

4 p I

[ 1;] ~1 M [2;] - 2 M[3;] - 3 M [4;] - 4 M [5;] - 5

2

27

9999

3

33

2

1

41

2

26

1

37

2

M

9999

3

Step I defines a numeric matrix by the name of&dquo;M&dquo; which is to contain 5 rows and 4 columns. The symbol &dquo;p&dquo; (rho) informs the machine to recognize the 4 and 5 as the row and column dimensions of the matrix. The 1 following the symbol informs the machine to place numeric Iss in all cells of the matrix. Had the symbol following &dquo; p &dquo; been enclosed in single quotations (’I’), the system would have treated the matrix as an alphanumeric. Steps 2 through 6 serve to redefine each row in the matrix according to the elements to the right of the arrow. In this case, the elements correspond to the values of the data matrix presented in Table 1. For example, the symbols &dquo;Af[ 1;]&dquo; indicate the first row of matrix &dquo;M.&dquo; Had the one appeared to the left of the semicolon (M[ 1;]), the first column of the matrix would have been specified. The arrow (-), which may be read as &dquo;is,&dquo; causes the elements listed

Downloaded from http://smr.sagepub.com at PENNSYLVANIA STATE UNIV on April 10, 2008 © 1978 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

[474]

right to replace the &dquo;ls&dquo; in the designated row. Accomplishing this procedure for each row completes input of the data on

the

matrix. At this point it may be noted that the ability to designate rows of the data matrix by the format &dquo;M[X:]&dquo; in effect permits the user to specify (or select) cases for special treatment. Similarly, variables may be indexed by specifying the columns of the data matrix in which they are contained. The format for specifying columns would be &dquo;M[;X]&dquo; where &dquo;M&dquo; is the matrix and &dquo;X&dquo; is the number of the desired column. These conventions become particularly important in the next section which presents procedures for selecting subsets of cases for special consideration.

SELECTION OF SUBSAMPLES

For present purposes the term subsample may be taken to refer to a subset of cases contained in the data matrix which are selected on the basis of some criteria defined by the researcher. The researcher may wish to temporarily generate such subsamples in order to transform (or recode) values of variables in some desired manner, or so as to separate out cases appropriate for application of some statistical procedure. In the latter regard, for example, the user may wish to eliminate from consideration cases holding missing values for certain variables. Similarly, testing of certain hypotheses may require grouping of cases in some

prescribed

manner.

prescribed criteria may be selected by procedure. In this procedure a conditional statement is presented, and a new matrix is created which contains only those cases meeting the designated criteria. The general format of this procedure is: In APL, cases meeting means of a two-statement

Step Step

I

I -

2

SS

where: X A

M

(M[;X] ~ M[I;]

=

A) /

tp

M[;1]

=

the column number of the criterion

=

the desired value of the criterion

=

the

name

variable,

variable,

of the data matrix.

Downloaded from http://smr.sagepub.com at PENNSYLVANIA STATE UNIV on April 10, 2008 © 1978 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

[475]

In discussing the procedures we move from right to left since this is the order in which APL executes commands. The combination of the rho symbol, &dquo;p,&dquo; with the specification for column one of the data matrix causes the system to count the number of cases in the file. (As the matrix is rectangular with entries in every cell, the researcher could specify any column of the data matrix and obtain the same number of cases. One is suggested as every matrix will contain a column 1.) Having calculated the number of cases in the data matrix (M), the Iota symbol, &dquo;i,&dquo; causes the generation of a vector of sequence numbers from 1 to the number of rows, or cases, observed in the data matrix. The compression symbol, &dquo;/,&dquo; in conjunction with the conditional statement &dquo;(M[;X] A)&dquo; eliminates the sequence numbers of those cases which do not meet the designated criteria. Through the symbols &dquo;/ ), less than (), and less than or equal to (~). Furthermore, through the use of symbols for &dquo;and&dquo; (A) and &dquo;or&dquo; (V), and through the use of parentheses, complex conditional statements involving more than one criterion may be constructed. In the example below, data matrix &dquo;M&dquo; (Table 1) is referenced and cases would be selected if sex was male (1) and SES was less than 30, or if race was black (2).

Downloaded from http://smr.sagepub.com at PENNSYLVANIA STATE UNIV on April 10, 2008 © 1978 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

[476]

Step1 /-(((~[;4] 1) Â (M[;3] < 30))V(M;2] 2)) / vpM[;1] Step 2 SS-M[I;] =

=

Following such selection routines, values of variables in the subsample matrix (here named SS) may be transformed by means of data modification statements, new variables may be computed as functions of old variables, and descriptive and analytical statistical routines may be applied. The remaining sections introduce several APL commands and procedures for accomplishing such manipulations. Techniques for entering transformed values into the original data matrix and for adding newly created variables

are

also discussed.

DATA MODIFICATION STATEMENTS

In the process of preparing data for statistical analyses and in response to such analyses, the researcher may wish to transform values of existing variables and to create new variables (as for example when building a scale). Two of the most common types of transformations are what may be called mathematical func-

tions and recode procedures. SPSS programming uses the term recode in the same manner as is intended here. Although in both cases new values are defined which replace old values of a varible, in mathematical functions values are modified in accord with a generalized formula. In recode statements each transformation must be specified. That is to say, in recode statements the mapping of each old value to each new value must be specified. Mathematical function transformations are among the easiest procedures to specify in APL. Such functions take the general form &dquo;M[;X] ~- F M[;X].&dquo; In this format, &dquo;M &dquo; specifies the matrix and &dquo;X&dquo; specifies the column holding the values of the variable to be transformed. The symbol &dquo;F&dquo; serves to represent a mathematical procedure desired by the researcher. Table 2 provides a list of the most common mathematical operators which the researcher may use in conjunction with constants or variables so as to specify the desired mathematical function.

Downloaded from http://smr.sagepub.com at PENNSYLVANIA STATE UNIV on April 10, 2008 © 1978 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

[477] TABLE 2

A Partial List of APL Mathematical

a

b

X and Y represent variables Follomng the APL direction of execution, expressions

are

Operations

read

right

to left

The researcher may wish to apply the procedure to only a subof the cases in the data matrix. This may be accomplished by combining data selection statements, discussed previously, with the desired mathematical function. In the example below, which references the data matrix M (Table 1), females are selected and their SES scores are multiplied by 3. These new SES values then replace the old SES values in the data matrix. set

Downloaded from http://smr.sagepub.com at PENNSYLVANIA STATE UNIV on April 10, 2008 © 1978 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

[478] In step I the female cases are selected and their sequence numbers (I) are noted. In step 2, the SES values for these female cases are indexed and the values for SES (column 3) are multiplied by 3. These new SES values then replace the old SES values in the

data matrix M. Recode data modification procedures must be used when the researcher cannot define a mathematical function to cover all values of the variable to be modified. In such a case the researcher must specify both the old values to be modified and the new replacement values. In APL such a transformation may be accomplished through data selection statements used with the APL operator for assigning values. This procedure is illustrated below.

In the above procedure the variable, whose values are contained in column &dquo;X&dquo; of data matrix &dquo;M&dquo; (designated by the set of symbols &dquo;M[;X]&dquo;), is recoded. In steps I and 2 those cases with values of X greater than A and less than or equal to B are selected and assigned to the new subset matrix &dquo;ST&dquo; (these procedures have been discussed above). In step 3, the new value &dquo;T&dquo; (a numeric value defined by the researcher) replaces the old values of the variable for the selected cases. In steps 4 and 5, cases with values of ’variable X which are less than or equal to A and greater than B are selected and assigned to a new subset matrix &dquo;SU.&dquo; Step 6 replaces the old values of variable &dquo;X&dquo; with the new value &dquo;U.&dquo; In steps 7 and 8, the new

Downloaded from http://smr.sagepub.com at PENNSYLVANIA STATE UNIV on April 10, 2008 © 1978 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

[479] X values of those cases selected in steps 1 and 4 replace the old X values in data matrix M. It may be noted that the new X values of cases selected are not entered into the data matrix until all desired recodes are completed. This avoids confounding old values with recoded values.Z This basic procedure may be modified according to the needs of the researcher. Single values, and discontinuous sets of values, to be given new codes may be designated by appropriate modifications of such conditional statements as exemplified by steps 1 and 4. The procedure may be expanded to include as many recodes as desired by the researcher. Values which do not meet the criteria of any of the conditional statements will remain

unchanged. In addition to transformations involving mathematical functions and recode procedures, data modifications may also involve the creation of new variables. Depending upon the purposes of the researcher, new variables may be created in at least two ways: (1) by defining as a new variable the outcome of a mathematical function involving one or more old variables, and (2) by defining a new variable and assigning values. In either case, these procedures will be simplified if the data matrix is expanded by adding a new column of some arbitrary set of values to the right side of the data matrix. This may be most simply accomplished by the following statement.

M~-M,(pM[;1])pl1 The above statement causes a new column containing only the figure 1 in each cell to be attached to the right side of data matrix M. This column, in effect the new variable, may be indexed by the set of symbols &dquo;M[;N + 1]&dquo;, where N is the number of columns (or old variables) which were in the matrix before the addition of the new column. Having expanded the data matrix to accommodate the new variable, the actual values may be generated by the use of the previously discussed mathematical functions and recode data transformation procedures.

Downloaded from http://smr.sagepub.com at PENNSYLVANIA STATE UNIV on April 10, 2008 © 1978 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

[480]

STATISTICAL PROCEDURES The researcher, having organized the data in accordance with his research design, is now prepared to &dquo;begin&dquo; statistical analysis. Usually, the first step would be to check the validity of the data by generating descriptive statistical summaries. Following this, procedures for testing hypotheses and for drawing inferences with reference to some larger population may be applied. Since APL permits specification of a wide range of mathematical procedures, the individual researcher could conceivably write his statistical routines. However, since the present paper seeks to increase awareness of APL social science research potentials, but not to instruct explicitly, details of statistical programming will not be presented. Attention will be directed instead to available program libraries which provide tested APL statistical functions on which the researcher may draw. Researchers desiring detailed programming information are referred to the manual by Gilman and Rose (1974). Much useful information, including discussions of statistical research and teaching applications, is also contained in the published proceedings of the 1974 and 1975 international APL users’ conferences (Sixth International APL Users’ Conference, 1974; APL 75 Congress, 1975). Although detailed programming procedures will not be discussed here, it is useful to note one APL capability which facilitates the development, storage, and sharing of conventional statistical routines. This capability is referred to as &dquo;function definition&dquo; (Gilman and Rose, 1974). Function definition permits the writing of a program which may be permanently stored under a specific name for future access and use. Two methods are available for sharing statistical routines as defined functions. First, the user may record the function in written format; i.e., in terms of the actual symbols reproduced on paper. A second user may then enter the program into his own APL workspace by typing out the program on his terminal. This method is adequate for sharing of many statistical procedures and several examples are provided in the appendix. A more practical technique for sharing large numbers of statistical procedures among many APL users and installations is to record defined functions on magnetic tapes which may then be shipped

Downloaded from http://smr.sagepub.com at PENNSYLVANIA STATE UNIV on April 10, 2008 © 1978 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

[481]

throughout the country. When received at a system capable of reading such tapes, programs may be entered into &dquo;library workspaces&dquo; which may be accessed by users from their own terminals. The procedure below provides one example of a statistical &dquo;program&dquo; as a defined function (this example has been provided by Microcomputer Machines of Ontario, Canada). V DSTA T X, MEA N, VAR; V, W; N

symbol z), which begins the first line (or &dquo;header&dquo; informs the computer that one is initiating function line), definition. In its second occurrence the del (V) indicates the end of a defined function. The information between the del and the first semicolon defines the name of the function and indicates temporary variable names which the user later defines and on which the function will be applied. In APL terminology such names are called &dquo;arguments.&dquo; The variables which are to correspond to the arguments are the columns of a data matrix which contain the values and cases to be statistically analyzed. It will be noted that in most cases a data matrix which is specified in conjunction with a statistical program will contain only that subset of cases selected by the researcher so as to exclude obserThe &dquo;del&dquo;

Downloaded from http://smr.sagepub.com at PENNSYLVANIA STATE UNIV on April 10, 2008 © 1978 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

[482]

vations with

missing data (unless a decision was made to handle data in some manner other than deletion). missing In the format of the header line only one program name and up to two arguments may be specified. If two names appear between the del (V) and the first semicolon, the first is taken to be the program name and the second is taken to be the argument name. Thus, in the above example, the program name is &dquo;DSTAT&dquo; and the argument is &dquo;X.&dquo; If there were three names specified between the del and the first semicolon, the middle name would refer to the program and the first and third names would refer to the arguments to be defined (i.e., to which the researcher would assign the variables to be assessed). No more than two data sets, or arguments, may be specified in conjunction with a given function. The symbols following the first semicolon in the header line (and separated by succeeding semicolons) define &dquo;new&dquo; variables relevant to the internal functioning of the program, but which cease to exist in memory after the program terminates execution. These are called &dquo;local&dquo; variables because they are located only within the specified program. The second line of the sample program &dquo;DSTAT&dquo; (labeled line 1) causes the number of observations in the data set (or sets) to be counted and printed with the label &dquo;sample size.&dquo; Similarly, the remaining lines of this program cause the calculation and printing of the various descriptive statistics corresponding to the written labels. In order for a &dquo;defined function&dquo; to be executed, it must first be entered into the researcher’s APL workspace. This may be done either by typing the program directly into the terminal, or by accessing the &dquo;library&dquo; workspace maintained by the system. Once the function has been entered into the researcher’s workspace, it may be &dquo;executed&dquo; by simply typing out the name of the function, followed by the column of the data matrix which defines the argument.3 For example, the researcher may have used the previously discussed data selection statements to generate, from the primary data matrix &dquo;M&dquo; (Table 1), a subset of cases containing valid (nonmissing) data on SES. For present purposes, this subset is defined as the new data matrix &dquo;K&dquo; and contains only the observations corresponding to ID numbers 1

Downloaded from http://smr.sagepub.com at PENNSYLVANIA STATE UNIV on April 10, 2008 © 1978 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

[483]

through 4. Since this subset data matrix has the same column organization as the original data matrix, column 3 contains the data on SES. Thus, to obtain descriptive statistics on valid observations for SES, the researcher may type: DSTAT K[;3]. Since the header line of &dquo;DSTAT&dquo; contained only one argument, &dquo;X,&dquo; only one variable is specified. In this example, the elements (or values) in &dquo;’K[;3]&dquo; become the elements of local variable &dquo;X.&dquo; Upon receiving the above command, the computer would respond: SAMPLE SIZE

4

MAXIMUM

411

MINIMUM

26

RANGE

15

MEAN

31.75

VARIANCE

47.58333333

STANDARD DEVIATION

6.898067362

MEAN DEVIATION

MEDIAN

5.25 30

While the above example has focused on a relatively simple of descriptive statistics, APL is sufficiently flexible to permit the programming of more complex routines. Appropriate to matrix algebra techniques, APL may be particularly conducive to certain types of multiple regression approaches to data analysis (Nygreen, 1971; Kerlinger and Pedhazur, 1973). set

Although occasionally presenting challenging problems to specialist, nonparametric routines may also be effectively developed in APL. However, whether concerned with parametric or nonparametric techniques, in the interest of research flexibilty, time, and computational accuracy, the individual researcher is advised to investigate APL defined functions and statistical libraries which are already available. Two of the more comprehensive libraries are those provided by Professor W. K. Smillie of the Department of Computer Science, the University of Alberta (1970), and by Professor J. Prins of State University College in the computer

Downloaded from http://smr.sagepub.com at PENNSYLVANIA STATE UNIV on April 10, 2008 © 1978 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

[484] TABLE 3

A Partial

Listing

of STATPACK2 APL Statistical Functions

SOURCE K W. Smillie (1970) STATPACK2: An APL Statistical Department of Computer Science, University of Alberta.

Package. Alberta,

Downloaded from http://smr.sagepub.com at PENNSYLVANIA STATE UNIV on April 10, 2008 © 1978 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

Canada

[485]

New Paltz, New York (1973). The defined functions developed by Dr. Smillie are available in both written form and on magnetic tape as a statistical program library entitled &dquo;STATPACK2.&dquo; &dquo;STATPACK2&dquo; contains a variety of procedures ranging from basic frequency tables and histograms, to cross tabulations, correlation matrices, and multiple regression techniques. Table 3 provides a partial listing of the available procedures. The &dquo;NPLSP&dquo; statistical program package developed by Dr. Prins (1973) is similar and complementary to &dquo;STATPACK2,&dquo; and Table 4 provides a partial listing of these routines. Both packages are available in written form as well as on magnetic tape. Thus the benefits of these packages may be obtained by users of computer systems interfaced with magnetic tape readers, as well as by users of relatively inexpensive, microlevel, desk-top, APL computers. Despite the value of available program packages, the statistical and mathematical techniques of social science research continually undergo expansion and refinement. Thus, accessibility to predeveloped routines, however carefully prepared, may be insufficient for the needs of the methodologically sophisticated researcher. Through the use of the function definition capability of APL, the researcher may supplement the procedures available in these statistical packages. Furthermore, once having written a new routine or having updated an old one, the researcher may share the program with his colleagues. In order to facilitate the systematic dissemination of such information, the computer center of the State University of New York at Binghamton has organized an APL program and information exchange system. This system, referred to by the anacronym &dquo;APL/ PIE,&dquo; serves as a clearing house for hard copy documentation regarding APL defined functions which may be of general interest. APL users and installations seeking to contribute programs to this service, or to obtain catalogued procedures, may join APL/ PIE by communicating directly with the center in Binghamton. Should communication with such services become a widespread procedure among statistically oriented APL users, the benefits of this interactive programming language to social science researchers may be expected to become similarly enhanced.

Downloaded from http://smr.sagepub.com at PENNSYLVANIA STATE UNIV on April 10, 2008 © 1978 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

[486] TABLE 4

A Partial

Listing of NPLSP APL Statistical Functions

SOURCE J Pnns (1973) APL New Paltz Library of Statistical Programs New Paltz, New York New York Wniversity College

CONCLUSIONS AND IMPLICATIONS

Recent advances in computer technology and the development of the APL interactive programming language promise research and educational benefits to social scientists in both large and small institutional settings. For the researcher with access to large scale computer systems, the interactive nature of APL offers potential savings in terms of both professional

Downloaded from http://smr.sagepub.com at PENNSYLVANIA STATE UNIV on April 10, 2008 © 1978 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

[487]

time and, depending upon billing alogrithms, computer charges. As an interactive language, APL provides essentially immediate results. Furthermore, time losses due to programming errors become negligible as a faulty command produces an immediate message which permits the user to make corrections and to continue analysis without repeating previous commands. Even when the desired analytical techniques require the use of batchoriented programs and statistical packages, researchers may find APL desirable for inspecting and &dquo;cleaning&dquo; data files. Social scientists in institutions which do not provide access to computer facilities required for &dquo;conventional&dquo; statistical program packages may find APL to be a particularly beneficial language since it is available on small scale desk-top machines capable of handling data matrices as large as 64,000 cells. For such researchers, APL may greatly expand their data management capabilities. Researchers in institutions providing APL systems &dquo;on line&dquo; with a central computer will only by the administrative requirements of the institution. Although the present paper has focused on social science research applications, the interactive nature and algebraic format of APL also promise educational benefits (Smillie, 1974). Operating in algebraic and matrix notation, which allows simple description of involved computations, and providing immediate &dquo;feedback,&dquo; APL may become an invaluable teaching and laboratory tool permitting demonstration and &dquo;experimentation&dquo; with mathematical transformations and statistical techniques. Such practices may help concretize abstract theories, may reinforce understanding of statistical procedures, and may stimulate innovative thinking regarding operationalization, social indicators, and research design. In summary, as an interactive programming language, APL may serve social science researchers as a powerful but inexpensive tool for statistical analysis which is applicable to a broad range of research needs and institutional settings. As both statistical formula and APL are basically expressed through algebraic symbol systems, the quantitative knowledge of the social scientist may facilitate his development of APL programming skills. Once developed, such skills allow rapid and accurate communication directly with the computer while freeing the

bellimited

Downloaded from http://smr.sagepub.com at PENNSYLVANIA STATE UNIV on April 10, 2008 © 1978 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

[488]

researcher from the limitations of programmers who lack statistical expertise and from the restraints of cost and equipment availability imposed by dependence on batch-oriented program

packages.

NOTES 1. It should be noted that when stored in APL files, single digit and multidigit numbers occupy the same storage space. Should the researcher be concerned with data storage space on punched cards or on tape or disk as "card image," it may be desirable to limit the number of characters in the missing data designation to the number of digits required for valid data codes. 2. One reviewer has offered a more concise version of the recode procedure. These statements produce the same final results as the recode statements discussed in the text, without the intermediate steps of creating new data matrix subsets.

3. If the function (program) to be executed has two or more arguments, then all must be defined when the function is executed. To execute the function, one designates the data matrix column of the variable coresponding to the left (or first) argument. This is followed by the name of the function. The name of the function is followed by the data matrix column(s) of the variables(s) defining the argument(s) to the right of the function name. For a more complete discussion of function arguments and related considerations the reader is referred to Gilman and Rose (1974).

arguments

REFERENCES Congress (1975) APL 75. New York: Association for Computing Machinery. BARR, A. J., J. H. GOODNIGHT, J. P. SALL, and J. T. HELWIG (1976) A User’s

APL 75

Guide

to

SAS 76.

Raleigh, NC: SAS Institute.

GILMAN, L. and A. J. ROSE (1974) APL: An Interactive Approach. New York: John

Wiley.

Downloaded from http://smr.sagepub.com at PENNSYLVANIA STATE UNIV on April 10, 2008 © 1978 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

[489] GREY, L. D. (1973) A Course

in

APL/360 With Application. Reading,

MA: Addison-

Wesley. KATZAN, H., Jr. (1971) APL User’s Guide. New York: Van Nostrand-Reinhold. (1970) APL Programming and Computer Techniques. New York: Van Nostrand---

Reinhold.

KERLINGER, F. N. and E. J. PEDHAZUR (1973) Multiple Regression in Behavioral Research. New York: Holt, Rinehart & Winston. NIE, N. H., C. H. HALL, J. G. JENKINS, K. STEINBERGER, and D. H. BENT(1975) Statistical Package for the Social Sciences. New York: McGraw Hill. NYGREEN, G. T. (1971) "Interactive path analysis." Amer. Sociologist 6 (February): 37-43. J.

PRINS,

(1973)

APL New Paltz

Library

of Statistical Progams. New Paltz, NY:

University College. Sixth International APL Users’ Conference (1974) Proceedings of the Sixth International APL User’s Conference. Anaheim, CA: Coast Community College District. SMILLIE, K. W. (1976) "Regression analysis: theory and computation," pp. 401-405 in APL 76 Congress, APL 76. New York: Association for Computing Machinery. (1974) "The use of APL in the teaching of probability," pp. 475-483 in Proceedings of the Sixth International APL User’s Conference. Anaheim, CA: Coast Community ---

College District. (1970) STATPACK2: An APL Statistical Package. Alberta, ment of Computer Science, University of Alberta.

---

Canada:

Depart-

APPENDIX The following APL statistical procedures are provided as examples of the analytical potential which APL holds for social science research. A short instructional statement accompanies each listing. Unless otherwise indicated, programs were developed by Stanley Wilson. ONE-WAY FREQUENCY TABLE

The following function produces a one-way frequency table of observed values for a specified variable. The argument &dquo;V&dquo; is a vector or single column data matrix which contains the variable to be considered. The program is written using formating features of APLSV. Users of other versions of APL will have to rewrite lines 7 through 11. Lines I through 6 will run on any version of APL. v TF

V;T;R

[1]]

R+«lr/R) ER/tf lR·-V1V

C 2J

R+V[R]

Downloaded from http://smr.sagepub.com at PENNSYLVANIA STATE UNIV on April 10, 2008 © 1978 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

[490] [31J

M~-((PR),2)p0

E4]]

M[;1J+R+R[!RJ1

[5J]

~[;2j~--t-/F0)/7P),+/(/?