Combining variables, creating and validating indices

109 downloads 5362 Views 2MB Size Report
regression, but the categories are not true dichotomy. • All transformations can be done in two ways. – Using the SPSS graphical user interface. – Writing code.
Combining variables, creating and validating indices

Workshops on writing research papers with TIMSS data Pretoria, South Africa May 27th – 31st

Table of Contents • Simple recoding of variables • Changing categories – reverse and simple change • Collapsing categories • Convert continuous into categorical variable

• Aggregation of variables • Creating indices

Simple recoding of variables • Changing categories – When the order is reversed – When dichotomous variables need to be included in regression, but the categories are not true dichotomy

• All transformations can be done in two ways – Using the SPSS graphical user interface – Writing code

• The data file we will use is the Student Background C:\Workshop\Data\bsgzafm5.sav

Table of Contents • Simple recoding of variables • Changing categories – reverse and simple change • Collapsing categories • Convert continuous into categorical variable

• Aggregation of variables • Creating indices

Recode Variable – Variable Information [BSDGEDUP] Derived Variable: BSDGEDUP – Parents‘ Highest Education Level T11_UG_Supplement3 (p. 7)

Recode Variable – Variable Information [BSDGEDUP] Variable Codes:

-> STEPS: – I. Recode ‚6‘ (not applicable) to SYSMIS – II. Reverse the scale and coding the lowest value to zero (we expect association between higher education with higher reading scores): 5->1/ 4->2/ 3->3/ 2->4/ 1->5 – III. Recode all other values to SYSMIS

Recode ASDHEDUP – I. Define input/output variable Transform -> Recode into different variables… 2. Name Output Variable [BSDGEDUPr] BSDGEDUPr 3. Label Output Variable (optional)

4. Press ‚Change‘

5. Press ‚Old and New Values…‘

1. Select variable [ASDHEDUP]

Recode BSDGEDUP – II. All missing to SYSMIS Recode all missing values (user defined or system) into system missing

Confirm each recoding step with ‚Add‘

Recode BSDGEDUP – III. Reverse coding of categories

Reverse the scale

Recode BSDGEDUP – Confirm

Confirm with ‘OK’

RECODE ASDHEDUP (6=SYSMIS) (5=0) (4=1) (3=2) (2=3) (1=4) (ELSE=SYSMIS) INTO ASDHEDUPr. VARIABLE LABELS ASDHEDUPr ‘PARENTS HIGHEST EDUCATION LEVEL’.

Recode BSDGEDUP – Define Value Labels (optional) Tab: Variable View Click to edit

Enter values [1…4] Enter corresponding labels Confirm each entry with ‘Add’

Recode ASDHEDUP – do all at once RECODE BSDGEDUP (1=5) (2=4) (3=3) (4=2) (5=1) (6=SYSMISS) (MISSING=SYSMISS) INTO BSDGEDUPr. VARIABLE LABELS BSDGEDUPr "PARENTS HIGHEST EDUCATIONAL LEVEL - REVERSED" VALUE LABELS BSDGEDUPr "1" "SOME PRIMARY, LOWER SECONDARY OR NO SCHOOL" "2" "LOWER SECONDARY" "3" "UPPER SECONDARY" "4" "POST-SECONDARY BUT NOT UNIVERSITY" "5" "UNIVERSITY OR HIGHER". EXECUTE.

Recode ITSEX • Recode the variable for regression – Original values: 1 – girl; 2 – boy – Recoded values: 0 – girl; 1 – boy

• The original values used in regression will give us incorrect estimate for the intercept • Syntax: RECODE ITSEX (1=0) (2=1) (MISSING=SYSMISS) INTO ITSEXr. VARIABLE LABELS ITSEX "SEX OF STUDENTS - RECODED". VALUE LABELS ITSEXr "0" "GIRL" "1" "BOY". EXECUTE.

Table of Contents • Simple recoding of variables • Changing categories – reverse and simple change • Collapsing categories • Convert continuous into categorical variable

• Aggregation of variables • Creating indices

Collapsing categories • When we need less categories than the variable provides • Books at home 0-10 BOOKS 11-25 BOOKS 26-100 BOOKS 101-200 BOOKS MORE THAN 200

UP TO 100 BOOKS MORE THAN 100 BOOKS

Collapsing categories • Syntax: RECODE BSBG04 (1 THRU 3 = 0) (4 THRU 5 = 1) INTO BSBG04r. VARIABLE LABELS BSBG04r "GEN\AMOUNT OF BOOKS IN YOUR HOME - COLLAPSED". VALUE LABELS BSBG04r "0" "UP TO 100 BOOKS" "1" "MORE THAN 100 BOOKS". EXECUTE.

Table of Contents • Simple recoding of variables • Changing categories – reverse and simple change • Collapsing categories • Convert continuous into categorical variable

• Aggregation of variables • Creating indices

Convert continuous into categorical variable • When we need the values over of the continuum of a variable to be represented as categories • Students Confident with Mathematics [BSBGSLC] – Derived variable (IRT) – Continuous scale: • Mean: 10.0 • SD: 2

• Recode into categorical variable: – Lowest through 5 = 1 – 5.1 through 10.0 = 2 – 10.1 through highest = 3

Convert continuous into categorical variable • Syntax: RECODE BSBGSCM (LOWEST THRU 5 = 1) (5.1 THRU 10 = 2) (10.1 THRU HIGHEST = 3) (MISSING=SYSMISS) INTO BSBGSCMr. VARIABLE LABELS BSBGSCMr "STUDENT CONFIDENCE WITH MATHEMATICS/SCL - CATEGORIZED". VALUE LABELS BSBGSCMr "1" "LOW" "2" "MEDIUM" "3" "HIGH". EXECUTE.

Table of Contents • Simple recoding of variables • Changing categories – reverse and simple change • Collapsing categories • Convert continuous into categorical variable

• Aggregation of variables • Creating indices

Aggregation of variables • When we need to obtain a measure for the sampled students in the schools based on their common characteristics • This could give us a measure of the student intake in the schools • For example, aggregated SES measures can give us idea what is the average SES across schools and use it to control for the student intake in linear regression

Aggregation of variables • Home Educational Resources Scale (HER)

– Derived variable (IRT) – Continuous scale • Mean: 10.0 • SD: 2

Aggregation of variables

1. Select Break Variable [IDSCHOOL] (NOTE: If you have more than one country, you will need to add IDCNTRY as first break variable 2. Select variables to aggregate [BSBGHER]

Default function is ‘mean’

3. Name the resulting variable [BSBGHERm]

4. Add the data as a new variable

Aggregation of variables • Syntax: SORT CASES BY IDSCHOOL. AGGREGATE /OUTFILE=* MODE=ADDVARIABLES /PRESORTED /BREAK=IDSCHOOL /BSBGHERm=MEAN(BSBGHER).

Table of Contents • Simple recoding of variables • Changing categories – reverse and simple change • Collapsing categories • Convert continuous into categorical variable

• Aggregation of variables • Creating indices

What is an Index? • A variable, a manifestation, indicator, etc. • Something that tells us something about something • Could be a single variable or combination of several variables

Combining Variables • Usually we talk about an index referring to a combination of two or more variables into one – – – –

Economic Index, Stock exchange index, etc. A ratio of two variables (student to teacher ratio) A score on a test Any other variable that results from the combination of 2 or more other variables

Combining Variables • One classification – Simple indices: arithmetic transformation or recoding of one or more items – Scale indices: constructed through the scaling of items (using IRT, factor scores, sum of individual scores, etc.)

• Many considerations and decisions go into computing an index

Steps in Creating Indices • Select the variables based on a theoretical background • Review variability of each of the variables • Check relationship with achievement (if the goal is to study achievement) • Conduct an exploratory or confirmatory factor analysis • Perform necessary recoding • Check the reliability • Compute the index • Check your work

Steps in Creating Indices • Are there variables that need to be recoded or reversed? • What’s the incidence of missing data? – – – –

How will you deal with missing data? How much is tolerable? Listwise or casewise deletion? Relative contribution of each item?

• Are the components measuring something similar? • How will the variables be grouped?

Computing Indices • How will you compute the index? – Sum or average score – Factor score – IRT score • Deciding on model • Deciding on estimation method

– Selecting parameters

• Advantages and disadvantages? – Computing indices over time – Comparability across countries – Relative contribution from countries to the parameter estimation

Setting the Scale • • • • •

How many units? Where does it begin and where does it end? Where do you set the mean? Do you set the dispersion? Same or different metric across countries?

Principal Component Analysis (PCA) • A method for data reduction

V1

V2

Factor 1

V3

V4

V5

V6

Factor 2

„Many correlated variables are replaced by few variables, that are uncorrelated“

V7

Partial correlation reminding correlation x y rxy => 0 factor loading rzx

z

factor loading rzy

„synthetic“ variable => Latent Factor

PCA as a method for generating of hypothesis • The latent factor estimates the joint variance of the variables that are in the model • Whereas it doesn‘t provide any information about the context-related meaning of the latent factor • but: factor loadings (which are partial correlations between the latent factor and the manifest variables) allows for: • …generating of hypothesis about the context-related interpretation of the latent factor

PCA as a method for analysis of the dimensions • Factor analysis allows for testing of the multidimensionality of complex sociological and psychological constructs • The single-dimensionality is one of the criteria of the quality of the constructs – When variables contains only one dimension the latent variable can be used for further analysis – In the case of multidimensionality further model development is needed (e.g. excluding variables, defining more the one factor that should be modeled separately)

Reliability • Cronbachs’ α coefficient – tests the generalizability (reliability) of the score under the condition that all variables of the score are correlated by 1 • correlation equal to 1 means that the construct is very consistent

– Range from 0 to 1 – Value near 1 means that the factor is close to the assumption of 1-correlation high reliability

Example Watch for the direction!

• Index of Student Confidence in Mathematics Variable BSBM16A BSBM16B BSBM16C BSBM16D BSBM16E BSBM16F BSBM16G BSBM16H BSBM16I BSBM16J BSBM16K BSBM16L BSBM16M BSBM16N

Question I usually do well in mathematics Mathematics is more difficult for me than for many of my classmates Mathematics is not one of my strengths I learn things quickly in mathematics Mathematics makes me confused and nervous I am good at working out difficult mathematics problems My teacher thinks I can do well in mathematics with difficult materials My teacher tells me I am good at mathematics Mathematics is harder for me than any other subject I think learning mathematics will help me in my daily life I need mathematics to learn other school subjects I need to do well in mathematics to get into the of my choice I need to do well in mathematics to get the job I want I would like a job that involves using mathematics

List of response categories • Response categories – – – – –

1 – AGREE A LOT 2 – AGREE A LITTLE 3 – DISAGREE A LITTLE 4 – DISAGREE A LOT 9 – OMITTED (missing value)

• Again, watch for the direction!

Recode variables • Positive statements to be reversed – – – – – –

1 -> 4 2 -> 3 3 -> 2 4 -> 1 6 -> system missing 9 (user missing) -> system missing

• Negative statements remain the same, but… – Recode the “not applicable” and user missing values into system missing

Recode variables RECODE BSBM16A BSBM16D BSBM16F BSBM16G BSBM16H BSBM16J BSBM16K BSBM16L BSBM16M BSBM16N(1=4) (2=3) (3=2) (4=1) (MISSING=SYSMISS) INTO BSBM16Ar BSBM16Dr BSBM16Fr BSBM16Gr BSBM16Hr BSBM16Jr BSBM16Kr BSBM16Lr BSBM16Mr BSBM16Nr. RECODE BSBM16B BSBM16C BSBM16E BSBM16I (MISSING=SYSMISS) (ELSE=COPY) INTO BSBM16Br BSBM16Cr BSBM16Er BSBM16Ir. EXECUTE.

Producing the scale using PCA • Principal Component Analysis (PCA) –involves extracting linear composites of observed variables • Data reduction method • Reduces correlated observed variables to a smaller set of important independent composite variables

In the graphical user interface of SPSS… • Load the merged file after recoding the variables • Apply the Total Student Weight (TOTWGT) NOTE: If you do it for more than one country, use the Senate Weight (SENWGT)

In the graphical user interface of SPSS… • Run PCA – Open the dialog box

In the graphical user interface of SPSS… • Run PCA (continued) – Add all recoded variables into the list box

In the graphical user interface of SPSS… • Run PCA (continued) – Extraction settings

In the graphical user interface of SPSS… • Run PCA (continued) – Rotation settings

In the graphical user interface of SPSS… • Run PCA (continued) – Factor scores settings

In the graphical user interface of SPSS… • Run PCA (continued)

In the graphical user interface of SPSS… • Run PCA (continued)

In the graphical user interface of SPSS… • Run PCA (continued)

Output

Output

Output

Output

Revision Variable BSBM16A BSBM16D BSBM16F BSBM16G BSBM16H BSBM16J BSBM16K BSBM16L BSBM16M BSBM16N BSBM16B BSBM16C BSBM16E BSBM16I

Question I usually do well in mathematics I learn things quickly in mathematics I am good at working out difficult mathematics problems My teacher thinks I can do well in mathematics with difficult materials My teacher tells me I am good at mathematics I think learning mathematics will help me in my daily life I need mathematics to learn other school subjects I need to do well in mathematics to get into the of my choice I need to do well in mathematics to get the job I want I would like a job that involves using mathematics Mathematics is more difficult for me than for many of my classmates Mathematics is not one of my strengths Mathematics makes me confused and nervous Mathematics is harder for me than any other subject

• 1 – Students confidence in mathematics • 2 – Students opinion on mathematics utility • 3 – Students experience difficulty in mathematics

Rerun PCA for Factor 1 Variable BSBM16A BSBM16D BSBM16F BSBM16G

Question

I usually do well in mathematics I learn things quickly in mathematics I am good at working out difficult mathematics problems My teacher thinks I can do well in mathematics with difficult materials BSBM16H My teacher tells me I am good at mathematics

Output 2

Output 2

Output 2

Factor score – our scale • FAC1_1 – standardized score – Mean: 0 – SD: 1

Reliability

Reliability

• Add all five variables for the first factor identified by the previous analysis and click on “Statistics”

Reliability

Output

Any questions?

Thank you for your attention!

Suggest Documents