Educational Measurement: Issues and Practice Winter 2013, Vol. 32, No. 4, pp. 16–27
Defining and Measuring College and Career Readiness: A Validation Framework Wayne Camara, ACT This article reviews the intended uses of these college- and career-readiness assessments with the goal of articulating an appropriate validity argument to support such uses. These assessments differ fundamentally from today’s state assessments employed for state accountability. Current assessments are used to determine if students have mastered the knowledge and skills articulated in state standards; content standards, performance levels, and student impact often differ across states. College- and career-readiness assessments will be used to determine if students are prepared to succeed in postsecondary education. Do students have a high probability of academic success in college or career-training programs? As with admissions, placement, and selection tests, the primary interpretations that will be made from test scores concern future performance. Statistical evidence between test scores and performance in postsecondary education will become an important form of evidence. A validation argument should first define the construct (college and career readiness) and then define appropriate criterion measures. This article reviews alternative definitions and measures of college and career readiness and contrasts traditional standard-setting methods with empirically based approaches to support a validation argument. Keywords: accountability, career readiness, college readiness, validity
wo multistate consortia have been formed to develop T assessment systems aligned to the Common Core State Standards (CCSS), which will be used to determine whether students are college- and career-ready (CCR), and several states are independently designing similar assessments. The U.S. Department of Education’s Race to the Top grant competition awarded $362 million to the two consortia1 and required higher education’s involvement in the assessment design and development with the goal that the tests would be used to determine student readiness for entry-level, credit-bearing courses. To support such uses, performance-level descriptors (PLDs), cut scores, and benchmarks will be established for the high school assessments, which correspond to college and career readiness. Colleges and career-training programs are expected to exempt students who meet a CCR benchmark from remediation in mathematics and/or English. At lower grades, the assessments would determine if students are “on track” to be CCR by the end of high school (Smarter Balanced Assessment Consortia, 2012). Their purpose is to serve as an early warning to students who are not on track for creditbearing college coursework in English and mathematics and provide them with an opportunity to correct deficiencies while they are still in high school, thereby decreasing the need for remediation (King, 2011). Recently, the U.S. Department of Education has required states receiving a waiver from some of the central provisions and sanctions of the No Child Left Wayne J. Camara is the Senior Vice President of Research at ACT Inc. in Iowa City, IA. He can be reached at
[email protected].
16
C
Behind law to agree to adopt “CCR” standards, such as the CCSS. Given the intended purposes of the standards and assessments (i.e., CCR and on track for CCR), PLDs, cut scores, and benchmarks should be anchored in both the definition of CCR and the CCSS. Evidence of such alignments is an important component of the validity argument, but not sufficient in and of itself (Kane, 2001). The purpose of this article is to review the intended uses of college- and career-readiness high school assessments and to articulate a validity argument for these new types of assessments. In addition, this article will highlight differences between current state accountability tests and new assessments intended to measure college and career readiness to describe the types of evidence required to support the intended uses and to identify potential challenges the assessments may encounter. In determining whether students are prepared or ready to succeed in college or career-training programs, direct evidence between test scores and performance in postsecondary education may provide the strongest form of evidence. CCR assessments will be used for multiple purposes, but their primary purpose is to determine if students who score above a cut score are ready or prepared to succeed in college or career-training programs. The explicit implication is that there should be a strong statistical relationship between performance on such assessments, particularly high school assessments, and subsequent postsecondary success. These assessments will be used to determine proficiency and readiness in a similar way to placement and certification tests are currently being used. A weak relationship between scores on CCR assessments and subsequent postsecondary success
2013 by the National Council on Measurement in Education
FIGURE 1. Validation framework for CCR assessments. Empirical pathway for validation argument.
would be reason to question the validity of inferences associated with the assessments.2 Figure 1 illustrates the framework of a validity argument for CCR high school assessments. The top portion of the figure illustrates the conceptual relationship that the CCSS capture the prerequisite knowledge, skills, and abilities (KSAs) for entry-level, credit-bearing courses in college and postsecondary career-training programs. These KSAs informed development of the CCSS (see arrows) and the resulting assessment blueprint will derive from the CCSS, and provide evidence to support inferences related to postsecondary outcomes. The bottom portion of Figure 1 illustrates where empirical evidence is critical in providing a validation argument to support the intended interpretations of test scores in CCR high school assessment. Significant empirical relationships have been found between college outcomes and college admissions tests, high school grades, and academic rigor. It is reasonable to expect that student performance on the new CCR assessments will show similar relationships to postsecondary outcomes and other predictors of college and career success (Wiley, Wyatt, & Camara, 2010). Any CCR assessment must first specify the criteria for college and career success in terms of the construct (e.g., course grades, FGPA, placement in credit-bearing courses), performance level (e.g., FGPA 2.0, grade of C or higher), and probability (e.g., 50%, 70%). Ultimately, the assessment benchmark and cut score should be related to the criteria as well as the CCSS. A validation strategy employing multiple methods and a variety of evidence (some of which is empirical, but also in-
College Success: Construct and Criteria Much of the interest and attention on college and career readiness has resulted from the high remediation and low completion rates in postsecondary education. A variety of criteria3 have been proposed as measures of postsecondary success: 1. Persistence and successive completion of courses resulting in a certificate or degree (e.g., persistence to second year). 2. Graduation or completion of a degree or certification program. 3. Time to degree or completion of a certification program (e.g., 6-year graduation for a Bachelor’s degree). 4. Placement into college credit courses. 5. Exemption from remediation courses. 6. Grades (and performance) in specific college courses (e.g., college algebra, freshman composition), or grades in specific subjects during freshman year (e.g., mathematics, English composition). 7. Grade-point average (GPA) in college which can also be described as successful performance across a range of college courses. These criteria are highly related but not mutually exclusive. It is quite possible for students to require remedial
Winter 2013
C
cluding judgments) is then developed to support the implied relationship between these three pillars of the College and Career Readiness as illustrated in Figure 1.
2013 by the National Council on Measurement in Education 17
courses but still graduate and attain good grades, just as it is possible for students to attain high test scores, require no remediation, succeed academically, and still drop out of college. What criterion of college success should states adopt? This is not a trivial question, because the assessment design and validation argument must be tied to the definition of college success. It is also important to distinguish between definitions of CCR which are constrained to academic elements and those definitions which extend beyond academics. Because CCR assessments will only measure mathematics and English Languages Arts (ELA) as described in the CCSS, the relationship to nonacademic criteria or “softer skills and knowledge” may be less relevant at the present time. In addition, if schools are not actively engaging students in acquiring such skills, then the assessment of these factors for accountability would be problematic. There are many individual and institutional factors that may be as influential as cognitive ability tests in predicting outcomes in some criteria of postsecondary success. Persistence, Graduation, and Time to Degree Several criteria such as persistence, graduation, and time to degree are frequently employed in studies of college success or college effectiveness, yet they are heavily influenced by many nonacademic factors that are not directly measured by cognitive ability tests and not included in the CCSS (Bowen, Chingos, & McPherson, 2009; Camara, 2005a). Although such broad criteria are often attractive to policymakers, they are not a direct outcome of academic preparedness alone. Many factors have been shown to relate to such outcomes which would threaten the validity of statements we may wish to make about test scores and their impact on college success. Financial factors. Finances are highly related to persistence, graduation, and time to degree. Despite increases in enrollment rates among all racial, ethnic, and income groups, participation gaps between affluent student and those from less privileged backgrounds have persisted. Also, gaps in degree attainment are larger than gaps in enrollment because lower-income students who are able to overcome the financial barriers to enter college are less likely to complete degrees (A. Camara, 2009, unpublished data). “Fewer than 40% of the academically high scoring/low SES students who do enroll in college earn bachelor’s degrees,” according to Baum and McPherson (2008, p. 5). Institutional factors. “By itself, Carnegie classification level (e.g., institutional selectivity and prestige as measured by external funding and research) has a profound effect on graduation rates” (Hamrick, Schuh, & Shelley, 2004, p. 10). Institutional demographic characteristics, geographic region, institutional financial assistance, and per student spending are also related to graduation rates. Social adjustment, whether students integrate socially and academically in their institutions and their level of engagement, are also strong predictors of persistence (Tinto, 1987). Students are expected to devote time to their education and assume responsibility for investing time and energy in learning. Student motivation, attendance, and engagement in learning are related to outcomes of persistence and graduation (Tinto, 1987). A wide range of psychological and social factors have also been shown to impact persistence, graduation, and time to de18
C
2013 by the National Council on Measurement in Education
gree. These factors include maturation, roommate conflicts, dating problems, health, nutrition, fitness, and time management (Harmston, 2004; Purcell & Clark, 2002; Robbins et al., 2004). Research has consistently shown that cognitive measures of academic performance, such as high school grades and test scores, are highly predictive of grades earned in college, but less so of retention and graduation (e.g., Robbins, Allen, Casillas, Peterson, & Le, 2006; Robbins et al., 2004; Schmitt et al., 2009). In fact, Robbins et al. (2004) found that the correlations between cognitive measures and first-year GPA were roughly two to three times larger than the correlations between cognitive measures and retention. Burton and Ramist (2001) reported that the combination of admission test scores, grades, and academic rigor offer the best predictors of graduation, but the correlations are generally about half as large as those found in predicting college grades. Grades on the Advanced Placement Exams have been shown to be a superior predictor of graduation when compared to admission tests, but they still placed second to high school grades even when the quality of high school was considered (Bowen, Chingos, & McPherson, 2009). This research also reported that noncognitive factors such as academic discipline, commitment, and persistence are related to graduation (Camara, 2005b; Robbins et al., 2004, 2006; Schmitt et al., 2009).
Placement and Exemption from Remedial Courses Placement into college credit courses, and the associated exemption from taking remedial courses, is a principal objective of college and career readiness. Setting a collegereadiness benchmark that is associated with the KSAs required for entry-level courses such as college algebra and composition can be accomplished through content-based and judgmental approaches (e.g., surveys, standard settings), but validation through empirical approaches will present greater challenges. Many colleges use the same assessment for both remediation and placement decisions (Achieve, 2007). That is, students are generally required to take placement tests prior to matriculation at an institution, and their performance on such tests are used to determine: (1) whether they are placed in remedial or credit-bearing courses, and (2) to determine a specific course placement (e.g., college algebra, precalculus/trigonometry). Today, many colleges have established an institutional score on admissions tests to exempt students from remediation or placement testing (Achieve, 2007). However, placement tests are generally employed to serve three purposes for colleges. First, they provide a narrow function of identifying deficiencies in student preparation. Second, they serve a broad function of certifying students as ready for collegelevel work across a variety of courses and institutions. Third, they serve a classification function, determining which creditbearing course (or remedial course) a student should enroll. This is especially true in mathematics4 (Achieve, 2007). Both assessment consortia propose to use the high school CCR assessments for the first two purposes, but are less clear on whether the assessments will provide precise placement recommendations across the range of skills and knowledge in a content domain. Typically, computer adaptive tests have been used for placement decisions because they are highly efficient and provide precise estimates of student ability across a wide distribution of ability for many different courses. Educational Measurement: Issues and Practice
Table 1. Percentage of Students Taking the ACT or SAT and Completing Math Courses During Their Freshman Year in College Math Courses During Freshman Year
ACT—ACT, 2007: Allen & Sconing, 2005; n = 80,000 (90 institutions)
Any math course Calculusa College algebraa Statisticsa Precalculusa
Not reported Not reported 36% Not reported Not reported
a Percentages
SAT—Wyatt, Kobrin, Wiley, Camara, & Proestler, 2011; Shaw and Patterson, 2010; n = 164,331 (110 institutions)
Adelman, 2006
72% 34% 18% (22%9 ) 10% (13%10 ) 9%
55% 18% 22% 5% 19%
based on total students completing any math course.
College Grades Selecting a course(s) as the criterion in mathematics may present a challenge because there is significantly more variability in freshman course taking behavior. States and consortia should weight the benefits and limitations of at least three potential postsecondary outcomes in an academic domain such as mathematics before selecting a criterion measure: (1) grades in a specific college mathematics course such as college algebra; (2) average grades in college mathematics courses (e.g., first credit bearing course in mathematics taken in college); and (3) freshman grades averaged across college courses in science, technology, engineering, and mathematics (STEM), or grades averaged across all academic courses in freshman year (FGPA). All three outcomes employ a predictive model and each of these different outcomes answers a slightly different question. ACT (Allen & Sconing, 2005) established benchmarks based on grades in specific courses (e.g., biology, college algebra). However, such an approach is likely to exclude a large number of college freshmen and exclude students in specific types of programs based on requirements. For example, 28% of college freshmen taking the SAT did not complete a math course, and this number is likely to be higher at 2-year colleges and career-training programs (Shaw & Patterson, 2010). Even among students taking math, there is significant variability in which math courses completed. Table 1 illustrates that 36% of students in ACT’s college readiness benchmark study took college algebra, but 64% of students either took another math course or did not enroll in math. Adelman (2006) used longitudinal samples and reported on math courses completed by freshmen. Table 1 also shows that basing a CCR benchmark on specific courses could exclude nearly three quarters of college
going students. Some universities have no remedial courses, but a significant percentage of students will still receive a grade of C or below in their freshman math course. These students met the existing requirement for the placement into a credit bearing course, but may not have been college ready. Basing a college readiness benchmark on a specific course such as college algebra or precalculus could make such a benchmark irrelevant to those students who have completed a more rigorous high school curriculum. Maruyama (2012) notes that there is a logical inconsistency in developing college readiness benchmarks from a test score when research consistently demonstrates that multiple factors are the best predictor of college success. In 2010, Wiley, Wyatt, and Camara used a conjunctive model composed of SAT scores, high school GPA, and an index of academic rigor to establish college readiness benchmarks. An academic rigor index was developed that weighted high school courses completed by students based on predictive modeling for various courses (e.g., number of courses completed, highest level course completed, additional weight for honors, dual enrollment, advanced placement; Wyatt, Wiley, Camara, & Proestler, 2011). Benchmarks composed of multiple predictors will improve prediction and inconsistencies in conclusions about readiness will decrease (Maruyama, 2012). In addition, students are perceived to have greater control over their level of effort and subsequent course grades and the academic rigor of course taking behavior can be more easily improved by schools and districts than standardized test scores. ACT and the College Board have set CCR benchmarks using college grades as opposed to placement decisions or measures that also include student persistence. The advantages of using a purely academic criterion such as grades include: r A strong statistical relationship between the predictor (test scores) and criteria (grades; r = .50 to .62 adjusted for restriction of range). r A criterion that minimizes construct irrelevant variance (e.g., higher educational decision consistency, many of the factors noted earlier). r A criterion where data are more easily available. r An outcome that appears logical and rational. There is an overwhelming body of research which uses college grades and GPA as the primary criteria of college success.6 This article will not review this literature, but whether one uses course grades or combined grades (GPA) the relationship with cognitive ability tests is quite strong. Using both high school grades and admissions tests results in the highest validity coefficients and the incremental validity contributed by the second predictor, whether grades or test scores, is approximately .08 (Patterson & Mattern, 2012).
Winter 2013
C
Colleges typically establish their own cut scores for placement in different courses, and faculty committees are often involved in making such determinations. A common cut score from the CCR assessments may serve as a “minimum” standard which exempts students from remedial courses, but a higher cut score may still be appropriate for many institutions when making actual placement decisions (e.g., college algebra, precalculus). However, if the CCR assessments are designed with adequate precision along a broad range of ability levels, institutions could set local cut scores for placement purposes and negate the need for an additional placement test. National placement tests are administered on computer adaptive platforms to meet this demand. Colleges and universities currently have different entry-level courses, remedial courses, and requirements for entry into the same and different courses5 (NAGB, 2009; Shaw & Patterson, 2010).
2013 by the National Council on Measurement in Education 19
FIGURE 2. Chronology of CCR assessments and major milestones in college and career success.
The many facets of college success, and potential criteria for these, have been discussed. The purpose of this article is not to develop a comprehensive model of college success but to provide guidance on a validation framework for CCR assessments. The focus of such assessments is on academic performance, and more specifically, the mathematics and ELA domains as defined in the CCSS. Therefore, the choice of appropriate criteria for college success should be directly related to the interpretative arguments which will be made from scores on CCR assessments. Figure 2 illustrates three primary uses of the CCR high school assessments which argue for criteria that are relevant to placement, including remediation, and academic success in college courses. The intended uses and validity argument for CCR assessments are more similar to those of traditional college placement tests than current state assessments and the relevant criteria for college success should initially focus on the same criteria—academic performance such as course grades. Grades in college courses will be mitigated by the accuracy of course placement which includes two decisions: (a) remediation or college credit placements, and (b) appropriate placement within each of these two streams (remediation or credit bearing courses). Therefore, the accuracy of placement decisions as well as students’ ultimate academic success in courses which rely on the mathematics and ELA skills in the CCSS should be the primary focus of validation arguments for CCR assessments in high school. Certainly other types of evidence, such as consequential evidence, will be required to support these assessments. However, this article argues that the CCR assessments require a significant paradigm shift from current state accountability tests and a greater focus on predictive evidence of future performance. Appropriate criteria for career readiness are discussed separately. Benchmarks for Which Colleges? When describing CCR benchmarks to higher education officials the first question is typically about the types of colleges used in the study. Clearly colleges differ greatly in their selectivity7 and in the admissions test scores, high school grades, and backgrounds of their admitted class. Can the
20
C
2013 by the National Council on Measurement in Education
same benchmark serve all colleges? The fact that different institutions may require different levels of skill for entrance and success in entry-level classes alone does not threaten the validity of CCR assessments. There are logical arguments to favor different cut scores (or benchmarks) by institutional type (e.g., 2-year college, 4-year college, and postsecondary vocational training program), selectivity, or academic major. Yet, wide variability would still be expected across institutions within each selectivity range and separate benchmarks would be inconsistent with the broader policy goal of increasing the transparency between K-12 and higher educational in terms of requirements and readiness. CCR assessments will be explicitly designed to identify students who are prepared to succeed in postsecondary education. Developing a consensus on a CCR benchmark that would be acceptable across states and hundreds of different 2and 4-year institutions may present the greatest hurdle to the state consortia. As noted earlier, many colleges and universities have conducted validation studies to set their own cut scores for both placement and remediation decisions. One option is to establish a common benchmark across institutions that will be used to determine whether students require remediation or may enter credit-bearing courses. Institutions could then set separate cut scores for different placement decisions within a remediation sequence or across different credit-bearing courses as they currently operate. However, even under this scenario, many institutions may be reluctant to sacrifice some predictive efficiency afforded through a local validation study for a uniform benchmark. Several states (e.g., Florida, Texas) have established a minimum cut score for remediation on placement tests, but individual institutions generally employ a higher cut score which is permitted as long as it exceeds the state mandate. If the CCR benchmark serves as a minimum standard but the majority of institutions require a higher benchmark, it could undermine much of the utility and effectiveness envisioned for the new assessments. A common CCR benchmark could produce significantly different remediation outcomes across different types of institutions, but should produce similar impact among institutions where student selectivity and characteristics are comparable. For example, if we used a common cut score on an admissions
Educational Measurement: Issues and Practice
r The metric for the criterion has not been defined and re-
test (e.g., SAT Math 440 or ACT Math 18), we would find smaller differences in the percentage of students below that score among comparable schools (e.g., state flagships, community college systems, the typical 4-year public institution). A cut score set too low would offer limited utility to many moderately selective institutions which still find a need for some remedial courses whereas a cut score set too high would likely place a majority of entry-level freshmen into remedial courses. A common cut score may be unlikely to meet the needs of all institutions. Different cut scores have been suggested for colleges based on their selectivity and different cut scores may be most useful within institutions. Specifically, more rigorous benchmarks or cut scores may be justified in mathematics where students major in STEM. Generally, students admitted to STEM majors are expected to have completed more rigorous courses and demonstrate higher achievement than non-STEM majors within many of the same institutions. There are several challenges to higher education adoption of a single definition of CCR and cut score:
r
r
r Will institutions use a common cut score for placement
r r r
out of remediation? Will the score be too low for some institutions and too high for other institutions? Will a common cut score result in a dramatic rise in remediation rates at some institutions and will they be able to accommodate the demand? Will institutions waive placement tests for students attaining the CCR benchmark or will students be required to take a second institutional placement test? Is it possible to get state systems to accept one standard when the test is introduced or is a phased approach more viable? What would be the impact of a CCR benchmark if some institutions demonstrate that a significant percentage of students reaching that benchmark do not succeed in their entry-level courses?
Empirical studies will need to address these issues to verify a common benchmark is adequate preparation for success across different types of institutions and majors. If such results are not found, then establishing separate benchmarks may eventually need to be reexamined. Placement into college credit courses without remediation is dichotomous, and studies of classification accuracy are more appropriate than predictive validity studies, which will be also required to examine validation argument. College Ready and Career Ready: The Same or Different? Can a single cut score or benchmark serve both college and career readiness? The types of empirical benchmark studies described for college readiness are comparatively easy when college readiness is the criterion because we are matching performance on the predictor (test) with outcome data (college grades) and establishing some probability level (e.g., 67%, 50%) of a specific outcome (e.g., grade in Biology, FGPA). This does not work quite as well for career readiness for several reasons:
r Career readiness has not been defined as a measurable
construct. Possessing the academic skills and knowledge required to be placed and succeed in a postsecondary vocational or career-training program has been suggested as an appropriate construct but the actual criterion has not been defined. Winter 2013
search has not explored the availability of data. Careertraining programs are not likely to make either remediation or placement decisions, and there is little research available on selection. Grades in career-training courses may seem like an appropriate criterion measure, but no multiinstitutional validity studies employing cognitive tests have been cited. Such programs may not maintain data on grades, and acquiring such data when it exists may be difficult and unsystematic. A single cut score or benchmark may not serve the diverse types of postsecondary career-training institutions and programs which exist. Loomis (2012) reported differences in the knowledge, skills, and abilities required for entry into five career clusters. The lack of criterion data from postsecondary training institutions complicates efforts to define and measure career readiness. Such a limitation has not deterred either consortia or many policymakers from pronouncing that college and career readiness is the same thing.
Studies by the American Diploma Partnership (ADP, 2004) and ACT (2006) found that employers and colleges often agreed on the level of knowledge and skills required for entry. The ADP study used data from the National Educational Longitudinal Study (NELS) to determine that highly paid workers required 4 years of high school English and skills taught in Algebra II. Next, they worked with content experts to extract the knowledge and skills taught in such courses. Finally, front-line managers from diverse industries reviewed preliminary workplace expectations and confirmed the importance of the content and skills emphasized in these courses during interviews. ACT (2006) also concluded that it was reasonable to expect the same level of academic preparedness for college and workforce training programs which provide a sufficient wage and advancement. A concordance between students who took WorkKeys and ACT was created and College Readiness benchmarks for Math and Reading were linked to ratings of the average KSAs required of job profiles. The National Assessment Governing Board (NAGB) implemented judgmental standard setting studies to set a score on NAEP, which would represent minimal academic preparedness for college course work and entry into job training programs. These studies focused on colleges as well as job training programs for five high trajectory jobs requiring at least 3 months of training but not a bachelor’s degree (Loomis, 2011, 2012). Many items on NAEP 12th grade reading and mathematics tests were judged to be irrelevant for job training programs in these five occupational areas. For example, between 56% and 100% of NAEP items in the four major math domains were deemed irrelevant by panelists representing three of the occupational clusters. Literary content was deemed irrelevant and reading panelists complained about the need for more relevant reading tasks such as instructional manuals. In mathematics, expectations differed between college and occupational panelists. The latter group found NAEP much more orientated toward academic math than applied math, although the college panelists were less willing to choose objectives in the framework which were more oriented to applications (Kilpatrick, 2012). Overall, panelists viewed career readiness as less relevant to the NAEP frameworks, yet they still completed the standard setting process. A number of significant differences in the required skills were found across the career clusters, and a standard could not be set. C
2013 by the National Council on Measurement in Education 21
Loomis (2012) concludes: “The clear indication of the findings of these studies is that there are important and significant differences in the academic requirements associated with preparedness for college coursework and job training programs. This is truly an area for which much more research is needed” (2012, pp. 22–23). For policy purposes, it would be convenient to conclude that competencies and performance levels required for the typical student entering all types of postsecondary institutions are identical. However, there is simply not sufficient evidence to make such claims or assumptions at this time. It may be difficult to conduct such research, but it appears there are substantial differences in the ability levels of students entering career-training programs and 2- or 4-year colleges. Both consortia recognize that there is considerably less empirical data on career readiness and initial benchmarks will apparently be based on college readiness (SBAC, 2012). Yet the consortia have concluded that equivalent determinations can eventually be established for “college-ready” and “career-ready,” as the academic mastery established through the CCSS are one and the same. CCR determinations will have to focus squarely on the academic preparation as articulated in the CCSS needed for careers, not other key elements of career readiness (Foughty, Monsaas, & Nellhause, 2012). It is also important to anticipate the consequences which could result if different and less rigorous benchmarks were established for college and career readiness. Different benchmarks could result in unintended consequences such as tracking students into less rigorous academic programs which could prevent them from later enrolling in college without remediation.
Readiness Versus Preparedness College and career success is the ultimate goal of the CCSS and current educational reforms. Readiness or preparedness should help ensure greater success and improve the odds of success for students. Readiness is generally considered to be a synonym of preparedness in most disciplines. College and career readiness (CCR) should result in greater levels of success in both environments. Again, it is important to draw distinctions between definitions of CCR which focus solely on academic and cognitive factors from those which involve personality, predispositions, and other noncognitive factors. Conley (2011) defines four key dimensions of college readiness. These dimensions are semi-independent, meaning a student may possess one or more dimensions and that to some extent they are all related to a successful transition to college. Most of what Conley describes is related to cognition, but some factors include elements which are not primarily academic or cognitive. Key cognitive strategies include cognitive strategies and higher order thinking skills (e.g., problem formation, interpretation, and analysis), and key content knowledge would most closely include the CCSS and other disciplinary documents which specify the concepts and knowledge in disciplines. The two remaining dimensions are not generally specified in state or disciplinary standards, but are related to college success. Key learning skills include time management, persistence, metacognition, goal setting, and self-awareness. College environments generally require much more self management for students than high school
22
C
2013 by the National Council on Measurement in Education
courses. Students are increasingly responsible for accessing readings and submitting assignments through web interfaces, engaging in collaborative learning with other students, and planning ahead to complete more complex assignments. Key transition knowledge and skills includes knowledge and awareness about the admissions and financial aid process, as well as how to interact with faculty and navigate college systems. Some of this information may be considered tacit knowledge and is often not obvious to students entering a new environment and culture. Transition skills do include behaviors and skills which can be learned but are less centered in cognition. For example, successful students must communicate and interact with a range of different types of individuals, as well as adjust to different systems and organizations. The NAGB established a technical panel to determine how the results from the 12th grade National Assessment of Educational Progress (NAEP) could be used as a tool to report CCR. NAGB and the technical panel defined preparedness as a subset of readiness (NAGB, 2009). College preparedness is defined as the academic knowledge and skills required to qualify for placement into entry-level college credit coursework without remediation. Preparation for workplace training refers to the academic knowledge to qualify for job training. This definition of preparedness does not mean the prepared student currently has the skills required to succeed in those entry-level college courses or to be hired for a job. The panel is saying a student has the prerequisite knowledge and skills needed to be placed in a credit-bearing course or training program (Loomis, 2011; NAGB, 2009). This definition would base CCR on skills students possess on Day 1 of college and not success during their first semester. Conceptually this definition may be attractive, but there are no empirical data available that attest to a student’s preparation to succeed on the first day of college other than data from other tests (e.g., admissions and placement tests). This is a particularly acute issue if we find a gap between the CCR benchmark set based on Day 1 readiness and ultimate success in college courses.
Metrics of College and Career Readiness Until recently, state standards and assessments have focused on “challenging content standards” and performance standards which determine how well children are mastering the material in the State academic content standards (No Child Left Behind Act, 2002). Current state assessments have been developed to measure a state’s content standards and content-based standard setting approaches such as Bookmark (Lewis, Mitzell, Mercado, & Schultz, 2012) and modified Angoff (1971), as well as norm-referenced approaches have been appropriately applied to determine cut scores and performance levels. There are many advantages to such approaches including the extensive process documentation which provides procedural evidence for the standard setting and coherence among performance levels, performance standards, test items, and student performance (Haertel, Beimers, & Miles, 2012). Most judgmental approaches rely on content and employ a broad range of panelists who are highly familiar with the standards, students, and performance levels which lends legitimacy and consistency to the process (Haertel & Lorie, 2004).
Educational Measurement: Issues and Practice
However, two weaknesses of these judgmental standard setting approaches are the lack of reference to external outcomes or criteria, and the significant variation which results in performance levels and impact across states (Haertel et al., 2012). Empirical evidence against a criterion measure has not been required for a validation argument in the current generation of state accountability tests. The proposed CCR assessments, designed to gauge the preparedness of students to succeed in college and careertraining programs, are fundamentally different and call for much greater emphasis on empirical associations with relevant criteria. CCR assessments are intended to be predictive of college and career success which can be measured. CCR assessments will employ a common definition and common performance standards as opposed to state- or college-specific definitions (i.e., cut score on tests). The major difference between previous state accountability assessments and the proposed CCR assessments lies in the interpretative argument and definition of the construct. This article does not argue that judgmental approaches cannot be employed, just that such approaches alone would be insufficient given the intended predictive uses of CCR assessments. Phillips (2012) states that “it is uncritically accepted that the performance standards must be based on the content standards and the PLDs written by the content experts, and that they should not be contaminated by empirical data” (p. 323). Panelists in each state likely believe they have set rigorous standards and cut scores based on their experiences in the classroom, but without any external referent the validation argument is based solely on one line of evidence (i.e., content). Linn (2003) has shown how separate judgmental standard setting processes based on their statespecific standards have results in significant discrepancies between state rankings of student proficiency with national indicators. Given the intended purposes of CCR assessments, if performance levels and benchmarks are inconsistent with empirical data on performance in college and career-training programs, they will not only lack credibility but would raise concerns about the validity of the interpretative argument. Determining CCR performance levels and benchmarks from empirical data alone for the high school CCR assessments might be defensible if that data were representative across different types of postsecondary institutions and students and similar benchmarks were equally effective across students and institutions. However, that data will not be available when the assessments are developed and initially used. In fact, acquiring appropriate criterion data that is representative across colleges and career-training programs will continue to present significant challenges to the consortia (Camara & Quenemoen, 2012). The challenge for CCR assessments is to develop an approach to standard setting which is consistent and sensitive to empirical evidence related to CCR, to collect such evidence across a large and representative sample of institutions and to incorporate multiple approaches that also consider alignment to the CCSS. Phillips (2012) describes the benchmark method which relies on statistical links between assessments and ensures policy decisions are explicit at the start of the process. The briefing book method begins by identifying and evaluating relevant data and translating data in terms of impact (or the percent meeting and exceeding each threshold). For example, in setting a performance standard at the proficient level
for CCR assessments you may report the percent of students proficient or above on 12th grade NAEP, the percent of students meeting or exceeding college readiness benchmarks on admissions tests, the percent of students requiring remediation in college, the proportion of students proficient on previous state assessments, and the percent of students exceeding certain raw scores (e.g., 50% correct; Haertel et al., 2012). Several states, as well as the assessment consortia, are exploring ways to incorporate external data into standard setting processes as their assessments focus on CCR. The external data can provide additional transparency and external validity evidence for state assessments, and establish direct links to postsecondary outcomes. Texas has recently adopted college and career-readiness benchmarks for its end of course assessments, and other states have similarly been looking for mechanisms to establish cut scores which correspond to CCR (McClarty & Davis, 2012). There are several additional methods for incorporating empirical data in a standard setting or benchmark process (Camara, 2012). For example, a separate policy panel could establish ranges or neighborhoods for performance levels based on external data prior to a formal standard setting process as was done recently with new CCR assessments in Texas (Keng, Murphy, & Gaertner, 2012). A second way to incorporate external data is to provide external impact data to panelists during the standard setting process much as traditional impact data for various cut scores is shared in later rounds. Some states have elected to conduct two separate standard setting processes and have policymakers make final recommendations based on all available data. Statistical data could certainly be used as a second method and policymakers could compare the impact and results from judgmental processes with predictive or statistical models in making final recommendations. Williams, Keng, and O’Malley (2012) emphasize the importance of addressing how evidence is introduced and presented to panelists and note the difficulties of presenting a large amount of complex external data to panelists. Of course there are several disadvantages to employing external data in such processes. Typically, you need a full year of operational data on a test to employ many of these methods whereas content-based methods can be conducted before the test results are operational (Haertel et al., 2012). Some would argue that methods which constrain panelists to neighborhoods and establish values or policy guidance prior to a judgmental process can devalue judgments and alienate panelists. Finally, much of the external data that would be introduced may be of limited value and relevance. Typically, the external data come from assessments which differ significantly from the CCR assessment in their rigor, test taking population, motivational conditions, and alignment (Keng et al., 2012; Williams et al., 2012). For example, the percentage of students considered proficient or above on NAEP could be an important external indicator, yet NAEP differs in several important ways from a typical state assessment. Similarly, knowing the percent of students meeting ACT or College Board’s benchmarks for college readiness is a relevant metric for CCR tests. However, the participation rates on the ACT and SAT will affect the impact data, as will differences in the motivation for taking tests, test content, and the methods used to establish benchmarks. The assessment frameworks for the consortia’s assessments are likely to depart from frameworks for admissions tests and NAEP.
Winter 2013
C
2013 by the National Council on Measurement in Education 23
Prediction Models ACT and College Board employed empirical models to establish benchmarks. Each focused on a single academic predictor (test score) and criterion (college grades). Some states have developed their own links to college outcomes either directly or through benchmarks set on admissions tests. For example, Texas used a variety of methods to determine the college readiness of students (Miller, Twing, & Meyers, 2008, unpublished data). They found strong correlations between scores on state and admissions tests, and they were able to develop predictive relationships and benchmarks to college success through admissions tests. Several states are currently undertaking similar studies and intend to establish direct links between state tests and college success criteria as well as indirect links through benchmarks on admissions tests. ACT and the College Board used similar methodologies based on linear and logistic regression, but selected different probabilities and criteria. ACT regressed specific subtest scores on grades in specific college courses8 (ACT, 2007; Allen & Sconing, 2005). The College Board used logistic regression to set the SAT College Readiness Benchmark based on freshman GPA (Wyatt, Kobrin, Wiley, Camara, & Proestler, 2011). An alternative method is to create expectancy tables which illustrate a range of probabilities associated with specific scores as recommended by Maruyama (2012). Wyatt, Remigio, and Camara (2013) produced such expectancy tables for different probabilities associated with different grades (B, B−, and C). Rather than base the benchmark on a specific course or entire FGPA, they calculated benchmarks that link SAT section scores to a range of relevant college courses. For example, SAT Math was associated with all STEM courses, whereas SAT Critical Reading section and Writing scores were benchmarked to performance in a broader set of freshmen courses which require extensive reading and writing skills, respectively. A third methodology using multiple predictors was proposed by Wiley et al. (2010). Arguing that multiple predictors produce the highest validity coefficients in predicting college grades, Wiley et al. (2010) developed a CCR prediction model that included: (a) SAT scores, (b) HSGPA, and (c) a quantitative measure of academic rigor of high school courses. They employed logistic regression to derive a cut score on each metric that corresponded to a 65% probability of a BFGPA. Thirty-two percent of test takers met all three benchmarks in comparison to 43% who met the benchmark for the SAT whereas 23% met none of the benchmarks. Greene and Winters (2005) also employed multiple measures in defining college readiness. They required students to have a high school diploma, read at the basic level or above on NAEP, and to complete the minimum course requirements at less selective colleges; only 34% of all high school graduates in 2002 met all criteria. Berkner and Chavez (1997) used four subject matter experts to set cut scores on admissions tests which would correspond to five categories of readiness from not qualified to very highly qualified. Students were moved up a category if they completed college core curriculum and down one category if they did not. High school grades, class rank, and test scores from the 1992 NELS were also used to compute a college qualification index. About 65% of 1992 high school graduates were minimally qualified. Finally, several districts have attempted to define college readiness. For example, Montgomery County in Maryland developed an index of college readiness that in24
C
2013 by the National Council on Measurement in Education
cluded seven key indicators ranging from advanced levels on K-8 reading tests, completion of Grade 6 math in 5th grade, and success in algebra courses and an AP examination, as well as a high minimum score on admissions tests (Wiley et al., 2010). Most recently, NAGB undertook a program of research which forms the validation framework for college and career preparedness. Four broad types of studies have been designed (NAGB, 2009). First, content alignments were conducted between NAEP and ACT, ACCUPLACER, SAT, and WorkKeys in reading and mathematics. Overall, there was considerable overlap between ACT, SAT, and NAEP. There were similarities, but important differences, between items on NAEP and ACT. There were similar levels of depth of knowledge across NAEP and SAT, but stronger alignment with SAT Math than SAT Critical Reading. Alignment results were somewhat weaker with ACCUPLACER and substantially weaker with WorkKeys (NAGB, 2011). Second, statistical linkages between NAEP and SAT were created and correlations of .91 and .74, respectively, with math and critical reading. Results indicated that the SAT readiness benchmark of 500 for critical reading and math is very close to the NAEP Proficient cut scores. NAGB (2011) noted that the “highest priority is generally placed on empirical studies” (p. 7). Third, judgmental standard setting was conducted for five career clusters and college readiness. This process was described earlier. Fourth, a national survey of 2- and 4-year higher education institutions was conducted to collect information about assessments and cut scores used for course placement (remedial and credit bearing courses). Preliminary cut scores have been proposed for describing the percent of students who are likely to need remediation in math or reading based on the surveys and linked to SAT cut scores of 470 and 450, in math and reading, respectively.
Validation and Concluding Thoughts As noted at the outset of the article, the primary purpose of new college- and career-readiness assessments is to determine if students completing high school are CCR and if students in earlier grades are on track to reach that milestone. The term CCR needs to be operationally defined, and then specific criteria and metrics are required to measure whether students in fact have attained this level of success. If the definition of CCR is vague or ambiguous, the assessment results and interpretations are more likely to be confused and misused. The benchmarks, cut scores, and PLDs should be informed by empirical data which corresponds to expected outcomes in postsecondary education (e.g., grades, placement in college credit courses), the desired performance outcome (e.g., grades of C or better), and probabilities or likelihood of success desired (50%, 70%). For example, an advanced performance level for the high school English assessment could state, “Students at this level have a strong probability (or 70% probability) of obtaining a grade of C or higher in first year English composition courses.” Defining and measuring readiness for success in careertraining programs is more nebulous and elusive. Despite the efficiency offered by a single standard, at this point there is insufficient evidence to conclude that the same standard or benchmark will serve college and career readiness equally. Many advocates of a single standard will argue that even Educational Measurement: Issues and Practice
among colleges a single standard is ineffective, but here we have data that can be brought to this issue. We can conduct research to determine the classification accuracy a cut score has across types of colleges (e.g., selectivity, 2-year vs. 4-year), courses, and even programs (e.g., STEM vs. non-STEM). Such data are largely absent from any large and representative sample of career-training programs and entry-level jobs. That is, we can estimate error over time and across institutions with the type of college outcome data that are generally used in higher educational validity studies, but that data are not available to inform decisions about career readiness. A series of important studies sponsored by NAGB challenge the assumption that the same academic skills are required at similar levels across career clusters and between career and college readiness (Loomis, 2011, 2012; NAGB, 2009). Content validation approaches, including surveys, reviews of job requirements, and perhaps some local concurrent and predictive validation studies will be required to provide a base of evidence related to how a CCR benchmark relates to career readiness and success. Relevant validation evidence emerged with the actual development of the CCSS (CCSS, 2010) which state, “the development of these Standards began with research-based learning progressions detailing what is known today about how students’ mathematical knowledge, skill and understanding develop over time” (p. 4). Additional sources of construct validation evidence for the standards themselves have come from surveys with higher educational faculty (Conley, Drummond, Gonzalez, Rooseboom, & Stout, 2011a) and comparisons of the CCSS with other rigorous standards of college readiness (Conley et al., 2011b). Content validation evidence for the assessments can evolve once assessment frameworks have been completed and comparisons to the CCSS and specified learning progressions are completed. The general validation approach and assumptions of the NAGB research agenda provides a blueprint for both the types of research and questions states need to consider before establishing a CCR benchmark or cut score. Empirical studies conducted over time and across different postsecondary institutions will provide the most compelling evidence of CCR benchmarks and standards. However, there are a variety of studies that should be planned which can provide different forms of evidence to address the complex and broad interpretative arguments which have been proposed for the CCR high school assessments. The recommended studies refer to the lines of evidence in Figure 1 and are listed in order in which they may be conducted based on the timeline for the development of the consortia CCR assessments:
r Conducting concurrent validation studies with high
r r
school students taking the new CCR assessments (line 1) and linking their performance to existing measures with benchmarks or evidence of predictive validity with college and career-training program performance (e.g., ACT, Advanced Placement examinations, high school grades, and placement test scores, PLAN, PSAT/NMSQT, and SAT) using different probabilities and performance levels (line 1). Conducting concurrent validation studies with college freshmen by administering the CCR to entering students and using college grades as the outcome measure after adjusting for the effects of motivation (line 3). Linking scores on the high school CCR assessments to cut scores on rigorous state assessments, NAEP, ACT,
Winter 2013
r
r r r
r
r
r
and SAT benchmarks and other external measures to examine impact (line 1). Policy-capturing approaches where panels review a variety of interim data including linking studies, impact studies, and standard setting or policy-capturing approaches to derive ranges of “neighborhoods” of scores which would correspond to the performance levels (lines 1 and 2). Traditional judgmental standard setting studies can be conducted to determine the appropriate cut score given the items and tasks contained on the assessments (line A). Studies examining the decision accuracy and classification accuracy of benchmarks and cut scores for individual students, schools, and states (lines 2 and 3). The requirement for validation evidence is not restricted to empirical relationships but ultimately must support inferences about the knowledge and skills in the CCSS. Is there evidence that the knowledge and skills addressed in the CCSS are required for success in postsecondary education (line B)? Do students lacking some of these skills succeed and do students possessing these KSAs fail (line C)? Longitudinal predictive studies which follow students who completed the high school CCR assessments as they enter postsecondary education using a variety of outcomes as potential criteria (e.g., enrollment, grades, persistence) (line 3). Differential prediction studies which examine whether the predictor and criteria work equally well across different types of institutions (public vs. private, selectivity) academic programs (e.g., majors, careers) and groups of students (demographic groups, in-state vs. out-of-state) (line 3). Finally, consequential validation evidence is required to support the many intended uses and proposed interpretations of the CCR assessments which range from diagnostic purposes to prediction of future success in postsecondary education.
Notes 1
The two consortia are the Partnership for Assessment of Readiness for College and Careers (PARCC) and the SMARTER Balanced Assessment Consortia and are composed of 44 states and the District of Columbia. 2 For example, if the majority of students scoring above a CCR cut score failed or a majority of students scoring below a CCR cut score succeeded, there would be significant concern and caution about using scores to support CCR inferences and decisions. 3 Other criteria have been used to evaluate the effectiveness of postsecondary education and may include both economic factors (e.g., starting salary), employment, life events (e.g., divorce), and self-reported attitudes and behaviors (e.g., interest in lifelong learning, community service) which do not appear relevant as primary criteria of cognitive assessments. 4 It also is true of placement in foreign languages and within course sequences in STEM areas for some selective colleges. 5 The requirements and expectations for success in calculus in an Engineering department may differ from those associated with calculus taught in a Business or Social Science department within the same institution. Research demonstrates that the average college grades differ significantly across departments. 6 FGPA is the most popular criterion measure for research on admissions, but other criteria used in studies include cumulative or final GPA, GPA in major, and grades in specific courses (Camara, 2005b). C
2013 by the National Council on Measurement in Education 25
7
Selectivity can be measured by traits of enrolled or admitted students (e.g., mean HSGPA, percent ranked in the top 10% of their class) but is most often related to standardized test scores. 8 ACT English to Composition, ACT Math to College Algebra, ACT Science to Biology, and ACT Reading to Social Science. 9 An additional 3% took a course labeled as “algebra/trigonometry.” 10 An additional 3% took a course labeled as “probability/statistics”; 1% took a course labeled as “business statistics.”
References Achieve. (2007). Aligned expectations: A closer look at college admissions and placement tests. Washington, DC: Achieve. Retrieved October 22, 2011, from http://www.achieve.org/files/ Admissions_and_Placement_FINAL2.pdf ACT. (2006). Ready for college, ready for work: Same or different. Retrieved November 1, 2011, from http://www.act.org/research/ policymakers/pdf/ReadinessBrief.pdf ACT. (2007). ACT technical manual. Iowa City, IA: ACT. Adelman, C. (2006). The toolbox revisited: Paths to degree completion from high school through college. Washington, DC: U.S. Department of Education. Retrieved October 21, 2011, from http://www2.ed.gov/rschstat/research/pubs/toolboxrevisit/toolbox.pdf Allen, J., & Sconing, J. (2005). Using ACT assessment scores to set benchmarks for college readiness. ACT Research Report 2005–3. Retrieved October 21, 2011, from http://www.act.org/research/reports/pdf/ACT_RR2005--3.pdf American Diploma Partnership (ADP). (2004). Ready or not: Creating a diploma that counts. Retrieved October 21, 2011, from http://www.achieve.org/files/ADPreport_7.pdf Angoff, W. H. (1971). Scales, norms, and equivalent scores. In R. L. Thorndike (Ed.), Educational measurement. (2nd ed.) Washington, DC: American Council on Education. Baum, S., & McPherson, M. (2008). Fulfilling the committee: Recommendations for reforming federal student aid. New York, NY: College Board. Berkner, L., & Chavez, L. (1997). Access to post secondary education for the 1992 high school graduates. (NCES 98–105). Washington, DC: U.S. Department of Education, National Center for Education Statistics. Bowen, W. G., Chingos, M. M., & McPherson, M. S. (2009). Crossing the finish line: Completing college at America’s public universities. Princeton, NJ: Princeton University Press. Burton, N. W., & Ramist, L. (2001). Predicting success in college: SAT studies of classes graduating since 1990. College Board Research Report 2001–2. Retrieved October 25, 2011, from http:// professionals.collegeboard.com/profdownload/pdf/rdreport200_ 3919.pdf Camara, W. J. (2005a). Broadening criteria of college success and the impact of cognitive predictors in admissions testing. In W. J. Camara & E. Kimmel (Eds.), New tools for admissions to higher education (pp. 81–107). Mahwah, NJ: Lawrence Erlbaum. Camara, W. J. (2005b). Broadening predictors of college success. In W. J. Camara and E. W. Kimmel (Eds.) Choosing students: Higher education admissions tools for the 21st century (pp. 81–105). New York, NY: Routledge. Camara, W. J. (2012, April). Defining and measuring college and career readiness: Developing performance level descriptors and defining criteria. Paper presented at the meeting of the National Council on Measurement in Education, Vancouver, Canada. Camara, W. J., & Quenemoen, R. (2012). Defining and measuring college and career readiness and informing the development of performance level descriptors. Retrieved March 16, 2012, from http://www.parcconline.org/sites/parcc/files/PARCC%20CCR% 20paper%20v14%201--8--12.pdf Common Core State Standards (CCSS). (2010). Retrieved November 1, 2011, from http://www.corestandards.org/the-standards Conley, D. T. (2011). Redefining college readiness. Vol. 5, Eugene, OR: Educational Policy Improvement Center.
26
C
2013 by the National Council on Measurement in Education
Conley, D., Drummond, K. V., Gonzalez, A. Rooseboom, J., & Stout, O. (2011a). Reaching the goal: The applicability and importance of the common core state standards to college and career readiness. Eugene, OR: Educational Policy Improvement Center. Conley, D., Drummond, K. V., Gonzalez, Seburn, M., Stout, O., & Rooseboom, J. (2011b). Lining up: The relationship between the common core state standards and five sets of comparison standards. Eugene, OR: Educational Policy Improvement Center. Foughty, Z., Monsaas, J., & Nellhause, J. (2012). College- and career-ready determination (CCR-D) policy and policy-level performance level descriptors (PLDs). Retrieved September 30, 2012, from http://www.parcconline.org/sites/parcc/files/ PARCCDraftCRDPolicyandPerformanceLevelPLDs_FINAL.ppt Greene, J. P., & Winters, M. A. (2005). Public high school graduation and college-readiness rates: 1991–2002. (Education Working Paper No. 8, February 2005). Manhattan Institute for Public Research. Retrieved April 23, 2008, from http://www.manhattan-institute.org Haertel, E. H., Beimers, J., & Miles, J. (2012). The briefing book method. In Gregory J. Cizek (Ed.), Setting performance standards: Foundations, methods and innovations (2nd ed., pp. 283–299). New York, NY: Routledge. Haertel, E. H., & Lorie, W. A. (2004). Validating standards-based test score interpretations. Measurement, 2(2), 61–103. Hamrick, F. A., Schuh, J. H., & Shelley II, M. C. (2004). Predicting higher education graduation rates from institutional characteristics and resource allocation. Educational Policy Analysis Archives, 12(19), 1–24. Harmston, M. T. 2004. Cross-validation of persistence models for incoming freshmen. AIR Professional File, Number 93. Tallahassee, FL: Association for Institutional Research. Kane, M. (2001). So much remains the same: Conception and status of validation in setting standards. In G. Cizek (Ed.), Setting performance standards (pp. 53–88). New York, NY: Routledge. Kilpatrick, J. (2012, April). The standard for minimal academic preparedness in Mathematics to enter a job-training program. Paper presented at the meeting of the National Council on Measurement in Education, Vancouver, Canada. Keng, L., Murphy, D., & Gaertner, M. (2012, April). Supported by data: A comprehensive approach for building empirical evidence for standard setting. Paper presented at the meeting of the National Council on Measurement in Education, Vancouver, British Columbia, Canada. King, J. (2011). Implementing the common core state standards: An action agenda for higher education. American Council on Education. Retrieved February 26, 2012, from http://www.acenet.edu/AM/Template.cfm?Section=Home& CONTENTID=39580&TEMPLATE=/CM/ContentDisplay.cfm Lewis, D. M., Mitzel, H. C., Mercado, R. L., & Schultz, M. (2012). The bookmark standard setting procedure. In G. Cizek (Ed.), Setting performance standards (pp. 225–254). New York, NY: Routledge. Linn, R. L. (2003). Performance standards: Utility for different uses of assessments. Education Policy Analysis Archives, 11(31), 1–4. Loomis, S. (2011, April). College readiness, career readiness: Same or different? Paper presented at the meeting of the National Council on Measurement in Education, New Orleans, LA. Loomis, S. (2012, April). A study of “irrelevant items”: Impact on bookmark placement and implications for College and career readiness. Paper presented at the meeting of the National Council on Measurement in Education, Vancouver, Canada. Maruyama, G. (2012). Assessing college readiness: Should we be satisfied with ACT or other threshold scores? Educational Researcher, 41(7), 252–261. McClarty, K. L., & Davis, L. L. (2012, April). Enriched by policy-making performance standards meaningful in educational outcomes. Paper presented at the meeting of the National Council of Measurement in Education, Vancouver, Canada. National Assessment Governing Board (NAGB). (2009). Making new links, 12th grade and beyond: Technical panel on 12th grade preparedness research final report. Retrieved
Educational Measurement: Issues and Practice
October 28, 2011, from http://www.nagb.org/publications/ PreparednessFinalReport.pdf National Assessment Governing Board (NAGB). (2011). COSDAM briefing materials for the August 2011 meeting of the National Assessment Governing Board. Retrieved March 1, 2012, from http://www.nagb.org/content/nagb/assets/documents/publications/ PreparednessFinalReport.pdf No Child Left Behind. (2002). Pub.L. 107–110, 115 Stat. 1425 (January 8, 2002). Patterson, B. F., & Mattern, K. D. (2012). Validity of the SAT for predicting FYGPA: 2009 SAT validity sample. College Board Statistical Report No. 2012–5. New York, NY: The College Board. Phillips, G. W. (2012). The benchmark method of standard setting. In Gregory J. Cizek (Ed.), Setting performance standards: Foundations, methods and innovations (2nd ed., pp. 323–345). New York, NY: Routledge. Purcell, J., & Clark, A. (2002, October). Assessing the life and times of first-semester academic failures. Paper presented at the conference of the Southern Association for Institutional Research, Baton Rouge, LA. Robbins, S., Allen, J., Casillas, A., Peterson, C. H., & Le, H. (2006). Unraveling the differential effects of motivational and skills, social, and self-management measures from traditional predictors of college outcomes. Journal of Educational Psychology, 98, 598– 616. Robbins, S., Lauver, K., Le, H., Davis, D., Langley, R., & Carlstrom, A. (2004). Do psychosocial and study skill factors predict college outcomes? A meta-analysis. Psychological Bulletin, 130, 261–288. Schmitt, N., Billington, A. Q., Keeney, J., Oswald, F. L., Pleskac, T., & Sinha, R., & Zorzie, M. (2009). Prediction of four-year college student performance using cognitive and noncognitive predictors and the impact on demographic status of admitted students. Journal of Applied Psychology, 94, 1479–1498. Shaw, E. J., & Patterson, B. F. (2010). What should students be ready for in college? A first look at coursework in four
year post secondary institutions in the U.S. College Board Research Report 2010–1. Retrieved October 22, 2011, from http://professionals.collegeboard.com/profdownload/pdf/10b_1417_ FirstYrCollCourseRR_WEB_100611.pdf Smarter Balanced Assessment Consortia (SBAC). (2012). How will Smarter Balanced validate its college and career readiness benchmarks? Retrieved October 7, 2012, from http://www. smarterbalanced.org/resources-events/faqs/ Tinto, V. (1987). Leaving college. Chicago, IL: University of Chicago Press. Wiley, A., Wyatt, J., & Camara, W. J. (2010). The development of a multidimensional college readiness index. College Board Research Report 2010–3. Retrieved October 21, 2011, from http://professionals.collegeboard.com/profdownload/pdf/10b3110_ CollegeReadiness_RR_WEB_110315.pdf Williams, N. J., Keng, L., & O’Malley, K. O. (2012, April). Maximizing panel input: Incorporating empirical evidence in a way the standard setting panel will understand. Paper presented at the meeting of the National Council on Measurement in Education, Vancouver, Canada. Wyatt, J., Kobrin, J., Wiley, A., Camara, W. J., & Proestler, N. (2011). SAT benchmarks: Development of a college readiness benchmark and its relationship to secondary and post secondary school performance. College Board Research Report 2011–5. Retrieved October 21, 2011, from http://professionals.collegeboard.com/profdownload/ pdf/RR2011--5.pdf Wyatt, J., Remigio, M., & Camara, W. J. (2013). Benchmarks for STEM, reading and writing. College Board Research Report 2013–4. Retrieved April 8, 2013, from http://research. collegeboard.org/sites/default/files/publications/2013/1/ researchnote-2013-1-sat-subject-readiness-indicators_0.pdf Wyatt, J., Wiley, A., Camara, W. J., & Proestler, N. (2011). The development of an index of academic rigor for college readiness. Retrieved March 7, 2012, from http://professionals. collegeboard.com/profdownload/pdf/RR2011--11.pdf
Winter 2013
C
2013 by the National Council on Measurement in Education 27