Assessment & Evaluation in Higher Education Design ...

126 downloads 1907 Views 278KB Size Report
Nov 14, 2008 - To cite this article: Calvin Smith (2008) Design‐focused evaluation, Assessment & Evaluation in ... *Email: [email protected]. Downloaded by [Griffith ...... Standards-based and responsive evaluation. Thousand Oaks ...
This article was downloaded by: [Griffith University] On: 24 September 2013, At: 13:20 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Assessment & Evaluation in Higher Education Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/caeh20

Design‐focused evaluation Calvin Smith

a

a

Griffith Institute for Higher Education, Mt Gravatt campus, Griffith University, Nathan, Queensland, Australia Published online: 14 Nov 2008.

To cite this article: Calvin Smith (2008) Design‐focused evaluation, Assessment & Evaluation in Higher Education, 33:6, 631-645, DOI: 10.1080/02602930701772762 To link to this article: http://dx.doi.org/10.1080/02602930701772762

PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/termsand-conditions

Assessment & Evaluation in Higher Education Vol. 33, No. 6, December 2008, 631–645

Design-focused evaluation Calvin Smith* Griffith Institute for Higher Education, Mt Gravatt campus, Griffith University, Nathan, Queensland, Australia In this paper an approach to the writing of evaluation questions is outlined and developed which focuses attention on the question of the effectiveness of an educational design for bringing about the learning it is intended to facilitate. The approach develops from the idea that all educational designs rely on instructional alignment, implicitly or explicitly, and succeed or fail to the extent to which the implementation of that alignment is effective. The approach to evaluation that is described, design-focused evaluation, utilises students’ experiences of instructional designs and strategies and focuses questions on students’ awareness of the effectiveness of those strategies for facilitating the intended learning outcomes. Detailed advice is given on the construction of items that fit with the approach described and that maximise the generalisability of the approach to most if not all educational settings and designs.

Downloaded by [Griffith University] at 13:20 24 September 2013

Assessment 10.1080/02602930701772762 CAEH_A_277269.sgm 0260-2938 Original Taylor 02008 00 Dr [email protected] 000002008 CalvinSmith and & Article Francis (print)/1469-297X Francis & Evaluation in Higher (online) Education

Approaches to educational programme evaluation The evaluation of educational programmes has a longstanding history with many meta-theoretical accounts written of various different approaches (for a comprehensive overview see Kellaghan and Stufflebeam 2003). Whether the focus is on outcomes beyond those related to assessing and grading students, such as in Smith’s and Tyler’s 1942 classic (Smith and Tyler 1942), or focused more tightly on educational processes and their immediate products, as Scriven argues (2003, 26) there is at the heart of all evaluation ‘an analysis of an evaluand in terms of [its] specific properties of merit, worth or significance’. Some evaluation models attempt comprehensive coverage of the components that together constitute the educational system or programme-as-a-whole. Examples of these include Stufflebeam’s CIPP (2003) 4-types model (context, input, process, product) and Kirkpatrick’s (1994) 4-levels model (reaction, learning, transfer, impact). Others focus on the standards against which the evaluand is compared (e.g. Stake, 2004) or the utilisation of the results of evaluation (e.g. Patton, 1997). Still others use the opportunity of evaluation to conduct a theoretical analysis of the internal or external factors associated with the results obtained in evaluation to explore causal explanations of programme-level outcomes (Scriven 2003, 24). In this paper, the approach to evaluation that is described focuses on the effectiveness of the educational processes used to bring about intended educational goals. In this regard, the approach described herein bridges in Kirkpatrick’s levels 1 and 2 and Stufflebeam’s process and outputs evaluation types. That is, the focus in this approach is on the link between the educational design and the outcomes, on the question of the effectiveness of the process for producing the programme’s outcomes.

*Email: [email protected] ISSN 0260-2938 print/ISSN 1469-297X online © 2008 Taylor & Francis DOI: 10.1080/02602930701772762 http://www.informaworld.com

632

C. Smith

Downloaded by [Griffith University] at 13:20 24 September 2013

Approaches to educational evaluation in higher education Educational evaluation in higher education often uses information gathered via student surveys in order to make judgements about the quality or effectiveness of teaching or course design. In order to have any plausibility this approach requires that questions put to students are derived from theories of effectiveness in either teacher behaviour or course design. Such theories implicitly link teacher behaviours or course design elements with learning outcomes or the quality of learning, often with an empirical research basis. Another way to say this is that these approaches allow inferences regarding the quality of the learning to be made from the quality of the learning environment, because there exists a prior body of research that shows such relationships are observed reliably. On the basis of this research, one can argue that from measures of the antecedent learning environment variables (e.g. teacher qualities such as enthusiasm, expertise and care for students, or courses qualities such as coherence, pace, volume of content or assessment, personal relevance to students), one can infer the consequent quality of learning. However, it is very unlikely that, once accepted or adopted, the theories that underpin the questions used in evaluation are re-tested for reliability or tested in the context of implementation, or in other contexts, in order to check validity across contexts. As a consequence, such evaluation systems will be heavily reliant upon the assumption that the data gathered about the quality of teaching or course design represent the quality of the learning that has been produced. That is, they will tend to assume that if scores on these environmental indicator variables are sufficiently high, then the quality of learning produced can be assumed to be sufficiently high also. Further, because the constructs measured are assumed to correlate with quality learning, it will also be assumed that they can stand as measures of the effectiveness of the teaching or of the course design in bringing about quality learning. This general approach therefore assumes both that quality learning has been achieved, and that it has been achieved effectively, if measures taken of teaching or course design qualities are sufficiently high. In this paper an alternative approach is proposed which focuses on the alignment between the teaching or course design elements, on the one hand, and the achievement of the learning objectives on the other. The proposed alternative provides systematic guidance in designing questions about the effectiveness of teacher practices or course design elements for producing the intended learning outcomes. In this way the questions used will inform teachers and course designers directly of the effectiveness of the design implemented for different learning outcomes. Whilst still drawing on student perceptions, I argue that this approach is an improvement on any approach that requires, each time it is used, an underlying theory, yet must assume rather than test the theoretical links between the measures of the learning environment used and the learning outcomes developed in the environment. Notwithstanding this claim, it remains true that the merits of any system of evaluation derive from the degree to which the evaluation approach adopted meets the needs of the evaluation project it serves. For instance, if measures of the quality of the learning environment are at the heart of, or serve the purposes of, an evaluation, then a process that gathers data on the environment is highly appropriate and is the best approach for those purposes. Thus, I do not propose that the approach described herein is a panacea for all known or alleged shortcomings of other approaches to evaluation, nor do I propose that the approach I describe should serve all evaluation purposes. Whilst there is a long tradition of research and theorising of evaluation, its forms, purposes and methods, I merely propose in this paper to make one small contribution to this tradition, which, if it has merit, may supplement other approaches. Although there is little published of the details of local institutional approaches to evaluation, common experience and anecdote suggest that any student evaluation of teaching system is underpinned by varying degrees of theorisation in the creation of the items used. Some

Assessment & Evaluation in Higher Education Table 1.

Three types of evaluation question. A (methods-focused)

Purpose or focus

Downloaded by [Griffith University] at 13:20 24 September 2013

Examples

633

Features of the teaching or course that are causally imputed to impact on the quality of learning • There was too much content in this course • The teacher in this course was knowledgeable • The role play instructions were clear • The demonstrator was clearly audible

B (learning/outcomes focused)

C (design-focused)

The existence or quality of the learning outcomes

Links between the features of the teaching or design and the quality of the learning Please rate your ability to: • The teachers gave feedback that helped me • critically appraise the with my learning arguments of others using logic and evidence • The video presentation helped me to learn how to • I am better able to cook spaghetti correctly reference research articles in my reports

systems, like SEEQ, reported by Marsh and Roche (1994), and the UQSES (Smith and Bath 2006) are based on fully validated latent construct measurement approaches. Others are based on some plausible range of generic dimensions of teaching practice or the course design elements that theory suggests are likely to produce quality learning. Still others are underpinned by specific theories such as the system operating at the University of Sydney which is based on the CEQ (Ramsden 1991) and the research linking CEQ with students’ approaches to their learning and the quality of student learning outcomes (Entwistle, 1981; Entwistle and Ramsden 1983; Entwistle and Tait 1990; Entwistle and Entwiste 1997). Sometimes the survey instruments focus on the actual learning outcomes, containing items or scales (see Smith and Bath 2006) that measure students’ perceptions of their learning development. Finally, there are occasions when the links between the measured constructs and the learning they were meant to produce are explicitly captured in the wording of the items. An example of this is: ‘The course was designed in a way that helped me to learn the content’. It is this last category of items that I aim to develop in this paper. In other words, when designing evaluation measures, one may focus on the antecedents of learning (as given by theory), the learning outcomes themselves, or the links between these two. Table 1 gives examples of each of these types labelled respectively ‘A’, ‘B’ and ‘C’. I will refer to Type ‘A’ as methods-focused questions – those that concentrate on the various aspects or characteristics of the teaching or the designed activities implemented in the classroom (the teaching and learning activities). Type ‘B’ I will refer to as learning-focused or outcomefocused questions – those concentrating on measures of the learning that has been achieved. Finally, Type ‘C’ questions I will refer to as design-focused questions – those that concentrate on the alignment of the design elements (the teaching and learning activities, materials, etc.) and the learning outcomes they were meant to achieve. Further, since items may also be derived from either a measurement model approach, using several items together as measures of each of a set of latent variables, or a single-item-perconstruct approach, we can cross-tabulate the focus of the questions against the measurement model used to generate them, as in Table 2. Design-focused evaluation The purpose of this paper is to describe an approach to educational evaluation focusing on the effectiveness of educational designs for bringing about intended learning outcomes, using student ratings and opinions as the main data source. To do this it is necessary to develop a systematic

Encouraged you to think in new ways (Smith and Bath, 2006)

• Were enthusiastic and committed to their teaching • Drew on current research and developments in their teaching • Taught in a way that increased your understanding of the discipline • Treated you with courtesy and respect • Were available for consultation • Taught in a way that stimulated your interest in the discipline • Intellectually challenged and extended you

C (design-focused)

Indicate the extent to which each of these teaching and learning activities was useful in preparing you for the lab report assignment: • Lab. exercise 1 (practice at summarising a journal article) • Lab. exercise 2 (referencing tasks) • Lab. exercise 3 (mini lab report) • Sample template for lab. exercise 3 • Feedback on lab. exercises • Lab. classes • Writing guide • Discussion with group in lab class (Bath, 2006)

As a result of this training I have learnt: • that the [name of policy] is a decisionmaking guide for staff when dealing with [type of incident] incidents • the organisation’s safety philosophy • the need for staff to conduct continuous threat assessments • a range of communication strategies which will enable me to employ ‘good practice’ when dealing with persons apparently suffering a disorder

I have achieved the graduate attributes which the The role plays were an effective way for me to course aimed to develop (e.g. oral/written practise my skills as an environmental communication, team work, critical thinking, consultant problem-solving, ethical sensitivity) (Smith, 2003; University of Qld, 2007)

B (learning/outcomes-focused)

Thinking about your major area of study or discipline, how much has your experience at UQ contributed to the development of the following skills and outcomes? • Your knowledge of ethical issues and standards in your discipline • Your appreciation of the philosophical and social contexts of your discipline • Your awareness and understanding of cultures and perspectives other than your own • Your openness to new ideas and perspectives Thinking about the majority of the lecturing staff • Your ability to evaluate the perspectives and opinions of others in your major area of study or discipline, how • Your understanding of social and civic much would you agree that they: responsibility (Smith and Bath, 2006) • Were experts in their fields

The teacher encouraged students to interact in class

Multiple items Please indicate your level of agreement with the per dimension following statements: • Students were encouraged to participate in class discussions • Students were invited to share their ideas and knowledge • Students were encouraged to ask questions and were given meaningful answers • Students were encouraged to express their own ideas and/or question the lecturer (Marsh and Roche, 1994)

One item per dimension model

A (methods-focused)

Table 2. Examples of items according to focus and measurement model.

Downloaded by [Griffith University] at 13:20 24 September 2013

634 C. Smith

Downloaded by [Griffith University] at 13:20 24 September 2013

Assessment & Evaluation in Higher Education

635

approach to the construction of items, to be used in student questionnaires, that explicitly link the learning outcomes to the teaching or design elements; these are the items in column ‘C’ in Table 2 that are referred to as design-focused questions. Let us first consider in more detail what it is that these items do. Any learning environment can be characterised by reference to the learning objectives (or LOBs) on the one hand and the methods used to help students achieve these on the other. Those methods, including teaching behaviours (such as giving feedback, interacting in a dialogical manner, being respectful or supportive) and design elements (such as group work, videos or other resources, practice exercises, etc.), we will call the teaching and learning activities (or TLAs). For the sake of this paper we will include assessment as a TLA. Thus the TLAs are implemented in order to achieve the learning objectives so that the latter are, at the end of the process, learning outcomes (or LOCs). Constructive alignment theory (Biggs 1996; Cohen 1987) exhorts educators to ensure that there is alignment between the LOBs, the TLAs and the assessed LOCs as in Figure 1. What is meant by ‘alignment’ is that the TLAs should cause students to engage in the kinds of activities that are required of them to succeed in the course. If the course requires deeply analytic thought, clear exposition of ideas and appropriate use of evidence to support an argument, then the TLAs should cause the students to engage in just these kinds of activities. Similarly, assessments should require students, as closely as possible, to produce performances of, or evidence of, the kinds of learning described in the course learning objectives. To validly establish that a course’s design has been an effective one, for bringing about the intended learning objectives, in a particular implementation, design-focused evaluation questions then must be concerned mainly with students’ perceptions of the effectiveness of the TLAs for facilitating the development of the LOCs (turning them from objectives to outcomes). The appropriateness of taking a design-focused approach to evaluation arises not just from educational design theory, but also in programme logic theories of curriculum design and evaluation (Wholey 1979). Programme logic helps designers to articulate what it is that they intend to achieve by the educational design. This articulation lines up the stated (espoused) objectives with the proposed learning design (the characteristics of the educational intervention). The logic is that the alignment of these with each other will produce the desired learning outcomes (the learned curriculum). In evaluation theory, programme logic helps evaluators to make decisions about what learning outcomes they might focus on if adopting an outcomes-focused evaluation, or what teaching and learning activities they might focus on if doing a methods-focused evaluation. However, the alignment espoused in programme logic theories of curriculum design and evaluation is rarely explicitly and thoroughly evaluated; it is exactly that alignment that is emphasised in design-focused evaluation. Figure 1. Alignment between LOBs, TLAs and assessment of LOCs.

What benefits would be derived from design-focused evaluation? It is sometimes useful to think of the different curricula that may be operating in any learning environment. Up to five different curricula have been identified by various authors (Jackson

Figure 1.

Alignment between LOBs, TLAs and assessment of LOCs.

636

C. Smith

1968; Snyder 1970; Lynch 1989; Margolis 2001; Porter and Smithson 2001; Gatto 2002; Bath et al. 2004) as follows: Intended or espoused – That which curriculum designers intend to teach; the one containing the learning objectives; the one that contains the intended TLAs and their intended outcomes. Taught, enacted or explicit – That curriculum which teachers teach; this is the actual TLAs as enacted in situ. Hidden or implied – That which is taught implicitly through evaluative judgements made explicitly or implicitly in talk, through the design of the TLAs, and the relationships between teachers and students and so on. Learnt – The learning the students take away from the experience. Sometimes (possibly always) this is more than, or goes outside the scope of, what is intended, taught and assessed.

Downloaded by [Griffith University] at 13:20 24 September 2013

Assessed – The learning that is assessed in the assessment protocols.

To this list we might fruitfully add two more: Experienced – The curriculum as experienced by students; this encompasses the idea that the student experience itself (created by factors such as the enacted curriculum, inconsistencies between the espoused, enacted and assessed curricula) is a worthy object of inquiry. Apart from their knowledge of the espoused learning objectives, the taught curriculum is the only one the students can comment on, since it is the one they have experienced; for the students, from their perspective, it is the experienced curriculum that counts. Evaluated – The curriculum as construed or implied by the evaluation protocols used to assess it. Typically, programme evaluation must specify the scope and design details of an evaluation before proceeding, which implies the specification of the evaluand. In the case of any curriculum, since not every component would typically be included in an evaluation design, and in any case since no component can be totally, definitively and objectively observed, the definition given in the evaluation design constitutes the curriculum for this particular evaluation occasion.

There can sometimes be a gap between the espoused/intended learning objectives and the actual learnt outcomes, which gives us a further reason to evaluate the effectiveness of the (taught/ enacted) curriculum for producing the objectives (i.e. converting them from objectives to outcomes through the enacted TLAs). Any deficiencies in the effectiveness of the enacted curriculum for producing the learning objectives may well account (along with other things, no doubt) for the gap between the objectives and the actual learning outcomes. Therefore, it is with an aim to discover these deficiencies that design-focused evaluation would be done. So these two curricula, the taught and the learnt (Figure 2), are pitched against each other in design-focused evaluation. The question to be answered in this kind of evaluation approach, then, is: Were the activities we enacted, for the students to engage in, effective in producing the outcomes we thought they would produce?

How we teach things is not an accident; rather how we teach is the result of deliberate decisions made about what TLAs should be used because they will be most likely to help students achieve the learning objectives. That is, the TLAs are chosen because they are meant to (‘designed’ to) bring about the learning objectives. More specifically, what we do in designing learning activities for students (e.g. lectures, group work, scenarios), learning materials (e.g. textbooks, readings, work books) and assessment is not accidental – i.e. there is an intention that the designs will be effective for bringing about the intended learning outcomes (i.e. the learning objectives). Therefore it is appropriate to evaluate the programme’s effectiveness for bringing about learning. This is the same thing as evaluating the effectiveness of the programme’s design for bringing about learning. Figure 2. Relationships between taught and learnt curricula.

Downloaded by [Griffith University] at 13:20 24 September 2013

Assessment & Evaluation in Higher Education

Figure 2.

637

Relationships between taught and learnt curricula.

One way to answer this question is to focus strictly upon the learned outcomes. If the process is effective then it will show up in the learned outcomes. To strengthen our inferences, we may suggest taking pre-test measures as well as post-test measures of student knowledge or skill, and with only the educational or training programme in between these measures, we may then infer that changes in student knowledge and skill are a consequence of the educational process. However, there are two problems with this approach. First, in higher education it is seldom the case that the educational process is the only thing sandwiched between pre- and post-tests. This may be an appropriate idea for a limited number of short training programmes, but will not provide the right kind of controls when learning occurs over a long period of time, say a semester, with the intervention in the meantime of all of life’s ordinary vicissitudes. The second problem follows from the first; if our capacity to attribute the learning to the educational process is weakened by the complex multivariate nature of the context in which the process is situated, then a focus strictly on outcomes will not give strong evidence for the effectiveness of the process’s design for bringing about the learning. Furthermore, the relationship between the process and the outcome is never a simple one – different teaching and learning activities might together produce a single outcome, or a single activity might produce multiple learning outcomes. What is needed is a way to focus attention on the relationships between the enacted teaching and learning activities and the learning we intended them to produce. Constructing DFE questions Since our interest in the learning environment then reduces to the implemented TLAs on the one hand and the intended learning outcomes or learning objectives on the other, design-focused evaluation questions have two parts: Methods (or TLA) component – that part which comprises the teaching and learning activities and assessment;

638

C. Smith

Learning component – that part which comprises the learning that was meant to occur as a consequence of engaging in the TLAs.

The two parts are joined by a grammatical conjunction as in this example:

Downloaded by [Griffith University] at 13:20 24 September 2013

The flight simulator session on bad weather landings (TLA-component)… helped me to learn how to (grammatical link) … land the Boeing 747 in bad weather (LOB-component).

If we consider this example of a design-focused evaluation question (hereafter DFEQ), we can see immediately that there are different levels of specificity that could be used in the construction of such questions. Just taking the methods component (‘The flight simulator session on bad weather landings…’), imagine a situation in which there is not one such session, but several, and that these together contributed to the development of the learning outcome(s). Thus, one can imagine that the questions used in evaluating this educational episode might be very general (‘The flight simulator sessions on bad weather landings…’) or very specific (‘The first flight simulator session on bad weather landings…’; ‘The second flight simulator session on bad weather landings…’, etc.). Indeed the question could be asked in a way that made only implicit reference to the sessions (‘The use of the flight simulator…’ or ‘The training package…’) where the training package included the flight simulator sessions (perhaps among other TLAs). Implicit, generic and specific levels of reference to the TLAs constitute three levels of generality-specificity for the methods component of a DFEQ. The same idea can be applied to the learning component of a DFEQ. For instance, landing safely in bad weather may not be the only learning outcome designed to be achieved through the simulator session(s). Perhaps correctly using instruments in the approach (because visibility is poor), and remembering to adjust the approach speeds are also part and parcel of the learning objectives (along with getting the plane on the ground in one piece). In this case ‘…helped me achieve the learning objectives’ is very generic whereas ‘…helped me learn how to determine the appropriate approach speed for the conditions’ is very specific. The implicit version of the learning component of a DFEQ typically will be something like ‘…was excellent’ or ‘…helped me to learn’ where the achievement of learning objectives is implied by either the high quality of the TLA(s) or by reference to the fact that learning has occurred. Table 3 details these levels for both sides of the DFEQ structure. In Table 4 I have displayed the combinations of specificity and generality for each side of the DFEQ structure, revealing that there are nine basic types of DFEQ. This will, henceforth, be referred to as the ‘SS-II Grid’, to indicate the range of item construction types defined by the cross-tabulation of three levels of generality for the TLA part of an item against three levels of generality for the learning objective/outcome part. For instance, where both the TLA and LOB/ Table 3.

Specific, generic and implicit question design. Teaching and learning activities and assessment

Specific Generic

Implicit

Specifically mention the TLAs (e.g. ‘The lecture on…’) Generically mention the TLAs (e.g. ‘The assessment…’ or ‘The teachers…’ or ‘The activities…’) Implicitly mention the TLAs (e.g. ‘This course…’)

Learning objectives/outcomes Specifically mention the learning objective or outcome (e.g. ‘…learn how to interpret case law’) Generically mention learning objectives or outcomes (e.g. ‘…learn’ or ‘…achieve the objectives’) Implicitly mention the learning objectives or outcomes or the links with good quality teaching practices (e.g. ‘…were relevant’ or ‘…were interesting’)

Assessment & Evaluation in Higher Education Table 4.

639

SS-II grid combining specificity and generality for TLAs and LOBs in DFEQs. Learning objectives and outcomes

Downloaded by [Griffith University] at 13:20 24 September 2013

Teaching, learning and assessment activities

Specific (S) Generic (G) Implied (I)

Specific (S)

Generic (G)

Implicit (I)

SS GS IS

SG GG IG

SI GI II

LOC are referred to specifically, the label given to the intersection cell is ‘SS’ (Specific-Specific); where the TLAs were referred to generically but the LOB/LOC was referred to specifically, the combination would be labelled ‘GS’ (Generic-Specific), and so forth. This labelling may help evaluators remain aware of the way in which their evaluation design is evolving, and facilitate monitoring of the comprehensiveness of the evaluation design, thereby reducing the risk that some important aspect of the educational design will not be explored in sufficient detail, or that some trivial aspect will be explored in superfluous detail. There are always two competing forces vying for survey space in any evaluation design – brevity and comprehensiveness; respondents want brevity, but evaluators and their clients sometimes need comprehensiveness. This systematic approach to monitoring evaluation design therefore helps to manage the competing needs of brevity and comprehensiveness in a way that ensures the best design for the purposes of the evaluation. Any TLA may be designed to produce one or more than one LOB/LOC. Sometimes there is a one-to-one relationship between the TLA implemented in an educational context, and the LOB/ LOC it is designed to achieve. At other times a single TLA may be expected to contribute to the learning of multiple LOB/LOCs. In a similar vein, depending on the way they are construed or worded, some LOB/LOCs may require more than one TLA to produce them, whilst others may be produced by just one TLA. For instance, ‘Have a sense of the breadth of scholarship in the field of historiography’ or ‘Develop a sound knowledge of principles of adult learning and their application’ might be learning objectives that require exposure to an entire course of TLAs to be achieved. On the other hand, ‘Learn the correct way to acid-bath bones’ may be an objective that is achieved through just one TLA. Of course a single TLA, such as a role-play, may produce multiple learning objectives (e.g. communication, critical thinking, drawing on relevant evidence for a particular kind of circumstance or problem). These combinations of relationships between learning objectives and the teaching and learning activities that are designed to allow students to achieve them are represented schematically in Figure 3. It is useful then to contemplate how the SS-II grid would serve these different kinds of relationships between TLAs and LOBs/LOCs. Table 5 shows the possibilities. For one-to-one relationships between TLAs and LOB/LOCs the specific-specific structure (SS) would be appropriate (e.g. ‘The video was effective in helping me learn how to cook spaghetti’). For one-to-many relationships, structures with specific references to the TLAs are most appropriate (e.g. ‘The role-play…’) but references to the LOB/LOC can be either generic (e.g. ‘…helping me learn the skills of group leadership) or implicit (e.g. ‘…was effective’; ‘…was well designed’) depending on the evaluator’s needs (SG or SI). For many-to-one relationships, references to TLAs can be either generic or implicit whilst the LOB/LOC part of the construction will be specific (GS or IS). With many-to-many relationships the references to the TLAs can be generic or implicit as can references to the LOB/LOCs, giving us either GG, GI, IG Figure 3. Logically possible range of relationships between LOBs and TLAs.

Downloaded by [Griffith University] at 13:20 24 September 2013

640

C. Smith

Figure 3.

Logically possible range of relationships between LOBs and TLAs.

or II patterns: Of course, in any case, a range of SS-patterned items could be used, and may in deed give the greatest diagnostic leverage. To give some concrete expression to these ideas now, let us imagine that a course dealing with leading groups effectively is to be evaluated. The course design features a practice session, two role-plays and two lectures. Let us suppose that the relationships between the course design and the learning objectives is as depicted in Figure 4, wherein the arrows represent the intended causal links between the TLAs and the LOBs. Thus, the Practice Session (PS), Lecture 1 (L1) and one of the Role Plays (RP1) altogether are designed to give the students the ability to use the Belbin Team Role Inventory (TRI) to manage role diversity in a practice setting, whilst the Role Plays and Lecture 2 are designed to help them learn how to manage dynamics within groups. More specific learning objectives exist for these two broader sets of learning goals (e.g. to manage diversity in groups requires that students can identify types of difficult group situations and manage these once identified). Figure 4. Example of curriculum segment showing levels of specificty in TLAs and LOB/LOCs and different hypothetical relationships between them.

Table 5.

SS-II grid with one and many relationship patterns between TLAs and LOBs/LOCs. Learning objectives and outcomes

Teaching, learning and assessment activities

One Many

One

Many

SS GS, IS

SG, SI GG, II, IG, GI

Downloaded by [Griffith University] at 13:20 24 September 2013

Assessment & Evaluation in Higher Education

641

Figure 4. Example of curriculum segment showing levels of specificty in TLAs and LOB/LOCs and different hypothetical relationships between them.

Table 6 shows examples of DFEQs relating to the role-plays designed to facilitate students learning about group work. Each TLA in the curriculum model in Figure 4 counts as a specific instance of itself, therefore reference to a TLA will count as a ‘specific’ reference in terms of its placement in the SS-II grid. The same will be true of references to the specific learning objectives. Thus, in the SS coordinate of Table 6 is an item dealing with one role-play (a specific TLA), and one learning outcome (a specific LOB/LOC). Going along the top row of Table 6, to the SG coordinate, we have the specific reference to the same role-play, but this time the reference to the LOB/LOC is generically referred to as ‘manage role diversity’, which is the generic objective associated with ‘understanding Belbin’s TRI’ and ‘Use Belbin TRI in a practice setting’, both specific LOB/LOCs. Similarly, there are two specific role-plays (RP1 and RP2), so a reference to ‘the role-plays’ is a generic reference in Table 6.

SS-II grid exemplar items with role-play as TLA and work in groups as LOB/LOC. Learning objective/outcome

Method/TLA Specific

Generic

Specific

The role-play on using the Belbin TRI The role-play on in group work developed my abilities leading group to manage role diversity work was effective The role-plays in this course were a The role-plays in useful method for helping me develop this course were my ability to manage role diversity well conducted This course helped me learn how to This course was lead group work effectively excellent

Generic

Implicit

The role-play on using the Belbin TRI in group work developed my ability to use Belbin TRI in a practice setting The role-plays in this course developed my ability to use the Belbin TRI in a practice setting This course has developed my ability to use the Belbin TRI in a practice setting

Implicit

Downloaded by [Griffith University] at 13:20 24 September 2013

642

C. Smith

this context. Were there only one role-play, the same words (‘the role-play’) would mark out a specific reference to a TLA. In this sense whether a reference is specific or generic is given not by the word(s) used per se but by the relationship of word(s) to context, i.e. to the design element(s) to which the words refer. Together the role-plays in this example contribute to two separate, specific learning objectives/outcomes, thus items written for the GS coordinate must number two to be comprehensive. Thus the two items would be the one already displayed in Table 6 (‘The role-plays in this course developed my ability to use the Belbin TRI in a practice setting’) along with ‘The role-plays in this course were effective in developing my ability to manage identified types of tricky situation’. Let me be clear that I am not advocating in a case such as this the construction of ‘complex’ or ‘double-barrelled’ items – those that ask two questions but only provide the capacity for a respondent to answer one of those questions (Bradburn et al. 2004, pp. 142–143; Groves et al. 2004, p. 233). Thus if one has three outcomes designed to be achieved from some collection of similar TLAs (e.g. six videos) then one could, if (a) one wanted to be comprehensive and (b) one wanted to use the GS coordinate to design the items, write three separate GS questions. Of course, better information would be garnered from 18 (six by three) SS questions instead. However, in the interests of not overburdening the respondent, compromises of this kind are de rigueur in evaluation and evaluation research and it may make more sense to compromise and use the three GS-style questions. These decisions are not guided by theory as much as by pragmatics such as your goals for the evaluation and how the data will be used (see, for example, Patton 1997). These differing levels of specificity, ranging from specific through generic to implicit are a very powerful and yet flexible device in writing DFEQs. Its power derives from the fact that it facilitates evaluation designers in thinking comprehensively through the evaluation design. Its flexibility derives from the fact that what counts as a specific, generic or implicit reference to the TLAs is completely a function of the evaluator’s concerns at the time, defined by the purposes of the evaluation. As already mentioned, it is the context, not the words themselves, that defines whether the item would be seen as having a specific, generic or implicit focus. Further considerations So far I have discussed only question wording. In this section I make some brief comments about response categories for design-focused evaluation questions. First it is to be noted that the general logic of the DFE approach does not suggest a preference for a particular response set – questions using this general form can be based on either agreement (‘The demonstration of the first aid technique for splinting a broken bone was an effective way for me to learn how to splint a broken bone myself’ – ratings from strongly agree to strongly disagree) or ratings of effectiveness (‘Rate how effective the demonstration of the first aid technique for splinting a broken bone was for helping you to learn how to splint a broken bone yourself’ – ratings from completely effective to completely ineffective). Further, taking a DFE approach does not release us from caveats that have emerged through experimentation on question/item construction. For example: inclusion of all points in the response range along with their semantic anchors is to be preferred to the inclusion or labelling of only the end-point extremes of the response scales (Frisbie 1979); care should always be taken with the scaling intervals used in response scales related to estimations of quantity so as to avoid various ‘context effect’ biases (Rockwood et al. 1997); and pilot-testing is an important part of the design process (Reynolds et al. 1993). Although well established in public opinion research and used extensively in educational evaluation, agreement scales (Likert 1932) must be handled carefully when adopting a design-focused approach to evaluation. This is because it is very easy to construct items with which respondents

Downloaded by [Griffith University] at 13:20 24 September 2013

Assessment & Evaluation in Higher Education

643

will too readily agree because of weakness in the item wording. For example, if asked about whether a video helped him/her to learn how to cook spaghetti, a respondent may well agree (or even strongly agree) even though the video was not an effective TLA for helping the person to achieve this learning outcome. This is because the interpretation of ‘helped me to learn’ requires little by way of generosity on the part of a respondent to garner his/her agreement; even if some TLA helped learning only a little, then agreement is the only appropriate answer to be given. Therefore I would recommend alternatives to this combination of weak items and their attendant agreement scales; to achieve this one can either strengthen the statement or change the scale. Finding alternatives to agreement with weak items involves finding items that capture what we want to know about our designs. Since we want to know how effective the TLAs are for helping students to learn the LOBs, then I would strongly recommend the use of either (a) scales that ask respondents to estimate the degree of effectiveness to which a TLA can be said to contribute to a LOB, or (b) items that contain reference to those degrees of effectiveness and/or the standards of attainment of the LOB. Thus strengthening of items can be achieved if reference to the degree of effectiveness or the standard of learning attainment is included within the wording of the item – e.g. the TLA was very effective in helping me to learn the LOB [to such-and-such a standard] (rating: agreement). This is generally commensurate with advice typically given about avoiding the use of statements that are too ‘mild’ to work well with agreement scales (DeVellis 2003, pp. 79–80) because it is too easy for respondents to tend to agree with them. Changing the scale requires the use of effectiveness scales – e.g. ‘How effective was the TLA in helping you to learn the LOB’ (rating: highly effective through to quite ineffective). The best option might well be a combination of both these strategies – e.g. ‘How effective was the TLA in helping you to learn the LOB, to such-and-such a standard’ (rating: effectiveness). Apart from caveats relating to instrument size, return on investment for evaluation expenditure and related matters, it is clear that Guttman scaling (Guttman 1950) and Rasch analysis (Andrich 1978) would be a potentially useful approach to be adopted on some occasions. These questions should be empirically explored in future research. Finally, to be absolutely clear, I am not advocating, in describing design-focused evaluation, a solely quantitative approach to evaluation. Of course, one would want to supplement quantifying DFEQs with open-ended questions to elicit richer, deeper explanatory and elaborative responses. Conclusions and critical reflections In this paper I have elaborated an approach to designing or constructing items to be used in educational evaluations so that the focus of the questions is on the intended relationships between the teaching and learning activities and the learning outcomes they are designed to produce. Calling this design-focused evaluation makes clear the intent to use this approach to focus evaluations of educational products and processes on their designs. The basic idea is to construct items that have two parts – one part that refers to a TLA element and the other part that refers to an objectives/ outcomes element – joined by a logical/grammatical conjunction focused on the causal relationship between the two parts. The SS-II grid encapsulates a range of logical alternatives in question construction that takes into account the fact that references to either the TLAs or the LOBs/LOCs can have varying degrees of specificity. Decisions as to which construction to use are not given by this approach but are, as always, appropriately determined by the goals of the evaluation, and other contingencies such as concerns for the length of instrumentation on non-response rates. There are a number of limitations to design-focused evaluation, things that this approach will not do. For example, this approach does not solve the problem of validity in question responses.

Downloaded by [Griffith University] at 13:20 24 September 2013

644

C. Smith

DFE will not eliminate concerns about the internal and external validity of measures and the conclusions we draw from them (Campbell 1969). Most important among the threats to external validity (a variant of Campbell’s ‘irrelevant replicability of treatments’) is the situation in which the question text, even at a fine-grained level of specificity, does not actually get to the heart of the cause of the learning to which it relates. To put this in another way, in spite of the appearance of clarity and specificity in the TLA part of the question, students’ answers may not validly reflect that teaching or design process. Following from this point, DFE does not eliminate or reduce the relevance, applicability and appropriateness of triangulation of data-gathering approaches, so that the picture formed from the evaluation results is more likely to be valid. If DFE questions may sometimes fail to give a valid picture of the relationship between the design aspects and the learning objectives of interest, then supplementary procedures for increasing confidence in the external validity of conclusions are still appropriate (such as interviews, focus groups and the like). Finally, DFE does not eliminate the relevance and utility of other approaches to evaluation, dependent on the purposes and scope of the evaluation being designed or conducted. Designfocused evaluation is proposed here as one way in which evaluation questions might be written to target a specific concern with the impact of the teaching and learning design on the learning outcomes; this is not to imply that other evaluation foci are invalid or redundant. Depending on the purposes of an evaluation, an approach (focus, methods) should be chosen which serves that purpose. Thus, appreciative evaluation is appropriate, say, when what is sought is insight into the learning that was unanticipated (‘F’ in Figure 2); similarly utilisation-focused evaluation (Patton, 1997) focuses on the intended use by the intended users of evaluation results. When methods or outcomes are being evaluated independently of each other, then approaches that focus on these aspects are highly appropriate (see Table 2). Indeed the whole point of this paper is to add to the arsenal of evaluation approaches another that might fit evaluators’ purposes from time to time; the intention is that a DFE approach would complement, rather than supplant, other approaches. Design-focused evaluation thus is proposed as additional to the present collection of evaluation strategies available to educators and evaluators. It is an approach that seeks to provide guidance in systematically addressing questions to the issue of the links between curriculum designs and the learning they elicit.

Notes on contributor Calvin Smith (PhD) is a senior lecturer in higher education at Griffith University. His research has focused on the evaluation of teaching, courses, programmes and university environments; staff development for enhancing teaching quality and scholarship; the measurement of learning environments for quality improvement and assurance purposes; and the relationships between learning environments and student learning outcomes, both discipline-specific and generic.

References Andrich, D. 1978. Relationships between the Thurstone and Rasch approaches to item scaling. Applied Psychological Measurement 2: 449–460. Bath, D. 2006. Enhancing the first-year learning environment: report on Griffith Teaching Fellowship. Brisbane: Griffith University (Internal Report – contact author for access). Bath, D., C.D. Smith, S. Stein, and R. Swann. 2004. Beyond mapping and embedding graduate attributes: bringing together quality assurance and action learning to create a validated and living curriculum. Higher Education Research and Development 23, no. 3: 313–328. Biggs, J. 1996. Enhancing teaching through constructive alignment. Higher Education 32, 347–364. Bradburn, N., S. Sudman, and B. Wansink. 2004. Asking questions: the definitive guide to questionnaire design. San Francisco: Jossey-Bass.

Downloaded by [Griffith University] at 13:20 24 September 2013

Assessment & Evaluation in Higher Education

645

Campbell, D.T. 1969. Reforms as experiments. American Psychologist 24, no. 4: 409–429. Cohen, S.A. 1987. Instructional alignment: searching for a magic bullet. Educational Researcher 16, no. 8: 16–20. DeVellis, R.F. 2003. Scale development: theory and Applications, Vol. 26. Newbury Park, CA: Sage Publications. Entwistle, N. 1981. Styles of learning and teaching. Chichester: Wiley. Entwistle, N., and A. Entwistle. 1997. Revision and the experience of understanding. In The experience of learning: implications for teaching and studying in higher education, eds F. Marton, D. Hounsell, and N. Entwistle. 2nd ed. Edinburgh: Scottish Academic Press. Entwistle, N., and P. Ramsden. 1983. Understanding student learning. London: Croom Helm. Entwistle, N., and H. Tait. 1990. Approaches to learning, evaluations of teaching and preferences for contrasting academic environments. Higher Education 19, no. 2: 169–194. Frisbie, D.A. 1979. Equivalence of questionnaire items with varying response formats. Journal of Educational Measurement 16, no. 1: 43–48. Gatto, J.T. 2002. Dumbing us down: the hidden curriculum of compulsory education. 2nd ed. British Columbia: New Society Publishers. Groves, R., J.F. Floyd, M. Couper, J. Lepkowski, E. Singer, and R. Tourangeau. 2004. Survey methodology. New York: Wiley. Guttman, L. 1950. The basis for scalogram analysis. In Measurement and Prediction, ed. S.A. Stouffer. Studies in Social Psychology in World War II, Vol. 4. New York: Wiley. Jackson, P.W. 1968. Life in classrooms. New York: Holt, Rinehart & Winston. Kellaghan, T., and D.L. Stufflebeam, eds. 2003. International handbook of educational evaluation (Part 1), Vol. 9. Dordrecht: Kluwer. Kirkpatrick, D.L. 1994. Evaluating training programs: the four levels. San Francisco: Berrett-Koehler. Likert, R. 1932. A technique for the measurement of attitudes. Archives of Psychology 140: 5–53. Lynch, K. 1989. The hidden curriculum: reproduction in education, a reappraisal. New York: Falmer Press. Margolis, E., ed. 2001. The hidden curriculum in higher education. New York: Routledge. Marsh, H., and L. Roche. 1994. The use of students’ evaluations of university teaching to improve teaching effectiveness. Canberra: AGPS. Patton, M.Q. 1997. Utlization-focused evaluation. Thousand Oaks, CA: Sage Publications. Porter, A.C., and J.L. Smithson. 2001. Are content standards being implemented in the classroom? A methodology and some tentative answers. In From the capitol to the classroom: standards-based reform in the states, ed. S.H. Fuhrman. Chicago, IL: National Society for the Study of Education and University of Chicago Press. Ramsden, P. 1991. A performance indicator of teaching quality in higher education: the course experience questionnaire. Studies in Higher Education 16, no. 2: 129–150. Reynolds, N., A. Diamantopoulos, and B. Schlegelmilch. 1993. Pretesting in questionnaire design: a review of the literature and suggestions for further research. Journal of the Market of Social Research Society 35, no. 2: 171–182. Rockwood, T.H., R.L. Sangster, and D.A. Dillman. 1997. The effect of response categories on questionnaire answers: context and mode effects. Sociological Methods and Research 26, no. 1: 118–140. Scriven, M. 2003. Evaluation theory and metatheory. In International handbook of educational evaluation, eds T. Kellaghan, and D.L. Stufflebeam, Vol. 9, 15–30. Dordrecht: Kluwer. Smith, C.D. 2003. Institutional course evaluation (iCEVAL). Report to DVC Academic (Friday, 27 June 2003). Brisbane: University of Queensland. Smith, C.D., and D.M. Bath. 2006. The role of the learning community in the development of discipline knowledge and generic graduate outcomes. Higher Educationno. 51: 259–286. Smith, E.R., and R.W. Tyler. 1942. Appraising and recording student progress, Vol. III. New York: Harper & Brothers. Snyder, B.R. 1970. The hidden curriculum. Cambridge, MA: MIT Press. Stake, R.E. 2004. Standards-based and responsive evaluation. Thousand Oaks, CA: Sage Publications. Stufflebeam, D.L. 2003. The CIPP model for evaluation. In International handbook of educational evaluation, eds T. Kellaghan, and D.L. Stufflebeam, Vol. 9, 31–62. Dordrecht: Kluwer. University of Queensland. 2007. Institutional course evaluation. http://www.tedi.uq.edu.au/downloads/ evaluations/iCEVAL_TEVAL_SmplRpt.pdf. Wholey, J.S. 1979. Evaluation: Promise and performance. Washington, DC: Urban Institute.