Evaluating the performance of calculus classes using ... - UFF

16 downloads 6295 Views 108KB Size Report
This paper compares the ef ciency of calculus classes. Two kinds of classes are ... Calculus classes with computer aid. The experiment began in the rst semester ...
EUR. J. ENG. ED.,

2002,

VOL. 27, NO. 2,

209–218

Evaluating the performance of calculus classes using operational research tools JO à O C A R LO S C . B. S OA R E S D E M E L L O † * , M A R C O S P. E . L IN S ‡ , M A R I A H E L E NA C . S OA R E S D E M E L L O † a n d E LI A N E G. G O M E S ‡ This paper compares the efŽciency of calculus classes. Two kinds of classes are evaluated: the traditional ones and others that use computational methods in teaching. This experiment was performed at Fluminense Federal University, Niterói, Brazil, from 1998 to 2000. The emphasis of this paper is on the quantitative evaluation using two operational research tools: multicriteria decision aid methods (mainly using the MACBETH approach) and data envelopment analysis. The evaluating variables are the level at which students enter the university and the performance of the students after studying calculus.

1.

Introduction That educational evaluation is necessary is accepted knowledge. It is also accepted knowledge that all attempts to do so are usually not well received by those under evaluation. These attitudes usually discourage the development of evaluation techniques, since they are condemned to not being too useful. As a consequence, the curricula are changed, methodological methods are tried and many other changes are made without validations. It is well known that all models have to be validated in order to be useful. What usually happens, and is called ‘evaluation’, are qualitative opinions, expressed by those who are involved with the experiment, be they students or lecturers. Sometimes simple incomplete parameters are used, like the number of students who pass, the number of students who Žnish the course and Žnal test marks. According to Boclin (1999), it is important to create and use methods of evaluation that are solid, independent, quantitative and comparative. In this paper we use the method developed by Soares de Mello et al. (2001), which combines two operational research tools, data envelopment analysis (DEA) (Cooper et al. 2000) and the MACBETH (measuring attractiveness by a categorical based evaluation technique) approach (Bana e Costa and Vansnick 1994) to perform a comparative evaluation of calculus classes, considering those that use traditional ways of teaching and those making use of computational aid. This experiment was

† Production Engineering Department, Fluminense Federal University, Rua Passo da Pátria, 156, São Domingos, 24210–240, Niterói, RJ, Brazil. ‡ Production Engineering Programme, Federal University of Rio de Janeiro, Cidade Universitária, Ilha do Fundão, Centro de Tecnologia, Bloco F, Sala F-105, 21945–970, Rio de Janeiro, RJ, Brazil. * To whom correspondence should be addressed. e-mail: jcsmello, mhelenamello @bol.com.br

European Journal of Engineering Education ISSN 0343-3797 print/ISSN 1469-5898 online © 2002 Taylor & Francis Ltd http://www.tandf.co.uk/journals DOI: 10.1080/0304379021012957 7

210

J. C. C. B. Soares de Mello

performed at Fluminense Federal University (UFF), Niterói, Brazil, from 1998 to 2000. 2.

First calculus at UFF It is known that for technological careers, it is the Žrst course of calculus that most marks the student. As a matter of fact, it is usually the Žrst time students face a new way of understanding mathematics and the inŽnitely small is not an easy concept to grasp. Because of these difŽculties many students give up studying engineering and some myths are created, the main one being that the students do not have sufŽcient background knowledge to follow this course. In part, this is true, but it is not the only reason. In the early 1980s, some changes were made based on this paradigm: the number of class hours for this course was increased from 4 to 6 h a week. Afterwards, the system of tests was changed, because people thought that the problem was that the lecturers were not evaluating their classes properly. Calculus Part One is compulsory for all the students who enter the UFF engineering course. Among the disciplines of the Žrst period there is linear algebra. Calculus Part One and linear algebra are required together for the student to go on to Calculus Part Two. In 1994, there was a profound change in the curriculum for the UFF engineering courses. Against all the tendencies, there was a reduction in the number of hours a week to 4 h. The idea was that the major problem was the number of new concepts students had to learn in such a short time. So, there was a noticeable reduction in the topics to be learnt. As usual, no quantitative evaluation was made, but according to the number of students that did not give up in the Žrst semester, it seems that it had some success. It is important to remember that none of these experiments had any objective, quantitative or comparative evaluation. Finally, there was a new experiment in 1998: some classes ended up having 6 h a week and used a methodology that had to use computer software. This paper is the Žrst attempt to evaluate this last experiment. 3.

Calculus classes with computer aid The experiment began in the Žrst semester of 1998 with two of the nine classes of that discipline. It lasted until the second semester of 1999, and only one class used the method until the second semester of 2000, when the project was interrupted. There were arguments for and against the experiment and its extension to all classes or simply its extinction. Those who argued for the experiment said that the possibility of seeing the graphics being constructed, the interactivity or the simple fact of using the computer as a modern tool, make calculus more attractive for the students. Those who argued against the experiment argued that it is exactly the use of a new tool that creates a new hurdle for the student to overcome. On the other hand, there were some who thought that the use of the computer as a tool is very important, although not in the Žrst semester. The solution proposed by the latter was to perform the experiments in the third period in the classes of differential equations. This debate is never-ending. It sounds like the discussion about the use of calculators: When should they begin to be used by the children?

Performance of calculus classes

211

4.

Quantitative evaluation In order to avoid passionate arguments, it is important to make quantitative evaluations in education. Our aim is to make a quantitative analysis of a set of classes in the same discipline, with different students, different means, different number of hours, and so forth. Since educational resources are used to improve the knowledge of students, classes can be seen as productive units to be evaluated, where resources (educational means and the previous knowledge of students, i.e. the level of knowledge they had at the end of secondary school) are used to produce outputs (the Žnal knowledge of students). The concept of classes as productive units suggests the use of DEA, a tool of efŽciency evaluation. Meanwhile, the need to combine multiple criteria and the need to express experts value judgements about the relevance of each criterion requires multicriteria decision aid methods. In this paper we use the MACBETH multicriteria approach. This technique requires the criteria not be redundant, to be exhaustive and coherent among them all (Roy and Bouyssou 1993). The criteria deal with all the facets of the problem, taking account of the model’s limitations. 4.1.

Data envelopment analysis models The DEA, developed by Charnes et al. (1978), is a tool that measures the relative efŽciency of each productive unit, usually named decision-making unit (DMU), taking into consideration the means at its disposal (inputs) and the achieved results (outputs). The aim of DEA is to compare a certain number of DMUs that perform similar tasks but that use a different quantity of inputs and achieve different outputs. Besides identifying efŽcient DMUs, DEA models allow inefŽciencies to be measured. They will also locate where they lie and estimate a linear production function that provides a benchmark for DMUs. The DEA model that has been used is the CCR model (Charnes et al. 1978), which considers constant returns to scale, i.e. assumes proportionality between inputs and outputs in the efŽcient DMUs. The formulation of the CCR model uses for each DMU k the mathematical programming problem presented in equation (1), in which: efk is the efŽciency of DMU k; xik represent the i inputs of the unit k, i = 1, . . ., n; yjk represent j outputs of the unit k, j = 1, . . ., m; and ujk and vik represent the weights of outputs and inputs. m

max ef k =

åu

jk

y jk

åv

ik

x ik

j=1 u

i= 1 m

åu

jk

y jk

åv

ik

x ik

j=1 n

1, k

(1)

i=1

u jk , v ik

0.

This model does not allow any value judgement about the relevance of inputs and outputs to be made. The weights vik and ujk, obtained from equation (1) express the relevance of each one. This characteristic makes DEA an extremely objective tool with some drawbacks: some weights can be unrealistically valued, in some cases

212

J. C. C. B. Soares de Mello

being zero, meaning that some criteria were not considered for some units. They are calculated in order to be more convenient for each unit, respecting the mathematical constrains, leading to a great number of 100% efŽcient DMUs, and consequently a poor ranking. To improve ranking, the establishment of lower and upper bounds to each weight is required. These bounds are obtained by the specialist value judgements about the criteria and introduced in equation (1) as new constraints. Usually the value judgements are qualitative and they must be quantiŽed to be used as constraints in the performance evaluation. We use the MACBETH multicriteria method (Bana e Costa and Vansnick 1994, 1997, 1999) to quantify these qualitative judgements. 5.

Evaluating students

5.1.

The Selection of new students for UFF One method of considering student’s level of knowledge before they join the Calculus Part One course is to know their performance at vestibular, the Brazilian university entrance exam. The vestibular is conducted in two phases. The Žrst one is made of multiple choice tests, their contents coming from all the secondary school disciplines. The second one is a series of written exams, the subjects of which change with the chosen university course. For engineering the subjects are mathematics, physics and a Portuguese composition test. The student is eliminated if he/she gets a zero mark in any discipline, or does not get at least 50% of the multiple choice questions right. The second phase is used only to rank candidates, with no need for them to achieve any given marks. In this case the term approval cannot be used, as there is no threshold for new students to be admitted. 5.2.

Evaluating students in UFF calculus classes The minimal mark to pass in any discipline is 6.0 (on a scale from 0.0 to 10.0). In the same class there are students from different courses (engineering, computer science, physics, mathematics), new students (freshmen) and students who are repeating the discipline. 6.

Selecting the criteria As stated previously, there are three broad classes of criteria: the quality of the student who joins the course, the quality of the student who Žnishes it and the quantity of resources used for that purpose. To quantify the resources, the material and physical means that have to be used, the lecturer’s salary and the number of students cared for should all be measured. Some of these means are difŽcult to ascertain, are not available or there are so many variables that the study would never end. So, the number of hours in each of the Calculus Part One classes will be used instead. This is not a serious restriction, as the classes that used more resources were those that also logged more hours. The lecturer’s salary is not usually taken into account in academic decision-making, especially in public universities, such as UFF, so it will not be used here either. On the other hand, it is important to have the number of students in each class. This number is not equal to the number of freshmen, as there are different kinds of students in each class. The higher the number of students, the higher is the dilution of the resources and the lesser the attention the lecturer could have for each

Performance of calculus classes

213

freshman. So the resources will be taken per capita, Hj being the number of hours per week of class j, Nj the total number of the students in this class; Rj = Hj /Nj will be used as a parameter to quantify the material resources. The decision to use only freshmen was based on the assumption that the level of their knowledge and their ‘quality’ are closely related to the marks they gained in the university entrance exam. As the study is made for classes of calculus, it is reasonable to take only the marks the students had in mathematics. So, another criterion will be the marks achieved in mathematics in both phases of university entrance exam. As the marks do not refer to a single student, but to the class, the arithmetic average will be used as a measure of a central tendency, and representative of the set of marks. So, the gauges for measuring the quality of the students are as follows: • MDj: Average of the marks the students of class j had achieved in the written exam of the university entrance exam; • MMj: Average of the marks the students of class j had achieved in the multiple choice test of the university entrance exam. A fact that must not be overlooked is that using marks in different years and, consequently, different exams and tests, will lead to distortions. It is usual to say that the tests and exams of the university entrance exam at UFF, especially mathematics, have been at a similar level since 1995. Even so, it may affect the results, as the students sit for exams or tests in other disciplines on the same day, with different degrees of difŽculty. Finally, the ‘quality’ of the students who complete the course. One way of measuring this would be to use the marks the students obtained in that discipline. This measure could produce distortions, as each lecturer has his own criteria for evaluating his students. In order to minimize this distortion, the marks the students had in the following discipline (the Calculus Part Two) will also be used. To minimize the drawbacks of the arithmetic average the following parameters will be used: • A1j: Percentage of freshmen who passed class j of the Calculus Part One; • M1j: Average marks of the freshmen in class j of Calculus Part One; • A2j: Percentage of students from class j of Calculus Part One who passed the Calculus Part Two in the following semester; • M2j: Average marks of the Calculus Part Two students who passed class j of Calculus Part One. 7.

Data The database available includes the classes in 1997 and the Žrst half of 1998. Although the classes under experiment began only in 1998, the 1997 classes have been included in order to increase the possibility of comparison. One letter and three numbers identify the classes. The Žrst one relates to the semester of the year and the others refer to the year. The data were normalized and are shown in table 1. The classes under experiment are J198 and K198. According to Lins and Moreira (1999), there are techniques to determine which variables must be used. An alternative approach is to consider that all are relevant and eliminate only those that have a high correlation coefŽcient. The correlations between pairs of inputs are shown in table 2, and those for the outputs in table 3. When the correlation is greater than 0.90 one of the variables will not be used in the model (e.g. M1 in table 3).

214

J. C. C. B. Soares de Mello

Classes (DMUs) A197 B197 D197 E197 A297 B297 D297 E297 A198 D198 J198 K198

Inputs {I} and Outputs {O} A1 {O}

M1 {O}

A2 {O}

M2 {O}

R {I}

MM {I}

MD {I}

0.620 0.800 0.420 0.540 0.180 0.430 0.620 0.500 0.800 0.680 0.770 0.700

0.504 0.572 0.384 0.522 0.303 0.413 0.487 0.416 0.629 0.534 0.617 0.618

0.570 0.640 0.280 0.490 0.110 0.300 0.380 0.180 0.430 0.630 0.580 0.530

0.706 0.616 0.658 0.691 0.652 0.586 0.572 0.515 0.825 0.726 0.645 0.734

0.476 0.455 0.556 0.571 0.444 0.500 0.588 0.625 0.370 0.417 0.698 1.000

0.715 0.719 0.718 0.713 0.657 0.660 0.615 0.643 0.716 0.677 0.704 0.693

0.647 0.678 0.630 0.707 0.502 0.522 0.517 0.486 0.533 0.511 0.537 0.459

Table 1.

MM MD Table 2.

M1 A2 M2

Data used for evaluation.

R

MM

–0.01 –0.34

1.00 0.64

Correlation between pair of inputs.

A1

M1

A2

0.95 0.83 0.37

1.00 0.82 0.52

1.00 1.00 0.41

Table 3.

Correlations between outputs.

The variables MD and A2 (input and output, respectively) have a surprisingly low correlation (0.35). With a convenient exploration of data, it is evident that those two variables would be highly correlated if two groups of variables were considered, one for 1997 and the other for 1998. This and the data in table 1 suggest that the marks of the university entrance exam of 1998 were lower than expected. 8. 8.1.

Measuring the efŽciency of calculus classes

DEA CCR basic model Using the inputs and outputs referred to above, a basic DEA CCR model was built with three inputs and three outputs. The efŽciencies were calculated and the results are shown in table 4. As the marks in the 1998 exam were too low, all the

Performance of calculus classes DMU

EfŽciency (%)

A197 B197 D197 E197 A297 B297 D297 E297 A198 D198 J198 K198

90.96 100.00 79.54 87.51 86.13 77.06 90.34 69.60 100.00 100.00 100.00 100.00

Table 4.

215

DEA CCR EfŽciencies.

classes in this year had an efŽciency of 100%. Besides, one other class in 1997 had an efŽciency of 100% as well. Among 12 classes, the model considered Žve as efŽcient, which does not allow any evaluation to be made. 8.2.

DEA CCR model with weight restrictions There are more elaborate DEA models to improve the quality of ranking the units. In this paper we use the weight restrictions technique (Allen et al. 1997) to improve the discrimination among efŽcient DMUs. In order to reduce arbitrariness in choosing the weight restrictions, the MACBETH multicriteria methodology (Bana e Costa and Vansnick 1994, 1997, 1999) is used. In the MACBETH approach the decision-maker must choose which of two alternatives is the most attractive and its degree of attractiveness in a semantic scale that corresponds to an ordinal scale (0 indifferent, 1 very weak, 2 weak, 3 moderate, 4 strong, 5 very strong and 6 extreme). Further, the cardinal (transitivity) and semantic (relationship between differences) coherence are analysed and in the case of incoherence suggestions to solve it are given. A weight scale is suggested via linear programming as well as the interval in which they can vary without the problem becoming inconsistent. These intervals are the ones used in the DEA model. The preferences about the importance of the criteria were as follows: • All the academic inputs (MM and MD) are more ‘important’ than the resource input (R). • Marks in written exams are more ‘important’ than those in multiple choice tests. • All the variables related to student evaluation in Calculus Part Two are more ‘important’ than the equivalent ones related to the Calculus Part One. This minimizes the distortion caused by the differences among different lecturers when they are evaluating students. • The approval marks in each class are more important than the mean mark of the student. This minimizes distortions caused by central tendency measures. • All the weights must be strictly positive. This means that all the criteria (inputs and outputs) will be taken into account by the mathematical model.

216

J. C. C. B. Soares de Mello

One criterion is more ‘important’ than another when an alternative with a ‘good’ evaluation in the Žrst criterion and ‘bad’ in the second one is more attractive than other with a ‘bad’ evaluation in the Žrst criteria and ‘good’ in the second. Table 5 shows the valuation criteria described above and the MACBETH scale used for inputs and outputs. Table 6 shows the transformation of this semantic scale in an ordinal one, with the corresponding thresholds. The Art (ArtiŽcial) criterion in table 5 ensures that no real criteria will have a zero weight. It corresponds to an alternative less attractive than all the others in all criteria. The thresholds in table 6 are not the MACBETH original ones; they have been changed (using the decision-maker sensitivity analysis in the MACBETH software) to make all the linear programs in the DEA model feasible. 8.3.

Results Using these weight restrictions, the efŽciency was recalculated. The results are shown in table 7. From the results, it can be seen that only one class was efŽcient. The classes under experimentation lost efŽciency with the use of all the restrictions, in order not to leave any criteria out. The classes of the second 1997 semester were also inefŽcient. The probable reason is that students who enter the second semester have problems because they spend 6 months without classes. It is also remarkable that the classes under experimentation were less efŽcient than others in the same semester. The result was inferior even when compared with some classes of 1997, despite the handicap of the variable MD.

Inputs

MD MM R Art

Outputs

MD

MM

R

Art

0

1 0

5 5 0

6 6 6 0

M2

A1

Art

0

3 0

4 1 0

6 6 6 0

A2 M2 A1 Art

Table 5.

Value judgements.

Criteria

A2 M2 A1 MD MM R

A2

Thresholds Minimum

Maximum

16.70 12.26 11.12 16.69 16.68 10.01

54.47 33.32 33.27 43.82 39.98 33.28

Table 6.

MACBETH results.

217

Performance of calculus classes Classes A197 B197 D197 E197 A297 B297 D297 E297 A198 D198 J198 K198

Basic DEA CCR model

DEA CCR model with weight restrictions

90.96 100.00 79.54 87.51 86.13 77.06 90.34 69.60 100.00 100.00 100.00 100.00

81.27 90.53 50.99 67.63 27.72 59.08 72.60 46.73 93.56 100.00 86.58 82.47

Table 7.

EfŽciencies (%) in both DEA models.

9.

Conclusions The use of quantitative methods for evaluation proved to be an advance in relation to the evaluation based only on opinions. Although they ought to be used, some care must be taken. Every model has its own limitations: they model the reality; they are not the reality. When to use models is also an important consideration to be able to validate results. As the marks obtained in the 1998 university entrance exam were lower than those in 1997, the basic DEA CCR model gave better results for the 1998 classes, because it looked as if lower level students got better results at the end of one semester. This was false. In such a case, some other parameter like standard deviation should be used to allow the marks to be compared. It was fortunate that there were simultaneously classes both under experimentation and not under it. So their students could be compared, and this is why DEA could be used. Finally, the classes under experimentation do not appear to be efŽcient, especially when their results are compared with those of Calculus Part Two. Despite the small universe of only two classes under experimentation, it looks as if the objective of this study was not achieved. In fact, the number of classes with computational aid has been reduced to only one in 1999 and the experiment ended in 2001. References

ALLEN, R., ATHASSOPOULOS, A., DYSON, R. G. and THANASSOULIS , E., 1997, Weight restrictions and value judgements in DEA: evolution, development and future directions. Annals of Operations Research, 73, 13–34. BANA E COSTA, C. A., and VANSNICK, J. C., 1994, MACBETH—an interactive path towards the construction of a cardinal value function. International Transactions in Operations Research, 1, 489–500. BANA E COSTA, C. A. and VANSNICK, J. C., 1997, Applications of the MACBETH approach in the framework of an additive aggregation model. Journal of Multi-criteria Decision Analysis, 6, 107–114. BANA E COSTA, C. A. and VANSNICK, J. C., 1999, Cardinal value measurement with macbeth. Research Paper, 13, CEG-IST, Lisbon. BOCLIN, R., 1999, Indicadores de desempenho: novas estratégias da educação superior. Ensaio—Avaliação e Políticas Públicas em Educação, 7, 299–308.

218

J. C. C. B. Soares de Mello

CHARNES, A., COOPER, W. W. and RHODES, E., 1978, Measuring the efŽciency of decisionmaking units. European Journal of Operational Research, 2, 429–444. COOPER, W. W., SEIFORD, L. M. and TONE, K., 2000, Data Envelopment Analysis: A Comprehensive Text with Models, Applications, References and DEA-Solver Software (Boston, MA: Kluwer Academic). LINS, M. P. E. and MOREIRA, M. C. B., 1999, Método I–O para seleção de variáveis em modelos de Análise de Envoltória de Dados. Pesquisa Operacional, 19, 39–49. ROY, B. and BOUYSSOU, D., 1993, Aide Multicritère à la Décision: Méthode et cas (Paris: Econômica). SOARES DE MELLO, J. C. C. B., LETA, F. R., FERNANDES, A. J. S., VAZ , M. R., SOARES DE MELLO, M. H. C. and BARBEJAT, M. E. R. P., 2001, Avaliação qualitativa e quantitativa: uma metodologia de integração. Ensaio—Avaliação e Políticas Públicas em Educação, 9, 237–251.

About the authors

João Carlos C. B. Soares de Mello graduated in Mechanical Engineering with a Master’s degree in mathematics. He has been head of the Applied Mathematics Department at Fluminense Federal University (UFF) and has been responsible for the operational matters in the entrance exam at the same university for 5 years. Marcos P. E. Lins graduated in Electrical Engineering with a Doctorate degree in operational research. He is currently Director of the Brazilian Operational Research Society and Vice Co-ordinator of the Production Engineering Programme at Federal University of Rio de Janeiro. Maria Helena C. Soares de Mello graduated in Mechanical Engineering and teaches calculus and operational research at UFF. She currently co-ordinates engineering courses at UFF. Eliane G. Gomes graduated in Chemical Engineering with a Master’s degree in operational research. She is currently developing her doctoral work in the operational research Želd.