Learning Process through Knowledge

1 downloads 0 Views 441KB Size Report
analysis of the teaching-learning process, Knowledge Data Discovery and ... From the above it is considered that the EDM leads us to two different points of ..... un índice socioeconómico para los estudiantes que presentan las pruebas Saber.
Methodology for the Design of a Student Pattern Recognition Tool to Facilitate the Teaching - Learning Process through Knowledge Data Discovery (Big Data) Amelec Viloria1,Jenny-Paola Lis-Gutiérrez2, Mercedes Gaitán-Angulo2, Abel Ramiro Meza Godoy1, Gloria Cecilia Moreno1, Sadhana J. Kamatkar3 1

Universidad de la Costa. Barranquilla, Colombia.{aviloria7, ameza24, gmoreno}@cuc.edu.co 2 Fundación Universitaria Konrad Lorenz. Bogotá, Colombia.{jenny.lis, mercedes.gaitana} @konradlorenz.edu.co 3University of Mumbai, Mumbai, India. [email protected]

Abstract.Imagine a platform in which the teacher can access to identify patterns in the learning styles of students attached to their course, and in turn this will allow you to know which pedagogical techniques to use in the teaching process - learning to increase the probability of success in your classroom?. What if this tool could be used by students to identify the teacher that best suits their learning style?. Yes, was the tool able to improve its prediction regarding academic performance as time passes? It is obvious that this would require specialized software in the handling of large data. This research-development aims to answer these questions, proposing a design methodology of a student pattern recognition tool to facilitate the teachinglearning process through Knowledge Data Discovery (Big Data). After an extensive document review and validation of experts in various areas of knowledge, the methodology obtained was structured in four phases: identification of patterns, analysis of the teaching-learning process, Knowledge Data Discovery and Development, implementation and validation of software.

Keywords:Identification of patterns, Teaching-learning process, Knowledge Data Discovery and Development.

1 Introduction The educational sector faces a great challenge today, especially related to the prognosis of the trajectories generated by the students in the iteration that they have with the educational-learning-teaching systems; since educational institutions constantly generate a large amount of information so they are interested in knowing what is happening with this flow of information, for example, what actions cause students to be interested in the study of hard sciences such as mathematics, physics, chemistry, computer science, humanities

or arts, since the educational strategies applied are varied, as well as knowing what could be the income and exit profiles that best suit their educational model, as well as the possible problems of desertion or loss of interest of the students [1, 6]. All this environment has focused on a new term known as educational data mining (EDM) [6-9]. In the education sector, data mining techniques are used to understand the behavior of students, EDM emerges as a paradigm oriented to the generalization of models, tasks, methods and algorithms for the exploration of data that come from an educational context It also has the function of finding, analyzing patterns that characterize behaviors based on their achievements, evaluations and mastery of knowledge content that students have in the various learning-teaching mechanisms that are nowadays granted in the various public institutions and private with the aim of generating educational models in which they can promote new techniques or tools to analyze and increase the participatory level of students on learning-teaching systems [7,12]. For example the recommendation of activities to offer new learning experiences, warnings or predictions of student performance to improve the effectiveness of the course or promote group work, to mention a few, [6, 12]. All these modalities generate information directly and indirectly, either by the student's interactions with their peers, with the teacher and with the technological tools that are available to interact and receive the corresponding instruction, these data come from several sources of information mainly in the classrooms where the teacher and the student exchange information in which they develop and apply learning strategies on a support medium (the use of Information and Communication Technologies - ICT), [1]. Therefore, from a general point of view the EDM involves an evaluation of a curricular program or a learning unit that has the purpose of influencing the student where the instructor / researcher acquires knowledge and converts it into learning and the student considered as the end user takes ownership of it, taking it to a context of his daily life, [10]. From the above it is considered that the EDM leads us to two different points of view or orientations; Oriented to instructors / researchers, with the main objective of helping or supporting educators to improve the functioning and performance of learningteaching systems based on the knowledge acquired about the flow of information that has been derived from students based on to predictive models that can be identified qualitatively and quantitatively. Student-oriented, with the main objective of helping you in the interaction with the learning-teaching systems by increasing your experiences, the implementation of new tools to facilitate your knowledge in the various educational topics, suggestions or activities in the course according to the progress of learning, etc. From the foregoing, the present research is based on the Knowledge Data Discovery methodology (EDM method) as a basis for creating a tool (application -software) that allows teachers and students of university institutions to determine which should be pedagogies strategies in order to improve the teaching-learning process given student characteristics. The tool will work through the determination of patterns in variable data associated with the students of the university, such as: socioeconomic, demographic, behavioral, motivational, to say some; and in turn associate these patterns with the teaching methods used by teachers. This application must be able to optimize the teaching

- learning process individually based on the results that the student obtains over time in the institution. The tool in a first approximation will be based on the data provided by the exams of admission to the university. As noted [8]: "the large amount of data produced by these evaluations is a valuable input for the development of research on the quality of education that allows generating knowledge about aspects relevant to the educational agenda, and that can contribute to the design of public policies and educational practices "

3 Methodology A documentary review and interview of experts was carried out to design each phase of the proposed methodology. Similarly, the main bases were designs already proposed by the authors [13], [14], [15], [16] and [17]. The methodology was validated using the Delphi method and the expert competence index.

4 Results 4.1 State of the art There is an extensive literature in the areas of Academic Analytics (Academic Analytics AA), Educational Data Mining (EDM) and Learning Analytics (LA), which has been synthesized by [1,5]. But the main elements of these areas can be summarized as follows: The AAs correspond to the application of Business Intelligence tools in higher education institutions, to support the administrative decision-making process [7]. In general, AA focuses on the political and economic aspect of education [9]. On the other hand, EDM focuses on the application of computational and algorithmic techniques, such as classification, grouping and detection of rules, to detect patterns in large collections of data that would be difficult or impossible to analyze in other conditions [5], with the purpose to support teachers and students in analyzing the learning process. In other words, EDM focuses on technical aspects oriented particularly to virtual environments [10]. Finally, LA focuses on the measurement, collection, analysis and reporting of data about students and their educational context, in order to understand and optimize learning and the environments in which they occur [8]. Although EDM and LA focus on the same domain, and their data, processes and objectives are quite similar, LA makes use of methods from the social sciences, such as the analysis of social networks, which allows to examine and promote collaborative and cooperative connections between participants [11]. In general, an EDM or LA project is composed of the following tasks: (a) data collection and pre-processing; (b) analysis and action; and (c) post-processing. Collection

and pre-processing refers to the synthesis of information from different sources and systems. During this process, the data can be cleaned, integrated, reduced and transformed into an appropriate format. Analysis and action refers to the application of the methods themselves, to discover and visualize relevant patterns, make predictions, program interventions, modify types of evaluation, among others. Finally, post-processing involves refining the data, determining new variables, or selecting new methods of analysis for a subsequent study [10,13] At the same time, an EDM or LA project should try to answer the following questions: • What type of data does the system collect, administer and use for analysis? Data can come from centralized systems, such as LMS, or distributed, such as laboratory equipment [14]. It is even possible to obtain physiological data such as the movement of the eyes, the frequency of blinking, breathing and pulse, which reflect the levels of attention, tension and fatigue, or the incidence of cognitive overload [15]. The challenge is how to add and integrate data from multiple, heterogeneous sources, usually available in different formats, to create a set that reflects the student's activities [15,18]. • Towards whom is the analysis directed? In other words, it is necessary to define the consumer of the results, which can be students, teachers, administrators, researchers, designers, among others, who may have different perspectives, objectives and expectations. • Why is a system required to analyze and collect data? In other words, what are the objectives of the analysis, which can be: (a) monitoring and analysis of the student's activities, in such a way that the teacher or the institution can make decisions; (b) predict future student outcomes based on their activities and achievements; (c) help students identify areas for improvement in specific tasks; (d) improve the evaluation process; (e) adapt the contents according to the individual needs of each student; or (f) promote reflection among students and teachers about their practice. • How does the system perform the analysis of the data collected? In other words, decide on the type of tools to be used, be it statistics, visualization, data mining, social network analysis, among others. Despite great advances in the area, the functionalities to synthesize, analyze, report and visualize these data are relatively basic [14], partly because the activities related to learning take place in different places and contexts. In some cases, these activities can not be registered at all as they are taken offline. In other cases, the recording of data may occur in different sites, whose standards, owners and access levels may be different [15]. Despite these limitations, multiple studies have shown the advantages of collecting and leveraging existing information in an LMS, in particular to classify students according to their learning style and results, identify abnormal behavior patterns, and adapt the sequence of contents due to this information [16]. For example [17,24] describe systems capable of capturing the learning style and skills of each student, establishing relationships between the results obtained with the type, sequence and difficulty of the contents. Apart from customizing the contents for each student, these systems discover patterns of interest. For example, [23] it identifies that the students who possess spend little time completing exercises, present low participation in

forums and have low grades in short exams, they will definitely fail in the subject. On the other hand [24] identifies that the effect of traditional reading is minimal, as long as students regularly access the content and contribute frequently to the proposed activities. As local experiences (Colombia, Atlantic) on the creation of cooperation tools in the teaching - learning process we can mention: The work of Adel Alfonso Mendoza Mendoza and Roberto José Herrera Acosta in 2013 [25] "Proposal for the Prediction of the Academic Performance of the Students of the University of the Atlantic, Based on the Application of the Discriminating Analysis" In this one it is proposed that the University of Atlántico implements the use of discriminant analysis, which is a multivariate statistical model that aims to find the linear combination of independent variables that best allows differentiating (discriminating) groups. Once found that combination (the discriminant function) be used to classify new cases. In this case, determine which factors discriminate (differentiate) the group of students who successfully complete an academic period (or a subject), the group of students who unfortunately fail in the scope of achievements and finally do not pass the academic period. 4.2 Methodological proposal Based on successful methodologies applied in [15, 16, 25], regarding data mining and text. The methodology is divided into four stages. The first, or preliminary, identifies the academic space for which the software will be designed. During this stage, the necessary information is collected to catalog and store the data. From the second to the fourth phase, or simply creation, the construction of the software is carried out following the five steps corresponding to the life cycle of a software engineering project - analysis, design, development, implementation and validation - taking into account it has three axes - knowledge, didactics and educational materials [23]. In the analysis phase (phase 2), the current and desired state of the academic space is determined in relation to the pedagogical, didactic and educational materials aspects. In the design phase (phase 3) the models of knowledge and learning events are constructed. Phase 4 is subdivided into three, as follows: development phase, the software architecture is established. Implementation phase the software elements are located on a platform. Finally, in the validation phase, it is sought to identify technical, pedagogical and communication errors with the purpose of making the pertinent corrections [25], [26], [27]. The methods belonging to the phases of the project are detailed below. Phase 1. Identification of patterns: The data for this phase of the project are obtained from university admission tests, among other tests, this constitutes the primary source of information. In the data, it is common to find, for example, the socioeconomic information of the student and his family, information of the school and the results obtained in the test [19].

It is recommended to use admission data of at least 3 consecutive years, in order to identify student behavior patterns in a general way at a specific stage for the physical region where the university and the university are located. It should be noted that the study is not comparative but rather recognition and validation of patterns. For the above, data mining will be used making use of the PHP and MySQL platform that is open source. Large data can be stored in the Cloud. The processing will be done according to the following steps [28]:  Prediction: development of a model that can infer a variable from the combination of available data.  Grouping: find datasets that are grouped naturally, separating the complete set into a series of categories.  Mining relationships: discovery of relationships between variables.  Discovery through models: modeling a phenomenon through prediction, grouping or knowledge engineering, is used as a component in a future prediction or relationship mining.  Data distillation: the data is distilled to allow a human to quickly identify or classify properties of the data. Phase 2. Analysis of the teaching - learning process: Mining of texts will be used for the extraction of information that allows to determine successful methodologies that have been applied in other universities of the region to obtain positive results in the teaching learning process. The first part of this phase will be the gathering of information, which will include:  Interviews with teachers from universities and colleges with outstanding results in the teaching - learning process.  Analysis of teaching platforms, example, Moodle.  Analysis of documents linked to educational models of selected universities and colleges.  Design of experiments related to the teaching-learning process.  Analysis of studies conducted by state institutions. The second part will be text mining, which will be processed using the KDD software, the PHP and MySQL platform that is open source. The large data will be stored in the Cloud, the information processing steps are shown below: I) Identification of data of origin and linking of data - This component will collect the data raised during the first part of phase 2. In addition, the data will be linked logically. II) Loading and cleaning data - This component will load the data into a SQL Server database. III) Data summary - To streamline the performance of the query, the data can be summarized based on the results of phase 1. IV) Extraction of information: consists of consultation operations to transform the data in order to improve the extraction and retrieval of information.

V): Interactive visualization: this component interacts with the front-end of the system that communicates with the query module to recover knowledge of the database. Each component will consist of sub-components and processes. However, each component makes an intelligent framework, providing management capabilities and users to create predictive intelligence by detecting patterns and relationships. Phase 3. Knowledge Data Discovery: This phase is to link successful methodologies to student patterns to facilitate the teaching - learning process through Knowledge Data Discovery (KDD). The above will be applied to the results obtained in phase 1 and 2, adapting the analysis to the patterns of the students of the university under study. The KDD process is structured by a series of steps initiated by the selection, preparation, cleaning and formatting of the data according to the patterns analyzed, this stage is known as pre-processing, then the stage of data mining intervenes which has as its task to search and discover hidden patterns in the databases based on the use of some algorithm to be implemented, passing to the last stage of evaluation, where the validity and reliability of the acquired knowledge is determined, that is, the patterns they must be valid and of high impact for the end user. The methodology to be applied will be based on [2, 5]. See theoretical framework for a more detailed explanation. For reasons of confidentiality, mathematical models used in its KDD software are not detailed [25]. Phase 4. Development, implementation and validation of software: Develop an initial prototype using open source Hadoop software. That allows to generate patterns on groups of students of the university under study, and to associate methodologies of success to them. In order to facilitate the teaching-learning process. The above in two sub-phases. Phase 4a. Software Architecture Design 1. Choice of Reference Architecture  Discuss the most appropriate possible styles and patterns that give the support required to achieve the desired quality attributes.  Basing on Reference Architectures recognized by both academia and industry.  Recognize the size of the target application 2. Assignment of components Its objective is to define the main components that will comprise the design. The reference architecture defines the communication patterns in general for the components It is also sought:  Identify how the components conform to the patterns  Identify the interfaces and services that each component supports in order to  Validate the allocation of responsibilities of the components  Identify dependencies among them  Identify the candidate parts of the architecture to be distributed on several servers Phase 4b: software development and validation The development includes the following software modules, making use of Hadoop:  Data linkage  Data loading and cleaning  Data summary

 Information extraction  Interactive visualization Query module The prototype must be validated with students of the first semester of the university under study. Following up over two semesters.

4.3 Validation of methodological proposal To achieve the objective of the research, in accordance with what the Delphi Method poses in its theoretical postulates [10], the two human groups in charge of validating the designed instrument were formed, in this case the coordinating group and the group of experts. The first consists of the members of the research group of the university that leads the project, and the second by 20 experts in big data. Table 1 shows the result of the calculation of the competence coefficient of the experts, highlighting the 10 whose score reached medium and high level. Table 1. Competency coefficient of experts Expert

Knowledge coefficient (Kc)

1 2 3 4 5 6 7 8 9 10

0.9 0.8 0.8 1 0.85 1 1 0.9 0.75 0.9

Coefficient of argumentation (Ka) 0.9 0.7 0.8 1 0.85 1 1 1 0.9 0.9

Competency coefficient of experts (K) 0.9 0.75 0.8 1 0.85 1 1 0.95 0.83 0.9

Assessment

Alto Medio Alto Medio Alto Alto Alto Alto Medio Alto

The criteria used were: functionality, replicability, clarity, consistency, cost-benefit. At the end of the Delphi method, the percentage of consensus among the experts was 90%, which according to [5] [29] is optimal.

5 Conclusions The methodology allows to identify:  Analysis of socioeconomic, demographic, family, gender patterns, etc. Of the selected students.



Analysis of the teaching-learning process for a sample of universities in the region under study.  Association of student patterns to successful teaching - learning methodologies.  Prototype of facilitation tool of the teaching-learning process based on pattern recognition through big data. The aim of the tool is for universities to make use of ICT so that, for example, a teacher can access the different student patterns and receive indications on which teachinglearning methods to apply on the patterns or what is the methodology that It has the highest probability of success in the teaching-learning process of its course in a specific academic section. In turn, the student before enrolling in a course can obtain recommendations, making use of the application, on which teacher is best suited to their learning style. The tool over time will be more accurate for a given student and teacher given the results of their evaluations.

References 1.

Sánchez Guzmán, D., Agentes Inteligentes; Diseño e Implementación para la Enseñanza de la Física, Tesis Doctoral en Tecnología Avanzada, Centro de Investigación en Ciencia Aplicada y Tecnología Avanzada, Instituto Politécnico Nacional, México (2009). 2. Mondragón Becerra, R., Exploraciones sobre el Soporte Multi-Agente BDI en el Proceso de Descubrimiento de Conocimiento en Bases de Datos, Tesis de Maestría en Inteligencia Artificial. Departamento de Inteligencia Artificial, Universidad Veracruzana, México (2015). 3. Olmos Pineda, I., González-Bernal, J. A., Minería de Datos, Universidad Politécnica de Puebla, México (2013). 4. Martínez, M. D., Minería de datos, Universidad Nacional del Noroeste Facultad de Ciencias Exactas, Naturales y Agrimensura, Argentina, (2016). 5. Reyes Saldaña, J. F., García Flores, R. El proceso de descubrimiento de conocimiento de bases de datos. Revista Ingenierías VIII, No. 26, pp 37-47 (2015). 6. Ballesteros Román, A., Minería de Datos Educativa Aplicada a la Investigación de Patrones de Aprendizaje en Estudiante en Ciencias, Centro de Investigación en Ciencia Aplicada y Tecnología Avanzada, Instituto Politécnico Nacional, México (2012). 7. Gómez Arenas, L. I., Evaluación Comparativa de Herramientas para la Minería de Datos y sus Aplicaciones, Instituto Tecnológico de León, Guanajuato, México (2015). 8. Cristobal, R., Sebastian, V., Mykola, P. & Baker R., Handbook of Educational Data Mining, CRC Press; 1st. Edition, Chapman & Hall/CRC Data Mining and Knowledge Discovery Series, (2010). 9. Han, J., Kamber, M., Data Mining: Concepts and Techniques 2nd. Edition. Morgan Kaufmann Publishers; (The Morgan Kaufmann Series in Data Management Systems, Jim Gray, Series Editor, USA, 2016). 10. Romero Morales,C., Ventura Soto, S., Hérvas Martínez, C. Estado actual de la aplicación de la minería de datos a los sistemas de enseñanza basada en web, Actas del III Taller Nacional de Minería de Datos y Aprendizaje, TAMIDA2005, (2015), pp.49-56. 11. Luan J., Aplicaciones de Minería de datos en la Educación Superior, (IBM Press and IBM Corporation, Estados Unidos de America, 2012).

12. Peña-Ayala, A., Educational data mining: A survey and a data mining-based analysis of recent Works, WOLNM & ESIME Zacatenco, InstitutoPolitécnico Nacional, México (2013). 13. La Red Martínez, D. L., Karanik, M., giovannini, M., y Pinto, N. Perfiles de Rendimiento Académico: Un Modelo basado en Minería de datos. Campus Virtuales, Vol. IV, num. 1, pp. 12-30. Consultado el [12/11/2015] en www.revistacampusvirtuales.es 2015. 14. Thakuriah, Piyushimita Vonu; TILAHUN, Nebiyou Y.; ZELLNER, Moira. Big data and urban informatics: innovations and challenges to urban planning and knowledge discovery. En Seeing Cities Through Big Data. Springer International Publishing, 2017. p. 11-45. 15. Roiger, Richard J. Data mining: A tutorial-based primer. CRC Press, 2017. 16. Khan, Arif; Uddin, Shahadat; Srinivasan, Uma. Understanding chronic disease comorbidities from baseline networks: knowledge discovery utilising administrative healthcare data. En Proceedings of the Australasian Computer Science Week Multiconference. ACM, 2017. p. 57. 17. Bajorath, Jürgen. Compound Data Mining for Drug Discovery.Bioinformatics: Volume II: Structure, Function, and Applications, 2017, p. 247-256. 18. Bandaru, Sunith; NG, Amos HC; DEB, Kalyanmoy. Data mining methods for knowledge discovery in multi-objective optimization: Part A-Survey. Expert Systems with Applications, 2017, vol. 70, p. 139-159. 19. Chen, Chen, et al. KDD 2016-Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. EnAssociation for Computing Machinery. 2016. 20. Witten, Ian H., et al. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, 2016. 21. Paulheim, Heiko, et al. Joint Proceedings of 5th Workshop on Data Mining and Knowledge Discovery meets Linked Open Data (Know@ LOD 2016) and 1st International Workshop on Completing and Debugging the Semantic Web (CoDeS 2016). En 1st International Workshop on Completing and Debugging the Semantic Web, Heraklion, Greece, May 30th, 2016. CEUR Workshop Proceedings, 2016. 22. Blanco Villafañe, V. P. Análisis del desempeño académico del examen de estado para el ingreso a la educación superior aplicando minería de datos (Doctoral dissertation, Universidad Nacional de Colombia-Sede Bogotá). 23. Zobaa, Ahmed F.; Vaccaro, Alfredo; Lai, Loi Lei. Guest Editorial Enabling Technologies and Methodologies for Knowledge Discovery and Data Mining in Smart Grids. IEEE Transactions on Industrial Informatics, 2016, vol. 12, no 2, p. 820-823. 24. Jiang, Heling, et al. Research on Pattern Analysis and Data Classification Methodology for Data Mining and Knowledge Discovery. International Journal of Hybrid Information Technology, 2016, vol. 9, no 3, p. 179-188. 25. Mendoza, A. A. M., & Acosta, R. J. H. (2013, August). Propuesta para la predicción del rendimiento académico de los estudiantes de la Universidad del Atlántico, basado en la aplicación del análisis discriminante. In WEEF 2013 Cartagena. 26. Kim, Jinho, et al. (ed.). Advances in Knowledge Discovery and Data Mining: 21st Pacific-Asia Conference, PAKDD 2017, Jeju, South Korea, May 23-26, 2017, Proceedings. Springer, 2017. 27. Cañon, Mileidis; Jimenez, Sergio. Enfrentando resultados programa de ingeniería de sistemas de la USB con las pruebas Saber Pro. Revista Investigación y Desarrollo en TIC, 2017, vol. 3, no 1. 28. Caicedo, Edwin Javier Cuéllar; Guerrero, Stalin; López, Daniela. Propuesta para la construcción de un índice socioeconómico para los estudiantes que presentan las pruebas Saber Pro. Comunicaciones en Estadística, 2016, vol. 9, no 1, p. 93-106 (85-97 English).

29. Viloria, Amelec, and Mercedes Gaitán-Angulo. "Statistical Adjustment Module Advanced Optimizer Planner and SAP Generated the Case of a Food Production Company." Indian Journal of Science and Technology 9, no. 47 (2016).

Suggest Documents