Visualizing the Asynchronous Discussion Forum Data with Topic Detection #
Simon, Y. K. Li 1 , Gary K. W. Wong 2* 1 The Education University of Hong Kong, Hong Kong 2 The University of Hong Kong, Hong Kong #
[email protected] *
[email protected]
Abstract
2 New Experiment Design
Visualization is a crucial part in learning analytics, where ordinary teachers could comprehend the depth of textual learning (e.g. discussion forum) through the easy-to-interpret figures (e.g. keygraph). However, the open source tools are often not fully developed with plug-in to common learning management system such as Moodle. In this paper, we are going to present the preliminary results of an ongoing project learning analytics extended based on Li & Wong (2016) and Wong & Li (2016), which sets the direction for the next stage of our experiment to aim for a better educational technology application in helping teacher evaluate the learning process of students through analytics. In this project, contents of discussion forums of students were extracted into text files, which were imported to a software tool called Polaris developed by Oshawa Lab (http://www.panda.sys.t.utokyo.ac.jp/KeyGraph/) to generate keygraphs to visualize the scenarios for mining patterns. However, as keygraphs are difficult to comprehend by humans and therefore, more effective tools are needed. In our latest experiments, we deployed novel text mining algorithms and data visualization tools to improve educational analytics intuitively.
In the latest experiment environment, we deployed a Moodle environment to host discussion forum of reflective postings from students. The contents from discussion forum previously hosted at Schoology for a general education course were extracted again into this Moodle environment for hosting and social interaction analysis was performed by a tool called Forum Graph (Andy, 2013). The data was then exported to a program written in R with packages related to Latent Dirichlet Allocation (LDA) (Blei, 2012). The analyzed results would then be visualized to reveal the hidden patterns by using analytical tools. We used improved text mining and visualization tools to design a better platform.
Keywords: LDA, LDAvis, Education Data Mining Concepts: • Information System~Information System Applications~Data Mining;
1 Problems of prior experiment There have been some prior efforts in visualization for learning analytics which provided insights for this project. For example, Coffrin et al. (2015) proposed visualization methods to realize the student engagement and performance in massive open online course (MOOC) environment. Inspired by this, we turned to use social interaction analysis in the discussion forum of our Moodle environment. Atapattu, Falkner and Tarmazdi (2016) raised the ideas using topic-wise classification of discussion threads on MOOC. Ezen-Can et al. (2015) used the ideas of clustering to group discussion topics. By referring to these ideas, we selected probabilistic topic model and a clustering visualization approach in this project. Keygraphs (Ohsawa, Benson and Yachida, 1998) in one existing technology and yet are difficult to interpret, especially based on cooccurrence of keywords, which may not make any sense to some viewers. The writing styles and the choices of keywords of the forum content contributors would have huge impacts on the readability. Those keywords, if used improperly, may lead to confusion. The clusters produced by keygraph may not be the logical grouping of concepts but the statistical co-occurrence of the keywords. A sample keygraph for the discussion forum is listed Fig. 1 below.
2.1 Improved model of text mining LDA was used as the text mining tool in our latest experiment. It is an algorithm for topic discovery based on generative statistical / probabilistic model, which assumes that each document is a mixture of a small number of topics and that each word's creation is attributable to one of the document's topics. LDA aims at uncovering the hidden thematic structure in a collection of document to help identify useful patterns. A topic is in fact a multinomial distribution over many different ranked keywords of the corpus of document. The levels of details provided for analysis can be made deeper. This approach is better than just relying on using co-occurrence of keywords to determine concepts as the relevant keywords would be identified to form topics.
2.2 Data Visualization According to Duval, E. (2011, February), learning analytics can be facilitated by collecting and analyzing the traces that learners left behind to improve learning. One of the well-developed learning analytics systems is called Gradient's Learning Analytics System (GLASS) according to Leony et al. (2012, April). This system captures and visualizes the events of learning with a dashboard as a presentation layer. Inspired by the above approach using visualization as a dashboard, we selected some software tools with similar functions. First of all, Forum Graph (in Fig. 2) was adopted, modified, and deployed in the Moodle environment to extract discussion forum to provide data for social interaction analysis between the students and teachers. Forum Graph is a tool based on directed graph analysis. Each node represents a content contributor (a student or teacher) to the discussion forum. The edges linking the nodes represent the interactions amongst them. The node size represents the numbers of posts in the forum while the thickness of the edge between two nodes represents the frequency of interactions. By using the Forum Graph, the teachers can know who the active and inactive participants of discussion forums are. They can then encourage the less active students to participate more in the discussion forums.
Furthermore, LDAvis (Sievert and Shirley, 2014) was used as a data visualization tool in our experiment to analyze the contents of the discussion forum. It is a web-based interactive tool deployed in R using D3 library. LDAvis provides a high-level overview of the topics spotted from the corpus of document to show their similarity and differences by calculating the distances amongst them. Then the viewer can think about the meaning and prevalence of these topics. The relevant keywords inside each spotted topic are displayed so that the viewer can much better understand how a particular topic can be formed. The left panel indicates the distance amongst each topic while the right panel indicates the composition of keywords inside a highlighted topic. The size of the bubble represents the prevalence of the topic. The two panels are linked together so that viewers can browse all the spotted topics together with their components to understand the correlations. This important feature allows viewers to interactively explore the themes of corpus of document and associated keywords constituting the themes with relevance figures.
3 Discussion and Further Development Based on the above improved text mining methods and data visualization of analyzing academic discussion forum, we are going to invite serving teachers to test drive the platform by importing their students’ works. Feedback would be collected to improve and further develop the platform based on field testing, which is the next few stages of our experiment. Due to the ongoing development process, the full prototype will be available toward the end of the research work in a later stage. Another important enhancement we used was network analysis using igraph and other R packages (Katya, 2014). This especially supplements the LDAvis. There are two ways to use network analysis. The first one is to investigate the evolution of themes over a period of time. Usually the students and teachers would create posts to the discussion forum over a period of a single semester or even longer. There may be changes in the directions of discussion. Using network analysis for the spotted topics periodically (e.g. weekly) may further help spot those changes. The second one is to model the changes of the components (i.e. keywords and linkage) that constitute the themes. Finally, the default user interface layout of LDAvis can be enhanced so that more analytical figures can be reviewed. The linkage of the keywords inside each topic with the discussion contents were not fully explore from the default LDAvis panels. It is desirable to have some drill-up and drill-down features to perform detailed analysis if necessary for some focused studies.
4 Conclusion The LDAvis visualization tool has a huge potential to help teachers understand the existing and growing topics of discussion and make accentual findings. This would help teachers spot whether their students can meet regular learning objectives and some unexpected learning outcomes by spotting some themes being closely located with themselves or being separately scattered as outliners, which is a major improvement to our experiment.
Acknowledgement
This project is funded and supported by the Teaching Development Grant of The Education University of Hong Kong (Ref: TDG 2015/16) in collaboration with the University of Hong Kong.
References Atapattu, T., Falkner, K. and Tarmazdi, H. (2016). Topic-wise classification of MOOC discussions: A visual analytics approach. Proceedings of the 9th International conference on Educational Data Mining (EDM), Raleigh, NC, USA Andy, C. 2013, Reports: Forum Graph., https://moodle.org/plugins/report_forumgraph
Moodle.
Blei, D. M. 2012. Probabilistic Topic Models. Communications of the ACM 55(4):77–84 C. Coffrin, L. Corrin, P. de Barba, and G. Kennedy. (2015). Visualizing patterns of student engagement and performance in MOOCs. pages 83–92, New York, New York, USA, 2014. ACM Duval, E. (2011, February). Attention please!: learning analytics for visualization and recommendation. In Proceedings of the 1st International Conference on Learning Analytics and Knowledge (pp. 9-17). ACM. Ezen-Can, A., Boyer, K. E., Kellogg, S., & Booth, S. (2015). Unsupervised modeling for understanding MOOC discussion forums: a learning analytics approach. In Proceedings of the Fifth International Conference on Learning Analytics And Knowledge (pp. 146-150). Katya, O., 2014. Network Analysis with R. http://kateto.net/network-visualization Leony, D., Pardo, A., de la Fuente Valentín, L., de Castro, D. S., & Kloos, C. D. (2012, April). GLASS: a learning analytics visualization tool. in proceedings of the 2nd international conference on learning analytics and knowledge (pp. 162-163). ACM. Li, S. & Wong, G. 2016, Educational Data Mining using Chance Discovery from Discussion Board. In Proceedings of GCCCE’16, The Hong Kong University of Education (pp. 712715). Ohsawa, Y., Benson, N. E., & Yachida, M., 1998. KeyGraph: Automatic indexing by co-occurrence graph based on building construction metaphor. In Research and Technology Advances in Digital Libraries, 1998. ADL 98. Proceedings. IEEE International Forum on (pp. 12-18). IEEE. Sievert, C., and Shirley, K. E. 2014. LDAvis: A Method for Visualizing and Interpreting Topics. In Proceedings of the Workshop on Interactive Language Learning, Visualization, and Interfaces 63–70. Wong, G & Li, S. 2016, Academic Performance Prediction Using Chance Discovery from Online Discussion Forums. In Proceedings of COMPSAC’16, IEEE (pp. 706-711).
Figure 1: A sample Keygraph for discussion forum. Source: SIMON, L., GARY, W. 2016
Figure 2: A sample Forum Graph for discussion forum
Figure 3: A sample LDAvis analysis results panels.