A Semi-Automated approach using Kanban to build ... - wseas.us

1 downloads 59 Views 393KB Size Report
Systems (CMS), or used through e-learning platforms. Key-Words: - Lean Kanban, .... (https://bobsleanlearning.wordpress.com/tag/kanban/). 4 Taxonomies and ...
Recent Advances in Electrical and Electronic Engineering

A Semi-Automated approach using Kanban to build taxonomies for Multimedia Contents ALBERTO BUSCHETTU1, SIMONE PORRU2, GIULIO CONCAS2, FILIPPO EROS PANI2 1

Experteam s.r.l. Spin Off of University of Cagliari Via Zara 11, Cagliari ITALY {alberto.buschettu}@e-xperteam.com 2

Department of Electrical and Electronics Engineering University of Cagliari Piazza d'Armi, Cagliari ITALY {simone.porru, concas, filippo.pani}@diee.unica.it

Abstract: - As of today, multimedia repositories have to manage more and more multimedia objects; this leads to the need for robust systems for taxonomy building, and, in turns, for well-suited metadata for specific domains. The knowledge-base must be dynamic, so to change with time, since new kinds of multimedia objects have to be retrieved with properly selected metadata. Automated systems are widely employed for repositories management, but many of them could be enhanced through the use of widespread methodologies that, in a collaborative context like that of taxonomy building, could be most useful. We propose a semi-automated system for taxonomy construction, based on Kanban driven approach to metadata selection and validation. The main goal is to improve the accuracy of automated systems, not capable of identifying all the useful metadata. This will be achieved through both top-down and bottom-up phases, aided by a validation process where Kanban is employed. The multimedia repository could be then fully embedded in Content Management Systems (CMS), or used through e-learning platforms.

Key-Words: - Lean Kanban, taxonomy, folksonomy, knowledge base, Knowledge Management System, machine learning. them. However, creating taxonomies and folksonomies without adopting a methodology would be unadvisable. Some methodologies cannot support the creation of such a semantic system and its immediate updating whenever the need arises. In our work, we will use a methodology based on Lean Kanban, where taxonomies/folksonomies can be assigned to resources possibly as soon as the resources are published, so that they can be updated in an iterative way. The implementation of a semantic system would benefit an organised document management in a company, directly influencing its competitiveness as well. The purpose of this paper is to formalize knowledge through an approach which, starting from the output of an automated metadata extraction system, would both validate those same metadata and add new ones. Domain experts would manually achieve that

1 Introduction Knowledge Management is an important aspect of organizations [1][2]. Documents are not always organised in a way that makes them easy to be used by those that needs to access them. The classic folder-based approach becomes a limit as soon as the volume of internal document resources grows. But it can become a problem much earlier, if those same resources need to be represented in different forms, not only for internal use, but also for external users that need to view a very specific set of documents according to law. In order to correctly create a multimedia repository that could be used by other systems, e.g. CMS or elearning platforms, aiming at representing a specific view, a system of taxonomies and folksonomies can be used to filter and find the exact set of documents that is needed for each system that makes use of

ISBN: 978-960-474-399-5

273

Recent Advances in Electrical and Electronic Engineering

goal leveraging a Kanban table built specifically for that purpose. Experts – who, in this context, aim to create taxonomies – are called originators. They use a mixed-iterative approach, applying a top-down (TD) and a bottom-up (BU) analysis to the domain of interest. The resulting taxonomy would provide innovative metadata with respect to the classification of resources, such as ebooks, that play a fundamental role in the context of modern multimedia repository. The paper is structured as follow: in section 2 a brief introduction about automated taxonomy extraction is provided, whereas in section 3 Lean Kanban approach is described; an overview on taxonomies and folksonomies for multimedia objects is given in section 4; in section 5 we present our proposed approach and in the fifth section we explain the case study; the sixth section includes our conclusions.

3 Lean Kanban Among Agile methodologies, the Lean development approach has rapidly grown in popularity, finally acquiring a leading position [8][9][10]. The term “Lean” comes from Lean Manufacturing, a set of best practices that aim to maximise the value created by an organisation and delivered to the end user. The accomplishment of such a goal is possible through waste minimisation, variations checking, delivery flow maximisation, focusing more on the whole process than on local issues, and pursuing continuous improvement instead of abrupt and huge changes. One of the reasons behind the high success of the Lean development approach, is the use of the Kanban system. Kanban system features a visual approach aimed at maximising the flow and at removing bottlenecks, together with other common issues [11]. The simplicity of a both structured and powerful approach such as Lean development has made it quite popular in software engineering. It is the first attempt to apply the Toyota approach to software processing [12]. As previously stated, Kanban - meaning "signboard" - is a concept related to lean and just in time (JIT) production. According to Taiichi Ohno, Kanban is one of the means through which JIT is achieved [13]. It might be considered as a system for visualizing work, making it flow, reducing waste, and thus maximising value for the end user. As it makes use of the demand rate to control the production rate, it is considered a pull system: it makes the demand move from the customer up through the chain of customer-store processes. Setting up a Kanban system typically includes the following steps [14]: 1. map the flow, so to identify each activity; 2. represent requirements through a set of features; 3. devise an upper limit for the number of features under work in each activity; 4. setup the Kanban board, highlighting the activities and how to deal with specific issues; 5. define the policy to follow in order to assign developers to activities and tasks, and to deal with issues related to flow; 6. specify the format and typical scheduling of meetings; 7. decide which features will be implemented in each release, and when to produce working versions of the system; 8. decide which technical practices to use; 9. choose which tools, statistical methods and diagrams to use for process managing.

2 Semi-Automated Taxonomy Extraction Nowadays, automated multimedia content metadata extraction has become a mainstream discipline, thanks to the advances in the field of artificial intelligence, information retrieval, etc. Automated systems can perform different kinds of analysis:  low-level analysis [3,4];  high-level multimedia annotation [5]. The increasing dimensions of multimedia content repositories caused those automated systems to become fundamental tools for effective information classification; nevertheless, they also suffer from the so-called “semantic gap” [6]. Only recently research has begun employing semantic content functionalities in this scenario. Through the leverage of a manual system, driven by an ontological approach, it is possible to increase the accuracy of extracted data. This leads to full semantic characterization of multimedia objects. The proposed work will refer to a semi-automated system that will employ machine learning techniques [7]. Although machine learning systems are fast and don’t require human intervention, they are limited with regard to various aspects. In particular, the produced results lack in accuracy. Nevertheless, accuracy could be improved via corrections provided by humans. This would lead to semiautomated systems.

ISBN: 978-960-474-399-5

274

Recent Advances in Electrical and Electronic Engineering

One of the key Lean principles is to minimise Work In Progress (WIP), so to optimise the whole process. A process is based on different parameters, so a global optimisation is a hard goal to achieve. The parameters are usually not fixed, as they often depend on factors such as the number of developers, their skills, the practices adopted by the team, the number of features to implement, and the overall effort needed. Those factors vary not only among projects, but can also change within a single project. The Kanban approach can be considered an excellent tool for supporting both taxonomy creation and validation, since it intrinsically allows for interaction among stakeholders [15], who are responsible for the building and/or validation of the taxonomies.

like ambiguity of meaning and the social aspects that are often used to connect users and resources.

5 Proposed Approach Metadata (a pivotal means to attach semantic content to documents) can be assigned to a document by using taxonomies and folksonomies. Metadata are a substantial tool to distribute documents in external systems connected to a document manager. Those systems are made of different sections or “views” (usually hierarchical) [19]. Every view has its own “mapping” that allows to view the appropriate documents in it. Most often, a folder-based system (Foldering Classification) is used. Foldering Classification mirrors precisely the external user's view. It enables an Automated Classification, by which copying a document into a specific folder would automatically classify that document as a part of the related view page. However, this approach has some clear disadvantages: need to create a specific set of folders just for some specific external views; distribution of documents in different folders, with possibility of duplication on more folders; impossibility of using the folder system on other applications. In light of all this, we are going to define, formalise and validate taxonomies and folksonomies (Taxonomy Construction) [20][21] using a semiautomated system, whose output is refined through a mixed-iterative approach, based on a Kanban (through the use of a Kanban Module) driven TD and BU analysis applied to the knowledge domain we need to investigate. Manual interventions are considered burdensome and costly, but their use is not forbidden: they often lead to metadata accuracy enhancement, since those interventions provide a higher level of disambiguity [22]. Kanban is well suited for this job, because its main feature lies in fulfilling the need to define taxonomies quickly, involving all the stakeholders of a company. The definition of taxonomies cannot be done through a waterfall model, but is perfectly encompassed by an agile approach, employing a top-down and bottom-up approach.

Fig.1 A typical example of a Kanban board (https://bobsleanlearning.wordpress.com/tag/kanban/)

4 Taxonomies and Folksonomies for Multimedia Objects A taxonomy structures the classification of entities in the form of a hierarchy, according to the assumed relations among the corresponding entities they represent in the real world [16]. Entities are represented as nodes on a tree. In particular, every node represents a class, and nodes are bound to one another by the relation “is subclass of” or “is a” (these relations are usually called parent-child relations). The root of the taxonomy represents the most general class. A taxonomy is thus the simplest form an ontology can have, and it is a simple hierarchical classification of entities of a field of application. Creating a taxonomy is an iterative activity (with a TD approach), and the participants of this activity are coordinated experts in the domain where documents are used. Folksonomies, used as a research tool, are based on a BU system: users use the system without following strict rules, but referring to a defined and controlled vocabulary [17][18]. Folksonomies are affected by some issues,

ISBN: 978-960-474-399-5

Fig. 2 Proposed approach schema

275

Recent Advances in Electrical and Electronic Engineering

Starting from this KB, further iterative refining can be made by re-analysing the information in different phases:  with a TD phase, checking if the information that is not represented by the chosen formalization can be formalized;  with a BU phase, analysing if some information found about the actual objects might be connected to formalized items;  with the iterations of phases by which these concepts are reconciled. This is obviously only needed for the information to be represented. The knowledge we want to represent is the one considered of interest by the domain users: for this reason, the most important pieces of information are chosen. At the end of this analysis we are going to define a formalization, in form of ontologies, taxonomies or folksonomy, metadata schema, able to represent the knowledge of interest for this domain. The final result of these phases will be a formalized knowledge able to be represented, reused and managed through Knowledge Management Systems, where the knowledge of interest is available.

5.1 Mixed-Iterative Process When our knowledge or our expectations are influenced by perception, we refer to schema-driven or TD elaboration. A schema is a model formerly created by our experience. General or abstract contents are placed at a higher level, while concrete details are placed at a lower level. A TD elaboration happens whenever a higher level concept influences the interpretation of lower level information. Generally, the TD process is an information process based on former knowledge or acquired mental schemes; it allows us to make inferences: to “perceive” or “know” more than what can be found in data. The TD methodology starts, therefore, by identifying a target to reach, and then pinpoints the strategy to use in order to reach that target [23]. The target is represented by a Kanban Card; this card is shared by the originators, with the aim of defining the taxonomy following a shared set of guidelines. Our aim is to begin with a formalization of the reference knowledge (ontology, taxonomy, metadata schema) to start classifying the information on the reference domain. The model could be, for instance, a formalization of one or more classifications of the same domain. In the bottom-up phase, the knowledge to be represented is analysed by pinpointing, among the available information, what is needed, in order to define a reference terminology to describe the data. This will be done through Kanban cards, that will be shared among the originators. We are going to analyse the objects of interest in a domain, objects that contain the information of the domain itself; both information whose structure need to be extrapolated and the information in them are to be pinpointed. One of the limits of this phase could lie in the creation of the KB, because each object can have a different structure and a different way of presenting the same information. Therefore, it will be necessary to pinpoint the present information of interest, defining and outlining it. In the iterations phase, we are going to try to reconcile the two representations of domain knowledge obtained in the previous phases. Thus, we want to pinpoint, for each single metadata found in the TD phase, where the information can be found in the metadata representing the knowledge of each object (which, for us, represents the knowledge we want to represent, considering the semantic concept and not the way to represent it, absolutely subjective for every knowledge object).

ISBN: 978-960-474-399-5

Fig. 3 Kanban Board activity

6 Case Study Throughout this case study, we see how such a Kanban driven methodology, which leverages a mixed-iterative approach based on TD and BU analyses, could be efficient in formalizing knowledge. Our real goal is to make knowledge manageable, shareable and reusable; we will focus our attention on information of interest in domain-specific knowledge. The system must produce a knowledge base in an open-access format, in order to avoid proprietary ones, as they could bring lock-in effects with them. Doing so, we allow for the exploitation of the knowledge base by diverse metadata extraction systems.

276

Recent Advances in Electrical and Electronic Engineering

As we previously stated, in this case study we want to formalise knowledge related to multimedia content. The main goal is then to optimise multimedia object metadata classification. The basic starting concept is the definition of a KB: in our study, the knowledge-base is made by all kinds of multimedia objects of interest for our specific domain. Through our approach, knowledge is extracted in order that a common structure could be modelled through a taxonomy, so to classify and make the most part of such knowledge available, thus, the resulting taxonomy would allow for the definition of a reference knowledge for multimedia content. First, we are going to analyse the metadata standards used in multimedia content management. Then, we strive to define a taxonomy to represent the semantics of these multimedia contents, leveraging the Kanban approach throughout the process. Our specific case involves setting up a system to a Public create a “Digital Library” of Administration. The platform hosts a number of multimedia object that the institution must make available to the public.

semantically described and it can be then automatically published by the CMS. The same resource can be then reinserted into the Kanban workflow in order that it could be further modified and/or enriched. As previously mentioned, metadata are automatically published by the semi-automated system, which could also enrich its training set and refine the accuracy of the extractions of highly similar multimedia objects.

Fig. 4 Simplified system schema

7 Conclusions We studied a process to identify existing formalizations and knowledge sources within the domain, paying attention to multimedia objects. Valuable knowledge is represented into explicit form through formalization and codification of information, in order to facilitate the availability of knowledge. Our real goal is to make interesting knowledge available for sharing and reuse. In order to do this, we employ a process which, starting from the output of an semi-automated system of metadata extraction from multimedia content, is capable to define comprehensive taxonomies, via a Kanban driven mixed iterative approach built upon topdown and bottom-up interventions from the both the stakeholders and the originators. A future extension of the proposed work could be obtained through the insertion of an additional system capable to assess the quality of the generated taxonomies. In addition, as digital libraries have to manage an ever growing number of multimedia objects, and given that those objects must be described through metadata stored in large repositories, it seems that a natural extension of our methodology could be one related to the digital library context. As digital libraries are deeply interested in effective ways to both categorize and effectively present the content they manage, this would likely be the most interesting possible extension of the proposed work.

6.1 Implementation In the proposed study, a CMS for multimedia objects storage, and a semi-automated system based on machine learning techniques will be employed. The extracted metadata resulting from the semiautomated system will be made available through the CMS administrative area, and will be then used inside and will be editable via its interface. Those responsible for the metadata editing are the originators, that must be registered users of the system. The metadata will be made available via a REpresentational State Transfer (REST) service, which would enable the system with CRUD (Create, Read, Update, Delete) capability on the metadata associated to a multimedia object. This service would allow for interfacing with an external software that implements a Kanban table, or with any other client which would permit metadata editing in a shared environment; those metadata would refer to a specific object and for classifications so different as taxonomies and folksonomies. Each and every object is then associated to a task among those reported on the Kanban. As soon as that task is completed it is transferred into the “Done” column, and chosen metadata (metadata release) can be made persistent inside the CMS; thus, that resource could be considered fully

ISBN: 978-960-474-399-5

277

Recent Advances in Electrical and Electronic Engineering

Acknowledgments This publication has been produced as part of the research project entitled "Servizi Avanzati in Cloud" developed at "Experteam s.r.l." company. The project is financially supported through a research grant funded by the Autonomous Region of Sardinia with European local funds (P.O.R. SARDEGNA F.S.E. 2007-2013 - Obiettivo competitività regionale e occupazione, Axis IV Human Resources, Line of Activity l.1.1. e l.3.1.).

[7] F. Sebastiani, Machine learning in automated text categorization, ACM Computing Surveys, Vol.34, No.1, 2002, pp. 1–47. [8] J. Highsmith, Agile Software Development Ecosystems, Addison Wesley, 2002. [9] M. Poppendieck, T. Poppendieck, Lean software development: An agile toolkit, Addison Wesley, Boston, Massachusetts, USA, 2003. [10] L. Seiyoung, Y. Hwan-Seung, Agile Software Development Framework in a Small Project Environment, Journal of Information Processing Systems, Vol. 9, No.1, 2013, pp. 69-88. [11] D. J. Anderson, Kanban: Successful Evolutionary Change for Your Technology Business, Blue Hole Press, 2010. [12] D. Leffingwell, Agile Software Requirements: Lean Requirements Practices for Teams, Programs, and the Enterprise, Addison-Wesley Professional, 2011. [13] T. Ohno, Just-In-Time for Today and Tomorrow, Productivity Press, 1988. [14] E. Corona, F. E. Pani, A Review of LeanKanban Approaches in the Software Development, WSEAS Transactions on Information Science and Applications, Vol.10, No.1, Print ISSN: 1790-0832, E-ISSN: 22243402, 2013. [15] J. Dutra and J. Busch, Enabling Knowledge Discovery Taxonomy Development for NASA, NASA technical whitepaper, 2003. [16] E. J. Jagerman, Creating, maintaining and applying quality taxonomies, Ed. Jagerman, 2006. [17] T. Vander Wal, Folksonomy Coinage and Definition, 2007. http://www.vanderwal.net/folksonomy.html. [18] A. Mathes, Folksonomies-cooperative classification and communication through shared metadata. Computer Mediated Communication, Vol.47, No.10, 2004, pp.1-13. [19] R. Trigg, J. Blomberg, L. Suchman, Moving Document Collections Online: The Evolution of a Shared Repository. In: Proceedings of the European Conference on Computer-Supported Cooperative Work ECSCW’99, 1999. [20] R. S. Sharma, M. Chia, V. Choo, E. Samuel, Using A Taxonomy For Knowledge Audits: Some Field Experiences. In: Journal of Knowledge Management Practice, Vol. 11, No.1, 2010. [21] E. Tsui, W. M. Wang, C. F. Cheung, A. S. M. Lau, A concept-relationship acquisition and inference approach for hierarchical taxonomy

Simone Porru gratefully acknowledges Sardinia Regional Government for the financial support of his PhD scholarship (P.O.R. Sardegna F.S.E. Operational Programme of the Autonomous Region of Sardinia, European Social Fund 2007-2013 - Axis IV Human Resources, Objective l.3, Line of Activity l.3.1).

References: [1] F. E. Pani, M. I. Lunesu, G. Concas, C. Stara, M. P. Tilocca, Knowledge Formalization and Management in KMS. In: Proceedings of the 4th International Conference on Knowledge Management and Information Sharing, KMIS 2012, Barcelona, Spain, 2012. [2] M. I. Lunesu, F. E. Pani, G. Concas, An Approach to manage semantic informations from UGC. In: International Conference on Knowledge Engineering and Ontology Development, KEOD 2011, Paris, France, 2011. [3] F. Ciravegna, Adaptive information extraction from text by rule induction and generalization, Proceedings of the 17th International Joint Conference on Artificial Intelligence, 2001. [4] K. Rapantzikos, Y. Avrithis, S. Kollias, On the use of spatiotemporal visual attention for video classification”. Proceedings of international workshop on very low bitrate video coding (VLBV '05) Sardinia, Italy, 2005. [5] G. Tsechpenakis, G. Akrivas, G. Andreou, G. Stamou, S. Kollias, Knowledge-assisted video analysis and object detection. Proceedings of European symposium on intelligent technologies, hybrid systems and their implementation on smart adaptive systems (EUNITE 2002), Albufeira, Portugal, 2002. [6] A.W.M. Smeulders, M. Worring, S. Santini, A. Gupta, R. Jain, Content-based image retrieval at the end of the early years, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 22, 2000, pp. 1349–1380.

ISBN: 978-960-474-399-5

278

Recent Advances in Electrical and Electronic Engineering

construction from tags. In: Information Processing & Management, Vol. 46, No.1, 2010, pp. 44-57. [22] R. Sujatha, R. K. Rao Bandaru, Taxonomy Construction Techniques - Issues and Challenges. In: Indian Journal of Computer Science and Engineering,. Vol. 3, No.5, 2011. [23] F. E. Pani, M. I. Lunesu, G. Concas, C. Stara, M.P. Tilocca, Knowledge Formalization and Management in KMS, Proceedings of the 4th International Conference on Knowledge Management and Information Sharing, Barcelona, 2012.

ISBN: 978-960-474-399-5

279

Suggest Documents