FTC 2016 - Future Technologies Conference 2016 6-7 December 2016 | San Francisco, United States
Imparting Data Knowledge in Discrete Data Volumes using Crowded Agent Approach for MultiPerspective and Visualized Big Data Hiba Khalid
Usman Qamar
Usman Akram
Department of Computer Engineering College of Electrical & Mechanical Engineering, NUST Rawalpindi, Pakistan
[email protected]
Department of Computer Engineering College of Electrical & Mechanical Engineering, NUST Rawalpindi, Pakistan
[email protected]
Department of Computer Engineering College of Electrical & Mechanical Engineering, NUST Rawalpindi, Pakistan
[email protected]
Abstract—the modern world is faced with the issues and concerns of business intelligence. Methodologies and techniques have been developed to facilitate the process of business analysis and comprehension. One such scientific field is focused on achieving the intelligent data before it can be utilized for intelligent analysis. The current size of information is huge and the tasks aimed out of analysis present a complex situation. These perceptions can be handled by using the right and optimal techniques from artificial intelligence. This paper is focused on achieving multi-agent perspective architecture for using data rawness and discrepancies to turn them into data intelligence and opportunities. The MAS technique has been used to generate faster data processing and for imparting data with knowledge of its own. Keywords—Big Data; Topological data; multi agent system
I.
INTRODUCTION
Multi-agent Systems is a field of Artificial Intelligence that is widely used in software engineering to solve and understand various domains and types of programing and architectural problems. Multi-agent system as the name indicates is a distributed system that contains more than one agent actively or passively involved in resolving problems. Multi-Agent Systems have been in the field for a long time however recently the importance has risen up due to emerging problems of data, collaborations, independent learners, cognitive science, computational modeling and many other related fields of work. Among many fields where Multi-Agent Systems are providing services; another field has used the concepts to resolve upcoming issues. Business Intelligence and big data have been on the research development for a long time now, recently the data size and knowledge interpretation has led to implementation of agent services to resolve business intelligence complex situations. Business intelligence is directly linked to big data. Big data in reality is a collection of huge volumes of data that cannot be analyzed like standard data in relational tables over a few minutes. The concept that changes a standard collection of data into big data is ‘Volume of data’ & ‘Size of data’. Business intelligence is one the most current topics and scientific problem at hand in the field of
computer science. Its purpose is to achieve complex decisions based on raw volumes of data to make known and intelligent decision for support. Business intelligence is a complex and multi-valued task of acquiring solutions from data i.e. extracting meaning from the data itself. The data could be in any form, type or shape [1]. The purpose is extraction of knowledge from it. In volumes of data, the data can be in structured or unstructured format. Thus different logical implications can be used to obtain the most optimized results from the set of data. The definition of big data is centered on few very important concepts such as follows A. Volume Volume in big data is the definition of how good amount or bad amount of data is present. The conditional quantity of data is referred to as the volume. This volume is the definition for either the data that has been stored i.e. the amount of data that has been stored. Secondly it is also the amount of generated or streamed data made available for use. B. Velocity The speed with which data is processed, curated, stored or acquired is called the velocity of data. The processing of speed and the storage speed are independent variables and may differ from each other. C. Variety Variety as the name indicates is the availability of options in a designated set of data. It is the varying nature of data that corresponds to the category of variety. Variety is a conclusive parameter in the big data science. Variety is one of the most helping and conducting constituents of data science since the nature and type of data involved can actually accommodate various changes in bands or lengths of data. Moreover it can help in defining the various categories a data can be classified into based on a specific domain or data set. Consequently it is a facilitator when it comes to extraction of knowledge and conclusion dependent on data types i.e. variety available in data.
1|Page 978-1-5090-4171-8/16/$31.00 ©2016 IEEE
FTC 2016 - Future Technologies Conference 2016 6-7 December 2016 | San Francisco, United States D. Veracity Veracity is the definition of quality of data. This involves the analysis and aspect based recognition of degraded data quality. Veracity is measured and analyzed on the data that has been acquired and stored. The processing is either improvement or exclusion of degradation. E. Complexity Data complexity is on the most crucial parts to be handled when it comes to data science. It is actually a description of how many ‘morphs’ or forms a data can exist in and present its co-existence in other places. Thus complexity is the multiplicity present in any big data collection. Complexity is also the combination and pertinence of varied key features in a single part or atomic data value in a data set. This feature of complexity in big data also incorporates the methods of data linking, data based correspondence, and data oriented connections and data correlations. F. Multi-Agent Systems & Big Data Features Mapping Big data is indeed a very active research field. The problem is business intelligence with all the technologies and tools working to resolve real time constraints on how data can be perceived and analyzed to produce results. However due to volume and lack of knowledge at raw data level the results are considered to be time and cost expensive. Multi agent systems function as a group of agents working together to achieve the tasks in a less costly manner and in a time effective way. The two fields can be merged to derive an intermediate solution that supports the multi-agent perspective on information extraction or imparting information. The following features have been mapped along with business intelligence parameters to lay out an idea of research conducted and how each parameter can be optimally used to support agent based knowledge imparting systems. G. Volume and MAS The volume issues regarding big data can be resolved using more than one agent. Each agent can work in parallel to support analysis and curation of data. Data lines can have flexible agents to understand the distribution and percentage of data shifting involved. H. Velocity and MAS Velocity parameters can be handled using activator agents that learn the speed of processing, storage, retrieval and curation on experience and shift to distributor and manager agents based on that has a direct relation to agent speeds I. Variety and MAS Variety is also an aspect that agents can work together to understand and relate to. The learning agents can learn from big data patterns and execute or distribute data according into next phases. A deep learning or self-evolving agent serves this purpose the best. J. Veracity and MAS Veracity is associated with the quality of data under
consideration. Many agents can be left to work in background on restoration processes or for identification of discrepancies in data. However this task is costly to achieve since the volume of data and rate of error is more than the number of optimal agents that can be deployed to achieve a probable result. K. Complexity and MAS The complexity of data and analysis based agent system is a derived and a dependent entity under the current research. Since the volume and nature of data directly correspond to complexity of data and how many possible data connections have to maintained and observed. Thus this layer of execution can was achieved alongside the parametric volume analysis and variety distribution calculation. L. Problem Statement Recently the market has been flooded with technologies, software’s and algorithms that resolve the problem of many data oriented issues. However the underlying cause has been under consideration of development such as the imparting f knowledge into the raw data itself so it can be categorized as a part of intelligent system on its own. The research was conducted on the same concern i.e. to make the data knowledge apparently and partially intelligent for less stress and more coverage issues. The research was conducted to understand the underlying process required for making the data intelligent on its own and providing end users with less complex data volumes before it can be used for analysis. The research statement is stated below: “The aim of the study is to achieve a working multi-agent system for establishing ground rules and inhibition in raw data technologies. The logical layer of system comprises to schedule complex data into a structured known entity; independent of rawness from original inconsistent state. Collaborative intelligence is supposed to convert the raw implicated data into partially independent intelligent cubes or icosahedrons of data”. II.
METHODOLOGY
This section is a description of all architectural processes and conclusion derived from the implementation and analysis of multi-agent systems on data science and data topologies. This section presents the methodologies and processes used to achieve an intelligent and collaborative multi-agent system. The section presents the concepts of multi-agent systems utilized to resolve problems in big data. Secondly the section provides mathematical investigation for analyzing data sets and conclusive results from the investigation. Thirdly the section provides the system overview and algorithms associated with achieving the designated tasks of MAS. Multi-Agent systems are a computational collection of agents that interact and collaborate to achieve a set of goals or tasks. Typically MAS consists of a group of agents along with their functional and working environment. There are different types of agents involved in the definition of MAS; these are as follows
2|Page 978-1-5090-4171-8/16/$31.00 ©2016 IEEE
FTC 2016 - Future Technologies Conference 2016 6-7 December 2016 | San Francisco, United States Active Agents Active agents are categorized as the agents that function according to goals defined. The definition of goals is dependent on the type of task that is required from the system. Thus these agents are the aware agents i.e. the conclusive and aware goal nature regards them as active agents. Passive Agents Passive agents are the agents in the systems that do not have a predefined set of aims or goals. Their functionality is flexible but hey can never incorporate the presence of goal or the knowledge of goals. Cognitive Agents Cognitive agents are the agents which are programmed as a part of computational complexity. They contain the computational logic of a program or system. These have the calculated complexities as the part of the system. The agent environments can be generically classified into three categories
B. Agent: Learner John This agent is an active agent and works on deep mathematical learning and reasoning to update its inventory of knowledge. The knowledge of this agent is revised using the mathematical rules particularly defined for these agents. John is typically an agent that has been programmed to work around achievable goals and to analyze its own performance after a specified data length. The following characteristics are designated to john for data analysis. The data domains and sections have been considered according to volumes of data and complexity of data available. C. Study Variables-I Historical Summarization: ( ) Data Complexity Level: Data Insensitivity:
Virtual Environment Discrete environment
Analysis Factor:
Continuous environment Under the current research and investigation the agent environment is data field. All agents are functional and participant’s on the logical layer of establishment. According ti the MAS theory every agent in the system has characteristics based on its nature. There are some inherent features that define the characteristics or types of agent forms that can actually exist. Some of the important characteristics are given below Autonomy: The agent’s under this characteristic are independent. If not completely independent the minimum requirement is the partial independence and partial awareness of the agents. Thus the behavior is informed, aware and autonomous. Local View: only local views and perceptions are made available to the agents in a working system. No agent is ever completely aware of the environment or causal effects to a situation. Decentralization: No one man show is the policy for this characteristic. No agents play the master mind in case of decentralization. There is no single controlling entity or agent A. Mathematical Investigations The above fig.01 explains the overview of how the multiagent system functions alongside data circles or data topologies [1]. The agent sections are designated. Under the current research only two priority structures have been considered for simplicity and analysis. There are three types of defined agents in the proposed system. The characteristics of each agent have been defined below
Error Rate:
Soft Boundaries:
( ) Weighted conclusion: (
)
∑
Measured References: (
(
)
)
Data Links: ∑ Communication Channel: Availability Switch: ∑
3|Page 978-1-5090-4171-8/16/$31.00 ©2016 IEEE
FTC 2016 - Future Technologies Conference 2016 6-7 December 2016 | San Francisco, United States Update Rate:
The learner agent uses the above formulas to function according to programmed algorithm. The functionality of agent is a direct constituent from these formulas. All effective data calculations are performed using the informed decision and keeps updating the experience based on number of iterations performed every execution cycle. D. Algorithm-I: Learner John Begin Start_process() Look for data() useKND() find similar datasets() construct mapping() locate() initiate() work() delete_discrepancy() help_find() if not then leave() schedulerequest() banish veracity() start volume*1+n*9 delete_sector() complete cycle() update rule() use rating9) sent to helper() function_rest() use memory() update scenario() update knd() update logic() else shift path() loof for availability(0 collaboration generation() calculate volume() design base() send base() send math() get forest() update network()
assign weights() changeoverchannel() Exit Rest() Store() Experience() learn_validate Reboot() End. E. Agent: Aware Opportunist The second in the system is an aware agent that uses contextual, textual and multi-media information residing inside the raw data to select sections of data and process it. This aware agent is regarded as the ‘opportunist’ since it only becomes active when certain sets or types of data appear in the volume or executions. Secondly the agent gas been programmed to an active participant in case of conflict of resources or agent unavailability. This agent has its own defined set of variables and programmed algorithm that sets its functionality different from the other two variables. F. Study-Variable-II The following study variables have been considered while defining the nature of opportunist agent Opportunity Detection: (1) ∑
(2)
Aware Decision Making: ∑ ∑
∑ (3)
G. Agent: Sleep Walker or Handle This agent is the error handler while the other two agents were processors and decision makers. This agent learns the flaws of system over time and uses mathematical calculation to conclude a decision of identifying threat and opportunities. The opportunities are floated to the opportunist agent and this agent dedicates its functionality on analyzing the problems in data shapes or errors in arriving data. This agent works in sleep mode and looks for discrepancies in the storage techniques or retrieval processes.
4|Page 978-1-5090-4171-8/16/$31.00 ©2016 IEEE
FTC 2016 - Future Technologies Conference 2016 6-7 December 2016 | San Francisco, United States
Fig. 1. System overview of proposed mas working system
III. TABLE I. Rule No Rule No 1 Rule No 2 Rule No 3 Rule No4 Rule No 5 Rule No 6 Rule No 7 Rule No 8 Rule No9 Rule No 10 Rule No 11 Rule No 12
EXPERIMENTAL RESULTS & ANALYSIS RULE BOOK ANALYSIS FOR SPECIFIED AGENT TYPES Condition d1 < d2 and a1 < a2 d2 < d1 and a2 < a1 d1 < d2 and a2 5 min Approx. 1- 2 hr >2 sec approx >2 sec approx >2t approx >2t approx Approx. 1- 2 hr -
RECorrection (REC)
Overall weight (OV)
50% - 80%
8/10
69%
7/10
99%
10/10
89%
9/10
89%
9/10
89% 89%
9/10 9/10
99%
10/10
-
-
analytics can be deployed and observed on multiple datasets with the same implementation. One particular implementation of ACO tuning can be supported out on multiple data sets created on the rational nature of execution and adoption [22]. Big data does not necessarily present itself in the normal meaning of information it can also present itself as shape or grids, this grid of information can then supplementary be exploited and operated, grid nodes and peripherals can be both engendered and computed using ACO [20]. V.
CONCLUSION
The use of multi-agent approach can help in gathering information and also imparting information into the raw data. Since data is increasing day by day and thus many tools have been developed for analysis of big data it is crucial to understand how this data can be useful. The meaning itself in the data cues and topologies with the help of artificial agents can completely change the perspective on big data knowledge domain. Data clustering and data science has always been of importance in the field of computer science. With the advent of data science and analysis new algorithms have been proposed to bring the best solution in the market. Many computational problems have been resolved by the tools and techniques however the process of improvement shall never be put to an end. Thus the research was carried out to understand the capacities and applications of multi-agent systems that can be accomplished using the knowledge domains from big data. The datasets used were publically available and programming agents were based on mathematical and computational complexities to understand and perform according to the given data capacities. In future we intend on working with logical inception of data layer and convert data of raw form into bits seconds for agent analysis this will facilitate the cost of execution per 100 million records. ACKNOWLEDGMENT I would like to thank Dr. Usman Qamar for his patience and deep knowledge in Multi-agent systems and data science.
6|Page 978-1-5090-4171-8/16/$31.00 ©2016 IEEE
FTC 2016 - Future Technologies Conference 2016 6-7 December 2016 | San Francisco, United States He has been generous with his time and knowledge to help complete the research in a dignified manner. [1]
[2]
[3]
[4] [5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
REFERENCES M.Usman Nisar, Arash Fard, and John A. Miller, Techniques for Graph Analytics on Big Data. 2013 IEEE International Congress on Big Data, pp. 255-262. Yang Song, Gabriel Alatorre, Nagapramod Mandagere, and Aameek Singh, “Where IT Management Meets Big Data Analytics,” Storage Mining. 2013 IEEE International Congress on Big Data, pp. 421-422. M. Riedel, A.S. Memon and M.S.Memon, High Productivity Data ProcessingAnalytics Methods with Applications at MIPRO 2014, 26-30 May 2014, Opatija, Croatia, pp. 289-294. Kapil Bakshi, “Architecture and Approach,” Considerations for Big Data. 978-1-4577-0557-1/12/$26.00 ©2012 IEEE, pp. 1-7. Sang-Woo Jun_, Ming Liu_, Kermin Elliott Flemingy, and Arvind_, Scalable Multi-Access Flash Store for Big Data Analytics. Cambridge, MA 02139. Xiongpai QIN, Huiju WANG, Furong LI, Baoyao ZHOU, Yu CAOCuiping LI, Hong CHEN, Xuan ZHOU, Xiaoyong DU, and Shan WANG, “Paving the Way toward a Unified System for Big Data Analytics,” Beyond Simple Integration of RDBMS and MapReduce. 2012 Second International Conference on Cloud and Green Computing, pp. 716-725. Hua Luan, Mingquan Zhou, and Yan Fu, Parallel Techniques for Improving Three-dimensional Models Storing and Accessing Performance, 2013 Ninth International Conference on Natural Computation (ICNC), pp. 1177-1182. Shweta Pandey and Dr.Vrinda Tokekar, Prominence of MapReduce in BIG DATA Processing. 2014 Fourth International Conference on Communication Systems and Network Technologies, pp. 556-560. Hyoung Woo Park, Il Yeon Yeo, Jongsuk Ruth Lee, and Haengjin Jang, Study on big data center traffic management based on the seperation of large-scale data stream. 2013 Seventh International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing, pp. 591-594. Marwa Elteir, Heshan Lin, and Wu-chun Feng, Enhancing MapReduce via Asynchronous Data Processing. 2010 16th International Conference on Parallel and Distributed Systems, pp. 397-405. Adams,J, Woodard, D.L, Dozier, G and Miller, P, Genetic-Based Type II Feature Extraction for Periocular Biometric Recognition: Less is More. Pattern Recognition (ICPR), 2010 20th International Conference on 23-26 Aug, pp. 205 - 208. Zhe Wang, Songcan Chen, Jun Liu and Daoqiang Zhang, Pattern Representation in Feature Extraction and Classifier Design: Matrix Versus Vector. Neural Networks, IEEE Transactions on 12 March 2008, pp. 758 - 769. Mohammadi, H, Venetsanopoulos, A.N and Sadeghian, A, Bouncing and raindrop image search algorithms, two novel feature detection mechanisms. Digital Signal Processing (DSP), 2013 18th International Conference on 1 – 3 July 2013, pp. 1 - 6. Chorianopoulos, K, Giannakos, M.N, Chrisochoides, N and Reed, S, Open Service for Video Learning Analytics. Advanced Learning Technologies (ICALT), 2014 IEEE 14th International Conference on 7 – 10 July 2014, pp. 28 - 30. Honghai Liu, Shengyong Chen, Kubota, N, Intelligent Video Systems and Analytics: A Survey. Industrial Informatics, IEEE Transactions on 01 April 2013, pp. 1222 - 1233. Wang En Dong, Wu Nan and Li Xu, QoS-Oriented Monitoring Model of Cloud Computing Resources Availability. Computational and Information Sciences (ICCIS), 2013 Fifth International Conference on 21 – 23 June 2013, pp. 1537 - 1540. Jadeja, Y, Modi, K, Cloud computing - concepts, architecture and challenges. Computing, Electronics and Electrical Technologies (ICCEET), 2012 International Conference on 21 - 22 March, pp. 877 880.
[18] Wenhao Huang, Haikun Hong, Guojie Song, Kunqing Xie, Deep process neural network for temporal deep learning. International Joint Conference on Neural Networks (IJCNN), [19] Agneeswaran, V. S. (2012). Big data - theoretical, engineering and analytics perspective. In S. Srinivasa & V. Bhatnagar (Eds.), Big Data Analytics SE - 2Berlin, Germany: Springer- Verlag, 7678, 8-15. [20] Brzezniak, M., Meyer, N., Flouris, M., Lachaiz, R. & Bilas, A. (2008). Analysis of grid storage element architectures: high -end fiberchannel vs. emerging cluster -based networked storage. In M. Brzezniak, N. Meyer, M. Flouris, R. Lachaiz & A. Bilas (Eds.), Grid middleware and services SE - 13, US: Springer, 187 -201. [21] Bullnheimer, B., Hartl, R. F. & Strauss, C. (1999). A new rank -based version of the ant system: a computational study. Central European for Operations Research and Economics, 7(1), [22] Chen, J., Chen, Y., Du, X., Li, C., Lu, J., Zhao, S. & Zhou, X. (2013). Big data challenge: a data management perspective. Frontiers of Computer Science, 7(2), 157-164. [23] M. H. Dunham, Data Mining: Introductory and Advanced Topics, Prentice Hall, 2003 [24] A. A. Freitas, S. H. Lavington, Mining very large databases with parallel processing. Dordrecht, The Netherlands, Kluwer Academic Publishers,1998 [25] I. Foster, C. Kesselman, The Grid: Blueprint for a New Computing Infrastructure. Morgan [26] T. G. Dietterich, An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting andrandomization. Machine Learning Vol.40, 2000, pp.139-158 [27] Cheng-Fa Tsai, Chun-Yi Sung, "DBSCALE: An Efficient DensityBased Clustering Algorithm for Data Mining in Large Databases" (PACCS 2010) Second Pacific-Asia Conference on Circuits, Communications and System, 91201 Pingtung, Taiwan, October 2010. [28] R. Agrawal, J. C. Shafer, Parallel mining of association rules IEEE Transactions on Knowledge and Data Engineering, Vol 8., 1996, pp.962-969 [29] E. Januzaj, H-P. Kriegel, M. Pfeifle, DBDC: Density-Based Distributed Clustering Proc. 9th Int. Conf. on Extending Database Technology(EDBT), Heraklion, Greece 2004, pp. 88-105 [30] N-A. Le-Khac, L. Aouad, and M-T. Kechadi, A new approach for Distributed Density Based Clustering on Grid platform The 24th British National Conference on Databases (BNCOD'07), Springer LNCS 4587, July 3-5, 2007, Glasgow, UK. 2007 [31] C. J. Merz, M. J. Pazzani. A principal components approach to combining regression estimates. Machine Learning Vol. 36, 1999, pp. 932 [32] J. Kivinen, and H. Mannila, "The power of sampling in knowledge discovery," Proceedings of the ACMSIGACTSIGMODSIGART,Minneapolis, Minnesota, United States, May 24 - 27, 1994, pp.77 [33] K. Sayood, Introduction to Data Compression, 2nd Ed., MorganKaufmann, 2000 [34] M.Usman Nisar, Arash Fard, and John A. Miller, Techniques for Graph Analytics on Big Data. 2013 IEEE International Congress on Big Data, pp. 255-262. [35] Yang Song, Gabriel Alatorre, Nagapramod Mandagere, and Aameek Singh, “Where IT Management Meets Big Data Analytics,” Storage Mining. 2013 IEEE International Congress on Big Data, pp. 421-422. [36] M. Riedel, A.S. Memon and M.S.Memon, High Productivity Data ProcessingAnalytics Methods with Applications at MIPRO 2014, 26-30 May 2014, Opatija, Croatia, pp. 289-294. [37] Kapil Bakshi, “Architecture and Approach,” Considerations for Big Data. 978-1-4577-0557-1/12/$26.00 ©2012 IEEE, pp. 1-7. [38] Sang-Woo Jun_, Ming Liu_, Kermin Elliott Flemingy, and Arvind_, Scalable Multi-Access Flash Store for Big Data Analytics. Cambridge, MA 02139.
7|Page 978-1-5090-4171-8/16/$31.00 ©2016 IEEE
FTC 2016 - Future Technologies Conference 2016 6-7 December 2016 | San Francisco, United States [39] Xiongpai QIN, Huiju WANG, Furong LI, Baoyao ZHOU, Yu CAOCuiping LI, Hong CHEN, Xuan ZHOU, Xiaoyong DU, and Shan WANG, “Paving the Way toward a Unified System for Big Data Analytics,” Beyond Simple Integration of RDBMS and MapReduce. 2012 Second International Conference on Cloud and Green Computing, pp. 716-725. [40] Hua Luan, Mingquan Zhou, and Yan Fu, Parallel Techniques for Improving Three-dimensional Models Storing and Accessing Performance, 2013 Ninth International Conference on Natural Computation (ICNC), pp. 1177-1182.
[41] Shweta Pandey and Dr.Vrinda Tokekar, Prominence of MapReduce in BIG DATA Processing. 2014 Fourth International Conference on Communication Systems and Network Technologies, pp. 556-560. [42] Hyoung Woo Park, Il Yeon Yeo, Jongsuk Ruth Lee, and Haengjin Jang, Study on big data center traffic management based on the seperation of large-scale data stream. 2013 Seventh International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing, pp. 591-594. [43] Marwa Elteir, Heshan Lin, and Wu-chun Feng, Enhancing MapReduce via Asynchronous Data Processing. 2010 16th International Conference on Parallel and Distributed Systems, pp. 397-405
8|Page 978-1-5090-4171-8/16/$31.00 ©2016 IEEE