Business Intelligence: Self Adapting and Prioritizing ...

2 downloads 0 Views 806KB Size Report
[6] Xiongpai QIN, Huiju WANG, Furong LI, Baoyao ZHOU, Yu. CAOCuiping LI, Hong CHEN, Xuan ZHOU, Xiaoyong DU, and Shan. WANG, “Paving the Way ...
FTC 2016 - Future Technologies Conference 2016 6-7 December 2016 | San Francisco, United States

Business Intelligence: Self Adapting and Prioritizing Database Algorithm for Providing Big Data Insight in Domain Knowledge and Processing of Volume based Instructions based on Scheduled and Contextual Shifting of Data Mazhar Hameed

Usman Qamar

Usman Akram

Department of Computer Engineering, College of EME, National University of Sciences and Technology (NUST), H-12, Islamabad, Pakistan [email protected]

Department of Computer Engineering, College of EME, National University of Sciences and Technology (NUST), H-12, Islamabad, Pakistan [email protected]

Department of Computer Engineering, College of EME, National University of Sciences and Technology (NUST), H-12, Islamabad, Pakistan [email protected]

Abstract—Modern world is not only about software and technology as the world advances it is becoming more data oriented and mathematical in nature. The current size of information that is brought in and processed is large and complex in size. Data size does not only involve using every single point of data that is reported. This information needs to be sized down and understood according to the application at hand. Data size is one issue and the other issue is the knowledge or information that needs to be extracted from it in order to obtain and achieve the purposeful meaning from the data. In memory and column oriented databases have presented viable and efficient solutions to optimize query time and column compressions. In addition to storing and retrieving data the information world has stepped up into big data with millions and terabytes of data as influx every single second. With the increase in the influx of data and out flux of responses generated and required. The world is now in need of both systems and software’s that are efficient in storing huge data as well as application layer algorithms that are efficient enough to extract meaning from the layers or topologically dependent data. This paper is focused on analyzing in column store technique for managing mathematical and scientific big data involved in multiple markets; by using topological data meaning for analyzing and understanding the information from adaptive database systems. And for efficient storing in database the column oriented approach to big data analytics and query layers will be analyzed and optimized. Keywords—In-column equivalent sets; topology

I.

memory;

evolutionary

algorithm;

INTRODUCTION

Business Intelligence is one the most current scientific topic in computer science. It is a very complex task of achieving solutions utilizing the knowledge gathered from available and unavailable data. The objective of Business Intelligence is to achieve commendable and executable

business or corporate solutions based on analysis and information extraction (processing). The key core to business intelligence is the constituent data from a certain perturbed class or domain. The data is indeed the most utilized aspect in business intelligence. The amount of achievable information from analysis of data depends on its comprehension for the size and volume of data associated in the particular domain. Business Intelligences provides informed decisions relying on the impact or nature of data. Thus data is utilized in complete business process to provide achievable and actionable conclusions. Business Intelligence uses multiple types of tools and technologies to assist in the process of decision making. Business intelligence in its core aspect is conversion of abstract, structured or unstructured raw data into something that is informative and decision supporting. Business intelligence is often linked to concepts of data surfacing. The purpose is to facilitate the understanding of huge volumes of data using different methodologies, tools and techniques. Recently, Business intelligence tools and methodologies have surfaced the market and resolved many corporate and analytical problems. The contribution of data orientation, data volume and data context is of great importance is Business intelligence solutions. The nature of data reveals how information can be optimally extracted from a particular run length of data or a data set. Big data is the term coined along with business intelligence or data intelligence. The concept of big data revolves around a few ground rules and subconstituents of the rules change depending upon the type of task executed or type of analysis and application being executed. Big data challenges not only include analysis and data mining it includes the process of acquiring data, storing data and curating data. With these aspects the problems of storage space, query result, retrieval of information, passage

1|Page 978-1-5090-4171-8/16/$31.00 ©2016 IEEE

FTC 2016 - Future Technologies Conference 2016 6-7 December 2016 | San Francisco, United States collaboration, virtualization, transformation, data access and controls and most importantly the data independency and associated information privacy. Generally big data is associatively coined with the concept of 3V’s. Recently two more additions have been contributed to the same definitions for big data. Big data can be categorized as follows A. Volume Volume can be defined in terms of the data quantity that is either stored using a process or that is generated as a part of gathering, curation or processing. This size of data determines whether it can be classified as big data. Volume also determines the displayed value or hidden value. B. Velocity Velocity of data refers to speed of processing involved in data generation or in data processing. This also includes the speed with which data is acquired or stored in the first place. Velocity is thus in turn the speed of data processing, data storage, data collection/gathering to provide conclusion or meet designated standards or demands. C. Variety Variety defines the exact or dispersed nature of data under consideration. Variety helps in deriving conclusions since it facilitates extraction and analysis based on the data domain or type etc. D. Veracity This factor defines the actual ‘quality’ of data that has been acquired and stored. The levels of quality are definitive and constitute greatly towards processing and solution generation. E. Complexity Complexity defines the aspect of ‘Multiplicity’ in big data. It involves the management of data that is inhibited or gathered from multiple and different natured resources. This feature also includes the process of data linking, data correspondence, data connections and data correlations. F. Problem Statement Recently many technologies have surfaced to accommodate business requirements and challenges offered by the big data. Most of the traditional data warehousing and database systems function on row relation or row set relations. However, recently the concept of in-column memory usage and databases has emerged on the front line of business intelligence and big data concerns. Both approaches are suitable for data processing depending upon the nature of tasks that is required to be achieved. This study was undertaken to provide an algorithm that suggests selective use of types of memory based on the nature of input data, the domain under consideration and the context of experiment. To determine whether the subjected data will be more fast and optimized in row relation or in column memory solutions; the research was carried out to understand the consequent occurring nature of data and how its processing helps in developing adaptive database systems.

II.

METHODOLOGY

In this section we introduce a detailed description of all processes and studies undertaken to develop a methodology for self-adaptive databases. Firstly, the given objective is to simulate and provide an algorithm that causes a shift between types of data stores that can be initiated at run time based on classification, nature and domain of data. Typically in-column memory or storage is much faster and efficient than relational tables or relational databases. However, in certain tasks related to big data relational databases tend to be faster and more comprehensive as compared to in-column storage techniques. The difference in two techniques is crucial and very important. Since business does not necessarily confines itself to a single domain or some single natured aspects. Business and data is evolving in context and nature thus and adaptive system needs to be created for supporting the change in demand based on certain achievable levels or constructs. In-Colum memory is also referred to as Column oriented Databases. In this type of DBMS system the data is not stored as in the row format. However, the data is stored vertically in the form of columns. As shown in fig.01. The database analysis switch can be created depending upon user request or querying requirements. Same process can be applied for cloud data, dispersed data or data already in standard format. In general Row based relational systems are well equipped for OLTP workloads. Whereas column oriented databases are more efficient in case of OLAP workloads. On Line Transaction Processing (OLTP) involve transaction oriented applications. Such as: retrieval, merge, sales orders and typical business operations. OLTP is a category in which it is required from the system to display prompt results or transactions as a part of user query. The on Line part of OLTP is mostly involved in processing throughput and the excessive or nonexcessive input, insert and delete requests. OLTP is however not supportive about clustering and indexing. It decreases the efficiency if incorporated. For performance of OLTP systems the following aspects need to be analyzed for optimal results  Rollback Segments  Clusters  Discrete Transactions  Block Size  Buffer/Cache Size  Dynamic Allocation  Transaction Processing  Partition Database  Database Tuning. Online Analytical Processing (OLAP) is a solution that was devised to provide answers to Multi-Dimension analytical queries in a cleaner, faster and easier way. OLAP is a part of

2|Page 978-1-5090-4171-8/16/$31.00 ©2016 IEEE

FTC 2016 - Future Technologies Conference 2016 6-7 December 2016 | San Francisco, United States business intelligence and includes many aspects of data mining, relational databases and report writing. In an OLAP system the heart is a multidimensional cube called the hyper cube. This cube is based on numerical measurements and constructive dimensions. Typically the cube metadata is generated from any one of star, snowflake or fact constellation schema. The main purpose of the research was to devise a system that consists of workloads from both the domains: OLAP, OLTP. The system can categorize data and nature of task involved and thus stored the data accordingly. This increases the efficiency and decreases cost per retrieval. To investigate the requirements of different types of market and data availability in relevance to user requirements a mathematical study and investigation was carried out to understand the data domain and user requests. A. Mathematical Investigation The market for business intelligence and business solutions were analyzed based on the scientific parameters described below B. Study Variables The following study variables were evaluated in every data domain to answer the research questions on generating an adaptive database system. Variable of Customer Sensitive Data: Variable of Organization Sensitive Data: Generic Benefit: Customer Benefit: Organization Benefit:

information analysis phase was carried out to understand how companies reflect to change in query results and operations. Based on the findings following mathematical and data oriented research questions were evaluated to understand the fluctuation rate. The scope of data knowledge with respect to business requirement [ ] [ ] (1) The approximate data variation calculations based on Classes or sub- domains of perturbed data. ∑

( )

(

(2)

)

Calculation of Cost exhibited per fluctuation or change in a business requirement i.e. (Cost/Fluctuation) ( )

(3)



Average expected sub classes or domains from a set of data. And the data nature along with type of output required from that data was an important and crucial part of research. ( (

) )



(4) (5)

The factors were constituent part of classifying nature and algorithmic fluctuations for real time change from relational to in-column databases. The data in table I. is the example and sample definition of information or knowledge based on understanding the possible and executable operations that can be expected from the nature of data. Data analysis was performed as a part of prior research using mathematical topologies [1] to define groups and shapes of data for efficient retrieval and storage. In context to table No.01 no definite boundaries can be placed on data outputs, however using customer’s input and application requirements, certain classifications and sub domains can be generated. These can then further be utilized in understanding the type of process or output required such as OLAP or OLTP. These sub-classifications are adaptive in nature since the database system on the whole needs to shift data orientation depending upon output required.

Context Oriented Output: Variability Velocity: Consistency Rate: Data Repetition rate: Data Individuality: Data Co-Dependency Factor: Task Type (OLAP | OLTP): The above variables were casted and mapped on different business requirements and data developments extracted from the investigation. These variables were further used in study to identify the rate of switch and cost associated with each single shift from relational to columnar database systems. Research on available market and associated data was performed on dataset of 20 variable entities or companies. A scientific

As mentioned in the table below; a factor of functional disposition will be evaluated for each data type. Functional disposition in context refers to incorporation of factors or indicators from other domains. Mixing of components is also considered as a functional disposition. Discrepancy rates and error rates have been calculated for each type of data. Some data types such as image are considered as ‘Degraded’ if pixels or two images merged accidentally. Thus functional disposition is based on discrepancy and error factors. These factors are flexible to change according to customers need.

3|Page 978-1-5090-4171-8/16/$31.00 ©2016 IEEE

FTC 2016 - Future Technologies Conference 2016 6-7 December 2016 | San Francisco, United States TABLE I.

DIFFERENT DATA DOMAINS, THEIR ASSOCIATED FUNCTIONAL DISPOSITION AND THE STANDARD OUTPUT REQUIRED FROM EACH DATASET

Nature of Data Integer Float Graphic (PNG, JPEG) Multimedia Text/Images Videos Audio Sales Data Stock data

Search Engine data

Medical imaging Mathematical Conversions Currency rates Map Locations

Fig. 1.

Output Required Calculated Sums, Multiples or mathematical formulas on data set Calculated Sums, Multiples or mathematical formulas on data set, Sensor Data, Collaborative devices, coordinates etc. Slideshows, categorization, photo editing etc. Multi-Purpose, documentation, slideshows, image text extraction etc. Video based demonstrations, game articulation etc. Audio messages, songs, mixes, presentations, lectures, descriptions etc. Sales operation, increase, decrease, customer request, order processing, multi basket operations , card information summary, total sales, loss, profit, margin calculations etc. Stock market analysis, predictions, company orientations etc. N-number of searches with n-number of users, multiple media searches, could be text, image, video or links to other engines, website databases, customer requirement etc. Diagnosis of disease, research in medicine, prescription medicines etc. Calculations, formulas educational, non-educational etc. Conversions rate changes, multi-currency change platforms Location based services, map guidance etc.

Functional Disposition Yes, Multi-Valued and Multi-Variable Yes, Multi-Valued and Multi-Variable No. Partial. No. No. Yes, Multi-Valued and Multi-Variable Yes, Multi-Valued and Multi-Variable Yes. It is perspective and search based. It is a classifier function. No. Yes, Multi-Valued and Multi-Variable Yes, Multi-Valued and Multi-Variable No.

Dual Natured Databases for same data. Processing orientation is adaptive based on query or demand at consideration

C. Real Time Adaptive Database Systems The adaptive methodology is resident on the concept of switching between database types in real time depending upon the type of operation required. Depending upon the case study undertaken for research let us explore one example in detail t understand the functioning architecture of adaptive database systems in real time. D. Case Study The organization is building a Customer Relationship Management System (CRMS). The customer for whom the system is being built is associated with Automobile manufacturing industry. Among many clients a few noticeable

are Toyota, Lexus, and Honda etc. Based on region and sales Centre locations the requirements change over the time in business. Currently under the research three main aspects for one client have been analyzed to understand the working of Adaptive database systems and to calculate the cost of load and dispositions in system. All these reputed organizations require the standard customer management tasks as well as some business analytics to ensure growth and customer satisfaction. These primary and secondary tasks combined form a composition heterogeneous in nature. This mixture typically consists of tasks of OLTP and OLAP nature. On observation of data volume and influx of data; the nature of relational

4|Page 978-1-5090-4171-8/16/$31.00 ©2016 IEEE

FTC 2016 - Future Technologies Conference 2016 6-7 December 2016 | San Francisco, United States databases was not considered to be sufficient for supporting different contextual results required from the system. To ensure that the organization succeeds in satisfying its customers and also increase in future ventures. The process of business intelligence was incorporated for real time analytics. The real time analytics involved understanding customer emotions at the time of purchase, before purchase or after purchase. The sentiments calculator is a real time process that changes according to the views of customer. These sentiments are gathered via many different sources such as phone, email, text, video surveys. Extraction of information from different sources is a complex and non-OLTP task that requires a shift from standard storage of relational databases. This research and experiment was conducted using the knowledge of independent and dependent variables in study. For every volume of data in a separate domain specific variables were defined to measure output and changes that are caused due to fluctuations or variations in those variables. E. Independent Research Variables The independent variables in adaptive database systems are as follows

The first step in process was to create a filter service which performs real-time analysis on incoming media such as audio or text files using specific headers for identification. H. Case No.01 In this case the data values of fields that correspond to sentiments or contribute to sentiments analysis in any way are separated with fact and dimension tables to present a layer of subjective in-column storage. The algorithm then runs on the necessary data fields and extracts the information that can be used to serve the purpose of sentiments analysis or sentiments display. Begin Analyze header information () { If is classifier_Sentiment Then Subject_To_InColumn(); Else SubjectTo_Relation();

Customer Choice:

}

Customer Requirements:

SendInfo();

Raw Data:

Wait For resposnse();

Feedback Data Display:

EvaluateRequest();

Reporting: (

GenerateConstructionalValueSet();

)

F. Dependent Research Variables The dependent research variables according to case study under consideration are Sentiments:

GtaherTopology(); AnalyzeVolume(); AnalyzeVelocity(); Categorize();

Purchasing power:

GenerateColumns();

Load Power:

OperateOnLogic()

Shift Function:

{

Profit Margins:

Calculate cost of shift();

Loss Margins:

GenerateGraph();

Cost of shift:

LocateDiscrepencay();

G. Algorithm-I: Pre-Processing data for logical analysis of probable data type The volume and velocity of data are taken as necessary inputs to initiate the process of logical analysis. In the case study under consideration the volume of data was approximately 8GB/day. And the main influx of data was calculated based on time zones from different outlets at different locations. The historical summarization of data was performed every week on Friday to analyze the week’s performance and to highlight customers ranging from good to average and from average to poor rating considerations.

} Send Logic() END. I. Case No.02 In this case a logical shift function and rwal time load and costs are analyzed to understand how the data will be shifted from row into columns on specific requests.

5|Page 978-1-5090-4171-8/16/$31.00 ©2016 IEEE

FTC 2016 - Future Technologies Conference 2016 6-7 December 2016 | San Francisco, United States BEGIN

END FOR

GO

Send Records();

CalculateLoad()

Transform () {Intermediate RecordMatrix()

CalculateFaultLine()

Transpose()

GenerateLoadBalancerGraph()

Adjusted Values()

CreatdataPoints()

WeightedCoverage() MeshFarm() ngthPartitions

SwarColony()

For Each partition

ColumnOptimizer()

CalculateCost

RunLengthSpecification()

CalculateDependency

FormalCorrectness()

CalculateLoad

DATAIntegrity()

Assign Weights to Runlengths();

CalcNewCost() CalcnewLoad()

Initialize shift()

GenNewGraph()

mINfIRST();

}

MergeSecod();

SEND;

MaxThird();

Extract;

RoughLengthFourth(); DependentsFifth();

SENDTOCLIENT()

IndependentsSixth();

END Iteration.

Core;ativeMatrixSeventh();

END.

SectionalsEighth(); III. TABLE II.

EXPERIMENTAL RESULTS & ANALYSIS

THIS TABLE REPRESENTS THE PERFORMANCE RESULTS OBTAINED BASED ON DATA VOLUME AND DATA SIZE. THE SHIFTING FACTORS AND LOAD BALANCING HAVE BEEN INCORPORATED INTO THE ANALYSIS AND CALCULATIONS

Feature Set Set 1 (1000000) Set2 (34900023) Set 3 (112288990299) Set 4 (9933556) Set 5 (1019188292) Set 6( Replicated set 4&5) Set 7 (1014569188292) Set 8( Replicated set 4&5)

Variability 25% 55% 100% 10% 30% 60% 30% 60%

Flexibility 67.9% 45.90% -- (non-Coherent) 87% 65% 89% 65% 89%

Classification Yes Yes Yes No Yes Yes Yes Yes

Set 9 (89364884477484101918)

40%

89%

Yes

Set 10(1090464841918)

40%

89%

Yes

6|Page 978-1-5090-4171-8/16/$31.00 ©2016 IEEE

FTC 2016 - Future Technologies Conference 2016 6-7 December 2016 | San Francisco, United States TABLE III.

THIS TABLE PRESENTS THE RESULTS IN THE FORM OF COMPLEXITY, PERFORMANCE AND DATA RELATIVITY ASSOCIATED WITH EACH DYNAMIC SHIFT OR REAL TIME CHANGEOVER BETWEEN TYPES OF DATABASES

Data Nature text Numbers Multi Media Video Irrelevant search Context

Data Complexity equation {Tx(i) + Ty(z)/t(t)} * P(i_R) [[{N(i) *P(n(i)} + {N(k) +PN(k)}]/Tt ]*P(i_R) [[{N(i) *P(n(i)} + {N(k) +PN(k)}]/Tt ]*P(i_R) [[{N(i) *P(n(i)} + {N(k) +PN(k)}]/Tt ]*P(i_R)

Data performance 89% 67% 85% 79%

Data Relativity 50-70% 90-93% 89-100% 100%

{Tx(i) + Ty(z)/t(t)} * P(i_R)

63%

67-45%

IV.

LITERATURE REVIEW

Big data is the field that is presented with many different challenges. Each challenge is from a different domain. Businesses and customer need & requirements tend to ask for various aspects of data. The research and tool developers for handling and managing big data ask for a totally different perspective on it. The management of classical data was same what different. The current data is huge, voluminous, unstructured and from different domains. The reflective change in time technology has encountered big data as the primary concern of researchers from across the world. There are various phases in the analysis of big data. The modern technologies related to bug data tools such as Hadoop [6], Scidb etc. have rendered some level of conscious development in the past few years. Hbase [4] database as well as the Apache Hadoop [6][4] have presented high volumes of successful big data processing. There are also the case of unstructured methods for unstructured data sets [4][6] and the processing is both complex and multifaceted. Thus question in beginning was to process how the quantities varying in type and nature of huge sets of data volume and deep data consistencies can be processed and analyzed without too much of power consumption and cost involvement; data can be analyzed and stored without certain implications. Flash Storage [5] is highly scalable and this provisions multi-access for storage in big data. Big data is not just a data concern or application specific issue, it is more of storage and knowledge [2].Many decent graph techniques have been used and employed by authors to understand storage and design of big data and knowledgebase [1] .The major methods for running and processing big data such as IBM’s Hadoop and Google’s Map reduce [4] have been deployed as a process and application contributions and have been used to serve the resolution. Ant colony optimization along with many other algorithms have been utilized for analysis of big data [19] such as deep neural networks or deep learning, other tool based analysis such as Azure Machine learning and many other such as Hadoop jobs etc. each has a specific manner of operation and thus functions for a different outcome provided the inputs also change. One ACO division and contribution can be carried out on multiple data sets based on the similarly based nature of execution and adoption [22]. The data does have its own form and shape and it is not necessary that data will actually present itself in the same execution method or form. The data grids is another concept with ACO [20] algorithm to utilize and manipulate peripherals, nodes and thus can be easily generated and computed with the help of ACO [20].

V.

CONCLUSION

Data clustering has always been of importance in the field of computer science. With the technology shift from small, scalable and manageable data towards big data has opened up many complex problems with even more complex yet efficient solutions. In this paper we have columnar database design and storage facility to understand the need and effectiveness it brings to applications. Market analysis was used as a sample application topic to understand how in-column memory will behave in terms of big data and continuously fluctuating data values. Moreover the paper analyzes the concept of logical and formal relationship between multiple attributes and topologies. Topologies extract knowledge for predictive analysis. In future we intend on working with performance based predictions using topological shape in columnar databases. This research will involve using the deployment logic of in column database and use the same logic to develop the topology or shape of predictive knowledge- base. ACKNOWLEDGMENT The authors would like to thank Dr. Usman Qamar for their generous help and assistance with big data and columnar databases. [1]

[2]

[3]

[4] [5]

[6]

[7]

[8]

REFERENCES M.Usman Nisar, Arash Fard, and John A. Miller, Techniques for Graph Analytics on Big Data. 2013 IEEE International Congress on Big Data, pp. 255-262.J. Clerk Maxwell, A Treatise on Electricity and Magnetism, 3rd ed., vol. 2. Oxford: Clarendon, 1892, pp.68–73. Yang Song, Gabriel Alatorre, Nagapramod Mandagere, and Aameek Singh, “Where IT Management Meets Big Data Analytics,” Storage Mining. 2013 IEEE International Congress on Big Data, pp. 421-422. M. Riedel, A.S. Memon and M.S.Memon, High Productivity Data ProcessingAnalytics Methods with Applications at MIPRO 2014, 26-30 May 2014, Opatija, Croatia, pp. 289-294. Kapil Bakshi, “Architecture and Approach,” Considerations for Big Data. 978-1-4577-0557-1/12/$26.00 ©2012 IEEE, pp. 1-7.. Sang-Woo Jun_, Ming Liu_, Kermin Elliott Flemingy, and Arvind_, Scalable Multi-Access Flash Store for Big Data Analytics. Cambridge, MA 02139. Xiongpai QIN, Huiju WANG, Furong LI, Baoyao ZHOU, Yu CAOCuiping LI, Hong CHEN, Xuan ZHOU, Xiaoyong DU, and Shan WANG, “Paving the Way toward a Unified System for Big Data Analytics,” Beyond Simple Integration of RDBMS and MapReduce. 2012 Second International Conference on Cloud and Green Computing, pp. 716-725. Hua Luan, Mingquan Zhou, and Yan Fu, Parallel Techniques for Improving Three-dimensional Models Storing and Accessing Performance, 2013 Ninth International Conference on Natural Computation (ICNC), pp. 1177-1182. Shweta Pandey and Dr.Vrinda Tokekar, Prominence of MapReduce in BIG DATA Processing. 2014 Fourth International Conference on Communication Systems and Network Technologies, pp. 556-560.

7|Page 978-1-5090-4171-8/16/$31.00 ©2016 IEEE

FTC 2016 - Future Technologies Conference 2016 6-7 December 2016 | San Francisco, United States [9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23] [24]

[25] [26]

Hyoung Woo Park, Il Yeon Yeo, Jongsuk Ruth Lee, and Haengjin Jang, Study on big data center traffic management based on the seperation of large-scale data stream. 2013 Seventh International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing, pp. 591-594. Marwa Elteir, Heshan Lin, and Wu-chun Feng, Enhancing MapReduce via Asynchronous Data Processing. 2010 16th International Conference on Parallel and Distributed Systems, pp. 397-405. Adams,J, Woodard, D.L, Dozier, G and Miller, P, Genetic-Based Type II Feature Extraction for Periocular Biometric Recognition: Less is More. Pattern Recognition (ICPR), 2010 20th International Conference on 23-26 Aug, pp. 205 - 208. Zhe Wang, Songcan Chen, Jun Liu and Daoqiang Zhang, Pattern Representation in Feature Extraction and Classifier Design: Matrix Versus Vector. Neural Networks, IEEE Transactions on 12 March 2008, pp. 758 - 769. Mohammadi, H, Venetsanopoulos, A.N and Sadeghian, A, Bouncing and raindrop image search algorithms, two novel feature detection mechanisms. Digital Signal Processing (DSP), 2013 18th International Conference on 1 – 3 July 2013, pp. 1 - 6. Chorianopoulos, K, Giannakos, M.N, Chrisochoides, N and Reed, S, Open Service for Video Learning Analytics. Advanced Learning Technologies (ICALT), 2014 IEEE 14th International Conference on 7 – 10 July 2014, pp. 28 - 30. Honghai Liu, Shengyong Chen, Kubota, N, Intelligent Video Systems and Analytics: A Survey. Industrial Informatics, IEEE Transactions on 01 April 2013, pp. 1222 - 1233. Wang En Dong, Wu Nan and Li Xu, QoS-Oriented Monitoring Model of Cloud Computing Resources Availability. Computational and Information Sciences (ICCIS), 2013 Fifth International Conference on 21 – 23 June 2013, pp. 1537 - 1540. Jadeja, Y, Modi, K, Cloud computing - concepts, architecture and challenges. Computing, Electronics and Electrical Technologies (ICCEET), 2012 International Conference on 21 - 22 March, pp. 877 880. Wenhao Huang, Haikun Hong, Guojie Song, Kunqing Xie, Deep process neural network for temporal deep learning. International Joint Conference on Neural Networks (IJCNN), Agneeswaran, V. S. (2012). Big data - theoretical, engineering and analytics perspective. In S. Srinivasa & V. Bhatnagar (Eds.), Big Data Analytics SE - 2Berlin, Germany: Springer- Verlag, 7678, 8-15. Brzezniak, M., Meyer, N., Flouris, M., Lachaiz, R. & Bilas, A. (2008). Analysis of grid storage element architectures: high -end fiberchannel vs. emerging cluster -based networked storage. In M. Brzezniak, N. Meyer, M. Flouris, R. Lachaiz & A. Bilas (Eds.), Grid middleware and services SE - 13, US: Springer, 187 -201. Bullnheimer, B., Hartl, R. F. & Strauss, C. (1999). A new rank -based version of the ant system: a computational study. Central European for Operations Research and Economics, 7(1), Chen, J., Chen, Y., Du, X., Li, C., Lu, J., Zhao, S. & Zhou, X. (2013). Big data challenge: a data management perspective. Frontiers of Computer Science, 7(2), 157-164. M. H. Dunham, Data Mining: Introductory and Advanced Topics, Prentice Hall, 2003 A. A. Freitas, S. H. Lavington, Mining very large databases with parallel processing. Dordrecht, The Netherlands, Kluwer Academic Publishers,1998 I. Foster, C. Kesselman, The Grid: Blueprint for a New Computing Infrastructure. Morgan T. G. Dietterich, An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting andrandomization. Machine Learning Vol.40, 2000, pp.139-158

[27] Cheng-Fa Tsai, Chun-Yi Sung, "DBSCALE: An Efficient DensityBased Clustering Algorithm for Data Mining in Large Databases" (PACCS 2010) Second Pacific-Asia Conference on Circuits, Communications and System, 91201 Pingtung, Taiwan, October 2010. [28] R. Agrawal, J. C. Shafer, Parallel mining of association rules IEEE Transactions on Knowledge and Data Engineering, Vol 8., 1996, pp.962-969 [29] E. Januzaj, H-P. Kriegel, M. Pfeifle, DBDC: Density-Based Distributed Clustering Proc. 9th Int. Conf. on Extending Database Technology(EDBT), Heraklion, Greece 2004, pp. 88-105 [30] N-A. Le-Khac, L. Aouad, and M-T. Kechadi, A new approach for Distributed Density Based Clustering on Grid platform The 24th British National Conference on Databases (BNCOD'07), Springer LNCS 4587, July 3-5, 2007, Glasgow, UK. 2007 [31] C. J. Merz, M. J. Pazzani. A principal components approach to combining regression estimates. Machine Learning Vol. 36, 1999, pp. 932 [32] J. Kivinen, and H. Mannila, "The power of sampling in knowledge discovery," Proceedings of the ACMSIGACTSIGMODSIGART,Minneapolis, Minnesota, United States, May 24 - 27, 1994, pp.77 [33] K. Sayood, Introduction to Data Compression, 2nd Ed., MorganKaufmann, 2000 [34] M.Usman Nisar, Arash Fard, and John A. Miller, Techniques for Graph Analytics on Big Data. 2013 IEEE International Congress on Big Data, pp. 255-262. [35] Yang Song, Gabriel Alatorre, Nagapramod Mandagere, and Aameek Singh, “Where IT Management Meets Big Data Analytics,” Storage Mining. 2013 IEEE International Congress on Big Data, pp. 421-422. [36] M. Riedel, A.S. Memon and M.S.Memon, High Productivity Data ProcessingAnalytics Methods with Applications at MIPRO 2014, 26-30 May 2014, Opatija, Croatia, pp. 289-294. [37] Kapil Bakshi, “Architecture and Approach,” Considerations for Big Data. 978-1-4577-0557-1/12/$26.00 ©2012 IEEE, pp. 1-7. [38] Sang-Woo Jun_, Ming Liu_, Kermin Elliott Flemingy, and Arvind_, Scalable Multi-Access Flash Store for Big Data Analytics. Cambridge, MA 02139. [39] Xiongpai QIN, Huiju WANG, Furong LI, Baoyao ZHOU, Yu CAOCuiping LI, Hong CHEN, Xuan ZHOU, Xiaoyong DU, and Shan WANG, “Paving the Way toward a Unified System for Big Data Analytics,” Beyond Simple Integration of RDBMS and MapReduce. 2012 Second International Conference on Cloud and Green Computing, pp. 716-725. [40] Hua Luan, Mingquan Zhou, and Yan Fu, Parallel Techniques for Improving Three-dimensional Models Storing and Accessing Performance, 2013 Ninth International Conference on Natural Computation (ICNC), pp. 1177-1182. [41] Shweta Pandey and Dr.Vrinda Tokekar, Prominence of MapReduce in BIG DATA Processing. 2014 Fourth International Conference on Communication Systems and Network Technologies, pp. 556-560. [42] Hyoung Woo Park, Il Yeon Yeo, Jongsuk Ruth Lee, and Haengjin Jang, Study on big data center traffic management based on the seperation of large-scale data stream. 2013 Seventh International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing, pp. 591-594. [43] Marwa Elteir, Heshan Lin, and Wu-chun Feng, Enhancing MapReduce via Asynchronous Data Processing. 2010 16th International Conference on Parallel and Distributed Systems, pp. 397-405.R. Nicole, “Title of paper with only first word capitalized,” J. Name Stand. Abbrev., in press.

8|Page 978-1-5090-4171-8/16/$31.00 ©2016 IEEE