A NEW APPROACH TO REDUCTION OF LARGE DATA SET 1 ...

Recommend Documents

A New Dimension Reduction Approach for Data-Rich Marketing ...

ucdavis.edu), and Chih-Ling Tsai is a professor (e-mail: cltsai@ ucdavis.edu), all at the ... In data-rich marketing environments (e.g., direct marketing or new.

Flexible approach to astronomical data reduction ...

software includes our Kepler-based solution, VObs applications, iRODS data grid clients and ... provides also help desk services for domain users. 4.1 DIAPL workflows. DIAPL [6] is an astronomical package devoted to photometry using ...

Statistical Emulation Applied to a Very Large Data Set ... - ThinkMind

Statistical Emulation Applied to a Very Large Data Set Generated by an Agent-based. Model. Wim De Mulder and Geert Molenberghs and Geert Verbeke.

A novel approach of rough set-based attribute reduction ... - CiteSeerX

tance preserving strategy for attribute reduction, and only choose fisher ..... discernibility matrix, and the so-called 'Roulette Wheel' is used for initiating a ...

Analyzing Data Clusters: A Rough Set Approach to ... - Semantic Scholar

In this paper we present a strategy together with its computational implementation to intelligently analyze data clusters in terms of symbolic cluster-defining rules.

Towards a Set Theoretical Approach to Big Data Analytics - CiteSeerX

putational Social Science, Data Science, Big Social Data. I. INTRODUCTION .... Second, we outline a formal model based on set theory and discuss the ...

A Data-Driven Sparse-Learning Approach to Model Reduction in

Dec 12, 2017 - the set of most influential reactions in a chemical reaction network. ... power of the last few decades, the modeling of these Chemical Reaction. Networks ..... Reduction for the Oscillatory Oxidation of Hydrogen: Sensitivity and ...

MDSpolar: A New Approach for Dimension Reduction to ... - Public

cessing and exploratory data analysis, visualization of the data helps to decide which ... dimension reduction to visualize high dimensional data. Our algorithm.

A New Approach to Upward-Closed Set Backward Reachability Analysis

than compute the transitive preimage of the entire set of violating states in one fell ... A transition system T is a pair (S,â), where S is the state space and ââ. S Ã S is .... sition relation â and âjumping downâ to a smaller system s

Polynomial Time Data Reduction for Dominating Set

arXiv:cs/0207066v1 [cs.DS] 16 Jul 2002. Polynomial Time Data Reduction for. Dominating Setâ. Jochen Alberâ . Michael R. Fellowsâ¡. Rolf NiedermeierÂ§.

Focus Group Data Saturation: A New Approach to Data ... - NSUWorks

Nov 28, 2016 - online focus group using WordPressÂ® was employed to answer the ... Data saturation was set at five responses per theme and subtheme.

A Data-Flow Approach to visual Querying in Large

Querying interfaces to GIS can be broadly categorised between textual interfaces and non-textual interfaces. Several text-based extensions to SQL have been.

A Large Scale Data Mining Approach to Antibiotic ... - Brown CS

Antimicrobial Resistanceâ (GSSAR) is such a network comprised by more than ... Moreover, when the surveillance is performed in hospital networks, the issue of ...

Analysis of a Large Structure/Biological Activity Data Set Using ...

sional pharmacophores cannot handle very large data sets or multiple binding modes/sites. For a number of years, there has been an interest in using artificial ...

Meta-analysis of a large data set with Water ...

Meta-analysis of a large data set with Water Framework Directive indicators and calibration of a Benthic Quality Index at the family level. P.D. Dimitrioua, E.T. ...

A New Look to Traversal Algorithms Using Set Construct Data ...

Data Structure ... examining or updating each node in a tree data structure, exactly once .... [3] Data Structures and Program Design in C, Kruse, Pearson Edition.

Page 1 This study tises a large, longitudinal national data set to ...

Statistics often show women as the primary victims of intimate physical abuse, suf- ..... focus of such models (Dutton & StarZomski, E993; Gelles & Cornell,. 1990 ...

Reduction of Large Training Set by Guided Progressive Sampling

End do. Return M. Fig. 1 Progressive sampling algorithm. Before starting the learning process, the ... does not explicitly deal with unbalanced datasets. To face.

Honey@home: A New Approach to Large-Scale Threat Monitoring

The major concern around unused port monitoring is pos- sible deniability. .... trieval time of time needed to browse several popular web pages (only their main ...

A New Approach to Repair Large Industrial Gears Damaged by ...

Thus for each repair requiring that pressure angle variation, a gear tooling with the .... gear data, the gears have to be accurately measured in order to determine all the .... tooling capable of hard cutting after heat treatment (up to 62 HRC).

Honey@home: A New Approach to Large-Scale Threat Monitoring

Mar 6, 2008 - eypot that monitors unused IP addresses and ports. Since it .... cording to [18]. 2.3 Unused IP address space monitor- ... are mainly targeted for more advanced users. ... host to obtain a list of used ports, similar to the way the ...

A New Approach to Large-Scale Nuclear Structure Calculations

which exact solution is not feasible. ... gions of the periodic table, namely where the number of ... configurations while maintaining the key dynamics of.

Efficient Data Reduction for Dominating Set: A Linear Problem Kernel

duction rules. This answers an open question from previous work on the ... research is that of algorithm engineering and, in particular, the power of data reduction by efficient ..... time O(n4) for general graphs. 2. 2.3 Reduced Graphs ..... Monogra

A New Hybrid Forecasting Approach Applied to Hydrological Data: A ...

Aug 25, 2016 - Research Institute, Chinese Academy of Sciences, Lanzhou 730000, China ..... and this model is applied to the precipitation data of the Qinghai.

A NEW APPROACH TO REDUCTION OF LARGE DATA SET 1 ...

Download PDF

2 downloads 1080 Views 432KB Size Report

Comment

Abstract. The paper deals with the problem of the large data reduction. The proposed approach is based on reduction of the number of objects in the initial large ...

A NEW APPROACH TO REDUCTION OF LARGE DATA SET D.A. Viattchenin United Institute of Informatics Problems of the NAS of Belarus Minsk, Belarus e-mail: [email protected] Abstract The paper deals with the problem of the large data reduction. The proposed approach is based on reduction of the number of objects in the initial large data set and representation of each new object from the reduced data set as a vector of intervals. An illustrative example is given and preliminary conclusions are formulated.

1

Introduction

In applied statistics the large data usually represent a high-dimensional data sets as well as the data sets with a large number of objects. The large data usually include data sets with sizes beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time. Given the ubiquity of large high-dimensional data sets, and the need not only to transmit, archive, and reduce them, but also to represent and analyze their content, corresponding tools for the computational dissection, analysis and representing of complex datasets whose size defies simplistic analysis must be developed. These tools should include a representation which allows exploitation of nonlinearity, supports fast multiresolution algorithms, incorporates a priori information and constraints, and provides for flexibility of adaptation to the application. The large data sets are difficult to work with for several reasons. They are difficult to visualize, and it is difficult to understand what sort of errors and biases are present in them. They are computationally expensive to process, and often the cost of learning is hard to predict – for instance, and algorithm that runs quickly in a dataset that fits in memory may be exorbitantly expensive when the dataset is too large for memory. The most common methods of the data reduction are methods to reduce the dimensionality of attributes, such as principal component analysis and multidimensional scaling. However, the problem of reduce the number of objects in the studied data set is also very actual, and exclusion of repeated objects is the most common approach to solving the problem. A new approach to the problem solving is presented in the paper and the approach is based on representation of the initial large data set by the data set with a smaller number of observations, where each object is described by the vector of intervals. In other words, a value of some attribute for an object should be considered as an interval of values. So, a problem of the interval-valued data processing is arises.

143

2

Heuristic for Reduction of Large Data

The object data clustering methods can be applied if the objects are represented as points in some multidimensional space I m1 (X). In other words, the data which is ˆ n×m1 = xˆt1 , i = 1, . . . , n, composed of n objects and m1 attributes is denoted as X i t1 = 1, . . . , m1 and the data are called sometimes the two-way data [4]. Let X = = {x1 , . . . , xn } is the set of objects. So, the two-way data matrix can be represented as follows:   1 1 xˆ1 xˆ21 . . . xˆm 1 m1  2  1 ˆ n×m1 =  xˆ2 xˆ2 . . . xˆ2  . (1) X . . . . . . . . . . . .  1 xˆ1n xˆ2n . . . xˆm n ˆ = {ˆ So, the two-way data matrix can be represented as X x1 , . . . , xˆm1 } using ndimensional column vectors xˆt1 , t1 = 1, . . . , m1 , composed of the elements of the t1 -th ˆ column of X. Let us assume that the value n = card(X) is very large. The matter of the proposed method can be formulated as follows: the initial large data set must be pre-processed ˜ = by some clustering procedure for some value n ˜