Machine Learning and Data Mining in Pattern ... - Semantic Scholar

6 downloads 14673 Views 2MB Size Report
Data Clustering: User's Dilemma (Abstract). 1. Anil K. Jain ... Multi-source Data Modelling: Integrating Related Data to Improve ... Mining Marketing Data.
Petra Perner (Ed.)

Machine Learning and Data Mining in Pattern Recognition 5th International Conference, MLDM 2007 Leipzig, Germany, July 18-20, 2007 Proceedings

Springer

Table of Contents

Invited Talk Data Clustering: User's Dilemma (Abstract) Anil K. Jain

1

Classification On Concentration of Discrete Distributions with Applications to Supervised Learning of Classifiers Magnus Ekdahl and Timo Koski Comparison of a Novel Combined ECOC Strategy with Different Multiclass Algorithms Together with Parameter Optimization Methods Marco Hülsmann and Christoph M. Friedrich

2

17

Multi-source Data Modelling: Integrating Related Data to Improve Model Performance Paul R. Trundle, Daniel C Neagu, and Qasim Chaudhry

32

An Empirical Comparison of Ideal and Empirical ROC-Based Reject Rules Claudio Marrocco, Mario Molinara, and Francesco Tortorella

47

Outlier Detection with Kernel Density Functions Longin Jan Latecki, Aleksandar Lazarevic, and Dragoljub Pokrajac

61

Generic Probability Density Function Reconstruction for Randomization in Privacy-Preserving Data Mining Vincent Yan Fu Tan and See-Kiong Ng

76

An Incremental Fuzzy Decision Tree Classification Method for Mining Data Streams Tao Wang, Zhoujun Li, Yuejin Yan, and Huowang Chen

91

On the Combination of Locally Optimal Pairwise Classifiers Gero Szepannek, Bernd Bischl, and Claus Weihs

104

Feature Selection, E x t r a c t i o n and Dimensionality Reduction An Agent-Based Approach to the Multiple-Objective Selection of Reference Vectors Ireneusz Czarnowski and Piotr J§drzejowicz

117

X

Table of Contents

On Applying Dimension Reduction for Multi-labeled Problems Moonhwi Lee and Cheong Hee Park

131

Nonlinear Feature Selection by Relevance Feature Vector Machine Haibin Cheng, Haifeng Chen, Guofei Jiang, and Kenji Yoshihira

144

Affine Feature Extraction: A Generafization of the Fukunaga-Koontz Transformation Wenbo Cao and Robert Haralick

160

Clustering A Bounded Index for Cluster Validity Sandro Saitta, Benny Raphael, and Ian F. C. Smith

174

Varying Density Spatial Clustering Based on a Hierarchical Tree Xuegang Hu, Dongbo Wang, and Xindong Wu

188

Kernel MDL to Determine the Number of Clusters Ivan 0. Kyrgyzov, Olexiy 0. Kyrgyzov, Henri Maitre, and Marine Campedel

203

Critical Scale for Unsupervised Cluster Discovery Tomoya Sakai, Atsushi Imiya, Takuto Komazaki, and Shiomu Hama

218

Minimum Information Loss Cluster Analysis for Categorical Data Jifi Grim and Jan Hora

233

A Clustering Algorithm Based on Generalized Stars Airel Perez Suärez and Jose E. Medina Pagola

248

Support Vector Machine Evolving Committees of Support Vector Machines D. Valincius, A. Verikas, M. Bacauskiene, and A. Gelzinis Choosing the Kernel Parameters for the Directed Acyclic Graph Support Vector Machines Kuo-Ping Wu and Sheng-De Wang

263

276

Data Selection Using SASH Trees for Support Vector Machines Chaofan Sun and Ricardo Vilalta

286

Dynamic Distance-Based Active Learning with SVM Jun Jiang and Horace H.S. Ip

296

Table of Contents

XI

Transductive Inference Off-Line Learning with Transductive Confidence Machines: An Empirical Evaluation Stijn Vanderlooy, Laurens van der Maaten, and Ida Sprinkhuizen-Kuyper Transductive Learning from Relational Data Michelangelo Ceci, Annalisa Appice, Nicola Barile, and Donato Malerba

310

324

Association Rule Mining A Novel Rule Ordering Approach in Classification Association Rule Mining Yanbo J. Wang, Qin Xin, and Frans Coenen Distributed and Shared Memory Algorithm for Parallel Mining of Association Rules J. Herndndez Palancar, O. Fraxedas Tormo, J. Feston Cdrdenas, and R. Herndndez Leon

339

349

Mining Spam, Newsgroups, Blogs Analyzing the Performance of Spam Filtering Methods When Dimensionality of Input Vector Changes J.R. Mendez, B. Corzo, D. Glez-Pena, F. Fdez-Riverola, and F. Diaz

364

Blog Mining for the Fortune 500 James Geller, Sapankumar Parikh, and Sriram Krishnan

379

A Link-Based Rank of Postings in Newsgroup Hongbo Liu, Jiahai Yang, Jiaxin Wang, and Yu Zhang

392

Intrusion Detection and Networks A Comparative Study of Unsupervised Machine Learning and Data Mining Techniques for Intrusion Detection Reza Sadoddin and Ali A. Ghorbani

404

Long Tail Attributes of Knowledge Worker Intranet Interactions Peter Geczy, Noriaki Izumi, Shotaro Akaho, and Köiti Hasida

419

A Case-Based Approach to Anomaly Intrusion Detection Alessandro Micarelli and Giuseppe Sansonetti

434

Sensing Attacks in Computers Networks with Hidden Markov Models. . . Davide Ariu, Giorgio Giacinto, and Roberto Perdisci

449

XII

Table of Contents

Frequent and Common Item Set Mining FIDS: Monitoring Frequent Items over Distributed Data Streams Robert Füller and Mehmed Kantardzic Mining Maximal Frequent Itemsets in Data Streams Based on FP-Tree Fujiang Ao, Yuejin Yan, Jian Huang, and Kedi Huang CCIC: Consistent Common Itemsets Classifier Yohji Shidara, Atsuyoshi Nakamura, and Mineichi Kudo

464

479 490

Mining Marketing Data Development of an Agreement Metrie Based Upon the RAND Index for the Evaluation of Dimensionality Reduction Techniques, with Applications to Mapping Customer Data Stephen France and Douglas Carroll

499

A Sequential Hybrid Forecasting System for Demand Prediction Luis Aburto and Richard Weber

518

A Unified View of Objective Interestingness Measures Celine Hebert and Bruno Cremilleux

533

Comparing State-of-the-Art Collaborative Filtering Systems Laurent Candillier, Frank Meyer, and Marc Boulle

548

Structural Data Mining Reducing the Dimensionality of Vector Space Embeddings of Graphs . . . Kaspar Riesen, Vivian Kilchherr, and Horst Bunke

563

PE-PUC: A Graph Based PU-Learning Approach for Text Classification Shuang Yu and Chunping Li

574

Efficient Subsequence Matching Using the Longest Common Subsequence with a Dual Match Index Tae Sik Han, Seung-Kyu Ko, and Jaewoo Kang

585

A Direct Measure for the Efficacy of Bayesian Network Structures Learned from Data Gary F. Holness

601

Image Mining A New Combined Fractal Scale Descriptor for Gait Sequence Li Cui and Hua Li

616

Table of Contents

XIII

Palmprint Recognition by Applying Wavelet Subband Representation and Kernel PCA Murat Ekinci and Murat Aykut

628

A Filter-Refinement Scheine for 3D Model Retrieval Based on Sorted Extended Gaussian Image Histogram Zhiwen Yu, Shaohong Zhang, Hau-San Wong, and Jiqi Zhang

643

Fast-Maneuvering Target Seeking Based on Double-Action Q-Learning Daniel CK. Ngai and Nelson H.C Yung

653

Mining Frequent Trajectories of Moving Objects for Location Prediction Mikolaj Morzy

667

Categorizing Evolved CoreWar Warriors Using EM and Attribute Evaluation Doni Pracner, Nenad Tomasev, Milos Radovanovic, and Mirjana Ivanovic Restricted Sequential Floating Search Applied to Object Selection J. Arturo Olvera-Lopez, J. Francisco Martinez-Trinidad, and J. Ariel Carrasco-Ochoa Color Reduction Using the Combination of the Kohonen Self-Organized Feature Map and the Gustafson-Kessel Fuzzy Algorithm Konstantinos Zagoris, Nikos Papamarkos, and Ioannis Koustoudis A Hybrid Algorithm Based on Evolution Strategies and Instance-Based Learning, Used in Two-Dimensional Fitting of Brightness Profiles in Galaxy Images Juan Carlos Gomez and Olac Fuentes Gait Recognition by Applying Multiple Projections and Kernel PCA . . . Murat Ekinci, Murat Aykut, and Eyup Gedikli

681

694

703

716 727

Medical, Biological, and Environmental Data Mining A Machine Learning Approach to Test Data Generation: A Case Study in Evaluation of Gene Finders Henning Christiansen and Christina Mackeprang Dahmcke

742

Discovering Plausible Explanations of Carcinogenecity in Chemical Compounds Eva Armengol

756

One Lead ECG Based Personal Identification with Feature Subspace Ensembles Hugo Silva, Hugo Gamboa, and Ana Fred

770

XIV

Table of Contents

Classification of Breast Masses in Mammogram Images Using Ripley's K Function and Support Vector Machine Leonardo de Oliveira Martins, Erick Correa da Silva, Aristöfanes Correa Silva, Anselmo Cardoso de Paiva, and Marcelo Gattass Selection of Experts for the Design of Multiple Biometrie Systems Roberto Tronci, Giorgio Giacinto, and Fabio Roli Multi-agent System Approach to React to Sudden Environmental Changes Sarunas Raudys and Antanas Mitasiunas Equivalence Learning in Protein Classification Attila Kertesz-Farkas, Andrds Kocsor, and Sändor Pongor

784

795

810 824

Text and D o c u m e n t Mining Statistical Identification of Key Phrases for Text Classification Frans Coenen, Paul Leng, Robert Sanderson, and Yanbo J. Wang

838

Probabilistic Model for Structured Document Mapping Guillaume Wisniewski, Francis Maes, Ludovic Denoyer, and Patrick Gallinari

854

Application of Fractal Theory for On-Line and Off-Line Farsi Digit Recognition Saeed Mozaffari, Karim Faez, and Volker Märgner Hybrid Learning of Ontology Classes Jens Lehmann

868 883

Discovering Relations Among Entities from XML Documents Yangyang Wu, Qing Lei, Wei Luo, and Harou Yokota Author Index

911