Investigating Boolean Matrix Factorization

Investigating Boolean Matrix Factorization Václav Snášel

Jan Platoš

Pavel Krömer

FEECS, VŠB – Technical University of Ostrava 17. listopadu 15 708 33 Ostrava – Poruba, Czech Republic



[email protected] Dušan Húsek

[email protected] Roman Neruda

[email protected] Alexander A. Frolov

ICS - Acad. of Sci. of Czech Republic ˇ 2 Pod Vodárenskou veží 182 07 Prague, Czech Republic

ICS - Acad. of Sci. of Czech Republic ˇ 2 Pod Vodárenskou veží 182 07 Prague, Czech Republic

IHNA Russian Acad. of Sci. Butlerova 5a 117 485 Moscow, Russia

[email protected]

[email protected]

ABSTRACT Matrix factorization or factor analysis is an important task helpful in the analysis of high dimensional real world data. There are several well known methods and algorithms for factorization of real data but many application areas including information retrieval, pattern recognition and data mining often require processing of binary rather than real data. Unfortunately, the methods used for real matrix factorization fail in the latter case. In this paper we focus on the Boolean Matrix Factorization (BMF), introduce the task and present neural network, genetic algorithm and nonnegative matrix facrotization based BMF solvers. The algorithms are tested on several data sets and their results are compared.

1.

INTRODUCTION

In order to perform object recognition (no matter which one) it is necessary to learn representations of the underlying characteristic components. Such components correspond to object-parts, or features. These data sets may comprise discrete attributes, such as those from market basket analysis, information retrieval, and bioinformatics, as well as continuous attributes such as those in scientific simulations, astrophysical measurements, and sensor networks. The feature extraction if applied on binary datasets, addresses many research and application fields, such as association rule mining [1], market basket analysis [4], discovery of regulation patterns in DNA microarray experiments [21], etc. Many of these problem areas have been described in tests of PROXIMUS framework (e.g. [14]).

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. DMMT’08, August 24, 2008, Las Vegas, USA. Copyright 2008 ACM 978-1-60558-307-5/08/08 ...$5.00.

[email protected]

Many applications in computer and system science involve analysis of large scale and often high dimensional data. When dealing with such extensive information collections, it is usually very computationally expensive to perform some operations on the raw form of the data. Therefore, suitable methods approximating the data in lower dimensions or with lower rank are needed. In the following, we focus on the factorization of two-dimensional binary data (matrices, second order tensors). The rest of the paper is structured as follows: first, a brief introduction to matrix factorization is given. In the following section, the basics of Evolutionary and Genetic Algorithms are presented. The rest of the paper brings description of Boolean Factor Analysis, Genetic Binary Matrix Factorization and summarizes performed computer experiments and conclusions drawn from our work.

2.

MATRIX FACTORIZATION

Matrix factorization (or matrix decomposition) is an important task in data analysis and processing. A matrix factorization is the right-side matrix product in

A ≈ F1 · F2 · . . . · Fk

(1)

for the matrix A. The number of factor matrices depends usually on the requirements of given application area. Most often, k = 2 or k = 3. There are several matrix decomposition methods reducing data dimensions and simultaneously revealing structures hidden in the data. Such methods include Singular Value Decomposition (SVD) and Nonnegative Matrix Factorization (NMF), which is subject of our interest in this paper.

2.1

Non-negative Matrix Factorization

Non-negative matrix factorization (NMF) [2, 15] is recently very popular unsupervised learning algorithm for efficient factorization of real matrices implementing the non-negativity

constraint. NMF approximates real m× n matrix A as a product of two non-negative matrices W and H of the dimensions m × r and r × n respectively. Moreover, it applies that r

Investigating Boolean Matrix Factorization

Investigating Boolean Matrix Factorization

Suggest Documents

Developing Genetic Algorithms for Boolean Matrix Factorization

Model Order Selection for Boolean Matrix Factorization - People

Probabilistic Matrix Factorization (PMF)

Matrix Factorization as Searchâ

Linked Matrix Factorization

NONNEGATIVE MATRIX FACTORIZATION WITHOUT ...

Randomized Nonnegative Matrix Factorization

Kernelized Bayesian Matrix Factorization - arXiv

using coupled nonnegative matrix factorization

Robust Asymmetric Nonnegative Matrix Factorization

Clustering by Nonnegative Matrix Factorization

Collaborative Filtering, Matrix Factorization and

Nonnegative Matrix Factorization with Temporal

probabilistic non-negative matrix factorization

Projective Nonnegative Matrix Factorization based

Nonnegative Matrix Factorization for Clustering

Data Fusion by Matrix Factorization

NONNEGATIVE MATRIX FACTORIZATION AND SPATIAL ...

Projective Nonnegative Matrix Factorization: Sparseness ...

Constrained Nonnegative Matrix Factorization for

Matrix factorization in Cross-domain

Nonnegative Matrix Factorization for Rapid Recovery of ... - Parra Labhttps://www.researchgate.net/.../Nonnegative-Matrix-Factorization-for-Rapid-Recover...

Factorization and the Schur-Cohn matrix of a matrix

A PRACTICAL ALGORITHM FOR BOOLEAN MATRIX ...

Investigating Boolean Matrix Factorization