active learning for sound event classification by

0 downloads 0 Views 281KB Size Report
ACTIVE LEARNING FOR SOUND EVENT CLASSIFICATION ... Training sound event classifier requires annotated recordings: ... Dataset: UrbanSound 8K.
ACTIVE LEARNING FOR SOUND EVENT CLASSIFICATION BY CLUSTERING UNLABELED DATA Zhao Shuyang, Toni Heittola, Tuomas Virtanen Department of Signal Processing, Tampere University of Technology, Finland Background

Evaluation Dataset: UrbanSound 8K

Training sound event classifier requires annotated recordings: I I

Audio data is easy to collect. Annotation is time-consuming. Idea: Utilizing abundant unlabeled data to optimize the effectiveness of the annotation effort.

I

8732 labeled sound segments

I

10 sound event classes in open urban space.

I

Cross-validation: 10-fold. Setup

Proposed method Medoid-based active learning (MAL): parition the data and annotate only the medoid segments, centroids of clusters. I I

I

Labels are produced by simulating limited number of labeling responses (labeling budget), according to the ground truth.

I

Number of clusters is set to 1/4 of the number of unlabeled data points.

I

Supervised learning setup (feature and model) follows UrbanSound SVM baseline. Features are various MFCCs statistics within segments: mean, median, variance, minimum, maximum, skewness, etc.

I

Compared with reference methods, including random sample (baseline), certainty-based active learning(CRTAL) and semi-supervised learning (SSL).

I

Experiments are repeated five times and the average performance is reported.

Medoids are assured to span different local distributions. A labeled medoid can be used to derive predicted labels for other cluster members.

Result

Figure 1: Overview of the proposed method. Medoid segments are marked with red border. Annotated labels are filled with black and predicted labels are filled with grey.

Noticeable technical details I

I

In the clustering stage, sound segments are respresented with a single Gaussian, based on the MFCCs statistics. Segment-segment dissimilarity is measured by the symmetric KL divergence.

Figure 2: Classification accuracy as a function of labeling budget, simulated using an oracle annotator. I

The classification accuracy is improved by 8%, when labeling budget is lower than 10% of unlabeled data.

I

The proposed method saves 50% to 60% budget to achieve the same accuracy, with respect to the best reference method.

Conclusions I

The proposed method effectively saves labeling budget for sound event classification.

Initialization of mediods is based on farthest-first traversal, starting from a random point.

I

Future: Study with different datasets, especially larger scale datasets.

I

An annotated label overrules predicted labels on the same segments.

I

Future: Study weak annotator case, simulated using real human labeling responses.

I

Produced labels are used to construct training examples for supervised learning.

I I

K-medoids clustering is based on the dissimilarity matrix.

[email protected]

Suggest Documents