Incremental K-Means

From Emcap

Contents

Incremental K-Means

by Ricard Marxer <email@ricardmarxer.com>

Music Technology Group, Universitat Pompeu Fabra

2008/2/4

Copyright (c) 2006-2008 All Rights Reserved.

Download Emerge 1.0 (GNU/Linux)

Download Emerge 1.0 (Windows)

What Is The Module?

It is in charge of classifying in an online, unsupervised and incremental manner a stream of incoming feature events. The module outputs representation events consisting of class assignments to each of the feature events received. The module also outputs changes to the representation space such as the creation of a new empty class, the creation of a new merged class or the removal of an existing class. The classification is done on the basis of the feature values, the timings values are left unchanged and unclassified.

This online classifier is based on the paper "An online learning technique for coping with novelty detection and concept drift in data streams" by Eduardo J. Spinosa et al. 2006.

It is an abstract model of online classifier based on the idea of having a "normal model" and an "unknown dataset" which corresponds to the short term memory and that holds all the data that cannot be explained or classified using the normal model. There are several internal steps:

  • initialization with first data
  • on new data arrival:
  • if normal data:
    • classify the data
    • update the normal model
  • if unknown data:
      • add data to unknown dataset
      • update the unknown dataset
      • if unknown dataset contains "novel concepts":
        • update the normal model by adding a new cluster
      • if unknown dataset contains "concept drift":
        • update the normal model by modifying the drifted cluster

This module is often used between a feature extractor and an expectator which acts on symbolic data such as the Conceptual Boltzmann Machine.

What Are The Inputs/Outputs?

The input to the model is a stream of feature events, each of them consisting of a timestamp a feature vector and their associated precisions. The output consists of a series of classes assigned to the input feature events. The classes assigned are the output of the process symbolization given as input a point in a continuous multidimensional space.

Since we are in an online and incremental configuration, the partitions of the space (the available symbols) are continuously dynamically changing and appearing, thus a complementary output informs of the partition changes. These notifications are in the form of three basic commands:

  • creation of a new empty set inside a superset
  • creation of a set inside a superset merging a group of subsets
  • removal of a set

This allows for connected modules to keep the interpretation of the symbols up to date and in synchrony with this module.

What Are The Parameters?

The parameters available to control this module are the minimum number of instances for a concept to be accepted, the acceptance coefficient for an event to be part of a concept and the plotting option.

Usage: emem-inckmeans [options]
Options:
 -h, --help            show this help message and exit
 -n MIN_INSTANCES, --min-instances=MIN_INSTANCES
                       the value for the minimun instances for a concept to
                       be considered novel [default: 10]
 -a ACCEPTANCE_COEFF, --acceptance-coeff=ACCEPTANCE_COEFF
                       the value defining the coefficient for concept to
                       accept a token [default: 1.0]
 -p, --plot            plot the representation module during exposure
                       [default: False]

What Are The System Requirements?

GNU/Linux

The package can be installed in several different ways, depending on your system. The easiest way is to install the setuptools package. On Debian based systems:

$ sudo apt-get install python-setuptools

and then just write the following:

$ sudo easy_install http://www.ricardmarxer.com/emcap/Emerge-1.0-py2.5.egg

If this method does not work correctly, one must install manually all the dependencies, and build the package himself. For the software to fully work you must have the following dependencies installed:

Then one must download the package here and run:

$ python setup.py build
$ sudo python setup.py install

Windows

A binary package has been made for ease of installation and use. To run, one must unzip the package somewhere and run the following executable from the command line:

emem-inckmeans.exe [OPTIONS] INPUT_FILENAME OUTPUT_FILENAME

How To Cite

@incollection{marx07,
 author = {Ricard Marxer and Piotr Holonowicz and Hendrik Purwins},
 title = {Dynamical Hierarchical Self-Organization of Harmonic, Motivic, and
 Pitch Categories},
 address = {Vancouver, Canada},
 booktitle = {Music, Brain and Cognition. Part 2: Models of Sound and
 Cognition, held at NIPS)},
 note = {(to appear)},
 year = {2007}
}

How To Use

$ emem-inckmeans input_filename.fev output_filename.rev

Input File Format

A file containing one feature event per line. The events are of the form:

ONSET-TIME,ONSET-TIME-VAR;FEAT1-VAL,FEAT1-VAR FEAT2-VAL,FEAT2-VAR ... FEATn-VAL,FEATn-VAR

Output File Format

A file containing one representation event per line. The events are of two kind:

  • Assignemnts of classes to incoming feature events in the form
ONSET-TIME,ONSET-TIME-VAR;CLASS-ID1 CLASS-ID2 ... CLASS-IDn
  • Changes to the representation space in the form:
CREATE CLASS-IDnew IN CLASS-IDparent
REMOVE CLASS-IDremoved
MERGE CLASS-ID1 CLASS-ID2 ... CLASS-IDn IN CLASS-IDparent

Example Of Use

$ emem-inckmeans test.fev test.rev