PRTools manual in preparation -- A basic PRTools program

A basic PRTools program

A basic PRTools program

We will develop here a very basic, but complete PRTools based recognition system. It is meant to give the reader a first flavor of the concept. More worked out examples as well as a proper description of the commands will follow. The main point that readers should realize in studying the next lines is that the application of a mapping to a dataset or a datafile as well as the concatenation of mappings are written in PRTools by the overloaded *-operator.

The total recognition chain in PRTools terms consists of the following steps:

Collect raw data on disk
Define a datafile A pointing to the raw data
Define a mapping W_prepproc for an appropriate preprocessing and analyzing the datafile
Apply the mapping to the datafile, resulting in a dataset, B = A*W_preproc
Define a suited conversion of the feature space, e.g. by PCA: W_featred
Apply this mapping on B : C = B*W_featred
Train a classifier in this space: W_classf

Apply the dataset to this classifier:

 labels = C*W_classf
        = B*W_featred*W_classf
        = A*W_preproc*W_featred*W_classf.

As the mappings W_preproc, W_featred and W_classf are stored in variables and as the concatenations of a sequence of mappings is defined in PRTools the entire recognition system can be stored in a single variable: W_recsys = W_preproc*W_featred*W_classf. New objects, e.g. images stored on disk as a datafile A, can now be classified by labels = A*W_recsys.

In this example three mappings have to be specified by the user. The first, W\_preproc, is usually entirely based on the background knowledge of the user of the type of images he wants to classify. The other two, the feature reduction and the classifier, have to be derived from data based on an optimization of a cost function or an estimation of parameters given a model assumption. In pattern recognition terms, these mappings are thereby the result from training. Datasets are needed for this, based on the same preprocessing and representation of the data to be classified later. There are many routines in PRTools available for training mappings and classifiers. It is in fact the core of the toolbox.

Consequently we distinguish two sets of objects: a training set with given labels (class memberships) to be used for designing the system and a an unlabeled set for which the class memberships have to be found. The first step of the program is the definition of these sets such that they can be handled by PRTools. Let us assume that the raw data has been stored in two directories, 'directory_1' and 'directory_2':

  A_labeled = datafile('directory_1');
  A_unlabeled = datafile('directory_2');

It will be described later how the labels of A_labeled have to be supplied and how they are stored. The first mapping has to define features for objects. A simple command is the use of histograms which can be specified by the following mapping:

  W_preproc = histm([],[1:256]);

The preprocessing of the two datafiles and their conversion to datasets is performed by

  B_labeled = dataset(A_labeled*W_preproc);
  B_unlabeled = dataset(A_unlabeled*W_preproc);

Let us assume that a feature reduction by PCA is demanded to 5 features. It has to be derived from the preprocessed data, of course.

  W_featred = pca(B_labeled,5);

Suppose that finally the Fisher classifier is used. It has to be found in the reduced feature space:

  W_classf = = fisherc(B_labeled*W_preproc*W_featred);

The labels for B_unlabeled can now be estimated by

  labels = B_unlabeled*W_preproc*W_featred*W_classf*labeld;

in which labeld is a standard PRTools mapping that maps classifier outcomes to labels. The classification system can also be stored in a single variable W_class_sys:

  W_class_sys = W_preproc*W_featred*W_classf*labeld;
  labels = B_unlabeled*W_class_sys;

In the next subsections some worked out examples are presented.

R.P.W. Duin, January 28, 2013

A basic PRTools program