PRTools manual in preparation -- PRTools structural indexing

PRTools structural indexing

PRTools structural indexing

The fields of the variables of the PRTools programming classes dataset, datafile and mapping can be retrieved by get-commands, e.g. getdata or getlabels and can be set by corresponding set-commands like setdata and setlabels. A different way to do about the same (but not exactly) is by the use of substructures, e.g.

    a = dataset(randn(250,5))
    a.featsize
%       5
    a.objsize
%       250

These commands show the same information as can be retrieved by getfeatsize(a) or getobjsize(a). The substructure construct can also be used for assignment:

    a.featsize = 6;
    a.objsize = 100;

The big and significant difference with the set- and get-commands is that the use of substructures gives direct access to the fields of the variables without any error checking. This is evident from the above example as after the assignment there is a conflict between the size of the data field and featsize and objsize.

The substructure handling is just implemented for maintenance by the PRTools designers who are aware of all underlying constructions. Users are severely discouraged to use the substructure retrieval and assignment as described in this section, certainly in their own coded commands. Changes in PRTools may effect such operations and they might become incompatible.

An exception is that the direct access of the structure fields can be very useful for debugging purposes from the command line. Here is an example which is only useful for advanced users that are able and prepared to study PRTools sources. It shows how the data field of a combined classifier including some preprocessing is inspected.

    A = gendatb;
    U1 = parzenc([],1);
    U2 = treec([],'maxcrit');
    U3 = qdc;
    U = scalem([],'variance')*[U1 U2 U3]*maxc;
    W = A*U
%       Minimum combiner, 2 to 2 trained  mapping   --> fixedcc

The last line shows that the final classifier has as a name 'Minimum combiner' and is should be executed by the routine fixedcc. This is the general routine that executes all fixed combiners.

    W.data
%        [2x6 mapping]    'max'    'Maximum combiner'     []

In the data field of W a cell array is stored with the above 4 elements. The second, 'max', is the type of combiner, the third is the name and the empty field is there because the max combiner has no parameters. The first element is inspected by:

    W.data{1}
%       unit-var+, 2 to 6 trained  mapping   --> sequential

So in this field a mapping named unit-var+ is stored, to be executed by the procedure sequential. It is a 2 by 6 classifier, which fits with the fact that there are 3 2-class base classifiers in 2 dimensions. The data field of this sequential classifier contains:

    W.data{1}.data
%       [2x2 mapping]    [2x6 mapping]

The two mappings that are combined by sequential are:

    W.data{1}.data{1}
%       unit-var, 2 to 2 trained  mapping   --> affine
    W.data{1}.data{2}
%       2 to 6 trained  mapping   --> stacked

Apparently an affine transform and a stacked combination. Details can be found by

    W.data{1}.data{1}.data
%       rot: [0.2143 0.3535]
%       offset: [0.5279 0.8172]
%       lablist_in: [2x1 double]

which is the result of the feature normalization by scalem([],'variance'), and

    W.data{1}.data{2}.data
%       [2x2 mapping]    [2x2 mapping]    [2x2 mapping]
    W.data{1}.data{2}.data{1}
%       Parzen Classifier, 2 to 2 trained  mapping   --> parzen_map
    W.data{1}.data{2}.data{2}
%       Decision Tree, 2 to 2 trained  mapping   --> tree_map
    W.data{1}.data{2}.data{3}
%       Bayes-Normal-2, 2 to 2 trained  mapping   --> normal_map

These are the three base classifiers. The final details can be found by the last steps, e.g.

    W.data{1}.data{2}.data{1}.data
%       [100x2 dataset]    [2x2 double]
    W.data{1}.data{2}.data{1}.data{2}
%       1     1
%       1     1

showing that the Parzen classifier stores the entire training dataset of size [100,2] and the smoothing parameters for both classes and both features, all 1.

A quick summary of the above analysis is produced by the command parsc which breaks down the data fields of mappings recursively:

    parsc(w)
%       Minimum combiner, 2 to 2 trained  mapping   --> fixedcc
%           unit-var+, 2 to 6 trained  mapping   --> sequential
%               unit-var, 2 to 2 trained  mapping   --> affine
%               2 to 6 trained  mapping   --> stacked
%                   Parzen Classifier, 2 to 2 trained  mapping   --> parzen_map
%                   Decision Tree, 2 to 2 trained  mapping   --> tree_map
%                   Bayes-Normal-2, 2 to 2 trained  mapping   --> normal_map

It tells that fixedcc operates on a sequential combination of an affine transform (defined by scalem) and the stacked combination of three classifiers.

R.P.W. Duin, January 28, 2013

PRTools structural indexing