HOME Dataset details Multi-labeling systemLabel types and conversion rules

Label types and conversion rules

Here more details will be given on the three label types that can be used in datasets and their conversion rules.

> Commands for labeling datasets and setting the label type
setlabels Define the labels for all objects. Used for crisp and soft labels.
getlabels Retrieve the labels.
setnlab Redefine the numeric labels (pointers in a lablist) of a crisp labeled dataset
getnlab Retrieve the numeric labels (pointers in a lablist) of a crisp labeled dataset
settargets Define the targets of the objects of a dataset of type 'targets'. Can also be used for defining soft labels instead of setlabels.
gettargets Retrieve the targets or soft labels of a dataset.
setlabtype Change the label type. Existing labels will be converted.
getlabtype Retrieve the label type.
setlablist (Re)define the class names.
getlablist Retrieve the class names, also called label list or lablist. For crisp datasets the pointers retrieved by getnlab refer to this lablist.

Label type

Objects can be labeled by three possible label types, as already described before:

Crisp labels, soft labels and targets can be assigned to the objects of a dataset A by

    A = setlabels(A,labels)

They can be retrieved by the getlabels command and the list of class names is returned by getlablist. The setlabels command however can only be used correctly if A is already a dataset of the right type. If not, the label type has to be converted first.

    A = setlabtype(A,labtype)

in which labtype is 'crisp', 'soft',or 'targets'. This is not a problem if A is still unlabeled or if its present labels will not be used anymore and a consecutive setlabels command will assign new labels. In case the present labels are still needed but have to be converted from one type to another, conversion rules apply. They are described below.

Crisp labels

For crisp labeling labels should be either a column vector of integer numbers (one for every object in A) or a character array or cell array supplying the name of the desired class for every object. Some examples for an arbitrary set of 4 objects, here randomly generated.

    D = dataset(rand(4,2)); % unlabeled dataset
    A = setlabels(D,[2 5 5 2]');
    B = setlabels(D,{'apple','apple','banana','banana'});

Although the dataset D is unlabeled, it has, as any dataset, a label type. This is by default crisp. So the datasets have thereby crisp labels. In both cases they have two classes. The names of these classes are stored in label lists, in PRTools terminology in a lablist. It can be retrieved by getlablist:

    getlablist(A)
%       2
%       5
    getlablist(B)
%       apple
%       banana

PRTools stores the labeling of the objects by indices in these label lists. These are called numeric labels and the corresponding vector (with an element for every object) is usually called nlab. It can be retrieved by getnlab:

    getnlab(A)
%       1
%       2
%       2
%       1
    getnlab(B)
%       1
%       1
%       2
%       2

The class names can be changed by the setlablist command, e.g.:

    A = setlablist(A,{'car','bicycle'});
    getlabels(A)
%       car
%       bicycle
%       bicycle
%       car

Soft labels

The idea of soft labels is that object can potentially belong to all classes. Multiple crisp labels (e.g. an object is a 'pen' as well as a 'pointer'), fuzzy labels and probabilistic labels are some examples. In PRTools soft labels should have values between 0 and 1 and have to be specified for every class. They do not necessarily sum to one over all classes. The total set of soft labels for a dataset has a size [m,c] if m is the number of objects and c the number of classes. In order to be able to give the classes names as well, soft labels themselves can also be specified as a dataset. In this label dataset the soft labels should be stored as features and the class names as their feature labels. The resulting label dataset is then used for labeling, e.g.:

    a = dataset(rand(4,3)); % unlabeled dataset with 4 objects which is by default 'crisp'
    a = setlabtype(a,'soft');                    % make it 'soft'
    labs = dataset([1 0; 0.8 .7; 0 1; 0.3 0.3]); % soft labels of 4 objects for 2 classes
    labs = setfeatlab(labs,{'pen','pointer'});   % name the classes 'pen' and 'pointer'
    a = setlabels(a,labs);                       % assign the labels
    getlablist(a)
%       pen
%       pointer

As soft labels are very similar to targets, the settarget command may be used for soft labels as well, see the paragraph on target labels.

Target labels

The name 'labels' for targets is somewhat confusing. Conceptually however they are very similar to labels. The only difference with soft labels, except from their use and interpretation, is that they are not restricted to the interval [0,1]. Commands like setlabels, getlabels, the naming of the targets and retrieving them by getlablist work identical as for soft labels. To facilitate the naming another command is introduced, settargets, which may be applied to soft labels as well. Here is an example of its use:

    a = dataset(rand(4,3)); % unlabeled dataset with 4 objects which is by default 'crisp'
    a = setlabtype(a,'targets');                    % make it 'targets'
    targets = [-3 1020; -8 1060; 4 960; 1 950] ;    % define targets for every object
    a = settargets(a,targets,{'temperature','pressure'}); % assign targets to data and name them
    getlablist(a)
%       temperature
%       pressure

The same procedure as followed for the soft labels may be followed for the targets as well.

Conversion rules

The preferred ways for defining datasets of a specific type are described above for the crisp and soft labels and the targets. Sometimes it may be desired to convert the label type of an existing dataset to another one using the setlabtype command. Below the rules are described that are applied by PRTools for conversion.

> Conversion rules of labels and targets in changing dataset types by setlabtype
crisp -> soft the crisp class gets soft label value 1, the other 0.
crisp -> targets the crisp labels are first converted to soft labels and then to targets.
soft -> crisp the label is set to the maximum soft label value.
soft -> targets the [0,1] interval is linearly converted to [-1,+1].
targets -> crisp the label is set to the maximum target value.
targets -> soft targets are mapped to [0,1] by the sigmoid function (sigm)


R.P.W. Duin, January 28, 2013


HOME Dataset details Multi-labeling systemLabel types and conversion rules