HOME Dataset definition Datasets Dataset, creation of artificial datasetsDataset overload

Dataset overload

Operations and operators

Although datasets may carry various kinds of additional information, they can be treated like 2-dimensional matrices in many common Matlab constructs. In the below table examples are presented. In these examples A, B and C are datasets, unless otherwise specified. The resulting datasets always have, as far as applicable, the same fields as the original dataset. Where needed they are adapted (e.g. sizes and labels) to assure that the result is a consistent dataset.

> Examples of standard Matlab operations applied to datasets
B = A(:,[3 5]) Construct a new dataset using the features 3 and 5 of A.
A(:,4) = [] Delete feature 4 of A.
B = A(1:10,:) Select the first 10 objects of A.
B = A(A(:,2)>0,:) Select those objects of A for which feature 2 is positive.
C = [A;B]; Add the objects of dataset B to those of A. A and B should have the same number of features.
C = [A B(:,3)] Add feature 3 of B to the features of A. A and B should have the same number of objects.
B = A+7 Add 7 to all elements of A.
B = A+[0 4 6] Add the vector [0 4 6] to all objects of A. A should be a 3-dimensional dataset.
A(:,7) = A(:,7)*5 Multiply feature 7 by 5.
B = A-mean(A) Subtract the mean of all objects from the dataset.
B = A./std(A) Divide for all objects the features by the standard deviation over the dataset.

The following Matlab operators are defined for datasets:

    +, -, *, .*, .^ , /, \, ./, |, &, ~, xor, abs, '

In dyadic operations the field settings of the first dataset are copied. Dyadic operations of datasets and double arrays are supported, provided that the dimensions fit. Vectors and scalars used as doubles in dyadic operations with datasets are given the appropriate size using the Matlab repmat command.

The logic operations

     >, >=, <, <=, =, ~=

applied to datasets return logicals as in other, non-PRTools constructs.

Indexing

The dataset fields might be accessed by structural indexing. Indexing by B = A(I,J) results in a dataset of just the objects I and the features J. Some classes may now have zero elements. They can be removed by the setlablist command:

    A = gendath
%        Highleyman Dataset, 100 by 2 dataset with 2 classes: [50  50]
    B = A(1:10,:)
%        Highleyman Dataset, 10 by 2 dataset with 2 classes: [10   0]
    C = setlablist(B)
%        Highleyman Dataset, 10 by 2 dataset with 1 class: [10]

Overloaded Matlab commands

Matlab commands that are overloaded for datasets are: abs, conj, corrcoef, double, sum, cumsum, min, max, mean, median, det, eig, inv, pinv, log, exp, find, hist, isempty, isfinite, isnan, size, length, plot, real, repmat, sort, sqrt. They operate on the dataset data field and when appropriate they return again a dataset.

For many Matlab commands without a dataset overload, an easy way to apply them on the data field of a dataset is by extracting this field, operating and reassigning it by setdata, e.g.:

    B = setdata(A,round(getdata(A)));

which performs the not overloaded operation B = round(A) on a dataset A. Another, somewhat more easy way is to use filtm: B = filtm(A,'round')


R.P.W. Duin, January 28, 2013


HOME Dataset definition Datasets Dataset, creation of artificial datasetsDataset overload