HOME Missing data Dataset details Missing labels

Missing labels

Objects can be labeled or stay unlabeled. Even entire datasets can be unlabeled. There is a slight difference.

In case a dataset is defined from raw data by

    a = dataset(data)

no labels are defined, so there are no class names and the lablist field in the dataset definition remains empty (getlablist(a) is empty).

In case a dataset has to be labeled but some labels are unknown they should coded as missing. How this has to be done depends on the way the known labels are defined.

> Coding of missing labels in datasets
crisp labels given by numbers NaN
crisp labels given by strings '' (empty string)
crisp labels given by cells {} (empty cell) or{''} (empty cell string)
soft labels NaN
targets NaN

Objects with missing labels are neglected when they are used for training supervised procedures (training procedures that are based labeled objects). For soft labels and targets objects or classes with missing values should first be deleted or the missing values should be filled first as for the missing data in the features missing data in the features.


R.P.W. Duin, January 28, 2013


HOME Missing data Dataset details Missing labels