PRTools manual in preparation -- Dataset definition

The two items data and labels are essential for the operation of PRTools. If data is neglected (data = []) an empty dataset is defined. If labels is not supplied the objects remain unlabeled. In case just a part of the objects have no labels the corresponding entry of labels should contain a NaN for numeric labels or an empty string ('') in case of string labels.

Various items can be stored in a dataset. A full list can be found be converting a dataset variable into a structure.

All fields have a corresponding set-command (e.g. setdata) to store it and a get-command (e.g. getdata) to retrieve it. Users are discouraged to use the '.'-constructs (e.g. a.files) as it will not guarantee consistency with other fields. In some cases not the exact fields are retrieved but some derived data. In the table more information is given.

> The fields of the dataset structure
`data`	This is the main field, storing the data as it is supplied by calling the `dataset` constructor or by `setdata`. The size of dataset, the number of objects (`m`) by the number of features (`k`) is derived from `data`.
`lablist`	This is a cell array that encodes the class names derived from the objects labels. Datasets can have multiple sets of labels for their objects of which just one is active, multi-labeling. The `lablist` field stores the necessary administration. The active set of labels can be retrieved by the commands `classnames` and `getlablist`. See also the following nlab item.
`nlab`	The labels supplied in the `dataset` definition are summarized by `lablist` and `nlab`. `lablist` contains the unique labels (class names) and `nlab` is an index vector for `lablist`. Its values range between 1 and the total number of classes (size of `lablist`). Entries in `nlab` for objects that are unlabeled are set to 0 (zero). The `setnlab` command should be treated with care as it changes the labeling of the dataset.
`labtype`	The labeling type (crisp, soft or targets, see above) is stored here. `setlabtype` changes the label type, but may also change nlab and lablist fields. The conversion rules are described elsewhere.
`featlab`	The feature labels are strings or numbers and are used by `PRTools` (if given) to annotate plots.
`featdom`	Here feature domains are stored. If these fields are set tests are performed whenever the values in the `data` field changes to check whether the new data is within the supplied domains.
`prior`	Classes in a dataset may have prior probabilities. These are used in density based classifiers, in error evaluation by `testc` and on some other places. If not set, the prior field is empty and when prior probabilities are needed the class frequencies in the dataset are taken.
`cost`	In this field a cost matrix can be stored for performance evaluation and procedures that explicitly minimize classification costs. Unless explicitly mentioned `PRTools` neglects this field.
`objsize`	The object size is the number of rows (objects) in the data field. It is retrieved by the `size` and `getsize` commands (`getsize` may return the number of classes as well). Although the routines `getobjsize` and `setobjsize` exist, users are discouraged to use them except in relation with image handling.
`featsize`	The feature size is the number of columns (features) in the data field. It is retrieved by the `size` and `getsize` commands (`getsize` retrieves the number of classes as well). Although the routines `getfeatsize` and `setfeatsize` exist, users are discouraged to use them except in relation with image handling.
`ident`	In subfields of the ident field various object identifiers can be stored. One field is always available: `ident`. Unless changed by the user it contains the object indices at creation of the dataset. Every ident subfield stores vectors or arrays of doubles, strings or cells with as many rows as there are objects in the dataset.
`version`	At creation of the dataset `PRTools` stores here its version and the date.
`name`	The user may supply a name here. It is displayed in the command window when a command returning a dataset is executed without a semicolon. The dataset name may also be used for annotating plots.
`user`	In this field the user can add and retrieve any additional annotation for the dataset in its entirety.

When datasets are changed, e.g. by a transformation of the data, or by taking a subset of features or objects, all relevant information is copied, including the name and user field.