HOME Datafile background Datafiles Datafile overloadDatafile definition

Datafile definition

Constructor

The datafile constructor looks like

    a = datafile(directory,type,readcommand,par1,par2,... )

Datafile types

There are five types supported for datafiles:

The raw type is the original, standard type. It is useful for raw date like images, time signals and spectra. The user should organize the files class-wise in sub-directories before the datafile constructor is called.

An important point to note is that in the datafile constructor call a mat-file with all information is stored in directory. This is done to speed up loading of an existing, earlier defined datafile, as the constructor may be time consuming because all directories, and for some types, all files have to be inspected.

Datafile structure

The below information is only important for maintenance and not for users.

Datafiles are children of the programming class dataset. It adds a few fields. These can be found by converting a datafile variable into a structure. In the following example the ORL face database is used. If it is not available, it will be downloaded from the PRTools website as a datafile.

    a = orl;
%       ORL faces, raw datafile with 400 objects in 40 crisp classes: [10  10  10  10 ...
    struct(a)
%
%       files: {{1x41 cell}  {1x41 cell}}
%    rootpath: 'D:\bduin\Desktop\prdatafiles\orl'
%        type: 'raw'
%     preproc: [1x1 struct]
%    postproc: [0x0 mapping]
%     dataset: [400x10304 dataset]

Most fields have set- and get- routines (e.g. setfiles and getfiles) to define or retrieve their values. Users are discouraged to use the '.'-constructs (e.g. a.files) as it will not guarantee consistency with other fields. In the table more information on the fields is given.

> The fields of the datafile structure
files Here one or two cell arrays are stored with the names of the directories and files. The organization is somewhat different for different datafile types.
rootpath The absolute path of the datafile. It cannot be reset by the user (there is no setrootpath routine) in order to avoid problems.
type One of the types discussed above.
preproc A structure array with preprocessing commands and related parameters. It starts with readcommand. The next commands have to be specified by the user using mappings. They are just stored and only executed at the moment the datafile is converted to a dataset. The last preprocessing command should allow this conversion. This means that all objects should have the same size at that moment, e.g. by the use of imresize in a setpreproc command or by applying the related mapping im_resize to the datafile.
postproc An optional structure array with mappings. Postprocessing starts by converting the result of preprocessing to a dataset. Formally it is applied object by object. PRTools may combine a number of objects for efficiency reasons.
dataset Following the Matlab rules for inheritance, one of the fields of datafile is dataset. All dataset commands apply thereby normally to a datafile, with a few exceptions. See below. At the time a datafile is converted to a dataset the settings of the dataset fields are used.
dataset.data This field is left empty as a datafile refers to data in the file system.
dataset.featsize This is, by definition, undetermined. It should be 0, which implies that feature size checking is skipped. Some routines, however, accidentally set the field. This may be corrected by the user by giving a command setfeatsize(a,0) for a datafile a.
dataset.ident This is again a structure with at least two fields (other may be set by the user by setident). These should be arrays with for every object some annotation. During the construction of a datafile a an array with numbers between 1 and the total number of objects is set in the field a.dataset.ident.ident , indicating the order of which objects are found in the file system. In a.dataset.ident.file_index pointers are given to the names of the directories and files stored in a.files. They are used to locate objects when needed.


R.P.W. Duin, January 28, 2013


HOME Datafile background Datafiles Datafile overloadDatafile definition