PRTools contents

PRTools manual

datafiles

DATAFILES

Info on the datafile class construction for PRTools

This is not a command, just an information file.

Datafiles in PRTools are in the MATLAB language defined as objects of the  class DATAFILE. They inherit most of their properties of the class DATASET.  They are a generalisation of this class allowing for large datasets  distributed over a set of files. Before conversion to a dataset  preprocessing can be defined. There are four types of datafiles

raw Every file is interpreted as a single object in the dataset. These  files may, for instance, be images of different size.
cell All files should be mat-files containing just a single variable being  a cell array. Its elements are interpreted as objects. The file names  will be used as labels during construction. This may be changed by the  user afterwards.
pre-cooked In this case the user should supply a command that reads a  file and converts it to a dataset.
half-baked All files should be mat-files, containing a single dataset.
mature This is a datafile by PRTools, using the SAVEDATAFILE command after  execution of all preprocessing defined for the datafile.

 A datafile is, like a dataset, a set consisting of M objects, each described  by K features. K might be unknown, in which case it is set to zero, K=0.  Datafiles store an administration about the files or directories in which  the objects are stored. In addition they can store commands to preprocess  the files before they are converted to a dataset and postprocessing  commands, to be executed after conversion to a dataset.

Datafiles are mainly an administration. Operations on datafiles are  possible as long as they can be stored (e.g. filtering of images for raw  datafiles, or object selection by GENDAT). Commands that are able to  process objects sequentially, like NMC and TESTC can be executed on  datafiles.

Whenever a raw datafile is sufficiently defined by pre- and postprocessing  it can be converted into a dataset. If this is still a large dataset, not  suitable for the available memory, it should be stored by the  SAVEDATAFILE command and is ready for later use. If the dataset is  sufficiently small it can be directly converted into a dataset by  DATASET.

Intermediate results of datafiles that by the defined preprocessing  cannot yet be converted into a dataset, can be stored as a new, raw  datafile by CREATEDATAFILE.

The main commands specific for datafiles are

 DATAFILE constructor. It defines a datafile on a directory.
 ADDPREPROC adds preprocessing commands (low level command)
 ADDPOSTPROC adds postprocessing commands (low level command)
 FILTM user interface to add preprocessing to a datafile.
 CREATEDATAFILE executes all defined preprocessing and stores the result  as a new, raw datafile.
 SAVEDATAFILE executes all defined pre- and postprocessing and stores  the result as a dataset in a set of matfiles.
 DATASET conversion to dataset

Datafiles have the following fields, in addition to all dataset fields.

 ROOTPATH Absolute path of the datafile
 FILES names of directories (for raw datafiles) or mat-files  (for converted datafiles)
 TYPE datafile type
 PREPROC preprocessing commands in a struct array
 POSTPROC postprocessing commands as mappings
 DATASET stores all dataset fields. Note that the DATA field as well  as the target field are empty and that the IDENT.FILE_INDEX field is used to store for every object a pointer to a file  or directory in FILES.

Almost all operations defined for datasets are also defined for  datafiles, with a few exceptions. Also fixed and trained mappings can  handle datafiles, as they process objects sequentially. The use of  untrained mappings in combination with datafiles is a problem, as they  have to be adapted to the sequential use of the objects. Mappings that  can handle datafiles are indicated in the Contents file.

Subscription of datafiles is only defined for the first arguement, the  objects, e.g. A(M,:) or even, irregulary, A(M) refer to object number M.  As the objects in datafiles (e.g. images or time signals) may have different  lengths, the second subscript, for datasets refering to the feature  number, is undefined. A(M,N) causes an error of any N. Formally the feature  size of a dataset is set to 0. Checking of feature sizes in applying mappings  to datafiles is disabled.

The possibility to define preprocessing of objects (e.g. images) with  different sizes makes datafiles useful for handling raw data and  measurements of features.

See also

datafile, addpreproc, addpostproc, filtm, filtim, createdatafile, savedatafile,

PRTools contents

PRTools manual