HOME Datafiles Datafile definitionDatafile background

Datafile background

Datasets as offered by PRTools represent objects by vectors, traditionally interpreted as features, in a vector space. The definition of such a representation and procedures to compute them are an important aspect of pattern recognition. This may be done on the basis of other, larger feature vectors by feature selection or feature extraction. In practise, however, we may have to start with more raw data like images, time signals, spectra of varying sizes. They do not fit in the PRTools definition of a dataset as it assumes vectors of a fixed and well-defined size.

The datafile construct is created to integrate raw data in PRTools. A datafile is programmatically a child of the dataset and inherits many of its properties. What is different is that datafiles do not store data as such but just keep links to data directories on disk. Typically, every file relates to an object. In addition pre-processing commands may stored in a datafile. When a datafile is converted to a dataset all theses commands are executed on all objects. They should therefor result in feature vectors of the same size.

Next to the integration of raw data in PRTools datafiles offer the possibility to handle very large datasets. A number of mapping routines can handle datafiles where datasets are expected. These are typically the routines that handle the data sequentially.


R.P.W. Duin, January 28, 2013


HOME Datafiles Datafile definitionDatafile background