Datafile background |
Datasets as offered by PRTools
represent objects by vectors, traditionally interpreted as features, in a vector space. The definition of such a representation and procedures to compute them are an important aspect of pattern recognition. This may be done on the basis of other, larger feature vectors by feature selection or feature extraction. In practise, however, we may have to start with more raw data like images, time signals, spectra of varying sizes. They do not fit in the PRTools
definition of a dataset
as it assumes vectors of a fixed and well-defined size.
The datafile
construct is created to integrate raw data in PRTools
. A datafile
is programmatically a child of the dataset
and inherits many of its properties. What is different is that datafiles do not store data as such but just keep links to data directories on disk. Typically, every file relates to an object. In addition pre-processing commands may stored in a datafile
. When a datafile
is converted to a dataset
all theses commands are executed on all objects. They should therefor result in feature vectors of the same size.
Next to the integration of raw data in PRTools
datafiles offer the possibility to handle very large datasets. A number of mapping routines can handle datafiles where datasets are expected. These are typically the routines that handle the data sequentially.
R.P.W. Duin
, January 28, 2013Datafile background |