Dataset, creation of artificial datasets |
Datasets can be created in the following ways:
Matlab
mat-files. Data collected in the past and converted to PRTools
datasets can be stored by the Matlab
save command and loaded by the load command. Users are encouraged to build a function for doing this as the help comments in the function may be used to give additional information. Alternatively or inside such a routine the PRTools
command prdataset may be used to load and handle the mat-file.
PRTools
format can be downloaded from the PRTools
web site.
Matlab
rand and randn commands (in combination with genlab) and the PRTools
gauss routine may be used for this.
PRTools
offers a set of commands for generating artificial datasets, see below.
> Commands for generating artificial datasets | |
gendatb | Generation of two banana shaped classes. |
gendatc | Generation of two circular classes. |
gendatd | Generation of two normally distributed 'difficult' classes. This example is 'difficult' as the distribution should be known to construct a good classifier in case of small sample sizes. |
gendath | Generation of two normally distributed classes according to Highleyman [], see also []. |
gendatl | Generation of the 'Lithuanian' classes as proposed by Raudys. |
gendats | Generation of two simple normally distributed classes. |
gendatm | Generation of eight 2d classes. |
gentrunk | Generation of Trunk's dataset [], used to illustrate the peaking phenomenon (curse of dimensionality). |
gendatgauss | Generation of multivariate Gaussian distributed data. |
gencirc | Generation of a one-class circular dataset. |
circles3d | Create a dataset containing 2 circles in 3 dimensions (for mds examples). |
lines5d | Create a dataset containing 3 lines in 5 dimensions (for mds examples). |
gendatr | Generate regression dataset from data and target values (for regression examples). |
gendat | Random sampling of datasets for training and testing. |
gensubsets | Generation of a series of consistent subsets of a dataset. |
gendatk | Nearest neighbor data generation. |
gendatp | Parzen density data generation. |
The last three commands generate datasets out of existing datasets.
R.P.W. Duin
, January 28, 2013Dataset, creation of artificial datasets |