Dataset, creation of artificial datasets |
Datasets can be created in the following ways:
mat-files. Data collected in the past and converted to PRTools
datasets can be stored by the Matlab
save command and loaded by the load command. Users are encouraged to build a function for doing this as the help comments in the function may be used to give additional information. Alternatively or inside such a routine the PRTools
command prdataset may be used to load and handle the mat-file.
format can be downloaded from the PRTools
web site.
rand and randn commands (in combination with genlab) and the PRTools
gauss routine may be used for this.
offers a set of commands for generating artificial datasets, see below.
> Commands for generating artificial datasets | |
gendatb | Generation of two banana shaped classes. |
gendatc | Generation of two circular classes. |
gendatd | Generation of two normally distributed 'difficult' classes. This example is 'difficult' as the distribution should be known to construct a good classifier in case of small sample sizes. |
gendath | Generation of two normally distributed classes according to Highleyman [], see also []. |
gendatl | Generation of the 'Lithuanian' classes as proposed by Raudys. |
gendats | Generation of two simple normally distributed classes. |
gendatm | Generation of eight 2d classes. |
gentrunk | Generation of Trunk's dataset [], used to illustrate the peaking phenomenon (curse of dimensionality). |
gendatgauss | Generation of multivariate Gaussian distributed data. |
gencirc | Generation of a one-class circular dataset. |
circles3d | Create a dataset containing 2 circles in 3 dimensions (for mds examples). |
lines5d | Create a dataset containing 3 lines in 5 dimensions (for mds examples). |
gendatr | Generate regression dataset from data and target values (for regression examples). |
gendat | Random sampling of datasets for training and testing. |
gensubsets | Generation of a series of consistent subsets of a dataset. |
gendatk | Nearest neighbor data generation. |
gendatp | Parzen density data generation. |
The last three commands generate datasets out of existing datasets.
R.P.W. Duin
, January 28, 2013Dataset, creation of artificial datasets |