DataSet

An associative data container

The DataSet is an important object in the second FluCoMa toolbox. Whenever there is a need to create a collection of data (such as for input to a machine learning process), or to make a provision for some output, DataSet is the tool that you’ll want to reach for.

The DataSet starts out empty and we add any number of points to it. A point is made up of two bits of information: an identifier and some data. The data is any number of numerical bits of information stored in a buffer. The identifier is a string or symbol which is associated to the data. If you have used the coll, dict, text or Dictionary objects in your environment of choice you will already be familiar with this idea of storing data associatively. If you’re not, you can think of it like a filing system. The identifier gives us a human-readable bit of information which we can use to look up the data that it is attached to.

A small DataSet is depicted below, with 5 points. Each identifier is an instrument name, and we might imagine the data associated to each of these identifiers could be descriptor values or parameters for example.

DataSet

Identifier
Data
guitar
0.26
0.82
0.64
0.39
0.84
synth
0.18
0.28
0.85
0.30
0.93
trombone
0.01
0.21
0.27
0.96
0.23
saxophone
0.79
0.24
0.94
0.05
0.32
noise
0.39
0.84
0.55
0.68
0.43

Some Caveats to Remember

  1. For each point, the data needs to be uniformly sized. In other words, if the first point you add to a DataSet dictates how many numbers each point’s data can should have. If you added a point that had 10 numbers in the data, and then tried to add a new point with 3 numbers in the data it wouldn’t work.

  2. Identifiers are unique. You cannot have the same identifier twice in a single DataSet.

  3. When data is transformed and passed between many instances of FluCoMa objects, the identifiers are preserved meaning one can backtrack results from the end to the beginning of a processing pipeline.

Usage

The help files in each environment are the best place to see the most common usage of the DataSet including adding, updating and deleting points.