From Data To Examples

> There is much talking about "Big Data" lately. Big Data can be helpful, but is not sufficient: Every example in CAEBM has to be a complete set of values for the OPs and IPs of the problem at hand. Missing data in incomplete examples have to be added appropriately. Sets of examples have to be tested on dependencies between the different examples. Only independent examples can contribute to the modeling process.

> There should be at least as many as 3 times independent examples than the number of IPs, but more examples is better for a reliable modeling. If the number of examples is lower, special measures for "sparse problems" have to be taken at modeling time, and if the number of examples sinks below the number of IPs, the modeling problem becomes "unsolvable", because of being underdetermined.

> Because the very most of our computers (as a consequence of overboarding CASBM!) are setup for dealing with numerics, we have to encode our non-numeric parameters to numerical ones. This has to be done carefully, to preserve the character of the nonnumeric parameters, like being sequential, periodic, having a fully independent set of values etc. Alternatively, appropriate modeling techniques like Decision Tree Forests (DTF) can be applied for arbitrary (mixed) parameter sets.

> In any case, normalization of the examples has to be done for every parameter, resulting in the same modeling accuracy for each of the parameters with respect to their ranges of values. If different modeling accuracy is needed for different parameters, this can be accomplished by deploying different normalizations for the parameters at hand.

> Based on the above tasks it is obvious, how helpful computers are for preparation of meaningful sets of examples. And as most of these tasks can be done more or less automatically, the formula CA+EBM=CAEBM becomes obvious.

continue