> There is much talking about "Big Data" lately. Big Data can be helpful, but is not sufficient: **Every example** in CAEBM has to be **a complete set of values for the OPs and
IPs** of the problem at hand. **Missing data** in incomplete examples have to be added appropriately. Sets of examples have to be tested on dependencies between the different examples. Only
**independent examples** can contribute to the modeling process.

> There should be at least as many as **3 times independent examples** than the **number of IPs**, but more examples is better for a reliable modeling. If the number of examples is
lower, special measures for "**sparse problems**" have to be taken at modeling time, and if the number of examples sinks below the number of IPs, the modeling problem becomes
"**unsolvable**", because of being **underdetermined**.

> Because the very most of our computers (as a consequence of overboarding CASBM!) are setup for dealing with numerics, we have to **encode** our **non-numeric** **parameters**
to numerical ones. This has to be done carefully, to preserve the character of the nonnumeric parameters, like being sequential, periodic, having a fully independent set of values etc.
Alternatively, appropriate modeling techniques like **Decision Tree Forests** (DTF) can be applied for arbitrary (mixed) parameter sets.

> In any case, **normalization** of the examples has to be done for every parameter, resulting in the same modeling accuracy for each of the parameters with respect to their ranges of
values. If **different** **modeling** **accuracy** is needed **for** **different** **parameters**, this can be accomplished by deploying different normalizations for the
parameters at hand.

> Based on the above tasks it is obvious, how helpful computers are for preparation of meaningful sets of examples. And as most of these tasks can be done more or less automatically, the
formula **CA+EBM=CAEBM** becomes obvious.

SMS\ WhatsApp +49 160 843 5298

mailto: rst.tbus@gmail.com