NGS Blog - i View i View

Archives: June 2017

Decision Trees vs. Coding on IBM i

Posted on June 20, 2017 by David Gillman

In machine learning, decision trees are a great algorithm family to work with business information. They are not the most precise nor are they considered cutting edge, but they are a first pass algorithm for many data scientists. Maybe in version two of a project, another algorithm family might create a better model for delivering a reliable model, but over most types of transaction or ERP data, decision trees as a class are where most data scientists start.

One of the great things for business use is that decision trees can be deciphered and understood by people. That capability lends them an air of credibility if managers and executives can look at the logic of the tree and follow how the final answer is made by tracking the tree at each branch.

It also lets IBM i programmers code the decision tree splits in familiar programming languages. Realistically, this is the only way decision trees are going to work with IBM i programs natively on the box without making calls out to other servers.

In reality, you'll want to make those calls out from your programs rather than code the decision tree. There are many reasons that go beyond just the simple work of coding hundreds or even thousands of decision points into a program. The easiest way to explain is to ask the question, “What happens when they change the model?”

It will happen. It always happens.

Posted in Analytics | Comments


Missing Data – The “Green” Revolution Continues

Posted on June 13, 2017 by David Gillman

Many machine learning and predictive processes struggle when they encounter missing data; entire records are bypassed if one field value is missing in the algorithm. For example, in a decision tree, if no value exists for the field where the tree splits, that record is useless because the algorithm cannot say what tree branch the record needs to follow.

Most software implementations of machine learning processes get around this problem by offering the data scientist the option of ignoring missing value records or imputing a value. Often, the imputed value is used so as not to waste what is otherwise a good record. Most of the time an average, median, or similar generic value is used in place of the missing value. Null often looks like a missing value, too, and usually receives the same treatment by data scientists.

Most IBM i IT professionals are close enough to operations to know that average values across the entire database are unlikely to be good substitutes. Using domain knowledge, IBM i professionals can easily create levels or classes based on experience that better substitute for the missing values. This work is best done on IBM i before it gets to the data scientist.

Posted in Analytics | Comments


The “Green” Revolution Rolls On

Posted on June 6, 2017 by David Gillman

Someone messaged me to point out my title of the “Green” Revolution last week might also refer to IBM i and its heritage with green screen terminal interfaces.

I think that idea is a valid and reasonable path of thought to go down this week. Machine learning and advanced analytics need real data to create meaningful and useful models. IBM i is at the heart of your real data as in data that is really useful. IBM i contains transaction records, customer records, payment records, and other concrete data points for businesses. 

While your IBM DB2 on i database probably does not store production machine sensor data, product environmental condition information, or other larger volume data flows, those same data flows almost always need to be tied to the transaction, product, and customer data from IBM DB2 on i to create useful machine learning models, advanced analytic visualizations, and so on.

IBM DB2 on i data is critical to the success of many commercial analytics projects. I wish IBM would give a nod to that heritage in its current marketing.

Posted in Analytics | Comments

twitter logoFollow us on Twitter.

Categories

arrowAnalytics (4)
arrowEducation (15)
arrowEnterprise Software (13)
arrowIBM i Marketplace (12)
arrowBusiness Partner Newsletters (4)

Recent Posts

Archives

Subscribe to our blog