Good quality data is never a bad thing. For fueling analytic processes, it is a must. In order to maximize return on the investment in machine learning and predictive analytics, companies need clean data as a foundation for analysis. (My use of “green” in the title refers to making money for those outside the US.)
Facing some real facts, no one is going to do machine learning on IBM i — not going to happen.
However, for many companies IBM i holds important data which is needed for creating meaningful processes based on machine learning. Getting that data to a machine learning environment seems like a no-brainer; just extract the data and send it over. In the real world, many data fields in databases on IBM i need a little massaging to use effectively in other applications.
Big picture problems include multi-member files. Those are almost impossible for non-IBM i based tools to deal with. I have seen companies where the analysts didn’t know about a file being multi-member, so when they wrote an SQL statement to retrieve the data, only data from the first member was pulled. As a result, they wasted precious time trying to figure out the problem before they were forced to throw in the towel and talk to IBM i people. Another common challenge is dates stored in non-date fields, or worse yet, stored in multiple fields — with one field for the century and year, another for the month, and another for the day.There are a few other pointers I will elaborate on in the next few weeks.
Follow us on Twitter.
Subscribe to our blog