NGS Blog

Clean Data—A Current “Green” Revolution

Posted on May 30, 2017 by David Gillman

Good quality data is never a bad thing. For fueling analytic processes, it is a must. In order to maximize return on the investment in machine learning and predictive analytics, companies need clean data as a foundation for analysis. (My use of “green” in the title refers to making money for those outside the US.)

Facing some real facts, no one is going to do machine learning on IBM i—not going to happen.

However, for many companies IBM i holds important data which is needed for creating meaningful processes based on machine learning. Getting that data to a machine learning environment seems like a no-brainer; just extract the data and send it over. In the real world, many data fields in databases on IBM i need a little massaging to use effectively in other applications.

Big picture problems include multi-member files. Those are almost impossible for non-IBM i based tools to deal with. I have seen companies where the analysts didn’t know about a file being multi-member, so when they wrote an SQL statement to retrieve the data, only data from the first member was pulled. As a result, they wasted precious time trying to figure out the problem before they were forced to throw in the towel and talk to IBM i people. Another common challenge is dates stored in non-date fields, or worse yet, stored in multiple fields—with one field for the century and year, another for the month, and another for the day.

There are a few other pointers I will elaborate on in the next few weeks.

Posted in Analytics Tips | Comments

Follow us on Twitter.

i View

Clean Data—A Current “Green” Revolution

How we protect your information

Copyright & Copy; New Generation Software, Inc. All rights reserved.

i View

Clean Data—A Current “Green” Revolution

How we protect your information

Copyright & Copy; document.write(year); New Generation Software, Inc. All rights reserved.

Copyright & Copy; New Generation Software, Inc. All rights reserved.