In machine learning, decision trees are a great algorithm family to work with business information. They are not the most precise nor are they considered cutting edge, but they are a first pass algorithm for many data scientists. Maybe in version two of a project, another algorithm family might create a better model for delivering a reliable model, but over most types of transaction or ERP data, decision trees as a class are where most data scientists start.
One of the great things for business use is that decision trees can be deciphered and understood by people. That capability lends them an air of credibility if managers and executives can look at the logic of the tree and follow how the final answer is made by tracking the tree at each branch.
It also lets IBM i programmers code the decision tree splits in familiar programming languages. Realistically, this is the only way decision trees are going to work with IBM i programs natively on the box without making calls out to other servers.
In reality, you'll want to make those calls out from your programs rather than code the decision tree. There are many reasons that go beyond just the simple work of coding hundreds or even thousands of decision points into a program. The easiest way to explain is to ask the question, “What happens when they change the model?”
It will happen. It always happens.
Many machine learning and predictive processes struggle when they encounter missing data; entire records are bypassed if one field value is missing in the algorithm. For example, in a decision tree, if no value exists for the field where the tree splits, that record is useless because the algorithm cannot say what tree branch the record needs to follow.
Most software implementations of machine learning processes get around this problem by offering the data scientist the option of ignoring missing value records or imputing a value. Often, the imputed value is used so as not to waste what is otherwise a good record. Most of the time an average, median, or similar generic value is used in place of the missing value. Null often looks like a missing value, too, and usually receives the same treatment by data scientists.
Most IBM i IT professionals are close enough to operations to know that average values across the entire database are unlikely to be good substitutes. Using domain knowledge, IBM i professionals can easily create levels or classes based on experience that better substitute for the missing values. This work is best done on IBM i before it gets to the data scientist.
Someone messaged me to point out my title of the “Green” Revolution last week might also refer to IBM i and its heritage with green screen terminal interfaces.
I think that idea is a valid and reasonable path of thought to go down this week. Machine learning and advanced analytics need real data to create meaningful and useful models. IBM i is at the heart of your real data – as in data that is really useful. IBM i contains transaction records, customer records, payment records, and other concrete data points for businesses.
While your IBM DB2 on i database probably does not store production machine sensor data, product environmental condition information, or other larger volume data flows, those same data flows almost always need to be tied to the transaction, product, and customer data from IBM DB2 on i to create useful machine learning models, advanced analytic visualizations, and so on.
IBM DB2 on i data is critical to the success of many commercial analytics projects. I wish IBM would give a nod to that heritage in its current marketing.
Good quality data is never a bad thing. For fueling analytic processes, it is a must. In order to maximize return on the investment in machine learning and predictive analytics, companies need clean data as a foundation for analysis. (My use of “green” in the title refers to making money for those outside the US.)
Facing some real facts, no one is going to do machine learning on IBM i — not going to happen.
However, for many companies IBM i holds important data which is needed for creating meaningful processes based on machine learning. Getting that data to a machine learning environment seems like a no-brainer; just extract the data and send it over. In the real world, many data fields in databases on IBM i need a little massaging to use effectively in other applications.
Big picture problems include multi-member files. Those are almost impossible for non-IBM i based tools to deal with. I have seen companies where the analysts didn’t know about a file being multi-member, so when they wrote an SQL statement to retrieve the data, only data from the first member was pulled. As a result, they wasted precious time trying to figure out the problem before they were forced to throw in the towel and talk to the IBM i people. Another common challenge is dates stored in non-date fields, or worse yet, stored in multiple fields — with one field for the century and year, another for the month, and another for the day.There are a few other pointers I will elaborate on in the next few weeks.
Descriptive Analytics on IBM i
While the term “descriptive analytics” is not brand new, it is unfamiliar to most people in the IBM i ecosystem. Despite all the hoopla surrounding advanced analytics and hot new technologies of analyzing data, old-fashioned reporting still dominates how companies interact with their IBM i databases. However, descriptive analytics is simply another way of saying query and reporting.
Traditional reporting methodologies are not sexy, but they are useful. That level of practicality is embraced by the vast majority of IBM i professionals. Relating to these people by using both new and old terms simutaneously is a quick way we have found to build trust. Being able to equate the new with old concepts is comforting to IBM i professionals who have seen the marketing spin change throughout their careers.
Supplying Watson with Data
You could not be blamed for being a little confused as to what Watson is - it is so multifaceted that narrowing on one area means you completely ignore other possible uses for it. IBM, trying its best, is messaging everything simultaneously.
Regarding our customers, a few are dabbling in some of the advanced analytic functions, but most are playing with Watson more than meaningfully using it.
Many of those who have experimented with these advanced functions have concluded that good data going in to advanced analytics is crucial (long known) and that NGS-IQ is great for getting that good data together from IBM i ERP databases.
Keep that nugget of information in mind. Getting clean, targeted data together using NGS-IQ on the IBM i is easier than uploading a lot of random data and then using code on Watson to filter it.
Unlike past years, NGS will extend its travel plans into the summer for visiting customers and prospects as well as attending conferences and trade shows. We have had customers tell us that summer is a great time for refreshers and skills update sessions.
Regions we will be traveling to include the mid-Atlantic, Southern California, and the Great Lakes region.
We are actively looking for additional areas we can visit in order to start some prospect evaluations. If you would like to organize an event or some customer meetings, contact me and we can probably arrange some time for joint sales calls.
There is a lot of confusion as to what companies can do with IBM Watson. At least Watson is architected so that any system can access it by processing via program calls over the Internet. Programmers need only to communicate input and receive output to make use of Watson-based analytics.
That is the main message you hear from IBM, but it is only part of the story.
Watson is not a magic box which mystically does whatever you tell it to do. Someone needs to create procedures and analytical models that produce a result from a future input. It is not much different than creating a formula in an Excel cell that uses the value in a second cell in a calculation. In this example, Watson is equivalent to the first cell — consuming the value in the second cell to create an output. In operation, your program from the IBM i supplies the value in the second cell so that Watson can process the formula. The missing key in the marketing is that someone needs to create the formula in the first place.
Certainly, the analysis is going to be done on a platform other than the IBM i, but for businesses with critical data on the i, knowledge of the existing database and the historical information are vital in making that data useful for creating analytic models in Watson.
In a recently completed survey of non-customers, we found that analysts who used Query/400 reported spending an average of 1.625 hours per day extracting, manipulating, and distributing data. We know from previous studies that people who move to NGS-IQ typically cut the time they spend on these tasks by approximately 50%.
That reduction in time is due to NGS-IQ having many more features which let analysts and business users write and run fewer queries and automate data transfers, spreadsheet updates, and report distribution. The math works out to 0.8125 hours per day in labor savings or about 10% of an eight-hour work day. Using a national average of $70,000 annual salary for a business analyst, the financial savings equate to $7,000 per year.
This productivity savings doesn't include the intangible business value and impression you make on your customers when staff members regularly have meaningful, accurate, timely data at hand.
While it’s unlikely that many companies will store their IoT device messages in the IBM i environment, it's easy to imagine most IBM i customers having systems (maybe cloud based) that store IoT message streams alongside their DB2 on i/ERP database.
While the data is stored separately, there is value to be realized from “merging” IoT and ERP data. Think about sensor data (IoT data) captured from products being used by thousands of customers. This data, once parsed and placed into a searchable format, needs to be viewed in different ways – by product, by customer, by order or install date, and so on. That product, customer, and order information is in the ERP database. Business people need this combination of data to give meaning and perspective to the IoT data.
Depending on the format and volume of your IoT data, with a little data cleansing and filtering, you could probably upload extracts of IoT data to DB2 on i. Once the extracted IoT data is there, forward thinking IBM i customers can begin to discover its business value.
2017 Marketing and Software Sales in IBM i
Despite what some have said is the worst name of any server or computer brand, IBM i is still going strong. (As you have heard many times, try adding “i” to any search and see if anything different turns up.) Fortunately, it is more effective to add “NGS” to an internet search. Give it a try and encourage your customers to do the same.
Seriously, NGS’ Business Intelligence is going strong, with great response to Web searches and ads. The Web presence is just one aspect of our marketing efforts. Our marketing plan for 2017 will emphasize Webinars for prospects and customers as well as on-site sessions with business users. We'll also exhibit at many of the regional and national conferences in the IBM i ecosystem.
For and With Partners
With partners we are always happy and available to do one-on-one discovery sessions and demonstrations, which usually lead to a proof of concept. While we can do all of these activities remotely, our travel to customers provides us many opportunities to go on site with prospects to develop personal relationships during the evaluation process. When we are in the area, we can add in a few prospecting visits with your other customers.
Partners can always drive attendance to our Webinars. We do about six Webinars per year just for prospects. Outside the general schedule, I am happy to organize Webinars with a partner. We can jointly drive attendance in your area through email and telemarketing invites along with your personal contact.
After the individual Webinar playback is recorded, it is useable for months as a destination or action in a marketing message or embedded in a website.
There are still several large conferences around the country for the IBM i ecosystem. Some of these are vendor specific while others, like COMMON, are general.
If NGS will be exhibiting at a conference in your area, we will work with you to get the message to your customers. Schedule permitting, we can hang around the area and do some prospecting meetings, too. Keep this in mind as the year goes along.
I am not the only one to say it – business intelligence is integral to enterprise resource planning.
ERP does a great job of working with individual items, transaction, orders, and so on. Getting aggregate views is generally done in current generation ERP applications, but older versions usually lack the cool, built-in reporting features (often marketed as “analytics” by ERP vendors).
Due to the steep cost of an ERP upgrade or conversion, most small and midsize companies need to keep running their current ERP system and maximize their return on investment by surrounding the ERP system with reporting and analytics software. They may not be as slickly integrated, but third-party reporting products do a better job than hard-coded reports built into the ERP screens.
With custom or “homegrown” ERP software, a reporting solution can make or break a company. Obviously, there isn't an ERP vendor to turn to, and many of the developers who originally wrote the ERP system have probably left the company. Simply deciphering the data base to run reports and to create “analytics” in the ERP is much simpler than modifying the old custom code to do the same.
This tactic can put off the need (and expense) of installing a new ERP for many years.
Follow us on Twitter.
Subscribe to our blog