It is a given that CEOs have deep knowledge about how to monitor and manage their enterprise’s financial assets, their physical assets and, of course, their human assets. However, it is quite rare for a CEO to have even a basic understanding of what is involved in managing and leveraging all the data assets within the enterprise. These assets are mostly dispersed among the various organizational units in large central databases, smaller departmental databases, and at all levels in hundreds if not thousands of spreadsheets. With the proven value of data from news, social media, industry reports, and other external sources, the value of the company’s potential data assets goes up tremendously, but so do the issues associated with enabling the business end users to get full value out of that data. This is typically delegated to IT, yet in most cases the business end-users find themselves frustrated that IT cannot meet their ongoing needs in a timely fashion. A big part of that problem is the lack of understanding among senior management of the issues IT has to deal with, and the need for active senior management involvement to help deal with these issues in an effective manner. This article will present a brief overview of the issues in meeting end user demand, and some perspectives on what senior management, and CEOs in particular, can do to better guide and support IT in enabling end users to get the most out of the enterprise’s data resources.
A Bit of History
By the early 1980s, most of the transactional activity of enterprises was computerized. This included the basic financial information as well as purchase and sales orders. Demand by operational and management staffs to access and analyze this data led to a number of major IT trends, including the advent of the relational database with its so-called end user query language Structured Query Language (SQL). In addition, there were a number of more user-friendly query languages that became popular, but generally IT had to provide most end user reports. With the popularity of spreadsheets by the late 1980s everything began to change and business end users started collecting and analyzing data on their own. Still, much of the data they needed had to be obtained through IT, although over time, more and more was self-maintained by end user communities in their spreadsheets or in local data bases.
At roughly the same time, more sophisticated analysis tools, now called Business Intelligence (BI) tools, began to appear and enabled some more advanced end users to analyze much larger amounts of data than possible with a spreadsheet, as well as do much more sophisticated analysis. However, most of the needed enterprise data was still hard to obtain in a usable form and needed constant IT intervention. This created another major IT trend: the Data Warehouse. The concept here was that IT would feed all the relevant enterprise data in a cleansed, digested and integrated format into the Data Warehouse where end users could either query it directly or load it into their BI tool of choice.
Although the Data Warehouse and the associated BI tools were a big step forward in enabling end users to analyze enterprise data, most data is still analyzed in spreadsheets. These have proliferated throughout the organization and by some estimates there is more data and certainly more data analysis done in spreadsheets than in anything controlled by IT. In addition, because a single Data Warehouse that could serve all end user needs turned out to be a pipe dream, there grew up many more purpose-driven so-called Data Marts. With the proliferation of the data resources in different data “silos” and with different formats and even interpretations, it became a giant management challenge and one that went beyond the control of IT. Now, when you further add to that the proliferation of internal and external data sources and data types (structured, text, charts, pictures, etc.), it is no surprise that IT finds itself little able to provide the right data in the right format at the right time to all the business end users who could benefit from this data.
One of the biggest end user challenges is to get a complete picture of a business entity, whether it is a customer, vendor, product, or employee. It’s the 80/20 rule here, with end users spending 80% of the time often along with IT, seeking, obtaining and preparing the data they need and only 20% of the time extracting value from the data. The problem grows more complex over time as the amount of data grows by orders of magnitude and the number of data types, both structured and unstructured, grows as well. Especially now with so many sales channels, how for example, can you get a 360 degree, omni-channel view of a customer? The answer is many-fold, but first requires even knowing where the data about a customer is kept. Perhaps it is in an operational database, perhaps in one or more spreadsheets, perhaps in a departmental Data Mart, perhaps on the enterprise’s website, or even a social media website, and more typically all of the above. So, the first challenge is taking stock of all of the company’s data resources and identifying them in a way that makes sense beyond their local use. The next step is having some way of accessing all of this disparate data and integrating it together. This is a major challenge that not only requires a lot of IT resources and specialized tools, but business resources as well. And, that is where the concept of “data semantics” comes in.
A Common Semantic View of Corporate Data
We are now all familiar with the term “metadata” as it has been applied to telephone records – the information about the call, but not the contents of the call. In general, metadata refers to data about data, but not the actual data values. Keeping good and up-to-date metadata is critical for the management and use of the data. There is now an industry standard way of keeping metadata, called “data semantics”, which can describe the underlying meaning of the data, its context and its history. By the underlying meaning, take for example a customer’s name. In one database it is spelled out completely, in another it is abbreviated, in a third the name of a subsidiary is used – all meaning the same entity. While IT can help with lots of next generation tools that can try to match names, it often requires business user intervention to do the mapping from all the different names to one common name and this mapping is required to integrate all the disparate data together to get the 360-degree view. While having a common name is great, many business users still want to refer to the customer by the name they are used to, and data semantics allows for both the common name and the local names to be used by different audiences. Data semantics also enables the understanding of what kind of customer it is (i.e. in what typology of customers’ types it fits). Further, it is important to always keep track of where the data came from, who changed it and when. This too is part of what we call the “semantic framework”. In addition to tracking data, it is important to understand and track “models”, that is, analyses of the data. A model may have been developed years ago to determine who the “A” customers are and marketing programs have depended on that for years. But, perhaps the model was created in a spreadsheet and the person who created the model has left the company. Is that still the best model for determining “A” customers?
Creating and keeping all of this semantic information requires specialized tools that IT can provide. It also requires senior management to mandate making the data available outside of its ‘silo’ and for business users to help define their data by the enterprise semantic standards. Once this is all done, with of course the right security and privacy permissions, the silos are broken down and a complete picture of any business entity can be available to the appropriate end users.
End User Analysis
The ideal state is for all business end users to have the exact data they need, at the exact time and have easy to use tools to analyze and understand it. As seen, there are great challenges in getting the “right” data together, more challenges in integrating the data, even more challenges in putting the data in the terms that the business end users can understand and utilize. And of course, since data can be complex, with lots of business entities all inter-related in various (and often unforeseen) ways, having an easy to use tool to comprehend and analyze that data is quite a challenge as well. While there are a variety of good BI tools, the big challenge in IT still remains getting the right data in the right integrated format to those tools. Here the technology is less mature and it is only with a new generation of semantic-based tools that it is beginning to be possible for IT to more easily and quickly get the data together, and in some cases for the end users to self-serve the data they need.
Even when this is all made possible there is another obstacle to getting the full value out of the data. That simply is that most business end users are not data analysts. That doesn’t mean that for 90% plus of what they want to do they aren’t able to ask the right questions in the right way. Actually, most of what is called analysis is simply looking up facts, generating reports and sometimes doing straight forward graphs. However, to get some of the hidden insights behind the data and get it with some assurance of accuracy, requires a specialist – namely a data analyst trained in statistics and the new generation of Artificial Intelligence tools. These tools, such as neural networks, which can sift through giant amounts of data (now called Big Data) and find patterns and relationships, still require specialists to utilize properly. However, once these insights are gained, often in the form of recognized patterns, or perhaps indices (such as the degree to which a customer is likely to buy again), these outputs from the analysis becomes data in itself and can then be directly used by the business end user.
Governance and Security
Like any valuable asset, data needs to be managed and secured through prudent practices designed to ensure that the asset is designed, created, developed and deployed so that the asset is as intended and that continues for the life of the asset. Without proper governance, the meaning, use and integrity of the data, may not be what is needed at the onset or may diverge over time, reducing the value of the asset. As in many cases like this, entropy will prevail if not countered by good governance and by that is meant governance that is neither too lax nor too stringent, and evolves over time as new situations dictate. Similarly, the privacy and security practices surrounding data should be defined and practiced at the outset because once necessary protocols are ignored or breached, it is difficult to regain the trust of those that provide or use the data, not to mention legal issues that can also be a result of faulty practices. The difficulty with governance, privacy and security is that they are not technically necessary to use the data asset and can be ignored or bypassed by those uninformed or not conscientious, but they are essential to using the data asset properly so the understanding and cooperation of all involved must be secured.
Utilizing the “Cloud”
The so-called Cloud simply means outsourcing processing to remote computers, managed by someone else and accessed via the Internet. There are many companies that provide that service, including large ones such as Amazon, Google and Microsoft. Cloud technologies provide an important tool in the toolkit for every IT need. In some cases a Cloud provider can be a cheaper, more flexible path to running applications and storing data for analysis. In other cases managing your own processing assets may be more suitable. As general guidance, when looking to deploy new applications it is probably best to start with Cloud providers as a quicker way to get deployment and often a more cost-effective way when you consider all of the hidden costs of doing it internally. Another area where Cloud deployment may be the better choice is in analyzing Big Data where you will likely need a tremendous amount of computing power for short bursts. Rather than having those assets internally and paying for them 24/7, many Cloud providers will allow you to call up on a moment’s notice all the compute power you need, and only pay for it while it is being used. One of the negatives that has been associated with Cloud deployment is integration with existing internal systems. Over time this has become much less of an issue with public data interface standards that many Cloud-based products adhere to. Plus, the semantics-based integration technology is very capable of introspecting data from any source, including the Cloud, and integrating it with internal sources.
Role of CDO
The Chief Data Officer, often combined with the Chief Analytics Officer, can serve multiple roles, the most important of which is the data custodian. In most organizations, the question asked is who owns the data. The default answer is usually IT, but as a critical enterprise asset, it may be best if the asset is owned by a function outside of IT while IT stays focused on the day to day processing that is the lifeblood of the enterprise. Like most assets that span the enterprise, ownership does not imply that all work must be done within that unit, but that the ownership, key management and utilization decisions originate from an analytics governance function. The CDO should take on this role and govern the use of existing data as well as new data. Often the CDO is also responsible for many analytics activities.
Role of Advanced Technology Groups
With the rapid change of technology, the reliance on systems to last many years and still have maximum effectiveness is waning. One school of architecture, termed Chaotic Architecture, suggests that changes are happening so quickly that in order to take advantage of new advances in technology, platform architects should assume systems/modules will be short-lived and that systems be architected to easily replace modules, almost in a “plug and play” mode. Most architects consider the first 2 of 3 costs for building (or buying) technology – start-up and on-going. But the 3rd cost, the exit cost, doesn’t get enough attention and the flexibility needed to quickly improve technologies will be important for the most successful enterprises, else they risk being left behind in innovations, and catch-up is difficult. This requires a group whose sole function is to be in touch with the latest innovations in technology and where appropriate to the business need, do proofs of concept. To be effective, this group needs the support of senior management, since IT often will resist new technologies especially given that many in IT have their careers wedded more to a technology than to the enterprise itself. This inertia can only be overcome with senior management support and business user involvement.
The Most Important CEO Role
The investment in obtaining, storing, utilizing and monetizing data can involve a significant capital investment and operational expense, and the cooperation of most functional units within the enterprise. For that reason alone, the CEO has to understand the opportunities and associated costs and help to set the proper priorities concerning these efforts. At a certain level this can be delegated to business unit holders, but since enterprise data assets span across business units, it can only come together at the highest levels of the enterprise. It is not just about funding. It’s also critical that all business stakeholders across all organizational units understand the need to participate in the enterprise semantic framework to break down the otherwise inevitable silos.
In summary, data is one of the most important assets of an enterprise and growing in value. Obtaining, managing and leveraging that data for both tactical and strategic use is a very complex process that cannot be left to IT alone. As we have seen, there are many issues involved in making the right data available, in an accurate and digestible format, to the wide-range of data consumers- at the right time and with the right tools to understand and analyze that data. This requires the CEO to provide leadership as well as senior management throughout the organization to not only provide the necessary resources, but also to put into place and enforce the processes and funding to obtain, secure, manage and share data.
About the Authors
Dr. Steven Rubinow is President, Infocology Inc. He is an award-winning Chief Information and Technology Officer, global executive, strategist and transformation expert. He started his IT career as a data scientist and software engineer and later served as CIO and CTO in several small and large organizations including the New York Stock Exchange and Catalina Marketing. He has received frequent acclaim for his industry-leading data applications.
Jeffrey Stamen was a pioneer in analysis-oriented database technology at the joint MIT-Harvard Cambridge Project. He was the CTO of early Business Intelligence entrant Management Decision Systems and then President of IRI Software, a senior executive at Oracle, and a co-founder of a number of database-oriented startups. He is currently Co-Chairman of Cambridge Semantics applying innovative semantic technology to leveraging enterprise data.