Records vs. Signals: The Landscape of Digital
By Geoffrey Moore, Mar 13, 2018
Everyone gets that data is the new oil in the digital economy, but not everyone gets that there is a critical difference between data as records—data in databases—and data as signals—data from log files, sensors, social media posts, and the like. Let me explain.
Data as records represent verified facts that express the essence of the activity they record, be that in the form of tables, text, graphs, or images. They are the foundation of Systems of Record upon which rest the integrity of the digital economy and the digital society. Such data feed programmatic decisions that are deterministic, meaning they follow an explicit and transparent decision tree leading to one and only correct answer.
Data as signals, by contrast, represent unverified facts that testify to the occurrence of the activity they record, be that a phone call, tweet, temperature reading, or website click. They are foundational to Systems of Engagement as well as to the Internet of Things, upon which rest the productivity of the digital economy and the digital society. Such data feed algorithmic decisions that are probabilistic, meaning they are based on the intensity of the signal as a proxy for the present likelihood of a given situation or the future likelihood of success for a given response.
Anyone who has learned to code is familiar with data as records, but unless you have taken courses in creating machine learning algorithms, you probably are not familiar with data as signals. I am no data scientist either, but I have been learning about these systems from readings in biology and complex systems (most recently Complexity: A Guided Tour, by Melanie Mitchell). It turns out that the immune system, for example, is basically a signals-based machine learning algorithm for detecting and dealing with antigens. Ant colonies operate as signals-based machine learning algorithms for dealing with food discovery and task distribution. And our very metabolism is a signals-based machine learning system for building the right proteins at the right times for each of our cells to function properly.
What all these systems have in common is that data is acquired by sampling, and actions are triggered by concentrations of signal that exceed whatever threshold for activation exists. Thus, for example, ants that find food excrete pheromones on their way back to the colony, and these chemicals signal to other ants the path to follow to get more food. The more ants, the more pheromones, the stronger the signal becomes, until all the food has been harvested—at which point, with few ants and weaker signals, the next troop are led elsewhere.
Digital marketers take the same approach to website clicks when they are looking where to place digital ads. Algorithmic traders take the same approach to high-frequency trading. Security software takes the same approach to cyber-attacks. Predictive maintenance takes the same approach to sensor readings. It is all about letting signals that are concentrated in location and time operate as proxies for a given state that implies a given response.
One key to such systems is that they do not understand. This is the big difference between AI and Machine Learning. AI does understand, or tries to. Machine learning doesn’t. It just operates, and it lets the feedback of natural selection guide its development. The more shots on goal, the better natural selection works, which is why machine learning algorithms all hunger for more data as signals.
As our digital economy evolves, one can see that AI and Machine Learning will interoperate more and more, much the way our conscious cerebral cortex interoperates with our cerebellum and autonomic nervous system. That is, we will begin to take AI stabs at understanding why Machine Learning algorithms are succeeding, and the more we are able to understand, the better our future strategic decision-making will be. Right now, machine learning is in the lead, at least here is Silicon Valley, but one can imagine a future in which each discipline takes turns pulling the other forward.
Finally, when it comes to setting public policy about data, or when it comes to estimating the economic value of data, it is critical that we distinguish between data as records and data as signals. The former are individually valuable and normally proprietary, so they need to be secured, and they warrant the protection of law. By contrast, data as signals are only collectively valuable and are normally not proprietary, so they warrant a different treatment. There is still some level of privacy risk to account for, but there is virtually no economic value at the level of the individual occurrence, and as a result it is important not to impose a data-as-records regulatory regime onto a data-as-signals enterprise.
All in all, referring to signals as data just creates confusion, so I am hoping going forward we can use the data versus signals distinction to keep our thoughts straight and our policies reasonable.
That’s what I think. What do you think?
This blog was originally posted on Geoffrey Moore’s LinkedIn.
About the author
Geoffrey Moore is an author, speaker, and advisor who splits his consulting time between start-up companies in the Mohr Davidow portfolio and established high-tech enterprises, most recently including Salesforce, Microsoft, Intel, Box, Aruba, Cognizant, and Rackspace.
Moore’s life’s work has focused on the market dynamics surrounding disruptive innovations. His first book, Crossing the Chasm, focuses on the challenges start-up companies transitioning from early adopting to mainstream customers. It has sold more than a million copies, and its third edition has been revised such that the majority of its examples and case studies reference companies come to prominence from the past decade. Moore’s most recent work, Escape Velocity, addresses the challenge large enterprises face when they seek to add a new line of business to their established portfolio. It has been the basis of much of his recent consulting. Irish by heritage, Moore has yet to meet a microphone he didn’t like and gives between 50 and 80 speeches a year. One theme that has received a lot of attention recently is the transition in enterprise IT investment focus from Systems of Record to Systems of Engagement. This is driving the deployment of a new cloud infrastructure to complement the legacy client-server stack, creating massive markets for a next generation of tech industry leaders.
Moore has a bachelors in American literature from Stanford University and a PhD in English literature from the University of Washington. After teaching English for four years at Olivet College, he came back to the Bay Area with his wife and family and began a career in high tech as a training specialist. Over time he transitioned first into sales and then into marketing, finally finding his niche in marketing consulting, working first at Regis McKenna Inc, then with the three firms he helped found: The Chasm Group, Chasm Institute, and TCG Advisors. Today he is chairman emeritus of all three.
Accelerate your organization’s journey to analytics maturity
Get the data sheet to learn how the Research & Advisory Network advances analytics capabilities and improves performance.