Data-driven world: An easy victim of “fake news”?
Technology insights

Data-driven world: An easy victim of “fake news”?

The convergence of AI and IoT is said to fundamentally improve supply chains and customer experience. But the choice of data will make or break the vision.

The hope regarding artificial intelligence (AI) lies in enabling machines to (better) solve problems that today only humans can solve. In this development, humans serve as an example for technology to improve our world. So, we’re basically creating AI in our own image – just better? Without disrespect to our own kind, I believe this holds challenges and pitfalls. The internet of things (IoT) will suffer from similar issues like the “internet of humans”. Let me explain what I mean.

From enlightenment to overload: where the internet has failed

The impact of the internet on how the public forms opinions in our society was one of the biggest disappointments for me over the past 2 decades. When it was first established, I had high hopes that it would lead to a freer world while enlightening people. Free access to the same information for everybody – that was the ultimate promise. A cure for suppressive governments, movements with strange intentions, and other powers trying to indoctrinate people with wrong facts. Maybe my hope was a bit overly optimistic, or even naive. Back in the 90s, I did not have the evolution on-screen – from internet 2.0 on.

Fake news on social media
Fake news on social media

Information inconsistency

Over time, the internet became interactive and made it very easy for everybody to share any information. This did not only lead to a great increase in information sources but naturally also to a massive growth of information available overall. That by itself is no issue. It’s a good thing on the path to enlightenment. It gets trickier, however, when information is based on different sources and published by different individuals and organizations with different perspectives and opinions on one subject. This naturally leads to – in most cases – inconsistent sets of information.

Speed over accuracy

The abundance of information then led to competition based on speed. Sources that publish an information first, generate the most attention – and profit, too. And as a result, information is often spread hastily without sufficient verification on accuracy and completeness.

Intentional manipulation

Exerting influence through information is as old as information itself. But the internet has amplified both challenges and opportunities, and created an ideal playing field to spread false or biased information. The internet is jammed with information – partially or completely fabricated – solely intended to influence opinions. The sources and intentions are manifold and include professional (political) organizations, individual spammers, or businesses that spend great efforts to circulate such information – manually and, increasingly, automatically. The MIT Technology Review published an interesting article last month on how social bots play a key role in spreading fake news.

The echo chambers we create

People are no longer able to digest and analyze all the information available due to its sheer volume. So, we mostly consume information from sources that we deem trustful. To a great extent, this is news or information from the own network: social media, news platforms, or other publications that generally publish news that are in line with our own ideology and perspective. As a result, people sooner or later glide into echo chambers of like-minded people. In this environment, the internet has also become a powerful tool for radicalizing people for certain political motives.

The vision of a data-driven world

Endless amounts of data are collected these days. This is driven by the growing digitalization of businesses, authorities, and infrastructure combined with an ever-increasing number of sensors in almost everything. The shared data provides valuable insight into the past and present. It also helps to identify patterns that serve as a basis to synchronize processes for maximum efficiency and best results – in the supply chain, for example. Many predict that a data-driven new age of analytics will evolve.

Though this vision is compelling, there are quite a few reservations concerning data-based decision-making among experts today. There are major technological challenges in managing (incredibly) high amounts of data at the required pace. But the most difficult task involves the process of capturing, cleaning, curating, and analyzing data from a vast array of disparate data sources (structured, unstructured, semi-structured) to derive a meaningful picture as the basis for decisions. This challenge alone could fill a few blog posts – but not today, let’s stay on topic.

What we will be facing in a truly data-driven world

All these challenges in the process from data capture to data analysis seem to come down to one assumption: once we get it right, we’ll get one single set of data with the right quality. In short: there will be a single and correct piece of information on anything. This does not mean that the decision-making process is easy, but we expect it to provide a clear set of information to base decisions on.

This brings me back to my idealistic hopes in the 90s regarding the “internet of humans”. After some sobering up, I believe that we will see some analogy between the developments of the “internet of humans” and the internet of things. In the same way that we find inconsistent information on the internet today, business-relevant data for AI-based decisions will also be, to some extent, inconsistent, inaccurate, and manipulated. Here is why.

Information inconsistency

In a data-driven world, sets of information will be produced, sold, and exchanged in massive volumes. Eventually, a whole ecosystem of data/content providers will emerge: they will offer services to collect, aggregate, and analyze data for input into AI systems.

Isn’t that comparable to how the media industry is delivering information to people today? And, just as with traditional media, the more data will be analyzed by more and more providers, the more the results will vary.

Good AI, bad AI?

Examples for this are weather or traffic forecasts, projections on political or economic developments, predictions on raw material prices, analyses of consumer demand and behavior, etc.

Today, different weather apps show different forecasts and raw material price predictions vary by platform. Looking at projections of economic developments that form the basis for investments, fundamental data for processing might (mostly) constitute facts. But what is analyzed and interpreted based on it is yet again strongly influenced by perspectives, opinions, and individual interests.

Speed over accuracy

The sooner information is available the more valuable it is – to gain benefits or to minimize losses. As such, data providers will increasingly be under pressure to share information as early as possible. Just like with news today, this will lead to vague or incorrect data. Initial information, for example, forecasting that an airline might go bankrupt, may trigger AI systems to book air cargo space before market prices increase. And it may turn out later, that these were just rumors based on false facts.

Intentional manipulation

Countless interest groups – be it countries, political parties, organizations, or else – have successfully figured out how to use the internet as a powerful tool to influence opinions. In the same way, similar interest groups will make use of the power that tweaking data in a data-driven world may unleash. Success or failure of companies and whole economies can be influenced by data corruption.

This possibility and potential will unchain massive resources. We experience some of this already today with cyber-attacks that increasingly make media headlines. But it also includes bots, for example, that infiltrate unstructured data with biased statements or even fake other data right at the source. Will there be swarms of virus-infected IoT devices broadcasting wrong data?

The echo chambers we create

When looking at this scenario you might say: AI will become intelligent enough to differentiate between fake and correct data. But, why should AI that generates and circulates fake news be any dumber than AI that processes it? I am not sure if we are always aware of these far-reaching impacts of our selections.

Finally, will there be echo chambers in the IoT? Well, I think behavioral patterns of things will remain different from those of humans. Nevertheless, echo chambers for algorithms already exist – just look at this article of the Business Insider on how they trade stocks and react to each other. Or at this article of the “Flash Crash” at NYSE in May, 2010.

Which news portal do you rely on in your echo chamber? And which data provider would you subscribe to and trust to deliver data that your business will base decisions on?

Let’s put things in perspective and look ahead with hope

Today, these issues are hardly on anyone’s mind in the context of using data to drive business. Successful examples of companies using AI are mostly limited to those big enough to fully generate, collect, and control relevant data across a major part of their value chain themselves. And for additional data from 3rd parties that is required for the analysis, the general rule applies: using any data is better than using no data at all.

The weather forecast is a good example here, too: merely monitoring it – even regardless of highest accuracy – already delivers benefits to retailers like, for example, Amazon. But in a world where it will be common practice to use extensive data as foundation to automate operations, optimize processes, and make business decisions, it will become crucial to have better data and better analyses faster than your competition. 

You know from my previous blog posts that I am very enthusiastic about new technologies. And AI has great potential to improve things in numerous areas – the report “Bot.Me: A revolutionary partnership” by PwC gives some good examples. Yes, I’m a true fan of the benefits of progress, but I’m also a fan of informed decisions and the right approach – and for AI, this has many aspects. In my view, the one considering data integrity as described here today is of vital relevance.

Hope for the future

This is why I struggle with omnipresent statements praising the business opportunities that can be tapped with the “2.5 quintillion bytes of data generated every day”. While focusing on how we can capture, process, and analyze more and more data faster and faster, are we neglecting to think about the source, credibility, and integrity of it?

The most successful companies will be the ones focusing on manageable amounts of verified data with specific relevance so they can derive effective actions from them. It will be interesting to see how all aspects of both challenges and opportunities come together in this area on our continued path of technological advances.

What’s your opinion on this? I’m keen to engage in discussions and look forward to your comments – on LinkedIn.