We’re in the pre-phase of AI and fundraising. The pre-phase, a concept coined by Kevin Kelly in New Rules for The New Economy, is marked by high hopes and grand ambition. Everyone can and does compete, including lots of non-experts.
Which horse to hitch your wagon? I’d wager heavily against the “data-driven” approach, as articulated way back in 2008 by Chris Anderson, former editor in chief of Wired magazine:
“We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot […] Correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all”
Here is a seminal, case in point for why I’d wager heavily in Vegas against the Anderson point of view.
Recently, 440 scientists from 32 labs produced 15 terabytes of human, cellular data and used AI techniques to code the human genome. Their blockbuster finding was that 80% of our genetic material serves a useful function. This ran directly counter to the old-school, theory-based view that espoused the opposite – about 80% of our DNA is “junk DNA”, mostly dead matter serving no function.
Was data-driven or theory-driven correct? It turns out the inside of a cell is a chemically active place and so the machine tagged darn near every piece of DNA as “alive and chemical” and therefore, functional. Tagging something as alive and chemical is very different from saying that part of our DNA does something useful for us.
Scoreboard: Correlation 0, Theory, 1.
Much more work is required to understand whether a certain part of the genome has a biological function and how this works. This requires, above all, smaller-scale, hypothesis-driven research.
More data do not necessarily generate more knowledge. Data by themselves are meaningless.
The idea that with enough data, the numbers speak for themselves hardly makes sense.
AI techniques can increase our capacity to find relevant patterns within huge amounts of data. The correlations may not tell us precisely why something is happening, but they alert us that it is happening. And in some situations, this is good enough – e.g., selection for efficiency on who to include in an appeal if all we care about is campaign level “winning”.
However, in most cases, understanding the why is crucial for achieving knowledge that can be applied with confidence. Why understanding only comes from having Ground Truths and those only come from Theory – Hypothesis – Experiments – Small data insights.
The losers in this race will be those thinking they can find optimum messages and answers on channel and frequency and ask amount by just feeding the machine all your historical data and peppering it with 3rd party appends.
No algorithm can find what isn’t there. It requires think-time, theory and experimentation to produce your small data sets and other documentation to guide your big data strategy and it requires small data to make sense of big data exploration and pattern detection.
In The Hitchikers Guide to the Galaxy a supercomputer called Deep Thought identified 42 as the answer. Unfortunately, no one knows what the question is.
I grant that framing AI/Big Data in terms of oppositions – deduction versus induction, hypothesis-driven versus data-driven or human versus machine – misses the winning strategy of both as necessary and complementary. But calling out the flaws in ‘data-driven’ seems necessary as the pre-phase, hype machine gets rolling.
In science, we strive to go from data-starved to data-rich, yet a more is better, throw it in the blender approach, as often advocated by the most enthusiastic Big Data/AI neophytes, will take you from data-rich to data-buried and a bazillion correlations of sub-par outcomes producing little juice for lots of squeeze.