Small Data and the Earthly Peak

HaAI Labs
3 min readFeb 12, 2023

The massive amounts of data that we generate in our online and offline everyday lives has been referred to as “big data”, and has been the focus of much attention and investment in recent years. However, there are now many signals pointing to the increased importance and value of small data.

Small data, as the name suggests, refers to small, curated and high-significance data sets. Big data is useful to identify statistical patterns and significance, small data is highly curated and carefully selected to provide meaningfulness (which may or may not coincide with statistical significance).

For instance, big Data doesn’t help much in picking venture investments; and the most influential startup incubator in the world is a small family business built on interpersonal psychology. Big data is an impeccably fitted Gaussian, based on millions of observations; small data is the one observation that shatters our Gauss Markov Assumptions.

Raw computing power has become the critical factor in developing state of the art AI systems, and large centralized entities such as Google and OpenAI have an unbeatable advantage. This means that most AI projects are now focused on fine-tuning large language models to the peculiarities of specific communities. The future of AI will be about working with these large entities rather than competing against them.

However, the importance of small data has been spectacularly highlighted by the example of Google, which lost $100 billion in value because of a small data blunder. This incident shows the significant impact that small data can have and highlights the need for careful human curation.

This shift in focus to small data is reflected in a recent blog post by a big data pioneer at Google, who declares “big data is dead”. A recent tweet by Greg Brockman, the co-founder of OpenAI, highlights the value of data curation by humans. This statement is further reinforced by the work of the Cornell Small Data Research Group, who have made some exciting breakthroughs in the application of small data in healthcare.

The advent of Large Language Models means that most content and interactions online will be AI-generated. We already have infinite amounts of machine-generated text, visual art, and symphonies, but no one interested in consuming them. What makes a poem, a wooden table, or a tailored suit beautiful are the implied hours of human ingenuity and labor that they carry.

Large Language Models make verifiably-human input all the more valuable. We refer to this as the Earthly Peak. It is the antithesis to the Uncanny Valley, and is just as ineffable. The Uncanny Valley refers to the phenomenon where, as artificial intelligence becomes more advanced but stills falls short, its output creates an eerie and unsettling experience for the viewer. The Earthly Peak, on the other hand, highlights the increased value of human intelligence, taste, and creativity as machine output improves.

The Earthly Peak

We believe DAOs can play a crucial role in the small data revolution. These communities have the potential to be at the forefront of curating, creating, and monetizing small data. The blockchain is simply the best tech around for access control and incentives management. This is the thesis behind the HaAI Labs (Human-augmented Artificial Intelligence), which we believe is an important piece of the human-machine symbiosis puzzle.