Underutilised knowledge is the new underutilised asset
A lot of VCs are excited seeing Scale’s outcome. Accel just made 2.5B as an early investor on Meta’s Scale AI deal. Everyone is talking about Scale AI’s and other labeller’s insane success, including their revenue growth, but misses the real story.
It’s not about being a better Mechanical Turk. It’s about timing the supply waves perfectly.
3 waves of labelling marketplace supply:
- Wave 1: generic labour.
- Wave 2: the COVID talent surplus.
- Wave 3: the underutilised idle specialists.
Just like idle cars and homes created Uber and Airbnb, idle expertise (cultural, legal, financial) will now fuel a new supply wave in data labeling.
Like Uber or Airbnb, data labeling operates as a two-sided marketplace.
- Demand: AI labs, product companies (co pilot for X).
- Supply: Human labellers/evaluators, starting with freelancers from India, Africa, Philippines, LatAm, etc to domain experts who have PhDs.
Samasource, was one of the earliest players that worked with these foundational models, and they positioned themselves as the “ethical” player in this space, sourcing labellers in Africa, focusing on both ‘impact’.
Scale is the one that grew to massive ‘scale’.
There are a lot of players now like Surge AI, Invisible Labs, Turing AI, Deccan AI (based in Hyderabad), Handshake that have a distribution advantage and existing relationship with “demand”.
But I think there will be more players. Not because of just more demand. Demand is proven. Not just from foundational labs who spend a lot, but also enterprise clients (verticals such as insurance, telcos, and copilots for X, and more). The differentiation will be in capturing specialised supply better, cheaper, at scale. And there has never been a better time to source that.
Let’s start with ‘supply’ side of marketplaces and how it changes over time. You initially overpay to attract supply. Uber drivers at one point were making 6 figures INR per month in India.
Drivers were so happy that they bought more cars on loan and started running their own fleet. The incentives were that good.
Scale AI and others paid generously to build their initial workforce. Good money for people in developing countries looking for flexible work.
There are some good examples on how they seeded the supply side in the book ‘Empire of AI’. Over time. Your buyers (in this case, AI labs) start squeezing margins. They want cheaper, faster labeling. They want more from their spend. You have two choices: subsidise the cost or cut incentives that you generously provided earlier to your labellers. Labeller payouts declined over time. For basic tasks, you always could launch in a few country where people were equally desperate for work. There was oversupply of supply.
And the models became better over time. For basic tasks, you do not need labellers anymore. So you start going after specialised tasks.
For specialised tasks you need domain experts. Domain experts (supply) need a reason to come to your platform.
This is why we need to understand not just the demand side, but also supply.
I think it is the best time to start a data labelling company if you can figure out how to gather differentiated supply. And assuming demand from the big labs persist.
The best marketplaces grow during economic crises. Not because they’re predatory, but because that’s when skilled supply becomes available. The 2008 recession gave us Airbnb and Uber. People needed money. They had idle assets (homes, cars).
Refer to Kevin Kwok’s underutilised fixed assets essay.
Covid created a similar moment for data labeling companies. Suddenly you had tech workers who got laid off in the first wave of layoffs, you had people working from home, professionals with time on their hands.
People who were not labellers earning a few dollars per hour in the Philippines, but experts who could charge far more for their domain knowledge.
Surge AI capitalised on this. Their supply quality shot up overnight.
Scale could still provide you labellers at scale, but Surge positioned themselves as a more premium option, with higher quality experts. Now we’re in another interesting moment.
AI is slowly displacing knowledge workers. This is the worst hiring market in a long time.
College graduates are unemployed.
Lawyers, analysts, domain experts are finding themselves with idle time.
Models don’t need basic labeling anymore. They need specialised knowledge.
Because they are getting better at general tasks but still struggle with domain specific knowledge. Every AI company trying to build vertical copilots (for legal, for doctors, and others) needs this specialised labeling.
Supply that did not need these jobs earlier, considered them low status, and maybe even feared providing expertise to help train models that would eventually replace them, have no choice but to join these platforms and monetise their knowledge.
A corporate lawyer reviewing contract clauses. A Bengali literature professor annotating cultural references. A former Goldman analyst explaining financial models. All these folks will be helping frontier models with the remaining 10% that is needed them to deploy fully specialised agents that replaces human workers.
One might ask: If AI is displacing knowledge workers and creating a surplus of desperate talent, and if these same workers can command premium prices for specialised labeling, why would a “Copilot for X” pay top dollar for their annotations rather than hiring them directly at lower wages?
The answer lies in how AI companies optimise for revenue per employee. They aim to minimise operational overhead and avoid adding non core roles to their payroll.
If a “Copilot for legal” company employed hundreds of lawyers, it would raise doubts about whether the company is truly leveraging AI to perform legal tasks. This scenario leads to satirical commentary such as the notion of Builder AI hiring thousands of Indians instead of having any real “AI” and the phrase “AI” standing for “Artificial Indians.” Ultimately, this means operationally heavy tasks will always be outsourced to trusted, high quality marketplaces rather than handled in house.
While reading Tegus interviews, I have come across examples of labs paying 1000$ for labelling tasks.
Not just foundational model labs, even companies like Apple might go to and ask Mercor for 50-100 MDs to help train their health data.
A Co-pilot for legal doesn’t need 10,000 random people telling it about law. It needs 100 lawyers who can spot when its legal reasoning is flawed. A Gujarati literature professor is more valuable for cultural alignment than 1,000 Gujarati mechanical annotators.
Companies are innovating too. Companies like Mercor are betting on a more flexible, pure marketplace model, which is low touch, and just connects demand with specialised supply. I think it is more scalable in long term.
They are the anti-Scale AI.
Namma Yatri to Scale AI’s Uber.
They are more Tegus than GLG if it makes sense.
Data labeling started with basic image tagging. Moved to text annotation. Now heading toward expert knowledge curation. The platforms that can attract and retain domain experts will win.
Smart founders would be building for this future now. Not competing with Scale on volume. But creating boutique marketplaces for specific expertise. Law. Medicine. Finance. Regional knowledge and nuance. Each with its own supply dynamics and quality requirements.
AI will be simultaneously destroying jobs while creating new ones: knowledge workers losing traditional roles but finding new work teaching AI their expertise.
Note 1: I’m using ‘labeling’ as a catch all. In practice these firms ship full data-ops stacks: APIs, Python SDKs, in platform QA, and RLHF/SFT workflows and it is not easy to recreate all of this. I still think there are opportunities in this space and Mercor is a prime example of a new kind of player who can enter late and capture significant revenue, even though players like Scale exists.
Note 2: One Underutilised Supply Source: Retirees. Retired Doctors. Retired Lawyers. Many retirees feel lonely and seek purpose. They don’t want full time jobs, making them an ideal supply source for marketplaces needing specialised skills. Gig work on typical on demand marketplaces tends to be low status and doesn’t use their specialised knowledge. Leveraging retirees in roles aligned with their expertise offers a much better alternative. This supply will not be as high intent as a specialised worker out of work. You can operate a marketplace with fewer but highly dedicated freelancers, almost acting as full-time employees, working longer hours and achieving higher utilisation or you can choose to have a larger pool of freelancers working fewer hours with moderate utilisation. This will be the later.