> GPT-3 — The Scale Laws_
175 billion parameters. "Scale is all you need."
> DEEP DIVE_
In June 2020, OpenAI unveiled GPT-3, and the world learned what happens when you scale a language model to 175 billion parameters. Trained on roughly 570 gigabytes of text at an estimated cost of $4.6 million in compute alone, GPT-3 was more than 100 times larger than GPT-2. But the real surprise was not its size; it was what that size enabled. GPT-3 could perform tasks it had never been explicitly trained on simply by being given a few examples in its input prompt. This "few-shot learning" capability meant that a single model could translate languages, write code, compose poetry, answer trivia, summarize documents, and generate creative fiction, all without any fine-tuning or architectural modification.
The paper "Language Models are Few-Shot Learners" demonstrated that scaling up model size, dataset size, and compute led to smooth, predictable improvements in performance across a stunning range of benchmarks. This was formalized by Jared Kaplan and colleagues at OpenAI in a separate paper on "Scaling Laws for Neural Language Models," which showed that model performance followed power-law relationships with compute, data, and parameter count. The implication was profound and unsettling: if you wanted a better model, you didn't need a better algorithm. You just needed to spend more money. Rich Sutton's "Bitter Lesson," his 2019 essay arguing that general methods leveraging computation always ultimately win over methods that leverage human knowledge, had found its most dramatic confirmation.
OpenAI released GPT-3 not as an open model but as a commercial API, marking a fundamental shift in how AI capabilities were distributed. Developers could access GPT-3 through a simple HTTP request, paying per token. Within months, hundreds of startups emerged, building products on top of GPT-3's capabilities: copywriting tools, code assistants, chatbots, content generators, and applications no one had anticipated. Jasper AI, Copy.ai, and dozens of other companies raised millions of dollars building thin wrappers around GPT-3's API. The startup explosion demonstrated both the economic potential of large language models and their potential to create entirely new product categories.
The release of GPT-3 also forced a reckoning with the economics and politics of AI. Training a 175-billion-parameter model was beyond the budget of any university lab and most companies. AI research, which had historically been an academic pursuit, was rapidly becoming an industrial one, with the largest models accessible only to organizations with tens of millions of dollars in compute budgets. This concentration of capability raised urgent questions about power, access, and equity that the field is still grappling with today. GPT-3 proved that scale was a path to capability, but it also proved that this path had a very expensive toll.