← Back to Timeline
2006Dawn of Deep Learning

> Hinton's Revival_

Hinton revived deep neural networks with deep belief networks.

> DEEP DIVE_

In 2006, Geoffrey Hinton — a British-Canadian computer scientist who had been working on neural networks for over 25 years, through two AI winters and countless rejections — published a paper in the journal Science that reignited the field. The paper, co-authored with Simon Osindero and Yee-Whye Teh, introduced "deep belief networks" and demonstrated a technique called greedy layer-wise pretraining that could effectively train neural networks with many layers. The key insight was counterintuitive: instead of trying to train all the layers at once (which failed due to the vanishing gradient problem), you could train each layer individually as a restricted Boltzmann machine, building up a deep network one layer at a time. Once this unsupervised pretraining was complete, the network could be fine-tuned with standard backpropagation.

The publication in Science was itself a minor miracle. For years, getting neural network papers into top venues had been nearly impossible. The mainstream machine learning community had moved on to kernel methods, support vector machines, and Bayesian approaches, and reviewers routinely rejected neural network submissions as old-fashioned or unpromising. Hinton's stature — he had co-authored the famous 1986 backpropagation paper and was a Fellow of the Royal Society — gave him just enough credibility to get the paper through review. He later admitted that the strategic decision to use the term "deep learning" rather than "neural networks" was deliberate: the field needed a rebrand to escape its stigma.

Hinton's persistence had been sustained in part by the Canadian Institute for Advanced Research (CIFAR), which had funded a small program on "Neural Computation and Adaptive Perception" since 2004. The funding was modest — a few hundred thousand dollars per year — but it provided crucial support for a community of researchers who believed in neural networks when almost no one else did. This group, centered around Hinton at the University of Toronto, Yoshua Bengio at the University of Montreal, and Yann LeCun at New York University, became known informally as the "Canadian Mafia" of deep learning. They had kept the faith through the long winter, continuing to train students, publish papers, and refine their ideas even as the rest of the field looked elsewhere.

The 2006 paper did not immediately transform AI. The practical impact was initially limited because the computational resources needed to train deep networks on large datasets were still insufficient. But the paper opened a door. It showed that deep networks could learn hierarchical representations — simple features in early layers, complex abstractions in later layers — and that this hierarchical learning was the key to handling complex real-world data. Within six years, deep learning would dominate computer vision (Krizhevsky's AlexNet in 2012), speech recognition (Hinton's group at Microsoft in 2012), and natural language processing. All three members of the Canadian Mafia — Hinton, Bengio, and LeCun — would receive the Turing Award in 2018, computing's highest honor, for their decades of work on deep learning. The 2006 paper was the crack in the dam. Within a decade, the flood would reshape the entire technology industry.