The Failed Promise of Large Language Models

Once upon a time

Major trends in technology rarely have a clear date of birth. The internet is a classic example. From the “Intergalactic Computer Network” imagined in the early 1960s to packet switching to ARPANET in 1969 to email in 1971 to the worldwide web and the Mosaic browser – the modern internet developed slowly, then seemingly all at once. Generative AI is different. It was conceived on June 12, 2017, with the publication of the seminal white paper, “Attention is All You Need” by researchers at Google’s Deep Mind. Generative AI was born in February 2019 with the release of OpenAI’s’ GPT-2. It came of age on November 30, 2022, with the release of ChatGPT 3.5. In the roughly two years since the release of ChatGPT 3.5, the Large Language Models (LLMs) born of the Attention whitepaper have dominated discussions about the future of technology.

That domination is starting to show some cracks.

Scaling Laws

Early versions of LLMs were interesting but inaccurate. GPT-2’s success paved the way for subsequent models like GPT-3 and GPT-4, which used more data and more sophisticated training to significantly improve both factual accuracy and reasoning abilities. At first, it seemed like the sky was the limit. The transformer architecture was in place, all that was needed was more training data, more compute cycles and bigger models. Each successive model generation was much more powerful and much more accurate than the prior generation. The concept of emergent capabilities came into vogue. The discovery of emergent capabilities suggests that certain complex tasks can be handled by models if they are scaled sufficiently, even if they weren’t explicitly trained for those tasks.

Artificial General Intelligence, the ability of AI to undertake any intellectual task performed by humans, was only a matter of scaling up the infrastructure.

7 Trillion

In February 2024, reports emerged that Sam Altman, CEO of OpenAI, was in discussions to raise between $5 trillion and $7 trillion for an ambitious project aimed at significantly expanding global semiconductor manufacturing capacity. This initiative sought to address the growing demand for AI processing power by building a network of chip fabrication facilities. Altman engaged with various potential investors, including representatives from the United Arab Emirates and SoftBank CEO Masayoshi Son, to explore funding opportunities for this large-scale endeavor.

Then, a funny thing happened … the models’ capabilities started to level off.

The illusion of life

Today’s LLMs are incredibly good at generating human-like responses that sound thoughtful and intelligent. They convincingly mimic emotions. However, on closer inspection, those LLMs exhibit stubborn inaccuracies. They struggle with simple tasks like counting the number of “Rs” in strawberry. Each word they produce is a prediction based on statistical patterns learned from vast amounts of text data. This prediction process happens repeatedly as each word is generated one at a time. Unlike humans, LLMs are incapable of remembering or self-reflection. They simply output the next word in a sequence. While they are masters of language manipulation, they fall far short of the promise of AGI.

What are the main components of AGI?

Language manipulation, reasoning, memory, adaptability, and learning. That’s Infinitive’s list, anyway. Of course, these capabilities must be able to communicate with one another at blinding speed. For the sake of argument, let’s say that today’s transformer based Large Language Models (with continual improvement) will take care of the language manipulation requirements of AGI. Unfortunately, no amount of additional compute or training data will turn these transformer-based architectures into proper reasoning, memory, adaptability, and learning AGI components.

One down, four to go

If today’s LLMs will evolve to handle the language manipulation aspects of AGI – what might appear to cover the other four areas:

Reasoning – Neuro-Symbolic Systems combine neural networks, such as transformers, with symbolic AI to enable structured, rule-based reasoning. Example: IBM’s Neuro-Symbolic Framework.

Memory – Memory-Augmented Neural Networks (MANNs) incorporate an external memory bank that models can read from and write to, allowing them to retain information over extended periods. Example: DeepMind’s Differentiable Neural Computer (DNC).

Adaptability – Meta-Learning trains models to learn how to learn, enabling rapid adaptation to new tasks with minimal data. Example: Model-Agnostic Meta-Learning (MAML). Still in research.

Learning – Continual Learning allows a model to learn continuously, integrating new knowledge over time without overwriting previous learning. Example: Elastic Weight Consolidation (EWC). Still in research.

When will AGI be achieved?

The 2023-era predictions of AGI coming from simply scaling up Large Language Models appear to be overly optimistic. While LLMs have provided a stunning preview of what can be accomplished by AI, their “next word prediction” architecture seems like a poor fit for reasoning, memory, adaptability, and learning. Other architectures that could address these areas are in research. While breakthroughs (like the transformer architecture) are always possible, the path to AGI will likely prove long and hard. Infinitive’s guesstimate – 2040.