Home AI's Next Frontier: From Token Prediction to World Models

AI's Next Frontier: From Token Prediction to World Models

Mar 27, 2025 14:52 CST Updated 14:52
NVIDIA

Artificial Intelligence Computing Service Provider

Editor’s Note: This article is from Egen Therapeutics, authored by Dr. Li Changqing. VCBeat has obtained permission to republish it.


Event Background and Speaking Occasion


In March 2025, NVIDIA held its annual GPU Technology Conference (GTC) in San Jose, California, attracting more than 25,000 attendees. During his keynote address and media Q&A session, NVIDIA CEO Jensen Huang shared his views on the development of artificial intelligence, including an optimistic outlook on Artificial General Intelligence (AGI). Meanwhile, Meta’s Chief AI Scientist Yann LeCun, as a special guest, joined NVIDIA Chief Scientist Bill Dally for an on-stage discussion about the future direction of next-generation AI models. At this conference, these two industry leadersAIInference Methoda thought-provoking “debate” unfolded, with one side envisioning a future based ontokenOne perspective views the current trajectory of large language models in sequence prediction as a promising path toward more powerful AI, while the other questions the limitations of this approach and advocates for introducing new concepts such as “world models.” The following sections will present both viewpoints separately and analyze the relevant technical details and industry responses.


Jensen Huang’s View: An Optimistic Outlook from Token Inference to AGI


Jensen Huang holds views on the progress made by AI, represented by large language models,Very Optimisticattitude. He currently generates models word by word, token by token (token)ReasoningThe process of generating answers is regarded as a continuously scalable intelligent production process, with the belief that the capabilities of these models improve rapidly as scale and data increase. Jensen Huang has stated on multiple occasionsAGIis just around the corner. For example, he predicted that if passing various human examinations were adopted as the benchmark for AGI, AI would be expected to achieve outstanding results in all such exams within the next five years (though he also emphasized that the definition of AGI remains controversial). He proposed that “manufacturingIntelligence"Analogous to manufacturing, establish an 'AI factory' to mass-produce intelligent outputs (text, code, and even actions) using tokens as raw materials."


At the GTC Conference, Jensen Huang emphasized that AI systemsReasoning Capabilityis continuously being strengthened and has been designated as a key focus for the next stage of development. NVIDIA has released new inference acceleration libraries (such as Dynamo) to enhance the efficiency of multi-step reasoning and complex decision-making in large models. Jensen Huang addressed these bastokenConfident that predictive models will evolve toward more advanced intelligence, it is believed that through the introduction of tool use, increased context length, and multimodal perception, models’ human-likeInferencePerformance will continue to improve. Furthermore, he is highly optimistic about the prospects of “Physical AI,” which integrates AI with robotics to enable agents to act directly upon the physical world. At the conference, he showcased next-generation general-purpose robotics platforms (such as Isaac GR00T), declaring “The Era of General-Purpose Robots Has Arrived”. This view indicates that Jensen Huang believes combining physical robots and reinforcement learning with existing AI technologies (such as large models) can accelerate the evolution toward artificial general intelligence.


In summary, Jensen Huang has painted a picture ofRoadmap for Gradual Evolution: Leveraging today's large language modelstokenBy continuously investing computational resources and refining algorithms to enable AI to acquire more knowledge and skills, we can ultimately “manufacture” intelligence that approaches human-level capabilities. He tends to believe that AGI is not a distant science-fiction goal, but rather a reality that can be gradually approached within the next few years through the concerted efforts of industry and academia. This optimism is representative of the broader tech sector, reflecting confidence in the current AI paradigm and a belief that breakthroughs can be achieved by scaling existing technologies.


Yann LeCun’s Viewpoint: Critiquing the Token Prediction Paradigm and the “World Model” Concept


In stark contrast to Jensen Huang’s optimism, Yann LeCun views the currently popular large language models (LLMs)Limitationsissued sharp criticism. He bluntly stated that he was “no longer interested in LLMs,” describing such models as nothing more than large-scale “tokenGenerator”(i.e., systems that continuously predict the next token based on context), due to operating in a discrete token space, have inherent limitations in their capabilities. Yann LeCun pointed out that existing AI systems suffer fromFour Major DefectsLack of Cognition of the Physical World, No Persistent Long-Term Memory, Inability to truly understand causality for reasoning, andDifficulty in Performing Complex Planning. These weaknesses mean that relying solely on next-word prediction over massive corpora cannot produce behavior truly akin to human intelligence. He even asserted that the paradigm of relying purely on such autoregressive language models is a dead end,In five years, it is possible that no one will continue to use the current pureLLMParadigm. It will be replaced by a more efficient new AI architecture.


To address the aforementioned limitations, Yann LeCun proposed the development of **"World Models"**concept. The so-called world model refers toAIInternally establish simulations and understanding of the external physical environment, much like human infants begin to form intuitive models of the physical world within the first few months after birth.. Yann LeCun emphasized the human intelligence's**Embodiment**—we learn and reason through interactions with the real world, not merely through language. In contrast, current large models have only read internet text and have never truly “experienced” the world. This results in a lack of commonsense physical understanding, making them prone to generating outputs that are inconsistent with reality. As Yann LeCun explained in his speech: “We need a predictor that, given the state of the world and an action you intend to perform, can predict the next state of the world. With such aWorld Model, AI can plan a series of actions to achieve specific goals.” In short, he advocates for enabling AI to possess human-likeCausal PredictionCapability: The ability to simulate “if I do this, what will happen next” in the mind, thereby enabling true reasoning and planning.


In terms of technical approach, Yann LeCun advocates exploringNew Architecture and Training Paradigm, rather than blindly scaling up existing Transformer-based autoregressive models. He has published papers proposing “The Path to Autonomous Machine Intelligence,” advocating for the construction of world models through self-supervised learning and incorporating principles such as energy functions to overcome the shortcomings of current generative models in understanding and reasoning. He also criticizes the currently popular reinforcement learning approaches as yielding minimal results in cultivating general intelligence, arguing that language generation alone cannot enable AI to truly “understand” the world. On the contrary, he points out by example that human infants absorb an amount of information through observation and perception within a few months that is several orders of magnitude greater than that of the largest LLMs (for instance, the environmental information acquired by a four-month-old infant through sensory inputs is approximately 450 times the volume of training data used for the current largest language models), which demonstrates thatEfficiencyandInteraction with the Environmentis the key to intelligence, not merely the scale of data. Yann LeCun even coined the term “Advanced Machine Intelligence (AMI"this term to replace AGI, arguing that 'human intelligence has its respective strengths, and labeling it as 'general' is inaccurate,' but moreHigh-Level Machine IntelligenceIt is achievable within the next three to five years.


In summary, Yann LeCun’s viewpoint can be summarized as follows: **The current path of large models has fundamental limitations; AI needs to “move beyond text” and acquire genuine understanding and reasoning capabilities by building world models and engaging in embodied learning.** Only by adopting learning mechanisms similar to those of humans (perceiving the world and predicting feedback) and designing entirely new model architectures can we achieve human-level intelligence without endlessly scaling up computational power. His stance serves as a wake-up call to the industry, reminding everyone not to be carried away by the superficial success of current LLMs, but rather to focus on the long-term bottlenecks in AI development.


Technical Details: Token Prediction vs. World Models and Embodied Intelligence


To facilitate a clearer understanding of this debate, the following provides a brief overview of the key technical concepts involved:


· Language Model'sTokenPrediction Mechanism: Current mainstream large language models (such as GPT-4, Llama, etc.) adopt an autoregressive generation approach, which predicts the next most likely token based on the sequence of words/tokens already generated, thereby producing text step by step. These models are trained on massive corpora to learn the probability distribution of token sequences, enabling them to answer questions and generate articles. However, their essence remains statistical correlation; they do not truly “understand” semantics or facts. Since the training objective is merely next-token prediction, they may exhibitHallucination(fabricating untruthful outputs) or inconsistent logical reasoning. ThistokenHierarchical ReasoningCurrently demonstrates strong capabilities in language and coding tasks, but also has obvious limitations: the model lacksLong-Term Memory(limited to a finite context window), unable to proactively perceiveExternal Environment(Learning only indirectly from training data), it is prone to errors when addressing problems that require multi-step reasoning. As Yann LeCun has criticized, it functions more like a powerful “autocomplete” tool rather than genuinely understanding the question and deducing the answer through logical reasoning in the brain.


· Limitations of Existing Large Language Models: In addition to the aforementioned issues of limited memory length and lack of environmental awareness, large models inInference DepthandPlanning CapabilityThey also have shortcomings. They tend to focus on surface-level correlations while lacking the ability for causal reasoning. For example, it remains challenging for pure language models to solve complex mathematical or physics-based reasoning problems, and techniques such as chain-of-thought prompting are required to marginally improve accuracy. Additionally, the models lackAutonomous ExplorationandExecution of Actionscapabilities—it does not proactively verify answers, nor can it interact with the real world to acquire new information. This closed, static training paradigm confines its intelligence to the scope covered by its training data. When confronted with questions that fall outside the training distribution or require real-world common sense, the model often exposesCapability BoundariesTherefore, critics argue that merely scaling up parameters and data to enhance existing LLMs is insufficient to overcome these fundamental barriers.


· The Embodiment of Human Intelligence: The greatest difference between humans and AI lies inHuman Intelligence Is Immersive and Embodied. Infants continuously interact with the world through their senses and limbs, thereby learning concepts of objects, physical laws, and cause-and-effect relationships. This process ofExperienceAcquired common sense is the cornerstone of human reasoning. The so-called "embodiment" refers to an agent possessingBody and Sensationand act within an environment, which tightly couples the acquisition and application of knowledge with specific contexts. Embodied intelligence theory posits that cognition does not occur in a vacuum; understanding often requires interaction with the world to construct meaning. For instance, we know that tipping over a water cup will spill water because we have personally observed or even experienced similar phenomena. Therefore, many AI researchers advocate that equipping AI with sensors or virtual interactive environments to enable autonomous trial-and-error learning (such as combining reinforcement learning with self-supervised learning) is essential for cultivating human-like intuition andCommonsense Reasoningcapability. Current LLMs are considered unable to acquire true common sense and physical intuition due to their lack of direct connection with the world.


·  World ModelConcept ofWorld ModelIt is a key concept proposed by Yann LeCun and others to address the aforementioned shortcomings, derived from concepts in cognitive science and robotics. A world model refers to an AI’s internal simulation of the external world.Status and Dynamic Changesmodel. Through world models, AI can experiment with various scenarios "in its mind"Hypothetical Actionand predict the outcomes. This is analogous to the human cognitive process of mentally simulating “what would happen if I took this action.” For instance, an AI robot equipped with a world model can anticipate that an object might drop if handled with excessive force during manipulation, thereby adjusting its grip strength accordingly. In implementation, world models require AI to represent continuous spaces in order toUnderstanding the Environment, rather than being limited to discrete tokens. This may involve training AI to build causal models of the real world through multimodal perception, such as vision and audition. Once equipped with a world model, AI canPlanning: because it can coherently anticipate the consequences of multi-step actions, thereby selecting the correct sequence of actions to achieve goals. Such capabilities extend beyond the scope of pure language models and are regarded as a critical step toward human-level cognition. Of course, building world models is technically very challenging, requiring solutions to difficulties such as representation learning in high-dimensional continuous spaces, the accuracy of simulated environments, and integration with decision-making and planning. However, if successful, it will endow AI with a human-like “internal simulator,” significantly enhancing AI’sUnderstandingandReasoningDepth.


Overall, the technical discussion focuses onDifferences Between the Two Paradigms: One isBig Data+Pure Symbolic Prediction Driven by High Computational Power (tokenPredictive Paradigm), relying on the emergence of intelligence from correlations; the second isEmbodied Interaction-Driven World Model Paradigm, aiming to enable AI to learn the true laws of the world from causal relationships. The former has achieved remarkable results in the short term but has been criticized for having a ceiling; the latter is highly anticipated, yet its implementation path remains under exploration. This is also the core technical issue underlying this debate.


Industry Response and Controversy


The debate over AI inference pathways has sparked widespread discussion in the industry andClash of Differing Perspectives. Some researchers and developers agree with Yann LeCun’s view, believing that the current hype surrounding large language models is somewhat overheated,AGIThe prospect may not be as imminent as the optimists have portrayed.Some have bluntly stated that the LLM field is currently rife with a “circus of exaggeration and hype” (referring to the phenomenon of overstating model capabilities to chase investment and attention), praising Yann LeCun for daring to reveal the fundamental limitations of large models and urging people to confront the unresolved issues in AI. For instance, senior practitioners have supported LeCun’s criticism regarding the lack of long-term memory and genuine understanding, arguing that current models are still far from achieving the flexibility of human cognition, and that AGI is more likely to be aLong-Term Challengesrather than near-term gains. Proponents of this view argue that while pursuing larger model scales, greater investment should be directed toward research into novel principles and multimodal approaches.


However, other industry insiders hold a different view on Yann LeCun’s assertionRetain or even opposeAttitude. They believe that although current LLMs have shortcomings, they are not a “dead end” with no merit. In fact, large language models are likely to become an important component of future intelligent systems.Basic Module. As one commentator pointed out, the belief that some in the industry advocate that “large language models are the only way forward” is actually aStraw Man Argument: Most researchers recognize the need to integrate multiple approaches, such as memory, tools, and feedback, with LLMs serving as one of the many key components. This perspective emphasizesIncremental Improvement: By incorporating long-term memory modules, integrating retrieval tools (such as enabling web-based information search), and adding multimodal perception and action interfaces, AI is evolving toward more general intelligence. Several positive examples have been cited to counter Yann LeCun’s pessimistic outlook. For instance, OpenAI’s newly launched research agent, capable of autonomously retrieving information online and drafting reports, is regarded as a successful case of enhancing reasoning capabilities through the combination of large language models (LLMs) and tools. Furthermore, while Yann LeCun at Meta has expressed skepticism about pure LLMs, Meta’s own open-source large model, Llama, has achieved 100 million downloads, highlighting the substantial industry demand for LLMs. This is seen as evidence that the AI community continues to improve and apply existing large models while exploring new approaches.


AboutAGIFeasibility and Timeline for Implementation, there is no consensus in the industry, and controversy is significant. Both Jensen Huang and Yann LeCun belong to the relativelyOptimismschool of thought (the former believes that continued breakthroughs along the current trajectory are within reach, while the latter holds that accessibility will be achieved shortly after shifting to a new approach), but many renowned experts remainCautionorSuspectedAttitude. Some researchers question the practicality of the concept of “general intelligence” itself, arguing that intelligence is multidimensional and difficult to measure by a single standard. They maintain that even if current AI surpasses humans in specific tasks, it does not equate to possessing true common sense and self-awareness. Others worry that an excessive emphasis on the goal of AGI may lead to neglecting more immediate, practical issues and risks. For instance, regarding the boundary of AI’s capability to “innovate independently and propose entirely novel solutions,” Yann LeCun has explicitly stated that current systemsdoes not possess


They are more akin to scholarly assistants with vast memory and retrieval capabilities, rather than scientists capable of truly independent thinking. Overall, the industry isShort-term OptimismwithLong-Term CautionTension exists: on the one hand, recent progress is encouraging; on the other hand, achieving human-level artificial general intelligence is still regarded by many asMore Major Breakthroughs Neededgrand ambitions. This debate has thrust these divergences into the spotlight, prompting profound reflection on the boundaries of AI capabilities and its developmental trajectory.


Concise Summary of the Debate's Significance


The debate between Jensen Huang and Yann LeCun on AI inference methods at GTC 2025 holds significant guiding importance. It represents two distinct yet not entirely opposing schools of thought currently prevailing in the AI field:One approach emphasizes continuing the existing large model paradigm with rapid iteration, while the other advocates breaking through the current paradigm to build new models that more closely resemble human cognitive mechanisms.. Such discussions have prompted the industry to gain a clearer recognition that:


· Identify Weaknesses, Clarify Direction: Even the most powerful language models have their inherent shortcomings, which need to be addressed through new approaches. As Yann LeCun stated, true intelligence may require “understanding” the world, rather than merely reading it. This points researchers toward the next direction of effort—how to endow AI with persistent memory, commonsense physics, and autonomous planning capabilities.


· Balancing Incremental Improvement with Disruptive Innovation: Jensen Huang’s optimism reveals that the potential of existing technologies is far from exhausted, and AI capabilities continue to advance through ongoing engineering improvements.Rapid Ascent. Yann LeCun’s warning serves as a reminder that we need toForward-Looking Innovation, to avoid wasting excessive resources on the wrong path. The future development of AI may need to both draw on the strengths of current large models and boldly explore entirely new architectures, achieving a “dual-track” approach.


· Promoting Multidisciplinary Integration: Issues Involved in the DebateEmbodied AIWorld Modelconcepts, prompting the AI field to draw more extensively on advances in cognitive science, neuroscience, and robotics. This will encourage academia and industry to place greater emphasis on cross-disciplinary collaboration, such as increasing investment in research on simulated environments, reinforcement learning, and the integration of large models, thereby paving the way toward more advanced AI.


· A Rational PerspectiveAGI: This discussion has also encouraged the public to view AGI more rationally. We should neither blindly believe that AGI will arrive uninvited nor regard it as an untouchable illusion. Instead, we should pragmatically define our desired goals for “advanced machine intelligence” and assess the feasibility and risks of the pathways to achieve them.


Yann LeCun prefers the term “advanced machine intelligence” to underscore this point: rather than fixating on what constitutes “general,” it is more productive to focus on enhancing specific capabilities. As industry leaders openly discuss the feasibility of AGI, regulators and the public can better engage in conversations about the future of AI.


In summary,2025YearGTCThe debate at the conferenceAIProvides valuable insights into future development directions. Jensen Huang’s vision is inspiring, showcasing a bright future for evolution along existing paths; Yann LeCun’s critique is thought-provoking, sounding an alarm for the industry and pioneering new explorations. The clash between these two perspectives is not a zero-sum game of winners and losers, but rather jointly outlines a more complete picture: namely, that true AI breakthroughs may require"Both quantitative accumulation and qualitative leap"For AI researchers and practitioners, the significance of this debate lies in inspiring us to maximize the advantages of current technologies while having the courage to break through conventional thinking patterns and forge new pathways toward higher intelligence. This will undoubtedly have a profound impact on AI development strategies in the coming years, guiding us to more steadily advance into a new era of human-machine intelligence.


References: The content of this review draws on reports and industry commentary from the GTC 2025 conference, including quotes from Jensen Huang and Yann LeCun as cited by media outlets such as Reuters, National Technology, and SiliconANGLE, as well as discussions among researchers on social platforms like LinkedIn.