Large Language Model (LLM)
Also known as: LLM, large language model, foundation model
What is a Large Language Model?
A large language model (LLM) is a neural network trained on vast amounts of text data to understand and generate human language. LLMs work by predicting the next token (word or subword) in a sequence, and through this seemingly simple objective applied at enormous scale, they develop sophisticated capabilities including summarization, translation, code generation, reasoning, and creative writing. Prominent examples include OpenAI’s GPT series, Anthropic’s Claude, Google’s Gemini, and Meta’s Llama.
How LLMs Work
LLMs are built on the transformer architecture and trained in two main phases. During pre-training, the model processes trillions of tokens from books, websites, code, and other text, learning statistical patterns of language. During post-training (including fine-tuning with human feedback), the model is aligned to follow instructions, be helpful, and refuse harmful requests. The “large” in LLM refers both to parameter count (ranging from billions to over a trillion) and training data volume. More parameters and more data generally yield more capable models, following predictable scaling laws.
Why LLMs Matter
LLMs have become the general-purpose reasoning engine at the center of modern AI applications. They power chatbots, coding assistants, search engines, content generators, and AI agents. When combined with tool use, LLMs can interact with external systems — querying databases, calling APIs, writing and executing code — transforming them from text generators into autonomous workers. Understanding LLM capabilities, limitations (such as hallucination and context window constraints), and cost characteristics is essential for anyone building AI-powered products or deploying AI in enterprise settings.
Related Reading
- Pre-training - The first phase of LLM training
- Hallucination - A key limitation of LLMs
- Scaling Laws - Why bigger LLMs perform better