Large Language Model Defined: A large language model is a type of artificial intelligence model that has been trained on vast amounts of text data. The “large” in its name refers to the size of the model in terms of the number of parameters it has, which can often be in the billions.
Large language models, or LLMs, learn to generate human-like text by predicting the next word in a sentence, given all the previous words. They are trained on a diverse range of internet text, but do not know specifics about which documents were in their training set or have access to any personal data unless explicitly provided.
The concept of large language models can be traced back to the 1950s when the field of AI was first established. The first notable instance of a language model was ELIZA, a program developed by Joseph Weizenbaum at MIT in 1966.
ELIZA doesn't stand for anything as an acronym. The name ELIZA was chosen as a reference to Eliza Doolittle, a character in George Bernard Shaw's play "Pygmalion" who is taught to speak with an upper-class accent. ELIZA was one of the first chatbots and was designed to simulate a psychotherapist by using pattern matching and substitution methodology.
The transformer architecture, which forms the basis of many modern large language models, was introduced by Google researchers in a landmark paper titled “Attention Is All You Need,” presented at the 2017 NeurIPS conference.
NeurIPS, the Conference on Neural Information Processing Systems, is a leading annual conference in the field of machine learning and artificial intelligence. The conference brings together a broad community around machine learning, artificial intelligence, and neural information processing.
Following the development of transformer architecture, the large language model BERT, a machine learning model for natural language processing developed by Google, was released in 2018.
With its ability to consider context by analyzing the relationships between words in a sentence bidirectionally, BERT (an acronym for Bidirectional Encoder Representations from Transformers) was a dramatic improvement over previous state-of-the-art models and quickly became ubiquitous.
Notably, OpenAI’s GPT-1, the first model in the GPT (an acronym for Generative Pretrained Transformer) series, was also introduced in 2018.
A generative pretrained transformer is a mathematical representation of text or other media that enables a computer to carry out certain tasks - interpreting and producing language, identifying or producing images, and solving problems - in a manner that appears to be similar to that of a human brain.
GPT-1 had a number of limitations, such as generating repetitive text and failing to reason over multiple turns of dialogue. Nonetheless, GPT-1 laid the foundation for larger and more powerful models based on transformer architecture.
GPT-1 was trained on a combination of two datasets: the Common Crawl, a massive dataset of web pages with billions of words, and the BookCorpus dataset, a collection of over 11,000 books on a variety of genres.
Following GPT-1, GPT-2 was released in 2019 and, in 2020, the larger and more capable GPT-3 was introduced.
In 2019, GPT-2 caught widespread attention because OpenAI, fearing malicious use, initially deemed it too powerful to release publicly.
With the launch of GPT-4 in March of 2023, OpenAI debuted its most advanced model in the GPT series to date. A large multimodal model that accepts image and text inputs and emits text outputs, GPT-4 has broad general knowledge and problem-solving abilities enabling it to solve difficult problems with greater accuracy than its predecessors, surpassing them in advanced reasoning capabilities.
The GPT-4 model was further improved by a November 2023 update dubbed GPT-4 Turbo. While GPT-4 had a knowledge cutoff of January 2022 – meaning that it was only able to respond to user prompts with information that was current through then – GPT-4 Turbo included knowledge of world events to April 2023. (GPT-4 Turbo is also able to include in excess of 300 pages of text in a single prompt.)
Modern LLMs, such as GPT-4, can be used for a variety of tasks, such as translation, question answering, and text generation. They can also assist in writing, brainstorming ideas, learning new topics, and more.
However, even powerful LLMs like GPT-4 do have limitations, including knowledge cutoffs, and should be used responsibly as they can sometimes produce incorrect or misleading information, and their understanding of nuanced or complex topics can be superficial.
Though sophisticated models like GPT-4 demonstrate human-level performance on various professional and academic benchmarks, LLMs remain less capable than humans in many real-world scenarios.
LLMs also don’t have beliefs or desires, and any statement they make about their thoughts or feelings is a simulated output.
"Our biggest technology that we ever, ever invented was articulated language with built-out grammar. It is that that allows us to imagine things far in the future and things way back in the past." - Margaret Atwood, Canadian poet, novelist, activist, and inventor