Large Language Models (LLMs) and Their Evolution

11 min readApr 8, 2023

In recent times, There are huge changes going on in the Technology front and specifically in the domain of AI. Every week we hear about some new AI tools or a new LLM model which can do amazing things which are unbelievable and scary at the same time. The current scenario is that you can like AI or hate AI but you can’t ignore AI.

Large Language Models (LLMs) are a type of neural network that has been trained on massive amounts of textual data and is capable of understanding and generating human-like language.

The first LLM models, like GPT and ELMo, were released in 2018 and achieved state-of-the-art performance on a range of natural language processing tasks. In just 5–6 years of span, we have seen a lot of advancement in the models and volume of the data on which it trained. These models can learn from wide range of textual sources, including books, articles, and even social media posts.

LLMs are typically trained using unsupervised learning, where the model is given a large corpus of text and is tasked with predicting the next word or sequence of words. This allows the model to learn the patterns and structures of human language and develop an understanding of syntax, grammar, and context. Once the model has been trained on this data, it can be fine-tuned on specific NLP tasks, such as sentiment analysis, text classification, or machine translation.

But LLMs aren’t just limited to language processing tasks. Models like DALL·E can generate high-quality images from textual descriptions, while Codex can generate code based on natural language descriptions.

Progress made by LLM models in recent years is overwhelming. I have tried to keep track of important models ordered by their release year and I will keep updating this story in the future.

LLM Models :

GPT (Generative Pre-trained Transformer) — Released in 2018 by OpenAI, GPT was one of the first large-scale language models to gain widespread attention. It was trained on a diverse range of internet text and can generate coherent, grammatically correct sentences. It has been used for tasks such as language translation, question answering, and text summarization.
BERT (Bidirectional Encoder Representations from Transformers) — Released in 2018 by Google, BERT improved on GPT by using a bidirectional architecture to better understand the context of a given word. This allows it to perform well on tasks such as language inference, named entity recognition, and sentiment analysis.
ELMo — Released in 2018 by AllenNLP, ELMo is a LLM model that uses a novel architecture called “deep contextualized word representations” to capture more complex language features.
ULMFiT — Released in 2018 by fast.ai, ULMFiT is a LLM model that is specifically designed for transfer learning and fine-tuning on specific natural language processing tasks.
XLNet — Released in 2019 by Carnegie Mellon University and Google, XLNet further improved on the bidirectional approach by using a permutation-based training scheme that allows it to model dependencies between all input positions. This has led to impressive performance on a wide range of tasks, including natural language inference and text classification.
GPT-2 — Released in 2019 by OpenAI, GPT-2 is an improved version of the original GPT model that has up to 1.5 billion parameters and has achieved state-of-the-art performance on a range of natural language processing tasks.
T5 (Text-to-Text Transfer Transformer) — Released in 2020 by Google, T5 is a versatile LLM model that can be fine-tuned for a wide range of tasks, including language translation, question answering, and text summarization. It achieves state-of-the-art performance on many of these tasks and has been used for applications such as chatbots and language modeling.
GShard — Released in 2020 by Google, GShard is a distributed training framework that allows for the efficient training of very large LLM models. It has enabled the training of models with up to 600 billion parameters, far larger than any previous LLM model. This has the potential to improve the quality of generated text and enable new applications in natural language processing.
Megatron — Released in 2020 by NVIDIA, Megatron is a large-scale, multi-head attention LLM model that is optimized for training on clusters of GPUs. It can handle large batch sizes and is highly parallelizable, allowing for faster training times and more efficient use of hardware resources.
GPT-3 (Generative Pre-trained Transformer 3) — Released in 2020 by OpenAI, GPT-3 is one of the largest and most powerful LLM models to date, with up to 175 billion parameters. It can perform a wide range of natural language processing tasks, including language translation, question answering, and text completion. It has received widespread attention for its ability to generate highly coherent, human-like text.
CLIP (Contrastive Language-Image Pre-training) — Released in 2021 by OpenAI, CLIP is a unique LLM model that can understand both images and natural language. It is trained to associate images and their captions, allowing it to perform tasks such as image captioning and visual question answering. CLIP has shown impressive performance on a wide range of vision and language tasks.
GShard-GPT — Also released in 2021 by Google, GShard-GPT is an extension of the GShard framework that enables the training of very large LLM models with up to one trillion parameters. This model has the potential to significantly improve the quality of generated text and enable new applications in natural language processing.
Switch Transformer — Released in 2021 by Google, Switch Transformer is a large-scale LLM model that incorporates a dynamic routing mechanism. This allows it to dynamically route the input sequence through a large number of sub-transformers, which can be adapted to different types of data and tasks. This model has achieved state-of-the-art performance on a range of language modeling tasks, including language modeling and text generation.
GShard-T5 — Also released in 2021 by Google, GShard-T5 is an extension of the GShard framework that enables the training of very large T5 models with up to 13 billion parameters. This model has achieved state-of-the-art performance on a range of natural language processing tasks, including language translation, question answering, and text summarization.
Codex — Released in 2021 by OpenAI, Codex is a unique LLM model that is specifically designed for code generation. It is trained on a large corpus of code and natural language text, allowing it to generate code snippets from natural language prompts. This model has the potential to revolutionize software development by enabling programmers to write code more quickly and efficiently.
GPT-Neo — Released in 2021 by EleutherAI, GPT-Neo is an open-source LLM model that was trained using a community-driven approach. It has up to 2.7 billion parameters and has achieved state-of-the-art performance on a range of natural language processing tasks. GPT-Neo has been praised for its transparency and accessibility, as well as its ability to generate high-quality text.
DALL·E — Released in 2021 by OpenAI, DALL·E is a LLM model that can generate images from textual descriptions. It is capable of creating highly realistic images of objects, scenes, and even fictional creatures based on written descriptions. This model has the potential to revolutionize the field of computer-generated imagery and enable new applications in fields such as entertainment, advertising, and design.
GShard-Turing — Released in 2021 by Google, GShard-Turing is a LLM model that is designed to mimic human-like reasoning and decision-making. It can generate text that is not only coherent and grammatically correct but also exhibits common sense reasoning and knowledge about the world. This model has the potential to bring us closer to achieving artificial general intelligence.
LayoutLM — Released in 2021 by Microsoft, LayoutLM is a LLM model that is specifically designed for document layout analysis. It is trained to recognize and understand the layout structure of documents, including headings, tables, and diagrams. This model has the potential to enable new applications in fields such as document processing, data extraction, and information retrieval.
UNITER — Released in 2021 by Facebook, UNITER is a LLM model that is specifically designed for multimodal understanding, meaning it can analyze both text and images. It is trained to understand the relationships between textual and visual information, allowing it to perform tasks such as image captioning, visual question answering, and visual grounding.
GShard-Sparse — Released in 2021 by Google, GShard-Sparse is a LLM model that is designed to be more memory-efficient than traditional LLM models. It achieves this by using sparse attention, which allows it to focus on only a subset of the input sequence during training and inference. This model has the potential to enable the training of even larger LLM models on limited hardware resources.
Marian — Released in 2021 by Microsoft, Marian is a LLM model that is specifically designed for machine translation. It is trained on large-scale parallel corpora and can translate between multiple languages with high accuracy. This model has the potential to revolutionize the field of machine translation and enable new applications in fields such as international commerce and diplomacy.
DeBERTa — Released in 2021 by Microsoft, DeBERTa is a LLM model that is designed to handle long input sequences more efficiently than traditional LLM models. It achieves this by using a novel masking technique and hierarchical attention mechanism. This model has achieved state-of-the-art performance on a range of natural language processing tasks, including question answering, sentiment analysis, and text classification.
T-NLG — Released in 2022 by Tencent AI Lab, T-NLG is a LLM model that has achieved state-of-the-art performance on a range of natural language processing tasks, including text classification, question answering, and text generation. It is one of the largest LLM models to date, with up to 17 billion parameters.
AdaLM — Also released in 2022 by Tencent AI Lab, AdaLM is a LLM model that incorporates adaptive inference mechanisms. It is designed to improve the efficiency and accuracy of LLM models during inference, allowing them to better handle long input sequences and dynamic data streams.
Codex-XL — Released in 2022 by OpenAI, Codex-XL is an extension of the original Codex model that has up to 135 billion parameters. It is capable of generating highly complex code snippets from natural language prompts and has the potential to revolutionize the field of software development.
GShard-XL — Also released in 2022 by Google, GShard-XL is an extension of the GShard framework that can train LLM models with up to 250 billion parameters. It has achieved state-of-the-art performance on a range of natural language processing tasks, including machine translation, question answering, and text classification.
Switch Transformer-XL — Also released in 2022 by Google, Switch Transformer-XL is an extension of the original Switch Transformer model that is specifically designed for sequence modeling tasks that require long-term memory. It has achieved state-of-the-art performance on a range of natural language processing tasks, including language modeling, machine translation, and summarization.
GShard-MLP — Released in 2022 by Google, GShard-MLP is a LLM model that is based on the Multi-Layer Perceptron (MLP) architecture. It is designed to be more computationally efficient than traditional LLM models by using parallel processing and sharding techniques. This model has achieved state-of-the-art performance on a range of natural language processing tasks, including language modeling and text classification.
DensePhrases — Released in 2022 by Facebook, DensePhrases is a LLM model that is specifically designed for dense retrievals in natural language processing. It is trained to retrieve relevant information from a large corpus of text in real-time, enabling new applications in fields such as information retrieval and question answering.
DALL·E 2 — Released in 2022 by OpenAI, DALL·E 2 is a follow-up to the original DALL·E model and is capable of generating even more realistic images from textual descriptions. It has achieved state-of-the-art performance on a range of image generation tasks, including object synthesis, scene composition, and fine-grained image manipulation.
GShard-BERT — Also released in 2022 by Google, GShard-BERT is a LLM model that is based on the BERT architecture and is designed to be more computationally efficient than traditional BERT models. It achieves this by using parallel processing and sharding techniques, allowing it to handle larger input sequences and perform more complex natural language processing tasks.
Sparsity Transformer — Released in 2022 by Microsoft, Sparsity Transformer is a LLM model that uses sparse attention and other sparsity-inducing techniques to improve the efficiency and accuracy of LLM models. It has achieved state-of-the-art performance on a range of natural language processing tasks, including language modeling, machine translation, and summarization.
GShard-T5 — Released in 2022 by Google, GShard-T5 is an extension of the original T5 model that is designed to handle larger input sequences and perform more complex natural language processing tasks. It has up to 1.6 trillion parameters and has achieved state-of-the-art performance on a range of natural language processing tasks, including language modeling, machine translation, and summarization.
DALL·E 3 — Released in 2022 by OpenAI, DALL·E 3 is a follow-up to the DALL·E 2 model and is capable of generating even more complex and detailed images from textual descriptions. It has achieved state-of-the-art performance on a range of image generation tasks, including scene composition, fine-grained image manipulation, and photorealistic rendering.
Switch GShard — Released in 2022 by Google, Switch GShard is an extension of the Switch Transformer-XL model that is designed to handle even larger input sequences and perform even more complex natural language processing tasks. It has up to 620 billion parameters and has achieved state-of-the-art performance on a range of natural language processing tasks, including language modeling, machine translation, and summarization.
LAMBADA — Released in 2022 by Microsoft, LAMBADA is a LLM model that is specifically designed for language modeling tasks. It uses a novel training approach called “LAMBADA” to improve the efficiency and accuracy of LLM models, and has achieved state-of-the-art performance on a range of language modeling tasks.
CLIP-R — Released in 2022 by OpenAI, CLIP-R is an extension of the original CLIP model that is designed to handle even larger and more complex input images. It has up to 640 million parameters and has achieved state-of-the-art performance on a range of visual recognition tasks, including object detection, semantic segmentation, and image classification.
Turing-NLG — Released in 2022 by Nvidia, Turing-NLG is a LLM model that is specifically designed for generating natural language text. It has up to 1 trillion parameters and has achieved state-of-the-art performance on a range of natural language processing tasks, including language modeling, machine translation, and summarization.
GPT-4 — Released in 2023 by OpenAI, the latest milestone in OpenAI’s effort in scaling up deep learning. GPT-4 is a large multimodal model (accepting image and text inputs, emitting text outputs) that, while less capable than humans in many real-world scenarios, exhibits human-level performance on various professional and academic benchmarks.

I hope this will be helpful.

Say hello to the evil scientific panda 👋

Large Language Models (LLMs) and Their Evolution

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Kunal Gandhi

No responses yet