I Built A GPT

I built a GPT. As in, ChatGPT, a Generative Pre-trained Transformer.  

Why, you ask. The rollout of Transformer models, such as DALL-E (which generates images based on some text input) and ChatGPT (which generates text instead of pictures), has ignited a public frenzy. Businesses trumpeted AI supremacy; stock prices exploded. That was not the reaction to spam filters or navigation apps such as Waze, both machine learning models – a kind of AI – that are on everyone’s phones.   

In a previous article, I posited that AI was just a tool. Like all tools, we decide how, when, where and what the technology is used for. Any good vs. bad debate is ultimately about humans and manmade entities. Still, the battle of narratives raged on, inundating social feeds, client conversations and every single networking event. Perhaps by building one, I would get what all the fuss is about. 

Since I know how spam filters (a Classification model) and navigation apps (an Optimization model) work, it should be an easy hop onto YouTube for a quick feature update. The hours-long “code from scratch” tutorials seem useful, but 10 minutes in, one admits that something more basic is required and quits. The shorter introductory clips (here, here and here) are so chock-full of jargon that I struggled to breathe. They also assume a mastery in computer science which I do not have. 

But like learning a Bach fugue, the first thing to do is to find the main theme. 

*  * *

Among the weeds of GPT instructional videos, one frequently encounters this term, Attention. Specifically, references to a whimsically titled research paper “Attention is All You Need”. A little digging reveal that this Attention mechanism is the core of the Transformer, the breakthrough that makes it so impressive. Transformers are designed to predict an answer when given a prompt, like filling in the blanks or completing sentences or translating French into English. Attention is how GPT understands the meaning of a prompt, beyond a jumble of words and punctuation. This sounds rather complicated, but the mechanism is simply a series of matrix multiplication, a basic algebraic procedure (Figure 1). 

 
A simple example of matrix multiplication.

Figure 1: A Simple Example of Matrix Multiplication

 

The next step is to work out the remaining components (Figure 2). Some elements are familiar, for instance softmax, a mathematical procedure that allows us to compare data on a common scale. Others, such as the Feedforward Neural Network, require more work. The Feedforward mechanism is a biological inspired algorithm, a stylized version of our neurological wiring. Information moves from one perceptron (analogous to a neuron) to another, performing calculations as it goes. These calculations refine the context that Attention has processed to recognize complex patterns, say sarcasm or other sentiments. Layer these many times and ChatGPT becomes sentient, amusing, uncanny. 

 

Figure 2: The Transformer – Model Architecture (Vaswani et al, 2017)

 

Once all the concepts are clear, translating into code is surprisingly uncomplicated, especially with the help of coding tutorials. Debugging, as usual, was an absolute hellhole. Even so, writing about this took thrice as long as programming the model and a gazillion times longer than GPT-4 would have taken. Working things out manually may be superfluous, but the clarifying, sorting and killing of one’s darling notions create a depth of understanding that regurgitating GPT output cannot replicate.  

*  * *

It is the consensus that bigger is better when it comes to GPT. A bigger model learns from more data, captures more patterns and responds faster to prompts. But are these the same as better? When it comes to GPT, how do we define good? Economic gains? Human approval? Human likeness? I seem to have looped back to the question I asked a year ago: What is Good (Note 1) ? Could making more advanced models, more powerful processors and more AI-generated content help us answer these questions? 

Research is underway to study the emergent effects of our interactions with AI. The field of ML (machine learning) alignment aims to steer AI “toward a person's or group's intended goals, preferences, and ethical principles”. Part of this involves painstakingly disassembling and reconstructing models to understand what really happens when individual components – Attention, softmax, Feedforward – interact in a wider whole, making evaluation and intervention of AI processes and behaviors possible. Learning how to build a GPT is step-zero to answering our questions. Perhaps this path of inquiry will also lead us to deeper knowledge about ourselves and our values. 

 
An output from a tiny GPT trained using the full works of Shakespeare, tokenized by character.

Figure 3: Output from My GPT - Trained Using the Full Works of Shakespeare, Tokenized by Character

A comparison of dimensions of my GPT vs. OpenAI's GPT-3

Table 3: Comparing the Size of My GPT vs. OpenAI's GPT-3

 

Note 1 And its corollary: Good for Whom? 

Click here for a list of learning resources to build a GPT

 
Previous
Previous

An Idea for the New Year

Next
Next

Not Perfect Is Good