PART 1: Understanding AI
Beyond the Hype
Is AI the saviour of the world, or its undoing? An accelerant of progress, or of chaos? You don’t have to look far in 2025 to find vocal proponents of either view. Indeed, “AI” is everywhere right now. Maybe it’s the spaces we spend time in, but it feels literally impossible to avoid the term wherever we go: every email newsletter, every social media post, every water-cooler-colleague-catch-up, every post-work-drinks-gathering seems incomplete without some mention of the term. Whatever your take, this is perhaps a sign that the ‘friend or foe’ question is now largely a rhetorical one, and that AI is here to stay regardless. Or, in the language of Roger’s Innovation Diffusion Theory, that “the late majority” have now come to terms with its usefulness, and that even “the laggards” are starting to resign themselves to its ubiquity, reluctantly filing into the party with the rest of us. But as “AI” assimilates into the cultural canon of twenty-first century life, what started as a precise scientific and philosophical term for a specific range of technological innovations is giving way to a hazy, catch-all term that gestures toward some inscrutable cliché (“clever tech stuff?”), with the majority of those who speak its name relating more to the sheer popularity of the field than to the inner workings of any particular tool. Much the same seems to have happened with “blockchain”, for example. There is a danger here, though: that we normalize the term so extensively that it seems to need no explanation, as though a harmless and vague fact of life, generic enough to contain any number of possible meanings without being individually accountable to any of them, permanently defanging our critique of how it really works, what it really represents, and whether or not it’s something we actually want in our lives. So, before we discuss what “AI” can and can’t do, what risks “it” does and does not pose, and whether or not we are witnessing its hand in the end of human civilization, let us first disambiguate. And apologies in advance to any readers with a highly sophisticated understanding of AI, for whom the summary below may be a cringe-worthy simplification: we welcome your corrections!
What is AI, though?
When people speak about “AI”, they’re typically referring to a variety of applications of “machine learning” (ML): a set of techniques that have been under development since the fifties, which enable computers to spot and contextualise patterns or rules in data without the need for humans to codify and input those patterns or rules beforehand. This is really at the core of all modern “AI” applications: autonomous pattern recognition. ML often uses a type of algorithm (an algorithm is essentially a set of instructions, like a recipe) known as “neural networks” (yes, modelled on the neurons of the human brain) to process the information needed to infer those patterns. But it’s advances in “deep learning” (which uses “deep neural networks” or DNNs, so called because they are structured into multiple stacked layers, huh) that are really causing all the fuss. DNNs – especially those using “transformer architecture” – are behind recent breakthroughs in, for instance, “natural language processing” (NLP), whereby computers can process linguistic data (giving a computer words that a human spoke or wrote and the computer determining what that human meant), and “computer vision” (CV), whereby computers can process visual data (giving a computer a picture of a train and the computer determining that it’s… a picture of a train) – both of which can be considered distinct subfields of AI.
Another subfield making waves right now is “generative AI” (GenAI). Essentially, GenAI refers to the creation and use of “large models” (LMs), which are trained to infer patterns from vast amounts of data, to generate “new” content, like text, images, sound, or video (whether or not it’s really “new” is a matter of spirited scholarly debate, but we’ll get to that later). With their algorithmic wizardry, “large models” (LM) condense immense amounts of complexity about, for example, how languages tend to work, or how images tend to look, into high-fidelity representations that can be used to generate predictions for what the next word in a given sentence might be, or what the missing patch in a given image might look like. After a few rounds of tweaking, an LM’s capacity for prediction can be used to generate endless reams of new content from a simple prompt. “Large language models” (LLMs), like those underlying household names such as ChatGPT, Claude, and Gemini, have typically been trained on tens (sometimes hundreds) of thousands of gigabytes of linguistic data, and their resulting proficiency in generating human-like language is already enough to fool humans into thinking they’re real people. But you probably already knew that bit.
LLMs: Under the Hood
In this research project, we’re using the term “AI” in reference to GenAI in particular, and especially to tools based on LLMs. So it’s worth going into a little more detail about how these models work. Let’s say you want to build an LLM: where would you start? Well, once you’ve developed your deep learning algorithm (no small feat), you need to expose it to data. A lot of it. This stage is known as basic training (although it can sometimes be preceded by an additional ‘pre-training’ step) and it provides the LLM with its modelling of grammar and semantics, its memorisation of basic world knowledge, and its discernment of human sentiment. The most common source of basic training data is known as “the Common Crawl”, which comprises an up-to-date repository of (almost) all of the freely available text on the open internet: every blog, every wikipedia entry, every Medium article, every angry Youtube comment thread. All of that text is first converted into strings of “tokens” (numbers which stand for clusters of letters that form the building blocks of all words, or “morphemes” if you’re a linguistics nerd) and then analysed for patterns. After this “unsupervised learning” period, an LLM will be able to call on millions of “parameters” (the factors and weightings it develops in order to model human language) to select the most plausible and contextually appropriate token to add to a given sequence, based on a dynamic probability distribution of all the possible options. So, if you were to feed the model a sentence beginning “the fox jumped onto the…”, the model would rank the probability of varying options for the next word. “Spaceship” would presumably be ranked very low; “skateboard”, just slightly higher but still fairly low; “cat”, quite highly; and “roof”, perhaps highest of all. And so it would select the term “roof”, and then move on. (It should be noted here, though, that how exactly an LLM generates its parameters and determines these probability distributions is nearly impossible to trace, a phenomenon often referred to as “the black box” or “explainability problem” of AI.) The sequence of tokens that precedes and informs a prediction is known as its “context window”. But these context windows aren’t necessarily just a few words long, as with our jumping fox example, but can also factor in the content of the preceding paragraphs, or even the sum total of all a user’s previous interactions with the model. The latest Gemini model released recently by Google, for example, has a maximum context window of one million tokens. That’s an amount of text equivalent to the entirety of Tolstoy’s War and Peace. One interesting thing to note here is that a model’s predictions are “autoregressive”, meaning that each new token that is selected and added to a given sequence then becomes part of the context window that’s used to predict the next token, and so on again and again. An LLM’s unsettlingly human-like linguistic proficiencies emerge, then, simply by virtue of its capacity to keep adding new tokens based on these probabilistic calculations.
The next stage is known as fine-tuning, and consists of training the algorithm on datasets of sample prompts, so it develops the ability to read queues in the requests and questions likely to be posed by users and to adjust its responses accordingly, as well as conditioning it to elicit certain human-like qualities in those responses, like agreeableness or helpfulness. The result is an LLM that is either attuned to a specific use case, or that can adapt to a variety of types of task. Finally, your LLM is ready for “Reinforcement Learning from Human Feedback” (RLHF), whereby human workers label or score a range of outputs offered by the model in response to a given prompt based on certain parameters of desirability, for example that a clear answer is preferable to an ambiguous one, but that an ambiguous answer is preferable to one containing information that could threaten public safety, in order to further align the model’s behaviour with a user’s needs and expectations. Whip up a user interface that integrates the ability to search the web or call on curated databases of subject-specific expert knowledge alongside the core functions of the LLM, and there you have it: a market-leading AI product. Congratulations.
As you behold your creation, then, you may ask yourself: to which kinds of tasks should this “machine intelligence” be applied? And to which should we refrain from applying it? What inherent capabilities does it really possess, if any, and where do we meet the limits of those capabilities? And these would be very good questions. Check out the next blog in the series to discover (our take on) the answers!