August 31, 2023 – Kenneth Wenger, author of the fascinating new book, Is the Algorithm Plotting Against Us?, sits down with FS Insider host Cris Sheridan to explain what most people don't know about large neural networking models like ChatGPT, Google Bard and others. For example, did you know that they map language and information into a hyperdimensional space? Or that these models just so happened to learn a method of processing information that is very similar to the way the human eye works? And, oh yeah, let's not forget about the 'Minority Report' incident that occurred a few years back where AI was used to predict crimes...that ended up happening! These are just some of the fascinating topics we discuss today in what is likely the most important technological breakthrough of our lifetimes.
Timestamps:
00:00:00: Introduction and Kenneth Wenger's New Book
00:00:40: Addressing AI Concerns and Understanding AI
00:03:07: Origins of Artificial Neural Networks
00:03:54: Evolution of AI and Neural Networks
00:06:03: Function and Power of Neural Networks
00:09:00: Difficulties and Power of Modeling Natural Processes
00:10:57: The Accuracy and Downside of Neural Networks
00:11:35: Challenges and limitations of machine learning models
00:12:13: Explanation of Multidimensional Hyperspace
00:14:42: Classifying Images with Vectors
00:15:26: Utilizing Neural Networks
00:16:40: Unpredictability of Models
00:17:44: Convolutional Neural network Discussion
00:18:34: Layers of Neural Networks
00:20:18: Complexity and Compression of Neural Networks
00:21:32: Similarities to Visual Systems
00:21:47: Emergence of Neural Network Properties
00:22:34: Designing Neural Networks
00:22:39: Understanding the Functioning of Machine Learning
00:25:17: Training the Model: Process and Optimization
00:25:43: Discovery of Information Organization in Machine Learning
00:26:05: Building a Model of Reality Using Neural Networks
00:27:09: Description and Compression of The Input Data
00:28:46: Understanding Convolutional Neural Networks
00:29:23: Emergent Properties in Complex Systems: Natural and Artificial
00:30:18: Converting Language to Numerical Input
00:31:25: The Role of Tokenizer in Natural Language Processing
00:32:22: Predicting the Next Best Word in Language Models
00:33:26: Probability Distributions in Text Processing
00:33:47: Making Text Inputs Profound with AI
00:34:26: Understanding large language models
00:35:22: The process of pretraining and fine-tuning models
00:36:56: Lack of philosophical understanding in models
00:38:12: Perception of 'magic' in AI
00:39:28: Why hallucinations happen in language models
00:40:33: The role of quality and quantity of data in training set
00:41:44: Mistakes in AI models and strategies to minimize them
00:42:32: The future of AI performance monitoring
00:42:55: The value of data in the world's largest companies
00:43:36: Immediate vs future concerns over AI
00:44:50: Current state of AI capabilities and understanding
00:45:38: Application of AI in banking and loan approvals
00:46:15: Neural Networks in Loan Prediction
00:49:16: Impact of Neural Networks on Advertising and User Behavior
00:51:29: Effects on Social Division and Cultural Compartmentalization
00:52:17: Real Life Case of Predictive Policing - Robert McDaniel
00:54:31: The Rise and Expansion of Neural Networks in the Last Decade
00:55:09: The Consequences of Using Statistical Models in The Judicial System and the Problem of "No Redemption"
00:57:56: The Limitations of Automating Decision Making
00:59:14: The Need for Better Systems in Automating Decisions
01:00:49: Applying AI in Hiring Processes
01:02:16: The Challenge of Training Models for Redemption
01:03:06: The Trade-Off between Efficiency and Fairness
01:04:12: Reconsidering Cost beyond Monetary Terms
01:05:11: The Real-World Consequences of AI Development
Transcript:
Cris Sheridan: Kenneth Wenger is the author of a new book released this year titled Is the Algorithm Plotting Against US? A layperson's guide to the concepts, math and pitfalls of artificial intelligence. He is also the Senior Director of Research and Innovation at Core Avi. The publisher of the book is Workingfires.org, and you can also follow more of Ken's work and views on this important topic workingfires.org where he regularly posts. Ken, thank you for joining us on the show today.
Kenneth Wenger: Hey, Chris, thank you very much.
Cris Sheridan: So, Ken, let's start off with how you introduce your book on how we should understand artificial intelligence from the perspective of living alongside the threat of lions.
Kenneth Wenger: The reason I chose the lion analogy in the book is because I find that I spend a lot of time talking with people who are concerned with AI, and maybe because of what they hear in the news. And what I like to tell them is that the best way to deal with these concerns is to try to understand for themselves how these algorithms work, what are the real capabilities, and what are their limitations. And this is part of why I wrote the book, is to try to make this as simple as possible for people to understand. The idea with things that may be dangerous in some respect is that almost nothing is dangerous all the time and in every situation. So you want to understand when do you have to be concerned with them and how do you protect yourself so that you're not always afraid of these things?
Kenneth Wenger: So, for example, with lions, lions are dangerous, but they're dangerous if you are in front of them in the African savannah and you have nothing to protect yourself from a lion. But if you are at a safe distance, or if you are in a vehicle, or if you are in North America, you probably don't have to be afraid of lions. And you can instead just learn to admire them and learn about them. And so that's how I like to propose that we deal with things that we're not familiar with, is first we have to understand what they are, how they work, when they actually pose a threat, and then we can kind of design our lives around that. But first we have to understand them.
Cris Sheridan: So if we understand the predator that's out there, which many people do characterize artificial intelligence as a predator or as a terminator, well, if we understand, like you said, their capabilities, limitations, and how we can coexist or work aroundside this threat in such a way where we can minimize risks, that's really the goal. And a lot of what you detail in your book is understanding how AI works, how neural networks work. But let's start off because I think one of the interesting things that I did not know, and this may be news to many people listening, is that when we think about the current breakthroughs and developments in the field of artificial intelligence. Currently, most of it surrounds the use of artificial neural networks, as they're referred to, and that's detailed extensively in your book. But what you say, this is what I was not aware of, is that Anns, artificial neural networks were developed by researchers to help understand neurons and the way the brain works.
Cris Sheridan: They weren't developed to advance the field of artificial intelligence. That just came about as a byproduct.
Kenneth Wenger: Right. A lot of the early development was, in fact, because researchers, especially in the field of psychology and neuroscience at the time, wanted to understand a little bit better how our own brains work and how our intelligence work. And so they wanted to build a model that would resemble it in some way. And then with a model that's not human based, you could actually change it a little bit and see the effect of those changes, which you actually can't do with a person. And so that was really the early motivation for developing neural networks.
Kenneth Wenger: But then once we understood what we had, then it developed into its own field, and now it's really the pursuit of artificial intelligence, not so much understanding ourselves.
Cris Sheridan: And it's fascinating, too, because you talk about the history of how this field has evolved. It wasn't a straightforward process, right?
Kenneth Wenger: That's right. There were many periods when the hype dried and the development also stopped for decades, even. And then it picked up again. And the reason that happened in different stages. It happened for different reasons in the early 70s.
Kenneth Wenger: There's a point where it didn't live up to the hype at that point, and so the public got tired of it, and so funding dried up. The reason it picked up again this time, though, is because of an interesting new piece of technology that kind of was developed for a different purpose. But Serendipitously came in and ignited a new wave of AI, which is the GPU, which was invented really for graphics. But the idea that we could now use GPUs as these really powerful computers to perform computations in parallel that could then accelerate neural networks, that's really what gave way to the current wave of AI. It's because now we have the technology to actually run these systems.
Kenneth Wenger: They take a lot of computing power, as you probably know.
Cris Sheridan: Yeah. And as you talk about the AI winter, I believe it was, correct me if I'm wrong, that came about because the massive amount of computational power that would be required to train these models that wasn't available at the time, we now have that.
Kenneth Wenger: That's right. So a lot of the current understanding of artificial neural networks and the processes that we use to develop these models and to train them, they were developed decades ago, but we didn't have the computational power, as you say. To actually carry these processes at scale with real large data sets and with models that have millions or billions of parameters that could actually make use of this technology. So with the scale of the computing power that we had back in, the success was very limited. And so it wasn't as interesting to the public especially and to funding agencies and so on.
Cris Sheridan: So again, we're talking about neural networks here, which is a subfield of artificial intelligence. It's where we're seeing the most advancements currently and for good reason which you discuss when it comes to the similarities between how our own brain works and how neural networks function. There are some critical differences, of course, but the similarities themselves lend to the fact that it just seems to be the best way of modeling the world around us through this neural like structure. But at the basis of all neural networks out of all these architectures is math and statistics. And you really hit that home numerous times.
Cris Sheridan: So tell us about the way in which these function at a general level, if you could.
Kenneth Wenger: First, I'd like to say we don't want to overplay the similarities between artificial neural networks and our brains. I think one way to think about that is that the early, say, intuitions about developing the artificial models had some intuitions that we learned from our own brains. But the similarities are not to the extent that our brains work exactly like these models. And that's important because sometimes I know people get confused about that. The power of neural networks and the artificial neural networks that we have today, which is, as you said, the basis for the current form of artificial intelligence is that what they are is that you can think of them as these little engines for equations.
Kenneth Wenger: So a neural network essentially is a universal approximation function. So what that means is that they are able to approximate any equation. Now, if we park that for a second okay, let's park that on the side for a second. Now, if we consider an equation, what makes an equation powerful? The reason equations are useful is because they are essentially simulators, right?
Kenneth Wenger: So if you think of any process like if you want to predict the weather or if you want to predict whether a patient may have a certain disease or if you want to try to understand the financial market, for example, all of these things could be described by some equation. And now when you have an equation, what that means is that you have a certain number of parameters. Those are the variables in the equation. And what the equation does is that it lets you predict over time how the environment reacts to certain inputs, right? So if you change the state of the variables, you can see what the equation would predict.
Kenneth Wenger: So you can see how the future will look like for any sort of changes in the state of the variables. That's what makes equations very powerful. The problem with trying to model natural processes or complex processes like weather patterns or financial trends, as I said, is that it is very difficult to actually sit down with a pen and paper and to develop an equation that actually properly and accurately models complex processes for long periods of time, right? So you may be able to predict the weather for the next few hours, but you may not be able to predict it over a week if you just develop your equations with a pen and paper. And the same thing is true for trying to predict whether a patient has cancer based on some biopsy scans or symptoms or clinical data from the patient.
Kenneth Wenger: It is very difficult to just from first principles develop equations that if you put in this input information, it will give you an accurate prediction for weather, for patients, or for anything. Now, the beautiful thing about neural networks and why they're so powerful is because you can train them in a very simple way, essentially to learn to come up with these equations on their own. And that's why they're powerful and why they're used in so many different use cases, right? Like you can think of neural networks today, they're being used for language models, for computer vision, medical use cases, as I said. So how come we can use them in so many different ways?
Kenneth Wenger: The reason is because all of these things can be described using, as I said, equations. And neural networks are the best tool we have found for systematically finding an equation that can explain or simulate a process.
Cris Sheridan: And like you said, the real power there is that in many cases neural networks discover the equations that are most applicable or the most efficient in matching the output, the prediction, to what we would expect, right?
Kenneth Wenger: With a neural network, what you can do is you can train this system without really understanding the equation, to find that equation for you specifically that will give you a prediction. And when you do this over a data set, over many, many samples, what you find is that if you look at how accurately it predicts the output, typically on a data set, it would function a lot better, it would be more accurate than if you had actually developed the equations manually. And this is true for the financial case. It's true for computer vision if you set up a set of rules for when an image depicts a person or a car or a bus. If you do that in a classical way, or you do that with a neural network, you find that neural networks are able to find that equation and produce a much higher accuracy than a human could.
Kenneth Wenger: The downside, though, which is very important, is that because the neural network finds this equation on its own and we don't really understand, we can't understand very well what that equation looks like. We don't understand its limitation. And that can have real repercussions, especially on the same example I gave you with the financial case, because then I don't know, for example, when it fails, when it gets it wrong, how is it getting it wrong, and what demographic of a population, for example, is it affecting the most? And then we get into bias questions and so on.
Cris Sheridan: Yeah, and that's towards the latter part of your book, which I definitely want to get into. But there is one thing that you said that I want to key in on before we get there. And that's where you said that we really don't even have the ability to understand the equation that is being used or the set of equations that are being used by these machine learning models. And you talk about how I mean, they're essentially working in multidimensional hyperspace. That part of your book was, for me, probably the most profound.
Cris Sheridan: Do you mind explaining a little bit of that part of your book?
Kenneth Wenger: So the idea with vectors in hyperspaces of multiple dimensions, the idea of that is if you think of a neural network, let's use a simple example that it's trying to classify images of cats and dogs, right? So you have a number of pictures of cats and dogs, and you want to present them to a neural network, and you want it to tell you, well, this is a picture of a dog and this one is of a cat. The way it does that is that you can think of these images as vectors in a multidimensional space. And the way you do that is you imagine a picture of a dog, and let's say the picture has 100 by 100 pixels. So the dimension of the image, right, it's a square, so it's 100 pixels width, 100 pixels height.
Kenneth Wenger: So in that case, you can consider that image a vector of 10,000 dimensions, where each dimension is essentially the value of the pixel. If you think of an image as a vector, and it's a vector of 10,000 dimensions, what that actually means is that you can imagine that image represents a point. So a single point in a universe that is 10,000 dimensions instead of three dimensions like ours, okay? And so if you have 100 images of cats, let's say, and you treat them as vectors, then you would imagine that each one of them is a point in this universe that is 10,000 dimensions. Now, if you have an image of a dog and you have 100 images of dogs, for example, the same thing, you would imagine that each image is a point in this multidimensional universe.
Kenneth Wenger: The way we try to separate and create a model that classifies between dogs and cats because they are vectors, we understand that, first of all, a vector has a direction. And so images of cats would typically be pointing in somewhat a similar direction because they are similar, because they're cats. And if you have images of dogs, they would also be pointing in similar dimension to each other but separate or away from cats. And so the idea of a model is we want to find the hyperplane essentially we want to find the line. You can think of it as a line that separates the cats and the dot vectors and once you have that line then that acts as a classifier because then next time you get an image of let's say a cat you don't know whether it's a cat or a dog.
Kenneth Wenger: You just get an image. All you have to do is figure out what side of the line it falls on and then you know whether it's a cat or a dog. So that's why it helps to think about these things geometrically. Now what a neural network does is because a vector of 10,000 dimensions is just so large and it is very difficult sometimes to separate images of cats and dogs in that many dimensions. What a neural network does is it tries to compress those images into smaller vectors.
Kenneth Wenger: So it tries to find what is the hyperspace where those vectors are separable. In other words, if we start compressing them and getting rid of pixels that we don't need in the image can we get to a certain compression of the image which maybe it's 100 dimensions instead of 10,000 or 50 dimensions where in that hyperspace, in that new universe essentially those points are actually separable. And now the more complicated you make the problem. So like now if instead of just cats and dogs you have ten different classes or 1000 different classes of objects you want to separate, you can imagine that things become more complex. And finding that line that separates between each of these separate classes is complicated because there could be objects that kind of look similar to each other.
Kenneth Wenger: So it's difficult to know where is the line going to be drawn. Like literally so that we have all the images of, let's say bananas on one side and all the images of lemons on the other without having any ambiguity in between. And that's why it gets very complicated. That's essentially an explanation of why sometimes models get it wrong. The reason we can't understand their prediction is a little bit different.
Kenneth Wenger: The reason why we call these things black boxes is because there are so many parameters in these models. Essentially trainable parameters are like the connections of the neural network. They are the things that we change and modify when we're training a model, when there's so many. So there could be millions or billions of parameters in state of the art models. The answer to why it gave you a given prediction it's not because of a few states of a certain number of parameters.
Kenneth Wenger: It's because of the data it was trained with and it was because of all of the data it was trained with and it was because of the training process. So it's not like there is no set of rules that you can say, well, the reason I got this answer is because of rule A or B. The explanation for why you got a certain answer is because essentially of the data it was trained with and the way it was trained. And so it's very difficult to kind of get to a specific answer of why a mistake happened. And that's an active field of research that we're working on right now I.
Cris Sheridan: Want to talk about because as you're discussing how these neural networks function, a core part of this is the convolutional neural network, which you also discuss how these work. And you touched upon some of that just now. Can you tell us about how that works in terms of understanding an image? And the reason I think it's important to land on this for a little while is because you do discuss how this relates to the way our own vision systems work and how this relates to some of the emergent properties of complex systems, which I think is a fascinating topic. So if you would mind, tell us about the importance of convolutional networks and how they function as well.
Kenneth Wenger: So the way convolutional well, the way neural network networks work in general is that you can think of them as layers. And so a modern computer vision neural network, convolutional neural network could be 100 layers, let's say. And so the idea is that you can think of each layer as extracting features from an image. So if you present an image to a model, what you want to do is you want to be able to figure out, well, what are the features in this image that are predictive of a certain quality that you want to predict for. So, for example, going back to detecting cats in an image, you want to figure out, what are all the features of cats that I could match in this image so that it tells me that it's a cat.
Kenneth Wenger: And if you think of an image, typically a cat is a portion of the image, right? If you think a cat standing in some field, well, most of the image will be covered by the field, and there may be trees and maybe there's the sky and all these things, and they're not predictive of a cat. So what the model has to do during training is it has to learn to pick out features that belong to the cat and features that are part of the background, for example. And so the idea with these different layers is that layers that are early, so layers that are closer to the input, they tend to pick up and learn through the training process. And we can get into that if you want.
Kenneth Wenger: They learn to extract features that are high level. We call them high level because there are things like edges, edge information. So they become sensitive to things like the edge between an object and the background. They pick up things like horizontal edges, circular shapes, diagonal edges, and those things. The deeper you go into the neural network, the more complexity builds.
Kenneth Wenger: So, as I said earlier, if you think as you're going through the neural network, you're compressing the image, you're trying to find that smaller version of the image that actually contains the features that you want. As you go deeper, as you're compressing the image or the information in the image, it means you're losing dimensions, right? Instead of like a 10,000 vector, you have, let's say, in the end you have 50 dimension vector. But as you're losing information in the size of the vector, you're getting information in the depth of each bit of information. So whereas early in the input where you had 10,000 elements in your vector, each element just had basically the color of the pixel.
Kenneth Wenger: When you get to the point where you have a compressed image that's 50 elements in your vector, each element actually has a lot more information than just a single pixel. It contains information about things like the texture of an object, things like faces of animals and wheels of a car, for example. More complex concepts are starting to form inside the deeper levels of the model. And when we look at how our own visual systems work, it is very similar in the sense that early layers closer to the eye, they tend to be very sensitive to edge information as well. But the deeper we go back into the visual cortex in our brain, the more complexity forms into the information that's captured.
Kenneth Wenger: And so again, you get neurons that are more responsive to things like shapes of faces or textures of objects and so on. And that happened, by the way, in the case of the artificial neural network that just happened, it was an emerging property. We did not design the neural network to do that. The fact that deeper layers have more complex concepts, it's something that emerged out of the training process, and we did not actually program that in, which is surprising and very interesting.
Cris Sheridan: Right? So it wasn't like researchers were looking at the anatomy or the functioning of human vision and then saying, let's replicate that in a neural network. It was just the opposite. It was just through the process of tinkering and experimentation with neural networks that we ended up seeing, oh, hey, this is actually very similar to the way human vision system works, right?
Kenneth Wenger: It was not done on purpose. So basically, we had the neural network. We designed a neural network. We trained it. It started to function well, meaning that it was predicting the correct classes of objects to high degree of accuracy, but we still didn't know how it was doing that.
Kenneth Wenger: Right. This came later, as you said, when we started to investigate and prod and take a look at what was actually happening inside of these layers. That's when we realized, okay, so this is what's happening. Each layer is learning these different concepts. And when you look at the distribution of concepts throughout the layers, it turns out that, hey, this is similar to how mammals do it as well.
Kenneth Wenger: Which I guess it was surprising.
Cris Sheridan: Yeah. So basically, just to clarify, to make sure that we are characterizing this correctly, the machine learned itself in terms of optimizing the process it was using in order to make its output as accurate as possible to the training data. That through that process of machine learning, it landed upon a process that is very similar to the way human vision systems work. So there was some human experimentation and tinkering, but the machine itself was also converging upon that same process as well. Is that the right way to think about it?
Kenneth Wenger: Yes, that's exactly right. Our tinkering was with respect to training the model, essentially. Training basically you can think of it as an optimization process where you say, well, what we want the model to do is we know that this is an image of a cat. Okay? This is during training, right.
Kenneth Wenger: We have a data set of images of cats. So they're all labeled, meaning that each one of them has a label that says this is a cat and these other ones are dogs. And so we present the images to the model initially. The model has all its parameters at randomized, so it'll give you any prediction it will be a mistake. The predictions are typically wrong initially.
Kenneth Wenger: And so what happens is you want to figure out, okay, so if it tells you that the image is a dog when it should be a cat, what you want to do is you want to figure out how do I modify the parameters such that next time it'll tell me that this is a cat. And you do this many, many times, but the process of updating the parameters so that next time it gives you an answer that's better has nothing to do. There is no place there where we are inputting the way we want these features to be discovered. We're not telling it, look, if you start discovering edge information at the beginning and texture information later, you'll have more success. All we're doing is we're looking at the output, we're looking at what the output should be and then we are through a process of it's called gradient descent through back propagation.
Kenneth Wenger: Basically, we're doing a lot of calculus and then we figure out, okay, so how do we modify these parameters so that the answer is closer to the truth. And then once you go through that many, many times and the answer starts getting better and better and better, you then realize, okay, so this is what it's discovering. It's discovering that to get the right answer, it has to organize the information in this way and the way it just happens to be the same as what we do.
Cris Sheridan: Yeah, that's so fascinating. So at a philosophical level, something that I wrote down as I was reading your book is that basically what these neural networks are doing is they're building a model of reality from the bottom up. They're taking what is continuous analog information very complex converting that into numerical information basically breaking it up into discrete computational pieces. They extract key features. Like you said, if it's an image, it'll be the edge information or it may be certain features having to do with geometry, et cetera, color.
Cris Sheridan: And then it reconstructs that through this multilayer process of getting landing more and more towards whatever it is that it's trying to predict or whatever the output is that it's trying to converge to focusing more on that to the point where it then constructs it back together but in a much more condensed form compared to the original input. Obviously, I'm not an artificial intelligence expert. So would you say that that is a good way to think about it?
Kenneth Wenger: Yes, it is. You can even simplify that more and just say that you have an input that's high dimension. In the case of an image, it's all the pixels in the image. In the case of a patient or a person or it's all of the features that describe that person which would be, for example, the age of the person, if that's relevant. Okay.
Kenneth Wenger: We have to be very careful when it comes to people. We have to make sure that the features that we select don't lead to bias and so on. But I'm just giving you an example here. So features of describing a person would be things like an age, their income, where they live and so on, depending on what is the purpose of what you're trying to find out. But initially the input would be very large.
Kenneth Wenger: You would have many dimensions, right? As much as you can possibly gather to describe your person or the image. And then as you go through the neural network the neural network is just compressing that input basically trying to figure out what it can get rid of and what is the fundamental set of features that describe that input. In the case of a person you may decide well, the age is not important to predict this thing. So it'll get rid of the age feature in the compression process and then maybe it'll land on just three features that it needs to predict the outcome for your problem.
Kenneth Wenger: But that's a way to think about it is that you start with a high dimensional input and then the only thing the neural network is doing is trying to figure out how do I compress this? To some we call it latent state, some smaller vector that better describes that input.
Cris Sheridan: Okay. So convolutional neural networks is really just the process of extracting key features from an image. These are primarily used for vision processing, as you explained. I mean, this is it just. Happens that this is very similar to the way human vision works through the various layers from the retina all the way down to the vision processing center of the human brain in extracting out features and then using that to model whatever it is that is being looked at.
Cris Sheridan: That's very fascinating. You talk about emergent properties in these complex systems and how we see that both with natural biological systems and now also with artificial systems. Machine learning converging upon what appears to be the most efficient means of processing information when it comes to because you talk about at one point with convolutional networks. I mean, if you break an image into a series of pixels and you can assign every pixel a number and then that can be computed upon in this multidimensional space to try to figure out what it is that's being looked at and then compressed into a much easier form to work with. What about in language?
Cris Sheridan: Right? So in that case, is a neural network literally just taking every letter and converting that into a number just like you discussed in your book with convolutional neural networks and images? Or is there something else happening?
Kenneth Wenger: Yeah, great question. So at a very high level, that's exactly what it is. So neural networks of any kind, they're still mathematical concepts. As I said, they're really very much just trying to find a function or an equation that models your problem as accurately as possible. And so because of that, they need to deal with numbers.
Kenneth Wenger: And so the way that that's done is that we need to figure out a way to convert your data into numbers first. And there are many different strategies for doing that. Typically it's either done at either the word or subword token level, that's what we call it usually where it's not really at a letter level, but it's more like either a word or a subword. So basically a whole word gets changed to a number or a certain subpart of a word that's often used would get converted to a number. And those are called tokens.
Kenneth Wenger: Typically the tool to convert the text into numbers is called the tokenizer, which takes your text as input and then it gives you essentially a whole array of numbers that you can then use to process.
Cris Sheridan: So in that case, let's just go with the example of chat GPT, open AI's chat GPT, right. So you're entering in all this text a prompt. What it's doing? Is it's converting each of those words into a numerical input? And then is it just extracting?
Cris Sheridan: Kind of like with the convolutional neural network example that we talked about with the image extracting key features of whatever it is you're saying and then through this multilayer process converging on an output which it's been trained to do over millions and millions of times to provide an answer that is the best fit for whatever you asked. It is that basically the way that we should think about it, essentially.
Kenneth Wenger: Yes. And the way it works is that these models, so the large language models, what they're essentially doing is they're trying to predict the next best word. So in the case of the images that we were talking about, the task was predicting some label for an image. So cat or dog here. With language models like Chat GPT, what they're essentially trying to do is they're trying to figure out, well, given the text that you provided as input, what is the best next word to follow that sentence?
Kenneth Wenger: And the way they do that is that as they're being trained through large, very large data sets of text, they essentially learn what are called probability distributions for each word in text. So they learn to internalize this information as distributions. So, for example, for any word, it figures out, well, what is the probability that this word will fit in this sentence, given the context of the sentence? And then you pick the word with the highest probability. And most of the time you're right, that's why they work.
Kenneth Wenger: But sometimes, because it's still a probability, sometimes it's wrong. And that's why sometimes you get like really funny text.
Cris Sheridan: Yeah. Or hallucinations. Right? Yeah, exactly. Which is still a problem.
Cris Sheridan: That's how it's supposed to function. Right. It's finding the most probable word to come after, and that's the way it's been described. But it seems like there's a lot more happening than just that because one of the examples I give is where I enter into Chachi PT. I say, here's a quote, just something that I thought of, please rewrite it to make it more profound.
Cris Sheridan: And then it spits back at me something that is actually much more advanced or is changing the construction of the way in which you understand the concepts at a much higher level. So in that case, for me, it's hard for me to see, oh, this is just a mathematical calculation and it's predicting the most probable word to come next, that there's something happening that's simulating intelligence and understanding. What would you say about that?
Kenneth Wenger: There's a couple of things to understand. So with large language models, the way they typically work is that there's one process that's called pretraining. This is where you train them on a large data set, and all they're training to do in that step is just to generate the next best word in a sequence. And this is not task oriented. Once you're pretraining a model, once a model is just pretrained, you can't ask it to do tasks as you said.
Kenneth Wenger: You can't say, well, summarize this text, or ask it questions and then expect an answer. They're not ready for that yet for an actual, what we call a downstream task. At that point, they're just being essentially trained to understand language and structure and to learn really what makes a word fit well in a certain sequence. Once you have a model that's pretrained, which means that now it understands language to a certain extent and it has a good set of probability distributions for each word, where now it is able to use that information to then generate text that is coherent. Then the next step is where you do what we call fine tuning.
Kenneth Wenger: And this is where now we train either a subset of the model or an extra few layers added to the model for a specific downstream task, for a specific task, whether it's summarized text or answering questions or finding identity, finding the subject of a conversation and so on. But even the fine tuning part, what that does is you're still giving it a set of contextual information. That's the window. So you give it your prompt and then it takes that prompt and then it learns to give you a result that fits into the task. So if the result is summarized a given input, then it'll take your input and it'll produce text.
Kenneth Wenger: That essentially would be a summary of the input. But the way it generates text, it's still a word at a time and it is still based on whether the next word fits well within the context of the previous words. Does that make sense? There is no real deeper kind of philosophical understanding there from the model. All it's doing is it is trying to figure out what is the next best word to output, given what it's already output, and given what the task is trying to do, whether it's summarizing an input or finding the subject in a sentence and so on.
Cris Sheridan: And I fully understand when Chat GPT or Google Bard or any of the other large language models when they hallucinate or make inferences that aren't true. If there's information that's not factual that they're providing, I certainly lower my expectations. That the thing on the other side that I'm dealing with is simulating understanding or has any inherent understanding. But then when it does provide novelty or even more advanced than what I was giving it, there's that part of me that's like, AHA, there's something strange going on here. And I know a lot of your book is devoted towards explaining these processes and how they work and the math behind it and that there is no magic.
Cris Sheridan: But every once in a while it does seem like, okay, there's something happening here. I don't know, it seems magical.
Kenneth Wenger: Yeah, well, I would say this. Look, the process of getting you the right answer and the process of hallucination in large language models are actually the exact same thing. And the reason why they hallucinate in this case when they're talking about nonsensical or nonfunctional things, it's because these are actually language models. They're not world models. So they're not fact models, they are language models.
Kenneth Wenger: So essentially what that means is that they are giving you a piece, they're trained specifically to give you text that's coherent, that is their measure of success is this text coherent. And the reason that's a metric of success is because specifically because all they're doing is giving you the next best word in a sequence. And it's only a best word in a sequence if it has been found often in similar sequences in its training data set. And that would only be the case if the word actually fits and people use it in that way. And therefore you would assume that it's coherent.
Kenneth Wenger: So you can expect these models, these advanced models, to actually produce coherent language. But what we can't do with our training process, at least the standard, the most common training process, of course all of this is still in research right now and the state of the art can change over time. But right now the reason that you get these hallucinations is because the primary goal of the models are not to give you a fact or to tell you exactly what be the most valuable to you, given your question. All they're trying to do is they're trying to give you text that's coherent based on its history of data in the data set. That's why hallucinations happen.
Kenneth Wenger: And it is done by the same process that it follows when it gives you the right answer, or even profound sounding answers. The difference is that when you hear something that sounds profound, you tend to obviously value it more. But the process is exactly the same and the model has less insight or has the same amount of insight when it's giving you false information or when it's giving you great information.
Cris Sheridan: How much does that depend on the quality and the quantity of data used in the training set?
Kenneth Wenger: Oh, it depends greatly. It depends entirely on that. And as I said when I checked myself and I was saying, well, the only metric is the next best word. Well, there are other processes that are being developed and used. I don't know if you've heard of something like rlhf reinforcement learning with human feedback.
Kenneth Wenger: So that's a technique that's used now and it was recently introduced by OpenAI, I believe. And what that means is that basically in the training process you have humans actually step in and say for a number of samples, well, this is the right one, even though they're both coherent, this is what a human would say. And then essentially what you're doing there is you're training the model to now learn to differentiate not just between coherent and non coherent text, but also what is more human like in the response. So reinforcement learning with human feedback has a lot to do with how well a model performs as well. And that's part of the training process and you also need training data for that.
Kenneth Wenger: But it is still not perfect, as we know. We still haven't solved this problem and they still make mistakes. But all of these different strategies are trying to get us to a point where we minimize the number of mistakes that models make. But at the end of the day, I think part of the work we're doing is we're trying to develop platforms that go beyond just a training process and then trying to understand how models make mistakes, why they make mistakes and then develop monitors that at Runtime, we can start then figuring out well, based on the response and the quality of the response, what is the likelihood of a mistake here beyond just the training phase? I think that's what's going to be the future.
Kenneth Wenger: I think we're going to see more and more examples of frameworks trying to monitor the performance of these models at Runtime.
Cris Sheridan: I think what I'm getting at when it comes to the quality and the quantity of data is that these models are getting bigger and bigger. There's more and more data that are being fed into them. And as you discussed in your book, data is the most important commodity today. It's no coincidence. The largest companies in the world are these big technology firms.
Cris Sheridan: They're all specializing in how to process use data, and they're all developing artificial intelligence as a core part of their business model. Multi trillion dollar businesses. Now, whether or not we're talking about Microsoft, Meta, Google and the like, I mean, these are the largest companies in the world with trillion dollar or more market caps. So data is obviously extremely important as well as not just technology, but artificial intelligence increasingly going to play a major role in our world. Let's talk about some of the immediate versus the future concerns over AI, which you do discuss in your book.
Cris Sheridan: And you do start off talking about really focusing in on data and bias. So if you wouldn't mind, tell us about some of the immediate concerns over.
Kenneth Wenger: This in the book. When I make the distinction between immediate and future concerns is most of the time when you talk to people and they're concerned about AI, at least in my experience, when I talk to people, their fear is more along the lines of robots taking over and they really mean consciousness, right? Can AI become self aware, take over and then develop this plan to actually destroy us, but on a conscious level, right? Can they get to the stage where they'll develop their own agenda to destroy us? And so that's what I call the future problems of AI, because I think we can't dismiss that that will ever be a problem.
Kenneth Wenger: We have to consider what happens if we get to that point. But what I really try to emphasize in the book is that there are many more problems right now that we have to deal with before we get to that level, even with the current state of AI that we have with our current capabilities. And they stem from the fact that, as I said at the start of our conversation, that these are essentially really good systems for discovering equations. But unfortunately we are not very good at understanding what those equations look like and what they're essentially really doing. So going back to our example of if a person walks into a bank and you want to apply for a loan, a bank would have typically or classically, they would have a model which is essentially an equation that takes information about you and then it predicts whether you should get a loan, the size of the loan you should get and the risk involved in giving you that loan.
Kenneth Wenger: When you have an equation like that, I mean, it's not perfect. People are still victims often of the models we have today. But at least when you control the equation and you understand every single input you're giving the equation and why you're providing those inputs, we have to remember that those equations aren't developed by statisticians that you hope would understand why they develop that equation and why they're providing those inputs. They can tell each input that you give to the equation whether they provide your age or your living situation, credit score and so on. They know what each of these components how much weight each of these components has to the final prediction of whether you should get a loan or not.
Kenneth Wenger: The problem with neural networks is that because we don't really understand the equation they're discovering, it isn't always clear which of the input features that you provide. It's actually emphasizing more in its prediction. Right. So again, let's say that we are not very careful, which often happens, unfortunately. And we generate data sets that has a lot of information.
Kenneth Wenger: Like it has information about where you live, your name, your income, credit score and so on. But for some reason, let's say that the neural network starts to pick up your name as an actual predictor for whether you should get a loan. And this could happen because let's say for some reason the data set that you compiled happens to have a certain number of names that is overrepresented. And it just happens that these people with these names have great credit in this data set and then you have another set of people with somehow similar names but different from the ones with great credit score and these ones don't actually fare that well when it comes to loans. And so now what happens is you have well, your model telling you who you should give loans to and who you shouldn't, based on something that, if you think about it statistically, on a large data set.
Kenneth Wenger: If the data set was unbiased and it was balanced so that you had the same number of samples from each demographic, you would find that the name is not a good predictor of credit score or of how well you would do with a loan. But in the particular data set you had, it learned that you can just look at the name to predict the right score. That would be a problem if you now took your neural network into production and start using that to decide who you give loans to or not. That's the problem with a neural network you can't tell exactly what it's picking up or it's very difficult to tell. Whereas with an equation that you develop you would know exactly what you're looking for in each input.
Cris Sheridan: So that's an example of like for credit loan process. You also detailed that we see neural networks algorithms being used for criminal justice, for advertising for all sorts of different areas, each of which have their own consequences given the limitations of neural networks. So would you mind touching upon some of the other areas in which you see immediate concerns?
Kenneth Wenger: Sure. With advertising it's a little bit different. With advertising, the concern and this is true for advertising, it's true for any system that does. They're called recommender systems where essentially the objectives of the system of the neural network in this case for example, is to figure out how to keep you on a particular platform for longer. Right.
Kenneth Wenger: Whether it's watching more videos in a particular platform, reading more news articles or whatever. They just want to basically monopolize your attention.
Cris Sheridan: Yeah, or purchasing something too.
Kenneth Wenger: Right, exactly. Or purchasing something as well. So the problem that I see with system using these use cases is that neural networks are basically optimizers. Great. Optimizers are trying to figure out how to change something, how to change an environment, whether it's themselves in the case of training, so that they get to the output that they want.
Kenneth Wenger: In the case of advertising or the recommender systems that I mentioned, you actually become the variable. So the user becomes the variable and then the game becomes what do I have to change in the user? What do I have to do to get them to do what I want to do? And it turns out that these algorithms are extremely good at optimizing and finding strategies for reaching their goals. They're not always obvious.
Kenneth Wenger: And so what could happen is you end up with this very complicated social dynamics where you end up further isolating people, you create more eco chambers where you're only exposed to the things that you want to hear and less interested in any diverging opinions. And that's what I'm really worried about is that these algorithms are going to make the problem of the kind of compartmentalized and segregated society that we have today because of social media. It's going to make that worse.
Cris Sheridan: Yeah, that is something that we are experiencing and we're seeing, I can say anecdotally even just with Twitter, for example. I follow a lot of different people on it, especially just for the podcast, various guests, research, staying up to date. And I've noticed that over time I'm seeing less and less of the people I follow their content and their tweets and more of sponsored content or things that aren't maybe not even be sponsored, but of other things that I don't want to see. But I can understand how they're very appealing to watch and that people get sucked into it really easily, even though it's content that is not at all related to the people I follow or being generated by them. And that's clearly whether or not that's twitter, if that's Facebook, each of these big tech companies, they have, like you say, these platforms where they're going to generate content or provide that hook to keep you there as long as possible.
Cris Sheridan: It's very easy to get sucked into that and all of a sudden realize you've lost hours of your day watching stupid videos about whatever it is. It's not very productive. And of course, like you say, the larger concerns of widening social divisions and leading to those other areas which are all real and consequential as well. You do have in your book one really interesting scenario and I'm surprised I never heard about this, but this was basically something that you would expect to hear right out of The Minority Report. Can you tell us about what we saw in 2013 with a man named Robert McDaniel and what he experienced?
Kenneth Wenger: Right, so this was a case where the police essentially came to Robert McDaniel and told him that an algorithm had predicted that he would be involved in a shooting and the police didn't know whether he would be the shooter or he would be shot. That's not part of what the algorithm predicted, but it predicted he would be involved in a shooting and so therefore they were keeping an eye on him and it turns out that he eventually did get shot twice. Thankfully he did survive, but as you said, it is straight out of a Minority Report kind of movie. And it's been hypothesized that if you live in a neighborhood that is not great, where there may be gang violence and so on, if you start getting attention from the police, you may start getting also attention from people who are not as agreeable to the police and they may start worrying about whether you are an informant of some sort. And so it turns out that he did end up getting shot in the end, but it is hard to know whether he got shot because why he got shot.
Kenneth Wenger: Essentially it could have very well been that he got shot because of the attention he started getting from police where he had nothing to do with anything. And so this is part of the problems when we start involving algorithms irresponsibly in the way I think this was carried out.
Cris Sheridan: Yeah, right. Like you write, it could have been a self fulfilling process where the algorithm was saying there's a high likelihood of this person being involved in some type of gunfire shooting and then the police go to inform him of this. And at the very same time, because of the increased police presence with Robert McDaniel. It just so happens that that draws the attention, more attention upon him and ends up leading to more suspicion by his neighbors or those who could have been involved in crimes. So like you say, I mean, it's a very fascinating situation and scenario.
Cris Sheridan: And again, this was ten years ago, so we've seen the use of neural networks of AI explode massively since then. So it's even hard to tell exactly how far things have come in their use. I do want to quote something. I think that this is a very important part of your book. You say, with a statistical model, there is no redemption, there is just history.
Cris Sheridan: If we lose the hope for redemption in the judicial process, however flawed it may already be, it will be a step in the wrong direction. Do you mind explaining that for us?
Kenneth Wenger: Sure. So that was with respect to whether we should use algorithms like neural networks in the decision making process of judicial system. So basically, should we use a model to determine whether somebody should, an accused should be granted parole, whether they should be granted any sort of privilege or any sort of rights during the judicial process? The concern that we have to consider here is that when you're standing in front of the judge and this is the point that I make in the book when you're standing in front of the judge and the judge is human, and they have all kinds of biases as well. But at least what you hope when you're standing in front of the judge and you're arguing your case, you're hoping that you can convince the judge that you're a little bit different from whatever biases that person may have.
Kenneth Wenger: You're trying to make your case. And you want to convince them that even if you've made mistakes in the past, even if you've made mistakes in this one instance that got you in front of the judge in the first place, you want to convince them that you're going to do better and you're going to turn your life around. And I understand that many people may actually use that process to game the system, and they may not in fact be trying to change their lives, but some people will. Some people will try to change their lives, and some people actually succeed in changing their lives. And so that's your one shot, is to talk to a judge and convince them that you will actually change.
Kenneth Wenger: If you replace a judge by an automated system, essentially, well, then it's impossible to change their mind. I'm being metaphorical here. It's impossible to change the outcome because the outcome, at least the way we've built this system thus far, the outcome is based on statistics, on a body of a distribution, essentially. How many people in your situation actually changed their lives? And if overwhelmingly most did not, then you're not going to fare well regardless of what your actual intentions are.
Kenneth Wenger: And that's the problem that I see here. I think if we move towards that kind of system where we're going through the overall distribution so if 90% of the people did not turn their lives around, you're not going to turn your life around. I think that would be a worse system. And that's exactly what will happen if we completely automate decision making processes using neural networks the way they're being done today.
Cris Sheridan: Because again, these AIS, they're not looking at you as you, they're looking at you as a statistical probability. And so if you're part of this class, even if you are that 25% versus the 75% or smaller, it doesn't matter. It doesn't matter if you are going to be the exception. You're still part of that class. And so that's the issue is you can hopefully see some sort of redemption if you are standing before a human judge who is able to look at you as you and hear whatever it is that you're saying and pick up on the earnestness of your voice.
Cris Sheridan: For example, to pick up on all of the perhaps the subconscious cues that they may have developed themselves over time to specialize their own discernment in whether or not a person is more likely to go back to commit a crime or if they're not. I mean, there's their own training and ways that they develop that but you don't necessarily have that with a machine. So I guess that's kind of the issue.
Kenneth Wenger: Well, to be clear, we don't have that with a single model in the way they've been employed in the past. As I say in the book, this is not just a hypothetical example. This was used in the judicial system. The point is, I cannot say that it is impossible in principle to automate decision making, but it is very difficult in a way that will produce a fair system and we don't have a good way today. So that's my point, is that if we're going to start deploying artificial neural networks or any sort of algorithm to automate decisions in very, very important life altering processes like the judicial system, then we need to do the best we can to make sure that these systems are better and not worse than what we already have.
Kenneth Wenger: And we need to make sure that they deal with individuals and they enable individuals to be able to redeem themselves regardless of what they've done in the past. That's my point. And there may be ways to do that, but that's not what has been done in the past and that's not how I see this field currently developing. And so I just want to bring awareness to that and that's why I wrote it in the book.
Cris Sheridan: Yeah. And I don't think that in order to correct that limitation of neural networks and the inability for them to see people as an individual and the danger that. You say of there being a lack of redemption that a human judge perhaps may be willing to offer to someone that that means that society is going to say, oh, we're not going to use them. It seems that the way we're moving is just there's going to be more and more data, right? So instead it just means, well, we need to have a camera and we need to be collecting video footage of this person as they're giving their testimony, for example, or whatever it is.
Cris Sheridan: And we need to analyze that. And we see that even now with the hiring process, right? I'm sure you've seen examples of this where during the hiring process, they're analyzing all of the cues that you're giving on the video and that can all be modeled and given an output of saying, hey, this know, meets certain criteria. When you ask these questions, here was their answer. This is a level of anxiety, confidence, all of these things dealing with what human people would normally judge for the hiring process.
Cris Sheridan: AI is now being used for this. So it seems that we're just going to be moving more data, bigger models, right?
Kenneth Wenger: Which, as we've been talking about, has its problems because more data doesn't mean more information necessarily or better information, because what you want is you want every possible outcome that you want to watch for be represented in the data. And if the data has many examples but of the same sort of types of outcomes, then it's not really helpful. So like going back to the redemption example, so let's say that the data set that we use to train a model has thousands and thousands of individuals. But in every single case in that data set, it just happens to be that those individuals don't actually redeem themselves. Well, how is the model going to learn in that case that does actually happen?
Kenneth Wenger: And how is it going to learn to pick up clues, if that's possible? How is it going to learn to pick up clues of when to know whether somebody might redeem themselves or not? That's a problem. I think if you want to have a system that's fair, then you need to think about these problems and you need to think about, well, so how am I going to construct a data set so that it equally represent every possible scenario, every possible circumstance that I want to model for? Because otherwise then it is biased and it will affect people, but it will affect people in a way that you can't escape.
Kenneth Wenger: Again, even humans, even the most terrible person has a good day and you can convince them. But with a model that is biased, you're not going to change that. It is going to stay like that unless you retrain it.
Cris Sheridan: And I think the big trade off that we see that's taking place in society here when we're centering around criminal justice or even advertising, for example, is weighing the risks of the output and the impacts that this is having either on an individual or us collectively versus the cost. Right. And when I'm saying the cost, I mean in terms of efficiency and lowering the cost. Right. So if you have well, let me weigh the cost of a human judge versus using an artificial intelligence system which I don't have to pay at a yearly amount.
Cris Sheridan: Well, it seems that society is shifting ever more and more towards I'm willing to take the risk that the AI system is not giving me exactly the same level of output or decision making as a human because it can do so at a much cheaper cost.
Kenneth Wenger: I would say that that may very well be what happens. My hope is that by understanding how these systems work that we are better equipped to make better decisions and that we actually the one thing that I hope that people get from it is that you have to think of cost a little bit differently. Cost is not just in money. So when you say the cost of a human judge versus an algorithm, I would say you have to factor in the cost of monetary cost, but also the cost of the mistakes cost to people's lives. Because who knows, you might end up being in front of an algorithm that's making decisions about your life.
Kenneth Wenger: Are you willing to pay that cost?
Cris Sheridan: Yeah. Right. And hopefully we don't see too many episodes of Minority Report like incidences take place as you talk about near the end of your book, where we see this technology go awry and in, I would say, very fascinating, somewhat scary ways. So there are real world consequences, of course, to what we discussed. And you feature a lot of that and I know you shared a number of that with our audience today.
Cris Sheridan: So I appreciate you coming on. Ken again, the title of the book is is the Algorithm Plotting Against US? A layperson's guide to the concepts, math and pitfalls of artificial intelligence. And again, you can follow more of his work[at]workingfires[dot]org or if you want to just understand, I would say probably the most important technological development of our lifetimes currently with the development of neural networks and how they're impacting the world in a variety of ways, then this is certainly a must read book. So you will have a good understanding of this important development through his book.