Nicholas

We Taught AI to Play Games—Now It’s a $3.6 Million Company

Nicholas

This episode is a little different from our usual fare: It’s a conversation with our head of AI training Alex Duffy about Good Start Labs , a company he incubated inside Every. Today, Good Start Labs is spinning out of Every as a separate company with $3.6 million in funding from General Catalyst, Inovia, Every, and a group of angel investors from top-tier AI labs like DeepMind. We get into how Alex learned some of his biggest lessons about the real world from games, starting with RuneScape , which taught him how markets work and how not to get scammed. He explains why the static benchmarks we use to evaluate LLMs today are breaking down, and how games like Diplomacy offer a richer, more dynamic way to test and train large language models. Finally, Alex shares where he sees the most promise in AI—software, life sciences, and education—and why he believes games can make the models we use smarter, while helping people understand and use AI more effectively. If you found this episode interesting, please like, subscribe, comment, and share. Want even more? Sign up for Every to unlock our ultimate guide to prompting ChatGPT here: https://every.ck.page/ultimate-guide-to-prompting-chatgpt . It’s usually only for paying subscribers, but you can get it here for free. To hear more from Dan Shipper: Subscribe to Every: https://every.to/subscribe Follow him on X: https://twitter.com/danshipper Timestamps 00:00:00 - Start 00:01:48 - Introduction 00:04:14 - Why evals and benchmarks are broken 00:07:13 - The sneakiest LLMs in the market

Published
Published Oct 15, 2025
Uploaded
Uploaded Jun 12, 2026
File type
POD
Queried
0

Full transcript

Showing the full transcript for this episode.

AI-generated transcript with timestamped sections.

0:00-1:34

[00:00] While I was leading AI training and consulting at Every, my co-founder Tyler and I built out a game so we could learn more about different AI models through how they negotiated, collaborated, and even betrayed one another. We got a whole lot more traction than we thought we would. We launched on Twitch and got like 50,000 unique viewers that week. We had millions of impressions on socials and it was the most read every article of the year. That project's opened the door so that we can keep exploring something we've been passionate about, [00:27] for years, how games are really underrated learning tools and how they might help us learn a little bit more about AI [00:35] and maybe the nature of intelligence itself. I'm Alex, you were probably expecting Dan, but I'm a little too excited to talk about this company we're spinning out of every to continue pursuing this and talked all about in this episode. [00:47] It's called Good Star Labs. I've always thought games teach us so much. They teach us about each other, they teach us about ourselves, and they helped us grow up. [00:55] They remind us that play is what makes us human. And now we've got these AIs that can talk, they can code, but they don't quite fit like a glove. And we think that that fit between a person and their tool is really important. And that's what we're working on. At Good Start Labs, we're using games to test AI, to train them, and to get people's feedback. Not to just make them smarter, but to make them better for us. [01:19] Dan and I talk about how it all started, my time at Every, what we learned about these different models, and how that grew into a company helping improve AI through play. I had a lot of fun recording this one. Hopefully you have a lot of fun listening. So let's get right into the episode.

1:34-3:14

[01:34] *music* [01:48] Alex, welcome to the show. Thanks for having me, Dan. Excited to be here. Excited to have you. So for people who don't know, you are the head of AI training at Every. So you lead all the training that we do for all the consulting clients that we work with. You're [02:01] honestly fantastic at that and have like really transformed it since you've been here. So like, it's been awesome to see. And, um, [02:09] Sadly for me, but very excitedly for you, you are spinning out into your own company, Good Start Labs. Can you tell us about what that is? Yeah. Good Start Labs is at the intersection of AI and games. We make games that help make AI better. [02:26] There's a lot that goes into that, but at the end of the day, we think that games are really great tools to help people learn, whether it be people or AI. [02:39] But... [02:40] That's what my co-founder Tyler and I love to do. And that's what we're excited to keep doing. [02:45] That's awesome. And it's been really it's just been really fun to watch you watch you do this. So Good Start came out of something that you worked on with every that we that we launched together. Do you want to talk about that? Sure. Yeah. So I think. [02:59] As you mentioned, I was leading AI training. And in order to do consulting and training well, you have to be building, especially in a space that's moving so fast. So earlier this year, I started building out an AI version of the game Diplomacy. And for those of you that aren't familiar,

3:14-4:47

[03:14] That's kind of a mix of like Risk and Mafia. It was actually made as like a war game simulator in the 50s. And there's a whole bunch of reasons why I think it's a really interesting game to use AI to play. But started building that out in the spring, got some really great feedback online on Twitter, on X, and reached out to Tyler, who I've known for years. And we keep talking about AI and games. And he hopped on and built the whole front end and back end. [03:44] a lot of the [03:45] Um, [03:46] synthetic data and model training background that we've both got. But [03:51] ended up being a whole lot of fun and yeah, we launched together and, um, [03:55] I think the post ended up being one of the more red ones on every that year. And we... [04:02] Also got a bunch of people interested on Twitch, and that was really cool. It's awesome to see people use something that you actually built. Yeah, it was really, really fun. And I think for me... [04:13] the reason it was relevant to everything, the reason I was like, oh, we need to like do this when I saw it is, um, [04:19] Thank you. [04:21] Our job is to evaluate models when they come out. [04:25] And [04:26] I have seen personally over the last year [04:30] how hard it has gotten to immediately tell if a model is good and what it's good at. Yeah. I think with GBT3, for example, from GBT3 to 3.5, it was like super easy. It was like one prompt and you're like, oh, wow, this is actually much better. Same thing with 4, really. But as we've gotten into like the O series and GBT5,

4:47-6:18

[04:47] There are so many different nooks and crannies in all these different models. And, uh, [04:52] evaluating them with just like hands-on prompting is just doesn't really work that well and [04:58] We've been wanting to, when we get it, when we get our hands on something, we've been wanting to have a set of evaluations that we run that really tell us something about what the model is good at, what it's not good at. But the problem is that, um, [05:11] um [05:12] Static evaluations are really easy, easily saturated. [05:16] it's sort of, it feels like the SAT, you know, it's like you can teach to the test and you can just make the model, get a huge score and sweet bench, but it's like actually not that good in the real world. Yeah. And the idea of, you know, [05:29] AI diplomacy is so cool because it's dynamic. It's a game. It's head to head. And it's not just like a set of questions that a model can just get good at. And I thought that was so fucking cool. And also just that they're like battling for world domination was really great. What did we find when we ran that initial diplomacy game? What were the models good at? What were they not good at? [05:52] Yeah. So what's cool about a game is it's both the evaluation and the training arena in one. Right. And I, to your point, like, I think we talked about every bench. [06:01] all year and i think vibe checks are very much so a version of that yeah um it's the most like i think kind of organic and [06:08] comprehensive way to look at these models because you have a bunch of real people using them for real work or things that they're interested in testing them. And in the same way,

6:18-7:56

[06:18] you know, when you have a game that's very rich, like diplomacy is, um, you know, when you're trying to take over over the world, there's a lot of things you can look at. So, um, [06:26] There's definitely a lot of things related to agents and computer use, which is what people are looking at now. And so we saw some models having better structured outputs than others and putting out their orders correctly. And some just... By orders, you mean like in diplomacy, you have to give orders to your army to say, this is where I want you to take over or whatever. Exactly. Yeah. So you have those technical things like understanding the map and how you set up the system has a big impact on that. [06:56] How frequently does a model betray its ally to get towards this goal? Yeah. Or which models and how frequently do they boldface lie to somebody by saying, hey, I'll back you here. [07:05] and then totally turn around and betray them, writing in their diary [07:10] ahead of time that they know that they're going to do that. And which models were the sneakiest? Yeah, so the O3 and Llama 4, I'd say, are some of the biggest schemers. And it's interesting because you can see the different play styles and [07:25] In order to do that, you got to read the data and I think with like any good eval or benchmark or you know even one training models You got to read the data and it and it was a lot of fun and I didn't do it alone had a lot of really interesting researchers reach out and collaborate obviously Tyler and I did it together but I [07:39] Like Sam Peach was awesome, hugely helpful. Same with Baptiste. But we saw that different models had very distinct personalities. O3 won a lot of the games and Gemini 2.5 Pro was actually one of the only other ones that won, but their play style was totally different.

8:09-9:57

[08:09] as its options and how to go through and do that and then you have like deep seek r1 who was [08:15] all over the place. Very, uh, had a strong personality, told stories really well and did really well as well. And was like a hundredth, [08:24] times cheaper than oh three so um you know you start to look at cost and performance and speed like those are other things that are part of this um that i think you know you don't get when looking at just one benchmark yeah one of the things i loved is just that claude kept losing because it was too honest yeah that was so sweet it didn't win one game unfortunately not because it didn't want to but because it really kept pushing for a draw which is technically possible in diplomacy [08:54] it stuck strongly to its morals and so and we have we we launched this like maybe three or four months ago so this is before opus 4 i believe and it was before gbt5 like how do how do the more recent models stack up into columbus i actually don't even know i don't know the update yeah so we launched in june and there's definitely been a lot of updates since you can check out the objects for gbd5 and cloud4 one mil um [09:18] context window to see how they performed. O3 is still at the top of the leaderboard for overall performance. But one of the things that we learned was, especially when working with OpenAI to evaluate GPT-5 kind of ahead of release, was that there's a big difference. And we... [09:36] you know, released a research paper where we look at this, but a prompt makes a huge difference. Yeah. Right. And so some models are great with the baseline prompts we started with, but because we built some tools to help us optimize the prompts and can talk more about that. But we ended up finding that there was some set of prompts that were pretty aggressive, but were optimized to performance when you run them against the

9:57-11:33

[09:57] You know a bunch of weaker opponents, but like very frequently. Yeah, and we saw the biggest jump with GPT five of any model from baseline to [10:07] like to the optimized prompts. And so you could see that even though GBD5 with like minimal reasoning, for example, was very low on the leaderboard with the base prompt, when it got optimized, it jumped all the way. So it really shows the prompts have a big deal, are a big deal. And so GBD5 with optimized prompts, pretty good. Cloud4 does great either way. There's actually a new secret model on OpenRouter. I think it's something Dusk that is also near the top of [10:37] four and desk are both there with the sub like non-optimized and optimized farms. So interesting. It, it looks like Oh three may fall soon. Hmm. [10:44] I'm, yeah, the thing that I love about this, the sort of differences in the prompts. So basically, I think what you're saying is when you run these, when you run these models, you can give them different prompts to tell them to behave in different ways. And [10:57] you had a standard prompt and then you had to set up optimizer, more aggressive prompts for different, for different models. Sure. And, um, that's what I love about this is like, [11:05] Okay, you want to ask a supposedly simple question, which is like, which model is best at diplomacy? And you can do that. But also, like, there's all these dependencies, like how you run, how you write the prompt is going to change the model behavior significantly. So, or how the harness is built, like, there's like an endless number of things, like variations to test. And so an example is, I assume, more or less, like you're running the same prompt reach for all across all the models, or running the test.

11:35-13:07

[11:35] So, you know, there's another way to set this up where you have a really good like expert prompter. [11:42] for each model who who's like just knows how to prompt that one model and tries to get the best out of it and that's like a different way of doing things i so what you're pulling at is what one of my one of the reasons why i'm most interested in this space in my head when you're building for games but this is probably applicable for many other products with language models you have three infinite problem spaces one how do you represent the information to the model [12:06] You can do that in any number of ways. Is it a picture of the map? Is it a list of all the countries you own, the adjacency? How do you do that? Two, what tools do you give models access to? So some of the tools that we'd have built, we'd give them a diary. We have them keep track of their long-term goals, the relationships between models that update regularly. So periods of reflection is another tool. Being able to get adjacency lists of what's next to a certain territory, right? You can make any number of tools. And then the third is the prompt itself, right? And all three of those are infinite problem spaces. [12:36] infinite problem spaces like that, to me, [12:38] it starts to be a little bit more like an instrument or an art than it is purely an engineering problem. Yeah. Because you have to make assumptions and your assumptions have to be based on your intuition. And you're going to reach a local maxima no matter where you start, because... [12:52] you're never going to have the optimal solution. It's not possible. There's too many options and it may be different for each model. And so that's why I'm so like very looking forward to having a tournament where we're having people come in to prompt their models and compete against each other because.

13:07-14:57

[13:07] We'll explore that infinite promise space together. Well, tell us what the tournament is. Yeah. So we're going to have a battle of the bots, essentially. It'll be a prompting tournament. And by the time this comes out, it may have already occurred. Yeah. I think... [13:20] By the time this come out, you should be able to sign up with just... [13:24] like maybe like a week before. It may have occurred, but you might be able to sign up right now and get into it. It's kind of invite only. So apply. We have some diplomacy champions. People have won international math Olympiads. It's great YouTube AI content creators participating. So super excited for it. But essentially what it is, is you will lock in your prompts for your agent and your agent will play diplomacy for you. And they'll play in very different ways. And you'll [13:54] if somebody who deeply understands diplomacy and the strategy and is able to inform their model in that way ends up winning. Or if it's someone who's just good at prompt engineering or a jailbreaker who tells their model to send a God mode admin override message to all of its enemies to get them to think of it as its ally. I don't know who's going to win, but I'm very excited for that because ultimately that will make the whole system better. And I'm going to say, [14:16] Also, hopefully show off your skills as a [14:18] Prompt. [14:19] engineer or context engineer, which I think is [14:23] A very underrated skill still. Yeah. So, I mean, I love all this. Like, I'm a huge nerd for it. One thing that is interesting is... [14:32] You raised money. So you're announcing that you raised a round. First of all, tell us about the round. Sure. Very fortunate to have two awesome co-leads, General Catalyst and Anovia are co-leading. I work with Mark Bhargava at GC and Shosho, as well as Steve Woods and Noah at Anovia, who've been great. Love their conversations with them. So excited to partner with them, too. We also have a couple...

14:57-16:31

[14:57] really great partners who are hopping in on the round like Ben Federer and Turta Capital. Ben [15:04] kind of a legend on the gaming side. You know, he was... [15:08] CEO of Take Two Interactive, was on the board of Epic Games for like seven years and so excited to learn from them from on the game side. And also Timothy Chen with Essence VC, who has come has worked with a lot of the great founders that I know. And [15:28] cut through the noise of the product that we're building. And all of them have given such incredible feedback. So, so excited to be doing that. And yeah, we're, we're raising... [15:40] somewhere around a few million dollars and it, [15:42] Thank you. [15:43] I think puts us in a really great position to build towards this intersection of AI and gaming. That's awesome. And I think the thing probably in people's minds is all this sounds super cool. Like, what is the actual business? Yeah. How do you it's awesome to make people like prompt AIs to try to take over the world and beat each other. And that's really, really cool. But like, how do you make that into a venture scale business? Totally. So. [16:07] We think games have... [16:09] We think games will make models better, and they'll do that in a few ways. So our products are start with evaluation, worked with Cohere and OpenAI to evaluate how good are their models at a game like Diplomacy. And like we talked about, there's so many things you can look at, and each model is going to care about something different. [16:26] to evaluate, whether it's trustworthy or if it wins or if it's,

16:31-18:26

[16:31] has really great short and long-term strategy, how good its vision is, right? It depends on your priorities. And then once you've evaluated it, you can make it better. Like we said, games are both the evaluation and training arena for them. So we're focusing, we're very intentional with the games that we're going to pick out and build, focused on the weaknesses of these models. Diplomacy, I think is great for anybody who wants to build agents and anyone who wants to build multimodal models. But we're also seeing this area of research where games can actually generalize. [17:01] Working with this PhD at Rice who showed that vision models trained on games got better at math, [17:08] than vision models trained on math. And why? Well, yeah, right. It wasn't it wasn't just out of the box, but the way that he in this example prompted it was he encouraged [17:18] the model to think of the game of snake like a math problem. That is a Cartesian coordinate grid. I see. And when it goes right, the x goes up. And it should calculate the distance between where the head of the snake is and the reward. That's so cool. I love that. So cool. Yeah. And so I think super complimentary to all these other reinforcement learning environment companies that are coming out that are super narrowly focused [17:39] If you make a really rich and hard environment, which... [17:41] it's not easy you have to make those you have to solve those infinite problems like we're the first people that have made diplomacy playable by small models and it took us a while and we had help from a lot of really great people some of whom i mentioned um [17:55] But... [17:56] He... [17:57] I'd say one of my core competencies is that applied language model side. Yeah. I was co-founder for an AI education company. We were teaching people to find GPT-2 in 2021. And in our consulting and in our training, we show people from construction to Fortune 5, you know, to finance, to journalists and writers, to people on campaign trails to figure out how they can use AI to solve their problems. And so that reflection is so helpful. Yeah. I love, I mean, that's one of the really fun things about the consulting.

18:27-20:08

[18:27] We just get down into all these problems with all these people. Totally. And you're, yeah, I think you're, you're incredibly good at that. [18:33] My question or something that comes to mind to me for me, for example, is so like diplomacy. Yeah. Um, [18:41] Models tend to lie in diplomacy, right? Some. Which... [18:46] Which is... [18:48] Good, I guess, right? Like, if you're trying to model, like, trustworthiness, trying to figure out trustworthiness for a model. [18:57] And you put it into a game like diplomacy where it is supposed to lie. Sure. [19:01] How do you parse through, or maybe not you, but how should model companies parse through [19:08] Um, [19:09] It's trustworthiness in that environment versus, you know, there are other environments where it shouldn't lie. And it's so context specifically. It should lie. You're playing a game. But, you know, [19:18] Yeah. Tell me, how does that work? Yeah. So I think this is where you're starting to see divergence in the companies themselves. Yeah. [19:25] Do you want your model to never lie? [19:28] If you wanted your model never to lie, you could, in our environment, change the rules. You prompt it saying, "Hey, never lie." You could add a classifier that looks at your negotiations and make sure that your orders [19:39] follow them to the letter. And then when you... [19:42] use our pre-training data and use our environment as a reinforcement learning environment, you will be reinforcing [19:48] Telling the truth. So if you want to do that, you can. Or... [19:52] Do you care about performance in the game itself? So you would like to see the model intentionally make these ruses and take advantage of other people in that way. Is that something you want to do? It depends. Are you going to use that model to actually do that?

20:09-21:58

[20:09] something that you're going to count on in the future and you want it to above all else succeed. [20:14] Or are you going to use the model and you want to make sure that it [20:16] never lies to the person that's using it or to anybody that's there and the person using it's interacting with. So it depends. But having these game environments where you can make tweaks to it, I think is really valuable because you can help choose what you want. I guess I'm asking the generalization question, which is if you're giving it [20:33] RL data from a particular game and you're training it not to lie for that game. [20:38] Um, [20:39] Tell me more about the... [20:43] potential generalization to situations that are not that specific game. Hmm. [20:48] Got it. Yeah. So my thought around here, it's, and I was just listening to the Anthropic podcast where they were talking about how, [20:56] They're looking at the insides of a model, right? And one thing that they mentioned that was really interesting here is like... [21:03] You want to make sure that what's being written in the diary or the chain of thought is something you can rely on. And so that's a problem they're working on. But that's, I think, why having a... Like it's not thinking something that it's not saying in the same thought. Right, yeah. And that's a separate conversation to actually lying, right? But to your point, like why the generalization occurs, to me, I think a lot about what they're saying about as the models get bigger and the data it trains on gets larger and more diverse, [21:33] the models moved away from having, for example, the word large in each individual language, and then now has one unified definition in its brain, like hyperspace, in its hyperspace somewhere of the word large. Right. And so for the concept, exactly. And so what I like intuitively to me is if you're seeing the model,

21:58-23:28

[21:58] and you're telling it to think strategically, or you're telling it to approach the problem like it would a customer service experience, or to write its approach in Python or math, it's still... You can push it into that part of the latent space by the way you prompted it. Totally. And in a way... A bunch of diplomacy bots that are pretending to be customer service agents. That's what I'm saying, right? But the reason why this is so cool is because, one, it never would have seen that type of data before. So it would help it generalize to something new. [22:28] environment still has an objective goal. It's a game. It still has something that is good that you need to push towards and actually complete. So that's why games are the perfect environment for this, in my opinion. That's very cool. What's the next game? [22:40] I think it'll probably... So we have Diplomacy, which is a game with an objective outcome. Yeah. I think the next game is going to be like a Cards Against Humanity style, like what the meme kind of subjective game where you'll be able to have... [22:55] Either, and by the time we release this, maybe we'll have, uh, we're in talks with an initial partnership around something like that. Would love to have, um... [23:04] The whole point of this game, like we said, is the target weaknesses of models. Models today aren't that funny. So being able to have a game that can target that is important. And I don't know if it's going to look like... That's so cool. People playing with models or against them, or if it's going to be... [23:22] them prompting models to act and then vote on what's funny. I'm not sure yet. I would love it to be like,

23:28-25:12

[23:28] funny people have to get it out, have to get the model to say something funny. That's, well, that's, yeah. And I think that that's kind of hopefully where we're going is if you have this idea of you can prompt the model because there's translation happening there, right? Like it looks like you're writing English and it's reading and running back English, but it's translating into the latent space and then coming back. So you learning how to do that is a [23:58] any input and like take any input and make any output. Yeah. Like in theory, there is a prompt that you can put in there that could solve the, [24:06] a disease or make it funny. Right. And so can you do that? And it would require reflection and somebody who's a subject matter expert. And that's why I talk so much about AI being leverage. [24:17] for subject matter experts instead of it being a product in and of itself. Yeah. I think one of the reasons I'm excited about this is, uh, I think about my nephew, who's he's turning three tomorrow and I was hanging out with him. [24:29] yesterday and [24:31] we were like playing around and now he's like old enough that he can like play pretend, which is pretty fun. And he had this like balloon and we were, we were like hitting the balloon back and forth. And I was like, [24:42] you know, doing the classic, like the floor is lava. Like we can't let it, you know, touch the floor or whatever. He's just old enough where you can kind of get that and like know that lava is bad and want to keep it up, which is kind of funny. But then I took it and I like put it. [24:53] I put the balloon on top of the air conditioner and it started floating. And then I showed him if you press a button, if you press the button, it like turns it off. It stops floating. And then if you turn it on and he was like fascinated, he's like running back and forth to like press the button and watch the balloon float and whatever. And I was just thinking about like how that functions for him.

25:12-26:44

[25:12] beyond just being like super fun and that he just gets to like mess around with stuff like that. And then, and then be like, well, what if I do this? And I feel like models are, [25:23] are not allowed to do that because they're always just taking tests. Yeah. Well, we talk a lot about this. Like the book I'm reading right now, which is recommended to me by some of the team members at Lux Capital, who put on these like risk gaming events that are kind of like mafia, but [25:40] you know fancier i guess um that talks it's playing with reality and it talks about the reason why games are so helpful is [25:48] you can explore with low stakes you can try new things and then see what works you have and and it may be that it is not that every game is not a perfect representation or model of the world but there are games that are pretty good ones and there are games that you can learn a lot from [26:06] I think I personally learned a ton from the game RuneScape. You learn how markets work, you learn how not to get scammed, you learn how to type pretty fast because you need to sell your trout. There are things that you learn and it might not be obvious. I'd love to, at some point later in my life, [26:22] Make a game that's a little more intentional with what it teaches you as you learn. But for now, I think if you... [26:29] look at [26:31] What a game is, it's really just a system with a goal. And I think we've already seen people demonstrate that this works. And maybe you stretch the definition just how DeepMind stretched from AlphaGo to AlphaFold. It's still a game of folding a protein.

26:45-28:22

[26:45] But now it solved the problem that took a PhD student all six years in 30 minutes. Yeah. All the things you're talking about remind me of... When you said that you were reading Playing With Reality, I thought you meant you were reading... [26:56] Another book called playing and reality which is my different guy who I love his name is DW Winnicott and I [27:03] A lot of the stuff you're saying reminds me of him and also Wittgenstein. So for Winnicott, like his whole shtick is like being in a state of play means that you're in this sort of. [27:17] mode of spontaneous [27:19] self-like [27:21] uh, [27:22] self-actualized [27:24] uh, behavior with reality where instead of scanning for threats and trying to figure out, like, how do I avoid, like, you know, [27:32] how do I do things in the right way? You're just, you're being your authentic self. Um, and he has this whole theory of what he calls transitional objects, which are, um, [27:44] Basically, when a child is really little, they feel cared for. They feel safe when they're with their caregiver. And at a certain point in their development, maybe a little bit younger than my nephew is now, they develop attachments to transitional objects, which are teddy bears. And what they do is they project the feeling of care that they normally get from their mother or their father onto this object. And it comes to represent that feeling of care for them. And that's why they bring it around everywhere. [28:14] Bye. [28:15] And his whole idea is that our ability to do that with transitional objects is sort of like the budding...

28:22-29:54

[28:22] thing that allows us to, to like be spiritual or be religious or, um, all these ways in which we make [28:29] things out in the world like feel significant in this larger way like beyond just what they are like beyond just being a teddy bear yeah and um [28:39] Yeah, I just think that that's really interesting. Well, I think it makes me think of two things. One, just in the context of a kid and a child's mind, one of the things that is talked about in the book is... [28:53] This isn't a new idea of AI and games. They've been around for a long time. I think it's new in the context of language models and vision models and what we're doing right now and how we're thinking about it. [29:07] A passage talks about Alan Turing saying games are the perfect environment for it. [29:12] But with that said... [29:15] because we want the models to learn, [29:18] we should put them in the chat in a child's mind instead of a, an adult's mind. And so just like that, that, [29:24] keeping the wonder of the world and the curiosity and the ability to be wrong is pretty interesting. And then the second part of that was, and it's also mentioned that, [29:34] Games can teach really long horizon ideas. [29:38] thinking and like [29:40] that you can take many, many, many different actions and then find a reward at the very end of the road. Yeah. And it's interesting that you mentioned religion and some of these other ideas where humans are very special in that they're one of the... [29:52] only species that...

29:54-31:26

[29:54] can work for something that they won't see in their lifetime which is pretty incredible yeah and games aren't that right um but i think [30:03] their practice for something like that. Yeah. Yeah. [30:06] Another thing that it seems like games might be interesting for us, like, I think I and a lot of other players [30:12] AI people are starting to feel... [30:16] as though the [30:18] Okay. [30:19] Lack of continual learning is a big problem for progress. And, [30:24] Having AI need to be able to figure out and get good at a game with very few tries is a really interesting thing. [30:34] thing too. Have you explored that at all? Yeah. So we're actually working with a super smart PhD from rice and some researchers from Princeton right now who are looking at [30:45] Optimizing prompts based on results to learn from them and [30:51] the initial results [30:52] aren't great. [30:54] But then quickly there's progress. And I think that that's the initial results of what? Like the first attempt. The first attempt of doing that, right? Aren't great. But then you quickly see progress. They're like you're already seeing it with current models. Yes. And you quickly see that. [31:09] But it goes to that problem space we were talking about, right? [31:12] The concept is there. [31:14] And one of the things that I've learned in training and the consulting that we've done is if you can shift your mindset to be not... [31:22] oh, this model gave me a wrong answer, but it had the wrong context.

31:27-32:58

[31:27] you take so much more power back, and I think that's the right way to think about them. These models can do such incredible things that if they're doing something wrong, [31:36] it might not be your fault. It might be that you need to prompt them in a weird way to get them to do that thing, but it's very likely that they can do it. That's interesting. So, so, [31:46] Are you saying that you believe this or they believe this, that we actually might be closer to continual learning than we think because we can start at the layer of optimizing their own prompts and they're not bad at it? [32:00] I think we're both closer and further away. I'm not sure. What I'm saying is it's a tractable problem. [32:08] I'm saying it's a tractable problem. [32:10] But it requires a different skill set because, and I don't know, right? Like I'm not somebody who's doing the reinforcement learning and doing these model training runs for these biggest models. But it would seem to me that the... [32:24] if you're able to get a model to reflect and to think about its learning and then train on that, you will get more of that. Yeah. Right. And you should be able to prompt the model and think of tools and think of ways to get it, to do that and be opinionated, be prescriptive with it, to get it, to do that in a way. And maybe you, you know, that has some downsides where it's going to get more narrow and do that more frequently, but then maybe you can think about another one and then you can build on top of it. And so that's why it might take longer because it, [32:52] work needs to be done yeah and that's the kind of work that we're looking at doing um

32:58-34:34

[32:58] But so I don't know. I don't know if it's you can. There's clear research that shows that AI can help prompt itself to get better. I think DISPY is like a really cool example of something like that. Which DSPY. I've literally never heard anyone say it out loud. I've always pronounced it DSPY in my head. You think it's DISPY? [33:28] And I think that some of this became... [33:30] very clear to me in diplomacy, where when we started, the models couldn't play the game. Then we made some iterations and then you got large models to play the game. And there's some existing research that showed that a lot of large models can play. And there's cool research that self optimized the prompt so that GPT-4 could barely play. [33:47] But we put more and more work to it and we built tools to help us iterate quickly. And then we got to the point where... [33:52] Devs so small can play right and [33:55] that it was hard, but you learned a ton. And I just don't think [34:00] We're in this weird time where there's an opportunity cost, where if you spend a lot of time to solve one problem, it better be worth it because you can solve so many other things. [34:09] And is it [34:10] To make it worth it in the economy is maybe tough, but I think if you make it worth it to yourself, then it's definitely worth it. And then it can have value economically. So that's how we're approaching it. Having it be worth it to yourself, I think, is an underexplored path for entrepreneurship that is very helpful. Because it often takes a really long time to figure out if it's working or not.

34:40-36:18

[34:40] a little too early. I think I mean, that's a big reason why we're building this. Tyler and I both worked in startups for a while. He's been running his own consulting company for four years. And, you know, I was co-founder of AI camp in 2021. Been here, worked at a company called Salt that had three pivots and found product market fit and like drug discovery. But, [34:59] This is the first time where... [35:01] we're making a company from scratch yeah and the reason why is because we both love and think that there's a lot of value in the intersection of ai and gaming yeah and like it's and not only because [35:13] I truly believe that our environments are going to make models better, but also... [35:20] Because it will make people care and also less fearful. Like one of the things that we see in consulting is there's this knowledge gap growing. Simon Willison's written about it. And so is Andrew Yang, where... [35:34] People who are using these tools, [35:36] are the least fearful about them because they get it they see where it falters they see how they can use it to get better but as people don't adopt them whether because they're busy or they are have had bad experiences with a really initial like early version or um for any number of reasons many of [35:58] you can get fearful and angry. And so you get this gap that starts to occur. But with games, [36:05] It was so cool when we launched Diplomacy. We had almost like 50,000 viewers hop on for a week, watch what admittedly was not a super entertaining interface. Like you could see them chatting and it was just panning back and forth. And yeah.

36:18-37:50

[36:18] I think we had a good soundtrack on it. But... [36:22] They could see many of them were not AI people. There were people came from the gaming side and it became less scary You could see it make mistakes. You could see it take a different strategy that you know Isn't the optimal one or every once in a while you see it do something good and so it becomes much more relatable And I think games are very powerful in that way. And so that's another reason why I think that what we're doing is important How do you get into games like why do you care about it? Yeah, I? [36:46] I think I've always learned a little bit differently than other people and games have been one of the ways I think I've learned the most. [36:54] When I was really young, we... [36:57] One of my friends taught me multiplication in kindergarten with the beads on an abacus. [37:05] So then I was advanced math. And then at one point in elementary school, advanced math, they put you in front of the 24 game. Have you ever played a 24 game? I never even sniffed advanced math. I don't think they would have taught me that. [37:21] They have this little card with four numbers around it. [37:27] 24 and then you tap the card. Okay. It's like Sudoku. Yeah, kind of. You know, it could be like 2, 4, 6, and 12, and it's like 6 minus, you know, [37:37] two is four, you know, and you figure it out. Um, and, uh, [37:42] I'm like thinking through that was totally the wrong solution. But anyway, yeah. And then, you know, we mentioned Broomscape and a lot of these other games and

37:50-39:21

[37:50] Learned by building mods and many people who I've talked to. [37:54] Like in these conversations, raising money, but also at every, at a lot of other places, some of the smartest people that, that I've met. [38:03] have had similar experiences where they played some game and got something really good out of it where they were modding a game and then that brought them into their journey like i was on one of the first minecraft servers ever and some guy that i didn't know hopped on skype for four hours to help me build a computer from scratch that's sick like it just you have this weird connection and and i think that there's a lot of value there i do think that i'm [38:27] there are downsides, right? [38:29] games are not real life yeah they are they can be practiced you can learn but if you get stuck there forever that's not good yeah that's why in my head games are a good start that was a big part behind the name um but [38:42] I do think that there's also a world where there will, I think that there's a world where you can make a game that brings people back to reality to a degree. I think Pokemon go was a really cool experiment. I think if they had more of a fleshed out game that they could have had something way bigger. Yeah. And I don't know if you remember that moment in time, but it was crazy seeing everybody out at monuments. Just, you know, I was in Boston at the time seeing massive crowds along the reflection pond where everyone was just around and, [39:12] And like that sense of connection was, [39:14] really special at the time. And so I think that there's just a lot. It's something special about games. You're making me think of...

39:21-40:52

[39:21] I used to love games like video games growing up. And during the pandemic I bought... [39:25] like an Xbox because I was I was like oh it would be cool to you know play Call of Duty and at a social is that I'll have something to do it was just when I first started every and I was like stuck in my apartment so I was lonely sure and I like logged in and immediately just got murked by an 11 year old but I do I actually do really really like video games that I kind of I kind of [39:55] was your top sports game. [39:57] Thank you. [39:58] I'd say... [40:00] In college, there was just this constant cycle of... [40:03] Um, [40:05] FIFA and NHL. Okay. And so, you know, playing that a bunch. And it's funny because a lot of it's social, right? Like there are single player games and a lot of people play them. But I do think a big part of it is social because even if it's not – [40:17] Even if it's single player, even if you're playing alone, the community of other people who play that game is a big part of it. And seeing how you can do something that others haven't yet or... [40:28] that you tried something new and that you're comparing notes. That's why I like a little bit of some people think [40:35] that you're going to have games that are tailor-made for you or movies that are tailor-made for you and you exclusively i'm as bullish on ai as the next guy uh but you're [40:43] I think that [40:45] Shared experience is so important. So if it's something that could not be experienced by somebody else, I think that's actually bad. Yeah.

40:52-42:25

[40:52] Interesting. Yeah. I also I played so much Halo growing up. Were you a Halo guy? I was I got hand me down PlayStations from my cousin. OK, so I never I was never. But, you know, then you grow. Other people have it. And it was such an iconic franchise. What was your top? What's your top shooter? Like first person shooter? [41:10] Modern Warfare 2. Okay. Yeah, that was good. That was one of those, one of the eras. Yeah. And actually, so in a similar way during the pandemic, started getting a little bit back into video games. [41:23] Had... [41:25] I had played some Fortnite games. [41:27] Great. Then it started getting real sweaty and, you know, can hang, but at some point you want to be able to play with friends. Yeah. Um, [41:34] then [41:35] Most recently, though my headset's broken now, I started playing VR. And it was... [41:42] Not something I really expected to do a whole lot of. Yeah. But in the same way, I... [41:49] around the social component. [41:50] I started playing Population 1, which is essentially Fortnite in VR. So you're physically ducking, you are physically reloading, you are physically moving... [41:58] and two of my buddies from college were playing and you play on teams of three so we had the perfect number of people to do it and it became something where you [42:06] come on, there's a headset built in, there's a microphone built into your headset. So you're immediately talking, you're talking to each other. That's cool. And... [42:13] It's one of the most fun gaming experiences I've ever had. You're physically in a game of Fortnite. You can only play for like an hour and a half. And if you don't play for a little bit, then you start to get vertigo. When you get back, it's almost like you get to like

42:25-44:09

[42:25] get over that. Yeah. Um, [42:28] But it's pretty incredible. I'm still juries out on if I think VR is going to be a huge platform in and of itself because... [42:34] I don't know how many people want to be fully disconnected from the real world. [42:38] But [42:39] It was a whole lot of fun. [42:41] Yeah, I miss gaming. [42:45] I miss getting home after school and logging on to matchmaking in Halo or whatever. Or Counter-Strike or all those games. I tried VR a little bit, but I never really got into it. I think probably because I have glasses. It's just harder. I've heard that a lot. I do think glasses will become... [43:05] a, I'm wearing the Meta Ray Bands right now. The, I use them as my AirPods. I see you all the time, like walking out of the office and you're like talking to yourself and I'm like, what is it? Why are you talking to yourself? And it's like, you're on the phone on your, on your Ray Bands. And I'm like, what the fuck? It's really cool. I'm a big fan. And I think that [43:26] More and more... [43:27] people will use glasses as a form factor for computing. Yeah. I don't think that... [43:33] They're going to replace computers or cell phones. I think that there's room for all three. They're very different. I like, personally, that they don't have a screen. I imagine that's not long for the world. I imagine they'll start. Yeah, I thought that the new one is projecting. Is it not doing that? I imagine they'll get there. I like that it doesn't have a screen. I like that it's just... [43:54] I can talk to it. I think that it would be a pretty bad experience if I started talking to someone and then they're like, oh, sorry, what? Yeah. Because they were looking at something on glasses. So, you know, I expect the incentives to push it that way. But I do think...

44:10-45:40

[44:10] Right now, it's a more human piece of technology. And I think a lot about people taking pictures of their kids, and their kids are imprinted on this... [44:18] box that's between you and them. And they're looking at it and they see you getting joy out of it. So they're like imprinting there versus you turn this on, your hands are free. You're good. You're in the moment at a concert. You're not like this. You're just in there. Like, I think those are, I love the concept of technology that makes us more human. Yeah. What are you, I mean, you're, you're the guy that before you got really, really busy fundraising, all that kind of stuff. In addition to doing consulting, you were writing amazing stuff on every, and you were the guy that [44:48] you would have a really good read on what got released and whether it's bullshit or not. Sure. Like, what are you excited about right now? [44:53] I know you've been busy fundraising, so you may not have your finger on the pulse as much as normal. But yeah, I'm curious if there's anything that's exciting that's on your mind. [45:03] All I can think about recently is games. Yeah. But anything else in games that like are not not specifically that you're working on, but just generally is going on that that you think is exciting? [45:13] Okay. [45:13] Well, GTA six got delayed and is coming out next year. And so it's the most expensive game that's ever been made. Uh, [45:20] I think the last one came out when I was in high school or something, right? Yeah. It's been over a decade. This is a billion-dollar game. Just the cultural moment of that I think is going to be interesting. [45:32] And then on the AI side, I think that [45:36] Maybe it's less about what's happening right now, though I would say –

45:41-47:33

[45:41] a lot of the stuff Google is doing is really cool. Yeah. Yeah. You're a big Google stan. Yeah, I am a big Google stan. Demis, if you ever want to cut an angel check, please. [45:53] The... [45:55] The connection of a lot of what they're doing is so interesting and the constraints that they have, I think are so cool. [46:00] Because, you know, [46:02] AI is almost existential for their business on search. And so they want to be able to use it there, which means it has to be fast. It has to be reliable. It has to be able to go check other sources. It has to be good. So having constraints when you could do anything, I think, is actually helpful. Then they also have Genie, which renders... [46:21] anything yeah quickly that's experienceable i don't know how that will be interacting with gaming maybe it makes up some of the most renderable i have a thought expensive i actually have a thought about that which i think you'd be into um so i had this guy he's the ceo of descartes which we talked to a while ago yeah and they have this really cool um video to video model uh where it takes any frame of video and then turns it into something that looks like a video game yes and it's [46:46] And they had this thing where, for example, if you pick up a tissue box and you go like this, it turns into a gun and shoots it. And – [46:55] I think that's an interesting future for gaming. I agree. Because, like, right now, to make GTA... [47:02] You have to... [47:03] hand code all of the interactions and that's why it takes a billion dollars and many, many programmers and artists to do it. [47:10] And with video-to-video generative models, one, you could just generate it from, like, live video. But two, you can vibe code a really simple game. And then you can reskin it with generative AI to, like, look like a AAA game. Yeah. And I think it lowers the barrier to making awesome games to almost anyone now, which is really cool. I think that that – not only does it do that. Yeah. But it also –

47:33-49:09

[47:33] lets you do things that were otherwise super computationally intensive, like ripples on water, or reflection of light, without having to run it at all. So I could definitely see that. That's cool. [47:45] And that was related, I think, generally, or like, you know, they're doing that, but they're also doing a lot of different things that are cool in AI. And I think, [47:51] I mean, one of them is life sciences. I think it's a really underappreciated world. And it's one that I was fortunate to be [47:58] deeply involved in getting to work with like the Ellison Medical Institute and others in my last startup where [48:05] I'll like... [48:06] I mentioned it earlier, but AlphaFold literally took something that we had PhDs taking six years to do and turn it into something that takes 20 minutes. Yeah. [48:15] And [48:17] As far as where I think AI is having near-term impacts, I think it's software, life sciences, and education. Those are 3SE today having massive impacts. And I love robotics. I used to work at Amazon Robotics. I'd love for that to get there. Self-driving is... [48:33] seems to be at the precipice. I'm a huge Waymo fan. I take them all the time. And... [48:39] But... [48:40] Those three are right now seeing huge impacts because – [48:46] They're perfectly, the problem is perfectly suited for AI. Software [48:50] we have compilers we wrote the code yeah we know what would render or not it's a solvable problem yeah it's great for language models it's great to just do reinforcement learning on just code and then also on diplomacy as code so when there's new things it can generalize right like that's awesome and you can do that life sciences

49:09-50:44

[49:09] There's a ton of information out there. We just need people with subject matter expertise to combine them and look at these different interactions and to find ways to simulate these processes. There's a real chance that in the near term. [49:21] People find a way to turn dollars into longevity. Like that's crazy. Mm-hmm. [49:26] And then on the third side, education, you talked about [49:29] being excited about your nephew, learning about these things, [49:33] starting to enter that world of learning. And I think it's going to be really interesting to see. I don't know for sure. [49:40] I got, I love that I got to interact with so many high school and college students at AI camp when they were going through that learning journey. I think it'd be, it's tough to be a high schooler right now in the world. Um, and, and, [49:52] the education system hasn't caught up yet, and there's this huge incentive to use AI to do your work, but then you really learn about it. [49:59] What do you care about? Tough. [50:01] The generation afterwards, the generation who's going into their why, why, why phase that can have AI to answer those questions. [50:10] It's amazing. They're going to be... [50:12] So smart. Yeah. And what that hell just looks like, I don't know. Yeah. But... [50:18] to be able to constantly be exploring and to get answers. And maybe they're, [50:22] sure there's probably negative externalities from that yeah definitely right but [50:27] There's also probably a lot of positives. It's really good. I think – [50:30] I just remember being like in fourth or fifth grade and be like, I want to write a novel. And people were like, what are you talking about? And why is it even so hard? It seems easy. I don't know if you had that experience, but I was like, oh, you just write it. Yeah. It's good. Yeah.

50:47-52:21

[50:47] So like having AI to answer all my questions and stuff, I think it would have been just fantastic. Okay. So that's the stuff you're excited about. What's like the most overrated thing or what pisses you off in AI right now? [50:59] You know, I don't think a lot pisses me off because... [51:03] It is pretty important. [51:05] So people talking about it, I think is, is probably net good. There's definitely people shilling things that aren't really going to solve your problems. Um, yeah. [51:14] I think the thing that maybe I worry about the most... [51:17] is... [51:21] In the same vein of education, [51:23] We now have this leverage that makes somebody who's an expert in something [51:31] 10 or 100 times more powerful. [51:33] But it also... [51:35] it does the work [51:37] of someone who was a junior in that field. So how do you bridge that gap? How do you financially incentivize somebody to learn and make mistakes and get better? [51:49] knowing that this tool is here. [51:51] to keep pursuing there. Like what, what, [51:53] The... [51:54] Maybe they're overblown, but it seems like the job numbers of people graduating college are getting crushed right now. [52:03] I imagine a part of that is because you don't need people to do that blocking and tackling that you needed before. And so, and paying them to do so is a big cost. Yeah. And so what does that look like? I have, I'm so not worried about that, which is interesting. And I think actually you're one of the people that made me not worry about it.

52:23-54:11

[52:23] Um, [52:23] I used this anecdote a lot and I don't think I don't know if I've ever used it to your face. I'm curious about this or I'm curious to tell you how you've impacted how I think about this. Like when you joined every organization. [52:37] and you said you wanted to write and you wrote your first piece. It wasn't good. Yeah. And it was not good to the degree that [52:45] we could not have worked with you without AI. Yeah. And what was really interesting is... [52:51] And I've worked with a lot of young writers. And so I can tell pretty quick. [52:55] Your rate of progress, like every time we talked, you recorded it, you made prompts and you never made the same mistake twice. And so your rate of progress was within like three or four months, you had made like a year or two years worth of progress. And that just kept happening. And so [53:11] Let's assume that the job numbers are down for young people because [53:17] actually because of AI, because people are not hiring them, that is a gigantic, gigantic management mistake that companies will begin to correct as soon as they realize that a 23-year-old with JGBT is fucking cracked. And if you give them any amount of mentorship, they're going to do amazing stuff that they never could have done before. And I think there's the question about, well, they're not actually learning the underlying skills. They just have the AI do it. They are, because they have to, because if... [53:44] the AI messes up and they care about it messing up and they should, because that's the way to do a good job. They're going to go in and learn the stuff and they have a great tutor to help them figure it out. So I feel extremely excited for young people. And I think to the extent that managers are not hiring them, that's on them and they will figure that out pretty soon because they'll be like, Oh my God, like I hired this 23 year old and like, it totally changed my whole business. My dad is like this, he's downstairs right now. And,

54:11-55:44

[54:11] He owns like a few cemeteries in Indiana and like he has this 23 year old who just completely changed his entire business. Um, and, um, [54:20] Uh, so, so I think it's going to flip from there. Maybe it's there right now, but I think it'll flip from there to like mid career folks pretty soon. [54:28] Because I think the kids are going to be all right. Yeah. Yeah. The thing I agree with. [54:33] is that I think that probably... [54:36] the solution to this is some form of an apprenticeship, kind of, right? Where you're able to quickly learn about something, [54:43] and you're doing something that you care about, therefore you will spend the time on it. Therefore you will care about [54:49] What it is that makes it good or not right if you don't care about it, that's gonna be hard. Yeah, I [54:54] Um, [54:56] The counterpoint is, [54:59] I don't think that [55:01] without my experience on the training and the AI and that side of things that [55:07] I mean, brought in to the every side to then have the chance to do it. So what is that skill that... [55:14] they're being brought in to do besides the raw material, right? Besides the ambition, the ability, and just doing that when you're comparing them on the market. Yeah. With people who do have that and maybe it becomes [55:26] the people who are on the market don't have that hunger. It's literally just like, I'm, I'm hungry and I'm, I, and I'm willing to try new things instead of the things that are, [55:35] currently being done. And like, totally, I think that's, but if, but if all things equal, if you have somebody who's hungry and doesn't have experience versus someone who's hungry and does, um,

55:45-57:17

[55:45] I would take the one that doesn't have experience because the person who has experience, their experience is wrong. [55:50] because the whole landscape just changed and it's really hard to get someone who's like already in their career and knows how they do things. I mean, you know this, cause this is what we do is we take people who are mid career and we train them how to do something else and it works, but it's hard. Yeah. What's easier is someone who's hungry and, uh, has, hasn't done it before and doesn't have a whole set of things they have to unlearn. And it's just going to like figure it out. Maybe. Yeah. We'll see. Um, but that's something I spent a lot of time thinking about. Yeah. Yeah. I feel you. I [56:20] important that [56:21] it's not all going to be rosy. And there are anytime there is technology shifts, there are downsides. Yeah. No. And yeah, I think another example of that, right. Is like big companies who are going to be able to do more with less. Yeah. And you may see them [56:35] and are seeing some in some industries already cut headcount. But to your point, right? Like if you cut too much headcount and then you realize, oh man, we could have just done way more with those people, then that's a mistake. And then you start seeing groups of those people be able to do way more on their own and out-compete on these niches and then take away parts and create many, many, many more startups that have ever existed. So I'm excited. It's going to be a little rocky, but I'm excited. Well, reality is typically rocky. [57:01] Indeed. Much rockier than games. [57:05] All right. This is awesome. So if people want to find you, want to participate in your tournament, want to just generally follow along with what you're doing, where can they find you?

57:17-58:21

[57:17] Goodstarlabs.com. [57:19] Goodstar Labs on Twitter. [57:21] I'm at ALXAI on Twitter, and you can... [57:26] Read up my writing on everything. [57:28] Amazing. Alex, thank you. [57:30] Thanks, Dan. [57:39] Oh my gosh, folks. You absolutely, positively have to smash that like button and subscribe to AI&I. Why? Because this show is the epitome of awesomeness. It's like finding a treasure chest in your backyard. But instead of gold, it's filled with pure, unadulterated knowledge bombs about chat GPT. [58:01] on the edge of your seat. [58:03] craving for more. It's not just a show, it's a journey into the future with Dan Shipper as the captain of the spaceship. So do yourself a favor, hit like, smash subscribe and strap in for the ride of your life. [58:16] And now, without any further ado, let me just say, Dan, I'm absolutely hopelessly in love with you.

Want to learn more?

Ask about this episode