Nicholas

OpenAI's Codex: This Model Is So Fast It Changes How You Code

Nicholas

OpenAI’s hottest app isn’t ChatGPT—it’s Codex. In the last few weeks alone, the Codex team shipped a desktop app, GPT-5.3 Codex (a new flagship model), and Spark, the fastest coding model I’ve ever used. Usage has grown fivefold since January, and over a million people now use Codex weekly. Codex was also the app that OpenAI chose to run an ad for in the Super Bowl. Dan Shipper talked to Thibault Sottiaux, head of Codex, and Andrew Ambrosino, a member of technical staff who built the Codex app, for Every’s AI & I about what OpenAI is building and how they’re using it internally. If you found this episode interesting, please like, subscribe, comment, and share! Want even more? Sign up for Every to unlock our ultimate guide to prompting ChatGPT here: https://every.ck.page/ultimate-guide-to-prompting-chatgpt . It’s usually only for paying subscribers, but you can get it here for free. To hear more from Dan Shipper: Subscribe to Every: https://every.to/subscribe Follow him on X: https://twitter.com/danshipper Head to granola.ai/every and get 3 months free with the code EVERY. Timestamps: 00:00:00 - Start 00:01:27 - Introduction 00:05:27 - OpenAI's evolving bet on its coding agent 00:09:42 - The choice to invest in a GUI (over a terminal) 00:20:38 - The AI workflows that the Codex team relies on to ship 00:26:45 - Teaching Codex how to read between the lines 00:28:45 - Building affordances for a lightening fast model 00:33:15 - Why speed is a dimension of intelligence 00:36:30 - Code review is the next bottleneck for coding agents

Published
Published Feb 18, 2026
Uploaded
Uploaded Jun 12, 2026
File type
POD
Queried
0

Full transcript

Showing the full transcript for this episode.

AI-generated transcript with timestamped sections.

0:00-1:29

[00:00] The first time I showed it to someone, they were like, [00:02] "No way, this is like a fake demo. This cannot be this fast. This will change everything." Especially because it's not yet the fastest that we can actually get it to be. My experience was trying the app. I didn't really want to go back to a terminal. [00:15] What I realized is actually GUIs are great, IDEs are just a problem. There's something that's a GUI for programming that's not an IDE. [00:22] And it seems like you're figuring that out, but I don't even know what that's called. [00:25] It's called a Kodak San. [00:28] - [00:29] *laughs* [00:29] *music* [00:44] Dan here, and I want to take a second away from the episode to tell you about Granola. Granola is an AI note taker for your meetings, and I use it pretty much every day. That may sound a little bit weird or a little bit creepy, like transcribe all your meetings. Well, for me, it's actually kind of indispensable as a leader. Every is about 20 people now, and it's really important to me that I understand how decisions get made, how I'm showing up in meetings, and how I can help my team the best way I can. Granola acts a little bit like a leadership log for me so I can see how I've done in meetings, [01:13] I can do better next time. If you're trying to improve as a leader and scale your company, try Granola as your AI-powered notepad for meetings. Head to granola.ai slash every, code every, to get three months free. And now, back to the episode. Tebow, Andrew, welcome to the show. [01:28] Hey, thanks for having us.

1:30-3:06

[01:30] Thanks for having us. [01:32] Great. Great to get to chat with you. So for people who don't know, Thibaut, you are the head of Codex OpenAI. And Andrew, you are a member of the technical staff on the Codex app at OpenAI. And you are the people of the moment. They just ran a Super Bowl commercial about Codex OpenAI did. How are you feeling? Yeah, that Super Bowl was quite surprising, wasn't it? [01:53] It really was. I think the core thing and I think the reason, the place I want to start this conversation is... [02:01] It feels like that is a strategic shift. [02:05] You would expect OpenAI to have run a ChatGPT commercial during the Super Bowl. And maybe not, especially if you looked at Codex's positioning like three or four months ago for professional engineers, maybe not have run an ad targeted at a much broader audience. It felt like for a long time there was this divide where Codex was for professional engineers. And if you want to do VibeCoding, you do that in the ChatGPT app. It seems like that has shifted a lot over the last month or two. [02:35] about that. [02:36] Yeah, I think especially like in, you know, we can talk about last week, right? So like last week on Monday, we released the Codex app. [02:42] Immediately, we saw a ton of downloads, more than a million downloads in the first week. And then we knew that we were releasing an extremely strong model, 5.3 codecs on Thursday. That just made, I think, this... [02:56] It's very visible that, you know, we're here today. [02:59] to put incredible experiences out there. We're very committed to Codex. And also agents are really starting to work and be able to create

3:06-4:43

[03:06] these things, even if you're a little bit less technical. I think the app really showed that. It's much more inviting for people to just try it and run multiple agents. [03:16] with our models being very [03:18] very good at sort of like allowing for multitasking and being reliable for long running, long running sessions. So it like allows you to create a lot more. So it just felt like, [03:29] that maybe we can inspire more people to build and then show that agents are here, right? It's like, [03:34] it's coming, it's going to be mainstream. Why don't you try and create something new and inspire people? I felt like the right thing that we wanted to reinforce. [03:44] Yeah, while we were designing and developing the app, [03:48] One of our... [03:50] like internal mandates to ourselves the whole time. [03:53] was that we had to make something that we love to use and that we used for all of our work. [03:59] And if we couldn't do that, then we weren't going to put this out. And this was back when we started [04:04] And I think that we surprised ourselves a lot with how fun it was. [04:08] And especially as we started to build this app [04:12] before we started to build agent skills. And then once we kind of paired them together, [04:18] it became this really rich interactive experience where you could open the browser [04:22] or you could connect to these various services. And so all of a sudden we started to feel this like really connected interactive experience and [04:30] wanted to share [04:32] like, [04:33] I kind of see the ad as like a love letter to builders, right? I have never seen a Linux CD in a Super Bowl ad. And so, you know, like that was really cool to watch.

4:43-6:19

[04:43] What was the impact of the ad? [04:46] Um, we're still to measure that. Uh, we'll see like, you know, how, how it, uh, plays out over the longterm, but we saw a giant surge of traffic actually like remarkably, like, you know, very, very quickly after 4 PM, like PST when it aired, like the surge and like our systems were like under heavy load. So it was, it felt kind of weird to me that, you know, people are watching the Superbowl and then going and like, you know, installing the app and they're just like trying it out right there and then. [05:11] But it happened. And a lot of [05:15] A lot of people reached out and saying they were really inspired by it and just wanted to build afterwards, which is what we're aiming for as well. [05:24] Timmy back, I still want to talk a little bit about the strategic shifts. So, um, [05:30] codecs app moving from or codecs in general moving from something that is [05:34] really for professional developers moving to something that has a broader audience. [05:39] and and and maybe moving some of the [05:42] vibe coding from Chatripti into the Codex app. Tell me about that. [05:45] I don't think we're trying to move Vibe Coding from ChatGipati into the Coding stuff. We're very much [05:53] two things are happening. Like one, we're pushing the frontier on like professional software development, like five three codecs, like [05:58] beats every single other model on the top benchmarks for coding. So it is a very, very capable model. [06:05] And, you know, it's also like, [06:07] at the speed and cost, it's like, you know, it's [06:10] It is a top performer, rather. I think the app-- the second thing is the app does make things more accessible. And so it does appeal to a wider audience.

6:20-7:50

[06:20] But internally, we're also seeing the app, you know, just it is very much used within research, within our own team, like the entire Codex team uses the app. It makes people more productive. [06:29] So it's like very much leaning in into... [06:32] how we think agents are best used, the patterns that we were seeing that were making people very [06:38] productive here at the company and outside. [06:41] And then it's just sort of like going all in on that. [06:43] it does happen at the same time. Also, it's like, [06:46] hey, it's just delegation is finally here. It works. It's much more accessible. We're going to [06:51] try and see how we can package that and actually ship this to a much, much wider audience. But that might not be the Codex app. [06:59] - You use that all the days, like you just build in there. - 99% of the code that I write is using the Codex app. - Same, I mean, I live in there now. [07:07] Okay, well, that's, that's actually really interesting. I definitely want to talk about the app in particular, but I want to go back to the thing you just said, which is, maybe if I if I'm reading you, right, you're, you're kind of like, we're pushing the frontier, we're seeing lots of people who are maybe broader than just like, [07:25] senior engineers using this. However, [07:27] The overall idea of like, who is doing what in which app, like maybe you haven't totally figured out yet. And it's not as clean of a line as like, no longer vibe coding and chat GPT or really vibe coding and codex. It's like, you can do it in both, but we haven't figured out exactly like which thing you're going to do where. [07:45] Yeah, I think... [07:47] Credex is like the most powerful experience, right? They're so...

7:50-9:21

[07:50] You should be fairly technical so that you understand like, hey, you know, code is actually getting written and it's going to get executed on your machine by default is executed in the sandbox. [07:59] but you should probably be able to read code in order to use codex to its fullest. [08:06] We will bring a similar experience to ChiaGipuji at some point. [08:10] which will have different properties in terms of the sandbox and how concepts are represented. Maybe we won't be showing, "Hey, this scary terminal command thing is running and you should probably approve it." It's like, "Of course you shouldn't do that to someone who is not technical." [08:26] and Codex is really there to like, [08:29] appeal to you know just [08:32] all coders, builders, you know, technical, like people who are close, like either technical themselves or like technical adjacent, you know, like data science, these kinds of things. [08:41] Yeah. [08:42] And, you know, if you use the Codex app for any amount of time, you can see the inspirations from chat. [08:48] The layout's very similar. We auto-name your conversations. We've got contextual actions, but it's pretty clean. The composer looks very similar. [08:57] And you'll see some of that inspiration... [09:00] back in chat for other types of things. [09:03] But we still believe that [09:06] uh, [09:08] When we set out to make something that was for the professional software developer and for us, [09:13] that [09:14] It deserved a dedicated experience that could really showcase the power of the models. And the way that the models could change

9:21-10:58

[09:21] the development lifecycle. [09:23] And so we made something very tailored to that. And we've had a lot of success internally with research teams, with product teams. [09:32] And so, you know, we're [09:35] We'll look beyond, but I think we're really happy with where we've ended up on the kind of tailored [09:39] the tailored approach to this. [09:42] Can you tell me about the decision to invest in a GUI over a TUI? I feel like TUIs are so hot right now. And obviously you have one for Codex already. And you could have said, okay, we're going to double down and just make the terminal experience even better than it is now and really invest in that versus, okay, we're going to go... [10:00] You know, I think [10:02] Yeah, making a GUI is a little bit of like a counterintuitive or like... [10:07] counter narrative things do. So tell me about that decision process. [10:11] Thank you. [10:12] I think it wasn't counterintuitive. It's more maybe it's not mainstream. [10:16] And so we experiment with a lot of different approaches. I very much consider that we're still [10:22] In the experimentation phase, [10:24] And we're responsible primarily for two things. It's like building the most powerful entity out there, that's capable of coding. And then increasingly this will become like a multi-agent system and it will become like more and more capable and you will have to, [10:39] figure out how to steer and supervise its outcome and its behavior. That's one thing that we're building. And then we're also building [10:46] How do you even interact with this? It's like, what is the optimal way [10:50] to [10:51] have visibility into what this very capable entity or system of entities is doing. How do you steer them? How do you supervise them?

10:58-12:29

[10:58] And so we were very much still experimenting with what that is. It's like, sure, you can do it in the TUI. It's like at some point, it starts to feel very limiting, especially on multimodal, actually, [11:10] the models can draw little diagrams and generate images, or you can talk over it using voice. Maybe you have many of them going in parallel and so you start to lose track. So we felt like we needed to start experimenting with something else. And it is only when we saw it become super, super popular internally, we were like, we have to shift this externally. This has come to a point where it's too good to just keep it to ourselves. [11:38] I mean, that was like the journey that you went, you know, you were now building in the app. Although like, when did you start building in the app? That was actually like fairly quickly, like when the app was building itself. That was, yeah, that was pretty quickly. And yeah, because I was starting with the TUI and with the IDE extension. [11:54] And I think that my goal personally was how can I get to [11:58] fully building the app on the app. [12:00] as fast as possible. [12:01] It's really easy when building this stuff. [12:04] to slip into the mode of like, [12:06] "Oh, this will be good for somebody." [12:08] Like somebody will love this, a certain type of like, they will love this. Right. So we really wanted to get quickly to like, [12:15] I want to be able to build the app on the app. I want it to be able to run itself. [12:19] with skills, [12:20] I want it to click around on the app that it spawned. And I want this to be like part of my workflow as soon as possible. And there, [12:28] um,

12:29-14:00

[12:29] I still use the TUI sometimes when I want to fire something quick, but I think that there is something about the flexibility of controlling UI. [12:37] and being able to have some pains be persistent and others be ephemeral and be, you know, [12:43] We shipped voice with the app so you can [12:47] prompt with voice. [12:49] We have mermaid diagrams in the app. We have full image rendering. [12:53] So all of those things [12:54] I think are like the tip of the iceberg and what we want to do with a dedicated UI. [12:58] um, [12:59] And it's pretty simple and it's simply intentionally, but I think we're going to do a lot with [13:05] um, [13:06] dynamic stuff there. I mean, yeah, the ceiling is just much higher. [13:10] Yeah, it's interesting. [13:13] My experience was trying the app, trying the app. [13:16] I didn't really want to go back to a terminal. [13:19] And [13:20] I had been coding most mostly in cloud code and and some codecs in the terminal for the last like for several months before that. [13:28] And I think what I realized is actually GUIs are great. IDEs are just the problem. [13:35] And like there's some there's something that's a GUI for programming that's not an IDE. [13:40] And it seems like you're kind of in that [13:43] figuring that out, but I don't even know what that's called. [13:46] It's called a codex app. [13:51] You know, there was a moment during the development of this [13:56] where everybody [13:58] and their mother was forking, uh,

14:00-15:32

[14:00] the same IDE. [14:02] And we kind of looked at each other. And we were like, hey, should we have done a fork of VS Code as well, like very seriously. [14:12] I remember exactly which day it was. [14:14] And I think [14:16] I don't know if I would say that IDEs are... [14:20] the problem [14:21] But I go back to like the truck analogy sometimes with them. [14:26] which is that I will open an IDE here and there. I opened one today. [14:31] It was something very specific that I wanted to do. [14:35] that [14:36] I don't even remember what it was. [14:38] But then I closed it and I went back to using the Codex app. [14:41] And I think that there is something there with like the Codex app being a great [14:46] daily driver [14:47] And like occasionally you need night to year, occasionally you need like [14:51] a really complex terminal setup [14:53] but that this should be your home base, it should be your command center for the agents that are running, and a place that you can come back to and track all this stuff. [15:00] And [15:01] you know, there were a lot of design decisions around like, do we allow free form panels like an IDE? And we kind of came to the conclusion that [15:09] a lot of what these models are great at. [15:12] is knowing what [15:14] is needed in the moment for what type of task. [15:17] And so we wanted to have kind of [15:20] more full control [15:22] over what was able to show at what point. [15:24] And you can see that in plan mode. [15:26] where [15:27] you're not necessarily getting a composer, you're getting a really quick way to answer questions.

15:32-17:05

[15:32] um you can you know and you've got your plan and you can edit your plan and [15:36] I think we only want to do more with that as we go. [15:40] It seems like you were surprised that you didn't want to go back to the TUI after. [15:46] I was. [15:48] Yeah, is that... [15:50] Were you like a... [15:53] Greg did an interview and Greg was like, I'm a power user. I thought I would never leave the terminal. Greg lives in Emacs. Are you at like a... [16:02] I was a 2e power user for like six months. [16:05] starting with starting with like when cloud code first got really good and i was like holy this is so much better than being in cursor or windsurf or whatever [16:12] And now I feel like I speed I speed ran my Chewie era and I'm back and back and go is like I'm kind of something back and forth right now, but I can. [16:20] I sort of see the light where [16:21] Um, [16:23] It just, if you're, especially if you have a bunch of them going at once, [16:26] The affordances of GUI are just like make it much nicer. [16:30] Yeah, and there's a lot more to come there. And it was a very intentional thing for us. We see agents [16:39] will act and are already acting on much more than code. [16:43] And so they need to be a companion to every single app and every single thing that you can do on your computer. It's like we integrate with linear Slack and... [16:53] of course, you also need to be able to read the code and produce code, but maybe it can do a deploy through Vercel as well. Are you going to do all these things from your IDE? That would sort of feel very odd.

17:05-18:39

[17:05] And so it's like, [17:07] It's like this command center for your agent. We optimize the entire experience around that, you know, around the idea that you have a very capable, intelligent entity that you're like controlling, steering and supervising. [17:20] And, you know, you never need to like sort of like go in there and, you know, do the things yourself. It's like, you know, the thing is very capable of like, you know, [17:27] being delegated to like, I think, you know, when when you [17:30] accept that that is what we're headed towards. And with 5.3 Codex, it just feels like we're getting [17:37] He's like, almost there, right? [17:39] then [17:41] You're like, well, you know, it's the same with you, right? You know, like when I talk to you about like a feature ID or something, it's just like, you know, you go and you get inspired and you go and do it. It's just like, you know, I don't suddenly jump into your ID and like, you know, just go and like implement it. [17:54] You could. [17:55] I mean, I think you would find it disturbing, right? It's like... [18:00] So that's the way that you will, you know, everyone will work with agents. It's like, [18:04] You just talk to them. [18:05] How has your workflow changed with 5.3 Codex versus 5.2? [18:10] I was surprised at how much faster it was. [18:13] and sort of like I have to adjust on [18:16] I have been optimizing a lot more for like long running, sort of like multitasking. [18:21] Um, [18:22] And, you know, I sort of like had an expectation of like, okay, this type of task will take like, you know, 10, 15 minutes. I'm going to kick like, you know. [18:29] for different things and then come back. [18:33] I'm able to maybe do a little bit less multitasking and be more in the flow. So that felt really good.

18:39-20:09

[18:39] And then [18:41] It just feels now very satisfying as well to kick off. [18:44] automations with it, using skills. It's like it's a more generally capable model. It's like less sort of like super focused on code, right? And so I find it like much more reliable, like, you know, sort of like going through like, [18:58] Twitter replies and like, you know, summarizing like the important teams or like filing bugs in like linear and then, you know, coming back to that and using automation so that, you know, things are like, [19:07] implemented like daily um feels like it's like much more robust for these things um i mean but you're really like the superpower user here and there's like you know it's just like the kind of stuff like you know he does it's just like you know it's like [19:19] I have very vanilla usage of codex compared to Andrew. No, I mean, well said. [19:26] I had a series [19:29] But I had intentions to run this for a while, and I only ran it for three days on... [19:34] on X. [19:35] Twitter, which was that I was setting up a prompt to basically add a feature to the Codex app. Like, [19:42] Some random... [19:43] like non-shippable feature to the Codex app. [19:46] I had this long prompts like about the, [19:48] quality bar that we had to do and [19:52] Once I switched it to 5.3 codecs, the results got actually much more interesting. Like we did a subway surfers panel on the right was one of them. Like a little Minecraft UI for the sub agents was another one that we did that I don't know, maybe... [20:06] Maybe we'll show it. I was like, get back to work. Yeah, yeah.

20:11-21:42

[20:11] Why do we have Minecraft in the critics out now? Yeah, but we've got to explore. [20:16] No, I mean, 5.3 Codex, like, it's... [20:20] It's neat. It's fast. It's capable. It's multimodal. [20:26] What are [20:27] Teebo says you have a lot of cool use cases like water. What are the like more interesting ways that you're using the codex app that [20:34] Maybe people... [20:35] should try but have it [20:36] thought of yet. [20:37] Andrew came up with automations. [20:39] And I think that's sort of like [20:41] shifts the way that you're thinking about these things when they can just like sort of like hop it into background, you know, on a specific trigger at a specific time. [20:49] And then you can sort of program it yourself. Yeah. You're using that a lot. There are a lot of things that I use. [20:56] the app for that are a little bit outside of just like coding features. I keep it to, I use it to keep my PRs mergeable. [21:05] with automations. [21:07] And so [21:08] it'll resolve merge conflicts it'll keep them updated it will fix like build issues so that basically [21:14] like, [21:15] as soon as they're ready to go, like they're ready to go. There's no like, oh, hey, somebody merged a big thing and there's a conflict now. So I do that. So you said like, so at what point is the, is the automation trigger? Cause I thought the automation triggers like, [21:28] at a certain time schedule, but it sounds like there are other triggers I didn't know about. Um, I yeah, I we're looking a lot of things. I have it right now just on a time schedule. And I use our GitHub skill and some internal skills for our CI.

21:43-23:14

[21:43] And that [21:44] that runs hourly or every two hours and kind of just cleans everything up. [21:48] I see. So it's like through all, you know, if there are any changes on main and it just looks through any PRs and just like make sure that they're all up to date so that whenever you're ready to go, it's never... [21:58] Bye. [21:59] That's actually, that's good. I like that. [22:00] Yeah, it's actually really helpful. It's surprisingly helpful. [22:04] Um, [22:05] I have one that every day at like 9am I get sent to [22:10] all of the contributions that have emerged to the Codex app over the last day. [22:15] And so it'll do like a nice report of who merged what. And it will, I have a group bit by theme. [22:21] So I can be like, all right, like three people worked on this part of the composer. [22:26] two people worked on automations, like here's what happened so that I can at least be like, [22:30] knowledgeable what's happening because [22:32] You know, things-- [22:33] Things get chaotic. [22:35] uh, [22:36] right before launch. One automation I have is... [22:40] I run it like multiple times a day and it's like, [22:43] pick a random file and find and fix like a subtle bug. [22:47] Amen. [22:48] And then it's kind of funny because it actually does pick a random file. So it will run like, you know, [22:55] Python like random and it will like, you know, find a random file and it will start from there. And so it's like every time it's like explores like a new one. [23:03] Has it caught anything in sight? [23:05] Oh, yeah, yeah. It's like we catch like it's often latent bugs that are not triggering actually like on the critical path, but they're actually bugs.

23:14-24:47

[23:14] And then, you know, just [23:16] It's trivial to fix it, merge it. [23:18] It takes very little time. [23:20] And it's a thing that I would have never found an issue in constraint sampling the other day. [23:26] That's really cool. Do you have other other automations that are worth sharing? [23:30] Let's see. I feel like I have 60 that are running at all times. Some for testing and some for real. [23:38] Some of the members on the team really like this one. [23:41] that looks at the PRs that you've done in the past day or so, and quietly cleans up any bugs you shipped. [23:49] Um, [23:50] And kind of like looks at a few of the observability platforms to see and like [23:55] tries to basically ship a fix before anyone's noticed that you shipped a bug. [24:00] That's cool. It's not coding related, which is like marketing research. It runs daily. And it's just sort of like it's [24:06] Prompted with like a specific skill to do like deep marketing research, which I've like sort of like tuned over time. And then that just goes and searches the web on, you know, any sort of like new things that sort of like came up. [24:19] in terms of how users are perceiving, talking about codecs. [24:24] And then I just received that little report. And it always makes for an interesting read. [24:30] We can just go on. These are just examples that we do rely on. They run. Do you have any particular skills that you guys... [24:40] like that [24:42] are beyond the normal kind of, you know, I have a GitHub skill and that kind of stuff.

24:47-26:18

[24:47] I love Andrew's, uh, yeet, yeet skill, which, um, it just like takes like the change and then, you know, does the commit, does the PR rights, like the draft, um, that puts it in draft and like, you know, publishes a PR with like a PR title and body. Yeah. It's very satisfying. Yeah. It just does everything. Um, that one is like makes definitely makes people like productive. What are the top used ones for you? Um, [25:15] ImageGen is a cool one. Yeah. [25:17] for both silly automation purposes, like, "Hey, make me an image that characterizes my last day of work." [25:25] Not my last day of work, my previous day of work. [25:28] Yes, yes, yes, Andrew. I, you know, the... [25:33] The ImageGen skill was actually really cool. [25:37] I use the Codex app to make a book for my daughters. [25:42] And so I had, I like, you know, put together this prompt [25:46] for teaching it about like a script that I wanted written. So like 24 pages, here are my daughter's ages, here's like where we've lived in the past, like we were in [25:55] Boston and moved to New York and then moved over here. [25:58] um... [25:59] And then I said, like, after that, we went through that, I agreed on the script. And then we went through and I said, like, all right, now it's time to use the image gen skill. [26:08] and it made um like it prompted for every page in the book based on the script [26:12] that prompted for the image. [26:13] And then it kind of put them all together and use the PDF skill to put together the book's PDF.

26:18-27:52

[26:18] And then I printed it. [26:20] And so we've got like a super custom book that, you know, [26:24] I read to my kids and it's really cool. It's just this awesome thing when you can combine like the intelligence of like the agent and then it's like it works in a programmatic way like by [26:34] using skills and then you can [26:37] just combine them in like novel ways. So like, [26:39] Yeah, I think the PDF and image 10 one is like a common combo that we see. [26:45] It feels like the codex model, it obviously has gotten faster, which makes it much more usable. And it also feels... [26:52] a little more opacy like it's a little more has a little more emotional intelligence but it still has a little bit of that like [26:59] it does exactly what you say thing in a way that is a little, it can be annoying. How are you guys thinking about how you shape the way the model feels and which way you're pushing it? [27:08] it's something that we obsess over. So we definitely want... [27:13] the model to excel at coding and be really good at instruction following. [27:17] At the same time, when we optimize a little bit too much in that direction, it can over index on specific words or misunderstand the intent in ways that humans wouldn't. [27:30] um, [27:31] Sometimes I will just have a typo and then the typo actually find its way into the file. And I'm like, obviously, I didn't mean the typo. I meant this name of this class. So that's something that we're definitely continuing to push on. But the thing that we're pushing on the most right now is really

27:52-29:27

[27:52] efficiency, speed, and then also what we now refer to as personalities. How supportive is it? And we understand that not everybody has the same preferences there. [28:03] like the, [28:04] Previous default was definitely super blunt, pragmatic personality. Now we've also introduced a more supportive, friendly personality, and you can just [28:13] between those. And I think for things that don't have like sort of like a universal, like, [28:18] accepted thing that everybody should just use is that we're probably going to introduce some way for you to just make it your own, right? You should feel like you have your own little personal codex that works in exactly the way that you want it to work. [28:34] Do you use the friendly or the pragmatic one? Pragmatic. Pragmatic. Yeah. Okay. I'll say it's pragmatic. Yeah. [28:40] um interesting I think um [28:46] You guys recently put out a model that is [28:48] so fucking fast. I was testing it before it came out and I was just like, [28:54] I can't really keep up with this thing. [28:56] So I'm curious how that changes how you think about, um, [29:01] what is now possible with coding with a model like this and also the affordances that you need in order to manage models that are so quick effectively. [29:08] Yeah. [29:10] Yeah, the first time we used this model in the app, [29:14] we had kind of that same thing happen where all of a sudden there was just like this wall of text and we are at the bottom of the scroll. [29:22] And we were immediately like, all right, we need to smooth this thing out coming in. And so we actually do slow it down.

29:28-30:58

[29:28] ever so slightly, just so that you can see the words come in like a little bit smoother. [29:33] That's so funny. [29:34] It's like a really funny problem. But this thing has been super fun. And I think [29:40] I think what I'm most excited about is [29:43] what sort of capabilities we can start to add to the app that are really, really dynamic. [29:50] that we couldn't with a model that wasn't this fast. So yes, this model is going to allow you to iterate really, really quickly. [29:57] But it also opens up a lot of new opportunities to how you [30:02] code and how you interact with the Codex app. [30:06] Thank you. [30:06] The first time I showed the very first prototype, when we hooked everything up and obviously the model is powered by Cerebrus and it's like, [30:18] We've talked about the partnership there, and we're very excited to put the first model that we're serving through that out there. It's obviously still very early. It's literally the first time we hook it all up, and we're just so excited that we want to share it. [30:36] But the first time I showed it to someone, [30:41] They were like, no way, this is like a fake demo. It's like, you know, this is not real. Like, this cannot be this fast. And then they tried like a few prompts. They were just like, oh, I literally cannot keep up. It's like, this is insane.

30:58-32:28

[30:58] And yeah, I think this will change. This will change everything, especially because it's not it's not yet the fastest that we can actually get it to be. [31:06] With the preview, we're putting it out quite early. We're actually going to layer a number of optimizations on top of it, which should be able to make it maybe 2 to 3x faster than the experience that you have experienced. [31:20] that's going to change things. And we're thinking about this also from a point of view of delegation. We think this model has a huge role to play. [31:28] as part of like a system of like, you know, multi-agent systems. [31:32] And as a way to speed up, maybe the slower... [31:37] more intelligent agent as well. So we're going to be experimenting in that way. [31:42] Hmm. [31:43] And do you do you expect the same hardware speed ups on like the more intelligent agents to come out soon? [31:50] So a lot of the things that we worked on were interesting, so like distributed systems and like infra problems that we uncovered because [32:03] the model we were able to sample from the model at unprecedented speeds. And then if you're getting tokens back this fast, you need to go and optimize the entire set of bottlenecks that you uncover on the critical path of serving. [32:17] All of those benefit, you know, like, [32:20] the current, they benefit like GPT-5.3 codex and like, you know, all future models. And there's one thing that we've been doing as well.

32:28-34:09

[32:28] which I'm sure we're going to put in a more detailed block post at some point, which is we wrote the entire series stack to be based on WebSockets and a persistent connection and to do things a lot more incrementally and statefully. [32:40] And that decreases the overall latency across all models. We haven't shipped it by default yet, but it's [32:47] It is something that we are making the default for this new super fast model. And then we're also going to enable on the other models. And it makes things... [32:56] It decreases overall turn latency by something like 30%, 40%. [33:01] I can. [33:02] We can look into the exact numbers, like [33:05] Yeah. What are the most surprising things that you've seen? [33:08] using the model internally in terms of like what what a speed speed up like this enables [33:14] It just allows you to be super, super in the flow. [33:19] And, you know, it's just you're almost like just in real time, you know, sculpting the experience or like the code. It's just a very different feel to it. [33:27] It's very... [33:30] unsettling at first. And then once you get into it, it's very hard to go back to any other model. That's the feedback that we've seen. That's what I have felt myself. [33:39] And so it's like this very... [33:43] It takes like five minutes to adapt and then you're sort of like, no, okay, it's like, this is how I'm going to use this thing. [33:48] Yeah. [33:49] I also don't think that we've poked at [33:52] the full extent of what we could do with it. Yeah, it's true. It's very early. We haven't had it for very long. Yeah, someone on the team like Channing was just showing like, "Oh yeah, it's so fast and it can actually like play Pong." You know, not very well. But it's like the model is able to react to things like, you know, almost like real time, right?

34:10-35:43

[34:10] It like... [34:12] You start to see how it might... [34:14] replace some deterministic steps. [34:17] So we have... [34:18] We have in the Codex app, [34:20] a set of get actions. [34:22] Right. [34:23] And as everybody knows with Git, [34:25] like certain configuration of things or certain states that you can be in can make it really hard to run those without a ton of error handling and like all sorts of like error messages and guidance. And it's really hard to create a good Git experience. [34:40] which is why nobody ever has. [34:43] But if you have a model that's almost as fast as running these scripts, [34:48] then you can imagine a world where these things turn into skills. [34:51] or something like that. [34:52] And you can have... [34:54] your operations run a little bit differently with some [34:58] like some intelligence. [34:59] and [35:00] and not have the same latency that you have today when you're asking it to go track something down the code mix. [35:05] You can kind of vaguely gesture and be like, hey, send this up and have that be fast enough for a button. [35:15] What I'm very excited about is when it's going to come together. One thing that we shipped with 5.3 Codex as well is this thing. [35:21] that we call like mid-turn steering, you know, where you're... [35:25] you're just you start with your prompt it's like it got to work and then you send another prompt like [35:30] while it's still working and it adapts in real time as well, it will just sort of like receive that message, acknowledge it, and then continue its work. Like if you start to think about, okay, what would this look like with voice

35:43-37:27

[35:43] and then with a model that is as fast as the one that we just shipped, then [35:49] That's like a whole other experience that we would be very excited to bring. [35:54] you know, hopefully very quickly. [35:55] Hmm, because you can easily interrupt. [35:58] as you're talking. Yeah, if you're just talking and engaging with like, you know, lateral language, and then doing the mid turn steers, [36:05] And then the implementation happens like almost [36:09] not instantly, [36:10] because of the speed, it becomes a very pleasant thing to use. Right now you can sort of emulate it with voice dictation, and then send it, and mid-turn steering, and then watch the model implement. And it's a very cool thing. I think we're going to have a step change. [36:26] in that experience when we just really just polish it. [36:30] If speed as a bottleneck is like [36:32] close to being solved. What do you think is the next? [36:35] Bottled at quarters and next limit on making the thing you want. [36:39] Thank you. [36:39] The bottleneck that is very apparent is how fast can you verify that things are correct? [36:49] So we can generate code faster than ever before. We can implement entire features. [36:56] And, you know, I saw like someone just [37:00] based on a description of the Codex app, if you synthesize that into a plan, [37:05] just based on screenshots, like the models are very much capable of like reproducing 95% of the features and just rebuilding the app from scratch. Now, is it going to be bug free? Is it going to, you know, is everything like implemented to like, you know, perfection in the same way that, you know, the actual app is like that takes like a lot of time still, you know, for like a human to go and click and verify and, you know, make sure that, you know,

37:27-39:00

[37:27] um you know it's like it's the designs are like consistent that you know there's like no bugs here or there um that the settings panel like you know when you click that button it actually does the thing that you expect i think verification you know definitely becomes a bottleneck like we have people on the team like complain you know like [37:43] There's too much code to review. It's like, you know, that's what we're trying to solve for. [37:48] Thank you. [37:48] I mean, you complain about that. I complain about that. There's so much code to review now. [37:54] um, [37:55] both on your own machine and from another peer. It's like [38:01] We're going to have to figure that out. [38:02] Yeah, you're already reviewing the code the first time because the agent is just presenting it to you. And then you have to review the code. [38:10] produced by your peers, you know, who are like, [38:13] There's like these two rounds of reviews and then [38:15] Yeah. [38:16] Yeah, I mean, this is something that we're working on. A lot of us still do have to review code. [38:22] And we want... [38:23] You know, we're taking a look at what that experience should look like. [38:27] with the model involved, right? [38:29] We've got [38:30] the review mode in the Codex app that works really nicely and kind of annotates your diffs on the side. [38:35] with findings and [38:37] stylistic things and um [38:40] Lots to do. [38:41] Yeah, it's one thing I'm also excited about, making the models faster. And then this one that we just put out, which is [38:49] mind-blowing fast is like you can also use it you know you can imagine using it like you [38:53] in a way through understand code, understand features, you know, helping you with code review, like helping you understand, like, you know,

39:00-40:31

[39:00] It's a pure road. And it's like, [39:03] much more pleasant because this is something that you want to do like you know you want to be [39:07] there in the flow, it's like something that has to be like synchronous. It's not something that you delegate. You cannot delegate understanding, right? It's like, you know, you're trying to, like, you know, get to [39:16] understand something. And so like speed there is a real advantage. So it sort of like helps offset as well. You know, the fact that [39:24] models are producing more and more code. It's like speed helps you understand this code faster as well. [39:30] Yeah, I mean, I definitely think... [39:33] I found this already with this with this new model is [39:37] Speed, especially for end to end to end testing is faster because if you're having it do end to end testing like manual integration testing. [39:44] Often there's like a toast that pops out in it. It pops up for like a second. And if the model's not fast, it's not going to. [39:49] Get it? [39:50] And it seems like it's better for that because the cycle times are much, much shorter. So and I definitely find this, too. [39:58] I can produce so much code, but when I see a PR come in, or when I make a PR, my first question is like, [40:05] Is there evidence that you've actually tested this and this actually works like not just unit tests like you've gone through it by end to end? [40:12] How do you handle this? [40:15] I mean, I've seen a lot of peers that I have the same question about. [40:19] It's so easy to code things now, right? [40:23] Um, [40:25] Yeah, I mean... [40:27] We have gotten the Codex app to be pretty good at...

40:32-42:10

[40:32] through some skills that we have. [40:34] of running itself [40:36] clicking around, screenshotting itself for evidence, and uploading it to the PR. [40:42] There's a lot that's pretty interesting there, especially when we... [40:46] make this like more async or when, you know, [40:50] the models get really fast at this stuff. Like, [40:52] I don't know exactly what it looks like yet, but there is a lot there around like, hey, here's a bug fix. This is exactly like what it looked like when it was happening. [41:02] And here's exactly what it looks like now with the same exact click path. [41:05] And so maybe that's the turning point that CodeReview becomes [41:10] less important when it's like you can verify that part instead. So you have to kind of like do less through the code as a proxy. But there's definitely more to explore there. [41:20] Last couple of questions. I'm curious, what have you guys learned from Anthropoc and Cloud Code and how do you think about your positioning? [41:30] in the market versus them? Like, how do you think about the differences? [41:34] I think they were... [41:37] first to put something out there. And that was interesting to us because we had been working on similar ideas for a bit. But I think our models were a little bit [41:48] at the time, not ready. They were not reliable. [41:53] on long horizon tasks, so they were not able to do like [41:56] reliable tool calls and stay on topic. [41:59] As soon as we started to really invest on that, and especially with GPT-5, we were like, okay, the models are there. We know how to make them even better.

42:10-43:43

[42:10] five to like the broad, you know, even like better, like long context. [42:15] long horizon, like reliability, long context understanding. And what we were seeing is that [42:22] and Tropic was sort of like [42:24] to us, losing a little bit of steam when it came to the model. And [42:30] we were in this fortunate position where the way that we run Codex is like, you know, we've got like product, we've got engineering, but we've also got research and we just like all work together and sit together and solve problems together. [42:41] And it's like a highly creative space where, you know, at times we decide to like solve problems in the product, in the harness. But at times we also we're like, hey, how can we actually improve the model? And like, let's just, you know, talk about it and like, you know, idea it together. And then like research will come and be like, hey, you know, we've got this like breakthrough that we're sitting on. It's like, would this be like sort of like something we can ship? And then it was just sort of like get excited about that. One of the examples was we had a lot of complaints on compaction. [43:11] whenever you would hit compaction, people would complain, it's losing too much context. And so we sort of solved that end to end and we decided to do end to end RL training and introduce compaction within research and then make the model itself very familiar with the concept of compaction and producing optimal delegating to itself across time. [43:34] And once we had that and we had solved it at the model level, the harness problem became so much easier because it was just like, oh, just let the model do it and it's going to be very reliable.

43:45-45:20

[43:45] Through that collaboration, it just felt like [43:49] the momentum has been very strong and we're able to improve models and ship a model roughly on a monthly cadence. [43:57] And then we took like a bit of a different bet and like a different approach with the Codex app, which turned out to be like, you know, an awesome thing, you know, to just try and do. Is it going to not just like sort of like force ourselves, you know, and like. [44:10] trying to cram everything into the Tui. [44:12] I mean, it was like, it was like a great challenge, right? You know, you were like, I'm just like, [44:17] let's build an app, like, just like, where do I get started? And then, you know, just like, you just got obsessed by it. [44:23] It's hard not to. Yeah. I mean, it's like, how was it to just like... [44:27] you know, build something that was quite contrarian, I suppose. [44:32] Yeah, I mean, I remember you and I talking about [44:35] whether or not like [44:37] early on we were like, we don't know if we'll ship this. [44:40] we'll try it out [44:43] We'll see if we can get there with something that we love and see if we can get... [44:47] I remember saying, let's get some PMF internally. [44:51] Let's get everybody at OpenAI to want to use this thing without being forced to use it. [44:56] Let's see if we can do it, right? We did. And it was like, [45:00] adopted very quickly. I mean, the minute it was barely usable, [45:05] the research folks, [45:06] like [45:07] put dev boxes on it, right? Like, which was like this crazy hack at the time. Yes, yes. But now they use it like for everything. [45:14] Yeah. [45:15] Yeah, it was like including in training like the 5.3 codex. And so like I think

45:20-46:39

[45:20] I feel really good about having hit the point where like, you know, there's like, [45:24] Everyone technical at the company, like almost everyone at a technical at the company, like uses codex, but like the people who use it the most are, you know, actually building codex and building the models. [45:32] And so, you know, we're just able to, like, you know, improve things that, like, [45:36] crazy, crazy speeds. And, you know, there's like no signs of it slowing down. [45:40] Amazing. Well, I'm excited for what you ship next. [45:43] Um, thank you guys for your time. I really appreciate it. [45:47] Thank you. Thank you for having us. Thanks. [46:17] insights, and laughter that will leave you on the edge of your seat. [46:21] craving for more. It's not just a show. It's a journey into the future with Dan Shipper as the captain of the spaceship. [46:28] So do yourself a favor, hit like, smash subscribe, and strap in for the ride of your life. [46:34] And now, without any further ado, let me just say, Dan, I'm absolutely hopelessly in love with you.

Want to learn more?

Ask about this episode