The Secrets of Claude's Platform From the Team Who Built It
In the future, you’ll be able to accomplish a goal by just giving Claude an outcome and a budget. That’s the direction Anthropic is building in with its new Managed Agents features, announced at this week’s Code with Claude developer event. The basic idea: Claude, wrapped in a computer in the cloud, that you can spin up, scale, and manage as needed. Anthropic is taking on the infrastructure that kills most agent products, and making sure that it scales to meet the needs of agents running 24/7. On this week’s AI & I from @every, I talk with Angela Jiang (@angjiang), head of product for the Claude platform, and Katelyn Lesse (@katelyn_lesse), head of engineering for the Claude platform, about what Anthropic is building and what it takes to make agents reliable in production. If you found this episode interesting, please like, subscribe, comment, and share! To hear more from Dan Shipper: Subscribe to Every: https://every.to/subscribe Follow him on X: https://twitter.com/danshipper Timestamps: 00:01:48 - How the Claude platform evolved from API to agents 00:04:09 - The primitives that make up Claude Managed Agents 00:10:37 - Why the harness and the model are becoming a single unit 00:18:49 - The infrastructure wall that kills most agent projects in production 00:24:49 - Why team agents need a different shape than individual productivity tools 00:26:36 - How Anthropic's legal team uses an agent to review marketing copy 00:34:24 - Using multi-agent orchestration for advisor strategies, adversarial pairs, and swarms
- Published
- Published May 8, 2026
- Uploaded
- Uploaded Jun 12, 2026
- File type
- POD
- Queried
- 00
- Source
- share.transistor.fm
Full transcript
Showing the full transcript for this episode.
AI-generated transcript with timestamped sections.
[00:00] A year from now, where do you think the platform will be? We'd want to experiment with directions where Claude actually gets so good at understanding itself. It figures out what model you should be using. It figures out how to spin up all the sub-agents. You don't have to think so much about what kind of architectures are there, because Claude is actually able to understand itself enough that it can write itself on the fly. In that world, if Claude is on the fly or agents on the fly are becoming what they need to become in order for you to do what you're trying to do, the platform has to seriously scale. [00:30] really what I'm asking. [00:44] Angela, Caitlin, welcome to the show. [00:47] Thanks for having us. Yeah, thank you. So for people who don't know, you both work on the platform at Anthropic. So Angela, you're the head of product for the cloud platform, and Caitlin, you are the head of engineering for the cloud platform. Um, [00:59] I'm really psyched to talk to you because, A, you've been launching a bunch of stuff. You have Cloud Managed Agents that came out recently. You've been launching new features for it. And I think that it... [01:09] It comes at this really interesting time where it makes me think about what actually is a platform in AI for a model company because. [01:18] In the GPT-3 days, the platform was a completion endpoint. You just, like, send a prompt to get a response. After that, it was, like, a completion endpoint with tool calling and a couple – and, like, chat sessions, like that kind of stuff. And now, like, with Cloud Managed Agents, you're essentially getting a Cloud on a computer with memory and all this other stuff. So I'm just trying to – I'd love to help –
[01:41] I'd love for you to help me unpack that trajectory and what it means to build a platform in AI. [01:47] Yeah. I think your characterization is very accurate. I think as a lot of these technologies have evolved with the LLM first starting, and then I think putting that behind an API was very fun. A lot of people were like, wow, I could... [02:01] do some at the time, I think it was very cool. Now we'll probably look back and be like, oh, that was like really basic. And then, you know, I think like we've moved more and more towards like a slightly more like stateful world as you kind of like want to persist the kind of like sessions state to be able to make sure that the kind of performance of the model is like better and better. I think that that's probably like actually the through line. Like as a lot of these kind of like as we make improvements to Claude and as it continues to get better and like more [02:31] basically needing to evolve the platform to be higher and higher order extraction, but it's in the pursuit of helping you get the best outcomes out of something. [02:42] I think in the very beginning, you know, we were very, like, everyone was very exploratory. It's like you have no idea what people are going to build with these LLMs, and you wanted to kind of have as much possibility out there as available. And then as those use cases started to kind of narrow down, like, people started building products with it. People started now, like, building agents with it. And more and more of that is about, you know, like, customers coming to us and being like, how do I get the best out of Claude? How do I, like, set up my tools? How do I run the loop? And so on and so forth.
[03:12] the edges, and that's great. And then you have just a whole host of other folks that are coming in who are like, "I kind of want a lot of this stuff out of the box." And in our pursuit for making sure that Claude is basically producing the best outcomes, we find ourselves enriching the platform to be richer and richer and richer, and that's an [03:30] contained in that is like both the state, it's like the tools that you start to see us adding. It contains a lot of kind of like almost like [03:37] sort of like cloud components of a lot of these types of things. But it's in pursuit of the same mission of like just making things literally as easy as possible. And I think in probably, you know, the forward state of a lot of these things in terms of maybe the philosophy of what a platform ultimately ends up doing, it probably ends up just being like whatever it's like the set of primitives and infrastructure that enables you to basically get the outcome as fast as possible with actually as little of work as possible. And I think that that tends to follow [04:07] Um, but yeah. How would you characterize like what the primitives are today? So maybe that's just asking, what are the primitives in Claude Managed Asians? [04:15] Yeah, so Cloudmanage Agents is built on all of our same primitives that you could otherwise build on directly, so the Messages API. And within the Messages API, we've built a whole bunch of, I guess, maybe innovations around the API. Like, you could just get tokens in and out if you really wanted to, but, you know, you can use some of our built-in tools. You can use stuff like code execution, spawn a sandbox, and execute work.
[04:45] search and all these sorts of different things. And so I think we've taken what we see as all the most powerful of those things and put them together into a harness and a set of infrastructure that is, you know, just the way to get what we think is the best outcomes out of Claude. So I'm sitting here feeling this sense of I've been thinking of it as like time deflation, like my time gets more valuable. [05:09] in the future as opposed to the opposite. Whatever the opposite would be, my time gets less valuable in the future. And the reason is because we're, so for example, internally for us, we're building an agent. We're building some agent products where it's like agents that do specific things for us internally and then hopefully for customers. [05:28] And in order to do that, we have a couple Mac minis with Claude running in a loop on the Mac mini, right? And a lot of that... [05:37] And it's like a thousand line Python file or whatever. And a lot of that mirrors what you guys are building in cloud managed agents. [05:45] So for me, and I think for a lot of people building on Cloud or on the Cloud platform or ecosystem, there's – [05:53] at least I feel this, maybe we should just wait for you guys to build it. Um, but then I don't know what the lines are and, uh, [06:01] And I, yeah, I'm sort of wondering if I want to build an agent, like what is the best path to do that in a way that aligns with what you guys are doing? [06:11] Yeah, I think this part of the platform business is actually somewhat similar to any other form of platform business where you do have customers like yourself who are building and you're kind of thinking, should I...
[06:23] go ahead and do it because maybe I have this like immediate need, but at the same time, I don't kind of want to like, you know, repeat the work per se. And you could have just, when you could have just gotten it for free, um, out of the platform. Um, and also infrastructure sucks. It's so, it sucks so much to like spin up servers. I can't believe you do that. [06:42] That's like, I hate to start the job. That part, everyone's like, that's the worst. But I will [06:53] Anthropic ourselves had gone through enough of these iterations where we built products that were agents that you could run autonomously in the cloud. And we did that, stand up the infrastructure so that it works well sort of work enough times that we ourselves were like, okay, we're done building this for ourselves. We're doing it once in a way that's going to really work from everything that we've learned, but also for all the people who are doing it. Like, you can run whatever you're running on a couple of Mac minis maybe, right? And for a lot of people that could work. [07:23] But I think if you're building agents into your product and you're running something really at scale, right, like that's where it really starts to become more and more challenging to get that infrastructure right. That's really interesting. Yeah. And then maybe to answer the other part of your question, I think we have like two pieces of the philosophy here. One is a bit in the way that we kind of design managed agents, which is that we try to have it be modular enough. Like we want to be opinionated about some pieces that we feel like should be very well like married to the cloud model. But then we... [07:50] Uh, like oftentimes, like the way we want, for example, we want Cloud to like very specifically use like file systems. Um, that's like a very particular like Cloud. In a specific way or just file systems in general? Just file systems in general. We also really want to lean into skills. I know like a lot of folks like skills, but like that's something that we like we want to have our hardest be really opinionated about that. And so we're kind of particular about like those kind of primitives being the case. So like use the file systems, use the skills. They're really basic. Um, but at the same time, like we still find people who are like still trying other methodologies to go do that.
[08:20] you, you know, when you build to start just kind of starting the best foot. So that's one piece on some of the kind of more opinionated ones. But as each one of these kind of like, you know, endpoints or APIs that we have as part of the suite, we try to like open them up a little bit in certain areas. So there's like things that, you know, we're looking kind of forward to and being like, you know, from maybe it's not available today, but in our design, we are trying to make it flexible enough for people to kind of like add in different pieces because we recognize that this API or suite of APIs is not [08:50] like maybe everything in its original construct, and there are going to be pieces that need to kind of open up. And then the second bit is like, you know, we're kind of public about this, is like when we do design a lot of these things, we do put out like blog posts and sort of like reference implementation. So if you did want to kind of at least be inspired by that construct, but still maybe make your own on the Messages API, you can definitely do that. I think that's, to the point you just made, that's something that's coming up for us. [09:20] like, you know, bigger, more serious implementations on, like, you know, cloud infrastructure that we're trying to figure out what to do with. [09:28] And I think I told the team that we were talking today, and I think one of the... [09:34] One of the questions that they have or one of the feelings of consternation that they have considering using cloud managed agents for this kind of thing, for spinning up agents for our customers, is just... [09:45] Right now, it's like... [09:47] we have a playground. We have... We just have, like, a little... We have a server or a Mac Mini. We can just, like...
[09:53] pipe stuff to, to Claude. It can do anything that Claude code can do. It has a file system. It has a browser. It has like all this stuff. If we want to, you know, switch it, switch it out to GPT 5.5 or Gemini or whatever, it's like pretty easy to, to do that. Um, [10:08] So is that kind of and I feel like they they feel like they're we're going to get if we use a cloud management, we're going to get locked in and it's not going to we're not going to have the flexibility to do all the stuff that we want. [10:18] And it... [10:20] There's also a worry that features are going to come to Cloud Code itself that won't be in Cloud Managed Agent for a little while, and that it'll prevent us from being at the edge, which is sort of what we promise to our customers and really to ourselves. Like, we just love being like just doing whatever the new thing is. How do you think about that? [10:37] Yeah, so I think what's nice about the way that we work internally, I guess, is like, so we run the platform and the platform for what most people think of it as is our externally facing APIs and our suite of APIs. [10:51] The other rest of what our team actually does is internal platform in the sense that all of our first-party products are built directly on the same platform as everybody else. And so what's cool about that is we spend all of our time, not all of our time, but a lot of our time working with the teams internally who are building on top of the platform and kind of enabling the features that they will build, sharing ideas and these sorts of things. And so I think over time, you'll maybe see less and less divergence of – [11:19] You know, like what might be available in cloud managed agents, what might be available in coworker cloud code that might sit on top of the same infrastructure, right? Like that's, I think, one way to think about that.
[11:32] yeah [11:32] And then I think on your point around, or your team's point around, having some kind of model lock-in fear, I think that that's valid. [11:39] Many folks kind of have that consternation. And I think we're kind of at this place where there's a bit of like an evolution here where, you know, if you look back. [11:47] maybe even just a couple months ago, it was very standard to kind of build a very, very, very generic harness. It's super generic, and then you can kind of hot swap models across all of those things. And I think for kind of an older generation of models across labs, that kind of worked like okay. A lot of things were moving at a pace where I think that that was like mildly reasonable. I think now for the next kind of generation of models, and as we kind of see it forward, I think you kind of see this a little bit from every lab. [12:17] form of the model. And so in theory, I guess you could do kind of the superset of all those things. But more often than not, I think, you know, like when you build agents for your company or for your customers, you do want to deliver like an outcome ultimately for them. And so I think that that level of abstraction of like what you're actually hot swapping stops becoming this like really generic harness and hot swapping the model. And it gets more to like the harness and the model get very paired. You still need redundancy and you still might want to use other models for things, but you probably do it at the layer of like the agent, meaning like the harness plus the model. [12:47] rather than necessarily the other architecture of like, you know, really, really generic harness and hot swapping everything underneath. That's really interesting. Is that how, I don't know, the cursors of the world are doing things? Like, do they have a separate harness for each model or is it a generic harness that they're kind of hot swapping the models in and out of? Do you know?
[13:04] I'm not entirely sure. My intuition would be that, like, I don't know about Cursor in particular, but there have been, like, teams that we have talked to who have kind of fallen on similar kind of perspectives. And it's mostly because they're just trying to squeeze the most out of each model to kind of, like, [13:18] almost like harness engineer, like every single like nuance. And, you know, one example that we have, it's not an external customer per se, but something that we've done a lot internally, like we recently launched like memory, for example, with with managed agents. And we tried a bunch of different harnesses ourselves, like we tried one that was like the one that we ended up launching, we tried a bunch of others using a bunch of different other techniques. And at least personally for myself, like when I saw the kind of like eval suite from like the team, but each one of these [13:48] And so I think just even looking at something like that shows you that you can actually hill climb a tremendous amount by just harness engineering the right pieces together. And I think if you were to just take that forward across all model combinations, across all different labs, all different kinds of providers, there is a lot of alpha in that kind of construct. And so I wouldn't be surprised if more than just ourselves have experimented with that level of unit tying. [14:18] you tool calls or whether you have the model want to use file systems or not. And then that sort of like – [14:25] changes the trajectory of all these different models. And it feels like maybe at the time, like such a small... [14:30] almost like, you know, kind of like footnote. But it ends up becoming very big. Do you think that that will end up affecting the model's generalizability in the sense that at some point, they'll just have these sort of maybe locked in lanes of stuff that they're good at because they're, you know,
[14:48] Cloud is really good at file systems and OpenAI is, you know, GPT is good at some other things. Like, [14:54] Yeah, how is that going to... [14:56] How's that going to... [14:57] flow through the model's like personality and behavior if it's like locked into a specific way of doing things. [15:03] I do think it does actually kind of tend to lock the model. So what we end up... [15:08] kind of treating as, like, the right path and the right primitives need to be... [15:12] like very carefully thought through. And so like I think in some eras, you know, like of other models, they become really, really, really good at like reasoning. And then they almost like over optimize on that level of reasoning. And there's other perspectives around like, okay, like, yes, we want to be really good at like a computer. Like maybe the computer part is the interesting part. And so if you think through maybe some of the primitives, which we could get right, we could get wrong, but at least we'll like go through the thought process of like, that will probably at least lead us, you know, one path or the other. [15:42] to say like you know in which direction per se will ultimately be true but I do think there's a lot of like path dependency it ends up taking so being really like thoughtful about what you choose to actually include or give kind of the model more natively is really important are there any of those path dependencies that you've had to undo [15:59] Hmm. [16:00] Okay. [16:00] Um... [16:02] Probably. I can't speak enough about that at the anthropic level. I've only been here like a couple of months, but I have to imagine that that has been the case. [16:12] I mean, we've experimented [16:14] like even at other labs, like the kind of like primitives that we have to take a look at are constantly changing.
[16:21] And you do kind of hit, like, a little local maxima and rethink, like, okay, maybe there's, like, a more generic approach. Yeah, yeah, yeah. Interesting. I want to take a step back and ask you something that maybe I should have asked at the beginning, which is... [16:32] Who is Cloud Manage Agents for? [16:35] I set one up earlier today. We've got some people already using it in production inside of Every, and I just did one today. I really loved the getting started chat experience that you had and some of the examples that you had, and it felt to me like... [16:52] even if I was not technical, I might want to use this to set up an agent. It might be a little bit complicated, but what I actually did is I just, and I'm sorry to say this, but I did it in the Codex in-app browser. So I had Codex driving the managed agent setup, and I had a Slack bot working pretty quickly. It was really cool. So how do you think about when you're designing stuff, when you're designing cloud managed agents, who it's for? Yeah. So it's interesting because I think you're right that especially with that quick start experience, which we actually felt [17:22] strongly about launching, not specifically for the sake of making it so that non-technical people could go and build agents, but actually just for anybody, technical or not, to be able to wrap their head around the primitives, like the APIs. Here's what I can do, and here's what fits together. Yeah, exactly. Like, you know, the kind of education portion of it. But I think when we think about who is for, we think about a couple different things. One is we're seeing people internally within companies build automation or build really powerful platforms or systems. Like,
[17:52] say, I want, you know, a full end-to-end software development platform, right? And like managed agent is a perfect solution for something like that. Or, you know, I want to automate a little process over here where like legal has to review my marketing copy, right? And things like that. And so you shouldn't have to re-implement memory and like all that stuff every time you're doing that, right. You can get started really quickly and you can get something running quickly. The other user that's top of mind for us is people building into their products that they [18:22] And so that's the other one where actually, yes, like you do still want a lot of customization. You do still want to make something that's going to be really powerful for your product. But we still like definitely, definitely believe that not spending your engineering resources on the infrastructure and on all the little harness engineering tweaking sort of stuff is worthwhile. Why couldn't we have talked like a month ago? You would have saved us so much time. [18:46] We'll just need to talk more. But I am sort of curious... [18:51] Okay, so maybe infrastructure is one of these things, but... [18:53] When you see people setting up agents, what do you see them think the hard thing is and what ends up actually being the hard thing? Are they the same? [19:01] Good question. Maybe this is, I don't know, spicy, I'm not sure. But I think people think the harness engineering part is the hard part. And so actually, like, you know, in the past, we launched the Agent SDK, which is what you guys, I think, are using on your Mac minis. [19:18] And for a lot of people, they were like, okay, great. I don't have to do the harness engineering part where I have to do prompt caching and I have to maximize my context window and all these sorts of things. I think we're just actually using just clod in bat, like the clod-p command. Oh, wow. Yeah. Okay. It's pretty good. Yes. Yeah. Cool. Nice. Okay, cool. But regardless, like you guys did that because it takes off your hands building the harness, right?
[19:48] scale it and everybody hits an infrastructure wall. Like everyone hits the same problem of like, oh, wow, I either need to like, keep a server constantly running, or I need to use infrastructure that will spin up and spin down and I need to store the transcript data and I need secure sandboxing and all these sorts of things. And so, you know, and like, if you boot a cloud code session, or you boot the agent SDK in a sandbox, and like, that's the thing that you have running, but your sandbox loses connection and dies or whatever your whole agent dies, right. And so [20:18] part especially is the wall that most people end up hitting, but they're more expecting that the actual harness engineering and getting the most out of the model is the part that's going to be harder. Yeah, I totally agree with that. I was just going to say, we talk to so many people who are [20:33] Now at a place where they're prototyping really quickly, and they're super excited, and it's doing the thing. And yet there's a class of people who are really pushing and being like, okay, I do want to hill climb. I really want to edit the hardest. But then once you have that thing, productionizing is just a freaking nightmare, especially for the more interesting, long-running, async ones that you want to do a bit more remotely, that are a bit more autonomous. And everyone kind of runs into that wall. It was a big inspiration for why we built what we built. [20:59] I feel like one of the... [21:02] like er examples of the shape of an agent is open claw. [21:06] And in particular, the thing that it has brought to us internally is you have an always on agent in Slack that has its own personality and has its own like part of the world that it like ends up working on.
[21:20] Are you guys like is is that a possible future for like, OK, a one click agent that lives in my Slack that, yes, I can go set up all the internals, but like I don't have to really think about all of the, you know, the technical infrastructure stuff, because I think you all have. [21:36] the beginnings of that, but it's still like a lot of steps from the current managed agent to something that's always on in my Slack that I have to like set up and customize. So is that, does that fall in the realm of platforms job or is it like too far in the product direction? [21:51] No, it definitely is something that we really want to do. I think, like, you know, we focused a lot on kind of the infrastructure piece to start because that's where we just see a lot of these, like, pain points. But, yes, like, I think in, like, it's, like, you know, I don't want to exactly say final shape, but in its, like, advanced shape, we actually want to make it so that you can kind of deploy these agents really, really easily. Like, we've made, like, some light steps in this direction. Like, for example, we included Vault as one of the primitives as just kind of. And Vault store your, like, keys and stuff, like your OAuth keys. Credentials. Credentials, yeah. [22:21] Yeah. As like, you know, kind of solving some of the lower level pieces as a starting point. But once you kind of wrap some of these more sort of like agent identity type of primitives in a more secure way and you can handle it really easily and it works with like the whole system, then, you know, I think it's very natural for us to get to a place where maybe you are either one clicking Slack integration or alternatively, you may be just telling, you know, Claude, like, add Slack and it just like handles absolutely everything. And then before you know it, your little bot is just picking you on Slack.
[22:51] I can't wait for that world. [22:55] What are the best internal use cases of agents? Because I think there's this big question happening right now where... [23:01] OK, yeah, everyone's in codex or cloud code, but then now we have these agents that are out in the cloud. [23:06] Now everyone inside of a company can like have their own agent. There are team agents that are company wide agents. So what are the patterns that you see for when people make really useful internal agents, what they do and what they look like? [23:16] Yeah, I would say we, similar to, and we've actually seen a few examples of these in some of the more like AI-pilled, AGI-pilled companies like Stripe built Minions. And they talked about that a lot as their kind of like end-to-end development platform that their engineers could use. I think RAM did something similar and we've done similar things as well, right? That's interesting. Yeah, we've built kind of platforms internally that are, you know, I have agents running that I can talk to from Slack or from wherever, right? [23:46] Um, at a certain point that becomes actually like a pretty thin layer on top of managed agents. Like you don't have to do very much to accomplish. That's what I was thinking. Like I looked at minions or whatever ramp does and I was like, it, [23:58] Why? Why? You know, so is it is it actually useful to have a sort of like thin... [24:03] coding agent that anyone in the company can use? Or like, why not just install the Cloud app in Slack? Yeah, I would say the difference in a platform like that, and some of the things that we've done internally, is there's a lot of customization that you might want to do on, you know, the development environment where an agent is actually running and able to verify its changes, right? And things like that. It's like, here's how our CICD works. Yeah, exactly. And so, you know, I think for lots and lots and lots of people, like Cloud Code is an excellent
[24:33] agents with cloud code and that is really great. But I think if you're trying to do a bit more end-to-end development, right, and you maybe want to bake in more custom things, then you could start with something like managed agents and build a layer on top of that and end up with something that's maybe closer to [24:47] that end-to-end experience. It also seems to me like there's something in particular about [24:52] having a team that you need to work with that makes the managed agent shape important as opposed to it just all works in cloud code. Like, I guess technically you could like sync the skills between everyone's cloud code. But like there's something about just we all have one agent that does this thing that seems to work. [25:07] Yeah, I'm really glad you brought that one up because I think that's actually one of the more common areas where we see a lot of the opportunity is that, to your point, there's a lot of individual productivity that's happening, whether you're a developer or non-developer. There's so many tools that you're using to just make yourself more automated, more high leverage. But then when you get to the team layer, suddenly everything gets massively more complex. Number one, obviously, you can't sit on your laptop. [25:37] with your laptop closed. But then you go to like, okay, well now like the three of us want like, you know, a couple agents that interface with each other and work with each other. And then maybe we're automating a process kind of end to end. And especially for some of the more complex processes that you kind of envision being like really transformed with AI, you do need like, you do need that kind of like team orientation. And that needs to happen at like a layer that's a slightly higher bit of abstraction than just a single agent. And I think some of the teams exploring, you know, kind of multi-agent architectures and things like that are really exciting.
[26:07] But it needs to be built on top of a little bit of a platform that everyone kind of spin up and down and control. And I think G from Vercel had a really good perspective on this in a way where I think his company, Vercel, is obviously incredibly AI-pilled. And he kind of describes it as sort of like an AI software factory internally. And I think that's exactly the right mindset. And that produces an extremely high-leverage organization that's really just creating a tremendous amount of productivity, but not just for themselves, just for every single process that they have in the company. [26:37] to go back to this like, okay, agent use cases. We've got [26:40] Coding agents that that anyone can use in the company like what are the other ones that are? I [26:44] that you see people standing up that are really useful. [26:47] We've seen a few. [26:49] One of the fun things that we get to do is just kind of work with our internal teams of different functions and, like, help them agentify, because we actually just get to learn a lot as a result of doing that. And so the silly example I brought up earlier of, like... [27:02] legal team needs to review marketing copy. It was one of the ones that- Very real. Yeah. Like extremely real. We like blew people's minds with like very basic agents that just give people the right setup to be able to do that. So we've seen that. Well, what does that actually do? So it's like there's marketing copy and there's a legal agent that is just like watching what everything marketing does and is like, stop. Like- No. Yeah. No. It is more like, okay, I'm a marketer and I've written some copy, right? And
[27:32] please review this copy. But instead, you submit it to this little app that we built on top of agents that is like, okay, cool. Now I'm going to go as an agent review first and then put it in legal's inbox as a already first pass review was done. And maybe actually the agent, it's clear enough that it can say, okay, marketing, you're good, right? Or maybe it's still like, no, this needs an extra human review. That's really interesting. And that's the sort of thing where [27:59] Again, just thin layer on top, but you can build the, you know, you have access, I have access. We can both see the outputs and we can work together on it. Okay, but then, so for example, why is that not a skill? [28:09] So it very much can be a skill, and that actually is, like, if you would probably build that agent as a, you know, legal reviewer agent, right? And so you would have MCP servers or whatever it is that help you access external contacts. You would have skills that help you understand, like, here's what rules we have to follow and not follow, right, and all those things. And you put all those things together, but then you can just fire off a session with that agent. [28:39] the form factor on top where like, [28:42] different people can collaborate together and like work with that agent and multiple agents can be involved in the system. And so I think it goes a little bit broader than a skill because you kind of still need like the right form factor for the agent to be able to go run and then for people to be able to interact with it. [29:00] Another core bit of why it's not a skill is because, or not exclusively a skill, is because you actually do need human in the loop. And so, like, if you were to automate the whole thing and you were just, you know, like, taking the skill and looking at yourself from, like, legal skill, for example, like, in that world, of course, you could have just, like...
[29:16] done a peer skill. But if you need a human in the loop to be like, okay, I want to review and I do want to check and I want to, like, we're looking at like legal things. And so there's a bit of like, you know, authentication that's sort of necessary. In order to automate that entire process, you kind of need like agents to go do the thing. And so because you need to spin up sort of separate sessions for that to happen, some sort of stitching is necessary that can't be instantiated in a single skill. That's really interesting. Yeah. [29:39] Um... [29:40] Okay, so just to push on that a little bit. So what is the best practice for you? You create an agent. [29:45] That its job is to make sure that when marketing is writing something, they can get it approved really quickly by legal. [29:51] And sometimes it'll approve things immediately. Sometimes it sends stuff to legal, uh, [29:56] And ideally, it's like getting better all the time. So it can do more and more, right? [30:01] what is the best practice for who owns that agent once it's built? Because one of the things that we found is if you don't have a human who's responsible for the agent, it gets stale very quickly and then it ends up being kind of this like dead thing. That's all just like out there doing stuff, but it's not necessarily good. And also, uh, even if it kind of works, there's all, there are going to be all these times where legal is like, you asked me to approve this, but I don't really need to approve this thing. Like let's update your prompt. So like, how does that all work when it works well? [30:31] So it's actually really interesting because so the form factor thing, right, like the app that sits on top of that that we originally built, one of our teams worked on that. Right. And like kind of sitting with these teams and understanding what they needed. And they were kind of like, OK, here you go. And we're going to go do other stuff now and like let us know how this goes for you. And then a really cool thing actually ended up happening where people on those teams who were using the tool were like,
[31:01] and they popped open Cloud Code and made some of the changes themselves. And then is your team responsible for approving the PR or does it just go in? Usually my team is responsible for reviewing the PR if it's a system that we actually own. But yeah, people can kind of self-serve making changes to those things, which I think is really cool. I do think we're still in a stage for a lot of teams and a lot of companies. Even going back to Stripe has minions, right? [31:31] Stripe has a large developer productivity team. We used to work at Stripe, so we spend a lot of time with them. But they have a large developer productivity team. They're awesome. And they're obviously putting a lot of work and energy into building platforms and tools like this. And so I think we're definitely still in a place where something like managed agents or being able to build on top of our platform is really powerful. [32:01] - That's interesting, yeah. I love the anyone can open a PR to do this, 'cause everyone's using Cloud Code. One of the things that I find talking to people who are [32:11] in infrastructure roles at companies where this is starting to happen is like, you know, that, you know, the meme where it's like, um, there's, there's a person and he's like going like this and he has like daggers in his like back and he's like covering it. It's like infrastructure. People are that now anyone can like, can let's can submit PRs. Um, how do you, how do you deal with that? And how do you do that? Well, cause obviously like,
[32:36] In an ideal world, you would love for legal to be able to submit PRs to improve this agent. [32:41] And also... [32:42] Um, sometimes they're probably going to submit stupid stuff that wastes time. And so what are the, what are the right ways to either organizationally, like culturally or technically like make that possible without ruining your, your lives? Um, [32:56] For this particular one that we've constructed that Caitlin's given as an example, we actually have, like, a couple layers of abstraction away from, like, that kind of, like, PR layer. So at the very beginning, it kind of, like, started that way. And to kind of, like, basically... [33:09] prevent users from kind of foot-gunning themselves a little bit, they kind of get to a place where oftentimes their way of interacting with the agent that they own, like whether it's the marketing team who owns the marketing agent requesting, or if it's the legal team, you know, owning the agent that does the review, they actually engage with those agents through Claude itself. So they actually spend more of their time like kind of talking directly to Claude, and then Claude will oftentimes figure out what should be the right way for them to go and handle it, so that they're not kind of [33:39] bit and doing something that may result in some complications. And they're talking to Claude or Claude Code? Like Claude Chat or Claude Code or Co-Work? It's a different instantiation of Claude that we made that actually is a managed agent in and of itself. So it's just kind of like managed agents all the way down in that construct. But we found that each layer, if we kind of tune and prompt each variant of the managed agent, it helps to solve like... [34:03] different parts of the problem for users. So at the end state for that marketing person or that legal person, it is like a really simple interface where the way that we tell them is like, you're just talking to Claude. But under the hood, it's many, many Claude's engaging with each other to get to the part where then the Claude's themselves are doing the more complex work that the human doesn't really necessarily need to interpret. Interesting. You guys just launched multi-agent orchestration. What are the coolest things that people are doing with that?
[34:30] one of the more interesting ones is like um i think people are using it to like construct sort of different harness techniques and that one i'm personally very excited by um because like there's different techniques that people have experimented with where um you know like for example we recently did like the advisor strategy one but really if you were to genericize it you just separate like execution from advice and there's also one where you can have like two you know modes where there one is generating someone something and the other one's adversarial [35:00] could also be sort of like, you know, you split it into a bunch of different like little tiny pieces and then they kind of recombine. And then there's ones where maybe it's kind of something closer to like best of end kind of like style of thing. And then there's so many more. And like in each one of these different types of like architectures or strategies, they are good for very specific use cases. So some of them are much better for like deep research or wide research type of style use cases. Right. And there are others that are like these are like the kind of ones where they all sort of swarm together are better for like bug hunting, for example. And so like [35:30] that if we can make the primitives very Lego-like, then people can put them together to solve things at a slightly higher form factor, which is more like... [35:38] an architecture or like a strategy. And they get much more like interesting results out of that. And that's like really exciting to see because it also suggests that you can actually hill climb, [35:47] at multiple layers of abstraction. How do you know if an agent is successful? How do you measure success for an agent? [35:54] Yeah, I mean, there's like evals and stuff like that, which everyone has talked about like ad nauseum. One direction that we really like is like...
[36:01] this kind of verifiable outcome. We've been somewhat opinionated on that one. And it's almost like in the absolute end state of, you know, we talked a little bit about what's what's a platform, the end of things. Going from that philosophy, it's like, [36:13] our kind of principle of like maybe the end state of some of these things is that, [36:17] everything should kind of compress down to an outcome and like a budget. And that's probably like about it. And everything else should be figured out for you to kind of resolve exactly across those parameters. And so for us, we're kind of, yes, we still have evals. We have a lot of these other things that we measure that are domain specific. Like, you know, some coding evals would be like you might want to measure like just the actual PR getting merged. Those are more verifiable. But as we get to the place where, you know, like an outcome is actually a spec that you are just as a human able to define [36:47] interpret that and regrade itself over and over as close to what we care about. Claude, make me a billion dollars. Your budget is $10. Exactly. I meant to say no mistakes. Go. Exactly. Maybe Mythos could do that. [37:03] And then one of the things that we were running into that I'm curious if you have a solution for is, [37:07] Agents get outdated pretty quickly. [37:09] Um... [37:10] Sometimes because there's no human attached to them, sometimes they're just running an old model or in an old architecture or whatever. [37:18] And it feels like there needs to be a... [37:22] end of life cycle for agents like we've talked about having like a little like funeral for them and like having like a little page on our website that's like here's all the decommissioned agents and stuff. Oh wow.
[37:32] How do you manage, especially in a really big company, how do you manage all the agents that are sort of out there and maybe they're in Slack pinging stuff once a week, but you're like, this is super stale. How do you make sure that you... [37:46] retire them as quickly as you are making them. So one of the things we have actually done is we have [37:53] maid skills... [37:54] that help you do things like upgrade to a new model when a new model comes out, right? Like we've actually put a good amount of work into making it easier to do exactly what you're talking about. And I think maybe some of the most like AGI pill people are like running agents that are monitoring their agents to see if their agents are, you know, like outdated and in need of that sort of stuff. But I think for, you know, the way that we like to talk to customers who ask us this question, [38:24] is there's a new model and now I need to go upgrade my agents or maybe be done with those agents because you know the new model in enables me to build agents that are way more powerful and do more interesting things than the old agents did right but I think that upgrade process and that migration process is like something people have had to wrap their heads around as like it's like a breaking change and I have to like put actual energy into making that work [38:50] And obviously, sorry to talk about evals, but like if you have evals, this process is easier and things like this. But I do think that's one of the things we've tried to do is how do we give you skills and how do we give you the right like just tools to make that process easier? And then you could go be AGI-pilled and choose to actually automate more of that with more agents. Yeah. Yeah.
[39:10] So a year from now, we're back at Code with Claude. Where do you think the platform will be? What will I be able to do? And how it will be different from what I can do today? [39:24] Do you want to go first? [39:25] You can go first. A year is a long time. In this industry especially. How close are we to Club Make Me a Billion Dollars? That's really what I'm asking. [39:41] I mean, yeah, like, we want to get closer and closer to that state where I think we kind of... Okay, so a couple things. I think in a year from now, I mean... [39:49] One thing that we'd love to get really, really close to is actually that kind of, like, simplicity. And this might be a significantly higher order of abstraction. I don't know what the form factor will look like or whatever. But the kind of parameters we will care for from users will be that outcome. And, of course, it has to be verifiable. There are some parameters that have to be restrictive. And the budget. And I think, like, we'd want to experiment with directions where Cloud actually gets so good at understanding itself. It figures out what model you should be using. It figures out how to spin up all the subagents. [40:19] in that world. Today, you know, you don't have to think so much... [40:21] more aggressively about like tool construction, for example, like we've kind of made that a little easier and you get to delete a little bit of that scaffolding. Less prompt engineering too. Yeah, exactly. Exactly. And I think if you just keep going up that stack, like today, a lot of the innovation is happening at this kind of like, like, like, [40:34] really high level, almost like harness architecture-like level, which is really fun. But I think a lot of that, honestly, also kind of goes away, where you almost like don't have to think so much about like model selection. You don't have to think so much about what kind of architectures are there, because we probably would have like gone through enough iterations with Claude, where Claude is actually able to understand itself enough that it can almost like write itself on the fly to figure out what is necessary in that kind of like two-parameter
[41:04] year, but I feel like we might be able to do the outcome part of that with maybe... [41:08] you know, some bars, some error bars on the budget side. Really cool. [41:12] Yeah. Okay, that was really cool. I'm gonna give you a slightly more boring answer, which is in that world. [41:18] If Claude is, like, on the fly or Agent's on the fly or, like... [41:22] becoming what they need to become in order for you to do what you're trying to do. The platform has to like, [41:27] seriously scale. That it is. And so I do think some of this will be what are the right abstractions that actually enable that, right? Like somewhere on the primitive to higher order realm, right? But I do think so much of what our team is going to be doing is making sure that the tokens that people want to come in and out of Claude are going to be able to come in and out of Claude because our system is scaled to meet not just the demand, but like in that world where it's just like you have [41:57] and recreating themselves and doing this sort of work. You just need a system that can handle long-running requests, can handle a bunch of differently shaped things. And so I think for us it's going to be, I never want the ability of the platform itself to be able to scale, to get in the way of what people would otherwise be able to accomplish with these things. And so I think that's something that's going to probably be very friend of mind when we're talking in a year. Awesome. I'm excited. Thank you so much for joining. [42:27] learned a lot. [42:28] Thanks for having us.
[42:58] and laughter that will leave you on the edge of your seat. [43:01] craving for more. It's not just a show, it's a journey into the future with Dan Shipper as the captain of the spaceship. [43:08] So do yourself a favor, hit like, smash subscribe and strap in for the ride of your life. [43:14] And now, without any further ado, let me just say, Dan, I'm absolutely hopelessly in love with you.
Want to learn more?
Ask about this episode