How to Win With Prompt Engineering - Ep. 38 with Jared Zoneraich

Prompt engineering isn’t just about telling AI to solve your problems—it’s about knowing which ones to solve. Yet there’s a mismatch between the people who can identify the right problems—experts with deep domain knowledge—and the technical infrastructure required for developing and refining prompts. Jared Zoneraich , the cofounder and CEO of prompt engineering platform PromptLayer , is bridging the gap with a platform on which non-technical experts can manage, deploy, and evaluate prompts quickly. The role of human prompt engineers, however, has been the topic of controversy, with some arguing that AI can optimize prompts better than us, while others suggest that more capable LLMs eliminate the need for meticulously crafted prompts altogether. I spent an hour talking to Jared about why he believes prompt engineering isn’t becoming obsolete. He also tells me everything he’s learned about writing a good prompt and what the future of AI tools looks like. Here is a link to the episode transcript . This is a must-watch for prompt engineers, people interested in building with AI systems, or anyone who wants to generate predictably good responses from LLMs. If you found this episode interesting, please like, subscribe, comment, and share! Want even more? Sign up for Every to unlock our ultimate guide to prompting ChatGPT . It’s usually only for paying subscribers, but you can get it here for free. To hear more from Dan Shipper: Subscribe to Every: https://every.to/subscribe Follow him on X: https://twitter.com/danshipper Links to resources mentioned in the episode:

Published: Published Nov 13, 2024
Uploaded: Uploaded Jun 13, 2026
File type: POD
Queried: 00
Source: share.transistor.fm

Full transcript

Showing the full transcript for this episode.

AI-generated transcript with timestamped sections.

0:00-1:47

[00:00] I really don't like these research papers about prompting. People love sharing them, oh, this research paper came out of this new prompting strategy. Prompting doesn't feel like that sort of science. It's more of, I'm trying a bunch of stuff. And as long as I have a good data set and a good framework to iterate and see if it worked, that's what I should be focusing on. What's the gnarliest prompt you've run into? I'm going to be honest, it's the chat GPT or anthropic system prompts. Those are bad. Wow, hot take. [00:30] most companies, you're going to win by working with the domain experts who can write down that problem as we were talking about earlier, who could define the specs of what you're solving. The best prompt engineer is create it as a black box and say, the LM is getting more complicated, not less. Let's not think about how it works. All I want to think about is how do I map the inputs to the outputs I want? And if you're not getting the outputs you want, that's a skill issue. [01:00] you [01:04] Jared, welcome to the show. Thanks for having me. [01:08] So for people who don't know, you are the co-founder and CEO of PromptLayer. And how do you describe PromptLayer these days? Yeah, we are a prompt engineering platform. We came out about a year ago, a year and a half ago, actually. [01:19] Whenever ChatGPT came out, not too long ago. I remember that. [01:24] It was an internal tool originally. We just put it on Twitter and people liked it. And we're like, maybe we'll do this. Kind of like early Slack vibes a little bit. Yeah, exactly. Exactly. So I guess like you've been around for a year and a half. So the prompt engineer is not dead yet. Have reports of the death of the prompt engineer been exaggerated? Or where are we in the timeline? I think so. I think so.

1:49-3:23

[01:49] question. [01:50] we hear a lot is that prompt engineering going to be automated what's going to happen here [01:55] I'll tell you my take on it. [01:58] We so [01:59] So we're focused on building the prompt engineering workflow. How do you iterate on these things? How do you make the source code of the LLM application better? [02:07] There's three primitives we see here. [02:09] The prompt? [02:10] the eval and the data set. [02:14] You can automate the prompt, but you still have to build the eval data set. You can automate the data set. You need the eval prompt. You can take out one of these elements from the triangle, but... [02:24] At the end of the day, our core goal, [02:27] thesis, our core theory is that prompt engineering is just about putting domain knowledge into your LLM system. And whether you have to say please and thank you to the AI, that'll probably go away, but you still need to iterate on this core source code, we can call it. [02:43] Okay, so you're basically like, probably you're not going to have to be like, hey, like I'll tip you $5,000. And like, in a couple years, like that thing is going to go away. But like the sort of core primitives that you're talking about. So the prompt, the eval and the data set, can you make that concrete for me? Like, what do you mean by those things? Give me a specific example. [02:59] Yeah, yeah, let's talk about an example. [03:02] Let's say I'm building a... [03:06] AI math tutor. Um... [03:10] Pre-AI, there's a lot of math tutors you can go to. Some are good, some are bad. There's almost infinite solutions to this problem. So there's never going to be one prompt that rules at all,

3:23-4:53

[03:23] Build an AI Map Tutor and it makes you [03:25] the only solution you can imagine, [03:28] 30 companies competing for this. So when I was talking about those primitives, [03:32] when you're building this you have the prompt or [03:35] multiple prompts on how you respond to each question the student asks. You have the ways to test that and the sample data you're testing that on. [03:44] But, [03:45] All those are a different kind of [03:47] a more technical way to say you have [03:50] the actual knowledge, the math tutor. [03:53] has. That's interesting. Uh, you're, you're making me think about like, um, um, [03:59] I [04:00] I love the kind of like, there's no one prompt to roll them all. There's no one data set to roll them all. And even in a particular domain. And the thing it made me think of is... [04:11] Do you know the philosopher Leibniz? I'm sorry to take it to Leibniz. No. Okay. So he's like around Newton's time. He also invented calculus. They fought over that. But one of the things that Leibniz was really into is the idea of creating a language where you could only say true things. So the syntax of the language made it so that you could only say [04:41] Thank you. [04:42] They would just say, like, write it in the language and then calculate and get the answer. Just like if you have a question about, like, you know, how far is the cannonball going to travel? You can use calculus to figure that out. Same thing for, like...

4:54-6:29

[04:54] statements of fact or statements about the world. [04:57] with this hypothetical language. And obviously he never figured it out. Um, [05:02] And the reason is because when you try to produce something that's so... [05:10] all-encompassing or totalizing of truth, it gets really brittle, it gets really hard. A lot of early AI attempts were sort of like that. And I think there's a thinking error in a lot of what people think of when they think of, okay, what's the future of AI? Like, oh, I'm just going to tell it to be a math tutor and it's going to be the best possible math tutor. It's the same kind of thing. It's going to be... [05:30] What that means is incredibly different in many different contexts. And so there's a lot of room for many different... [05:40] attempts at finding the truth or representing the truth of what is best in any given scenario. And I think you're kind of, you're getting at that with the answer of like, yeah, we're going to have to do prompt engineering because like there's going to be 30 different ways to make a good math tutor and like who knows which one's going to be the right one for which person. [05:59] Yeah, no, I mean... [06:01] Forget about the prompt. [06:02] in this world of [06:04] we have the best AGI possible. [06:06] I like that reference a lot because it's the left. How do you even define the problem to solve? That's the hard part. Let's assume you had... [06:14] millions of dollars, you just [06:17] got a big VC round to start a new company and [06:20] What would you build knowing AGI is going to come? [06:23] The hard part is the problem. You would start working on what problem are you going to solve? Because even with the best tool...

6:29-8:21

[06:29] you need to define the exact scope of the problem. And that's kind of the irreducible part. [06:34] That's really interesting. Yeah. And I think we're so used to building things being expensive that finding the problem is like this invisible thing that like everyone has. Like, I don't know if this is your first company. I assume it's not like every is not my first company. You spend like freaking years trying to figure out what to solve. And the building is expensive and hard, but like... [06:55] finding the right problem is quite a bit harder. And I guess you could sort of like posit, okay, the AGI is going to figure out the problem for you to solve. But like, I don't know, that requires a lot of real world interaction and creativity and a viewpoint. I don't know, that feels quite a bit more complicated than building software. Well, this is my first company, actually. Oh, it is? Oh, congratulations. You're doing great. [07:25] before this that probably was, you know, it was like, [07:28] are pivot. But, uh, yeah, I like this simple example. I like to think about, [07:33] If I wanted to build an AI secretary that booked me flights and I have a conference in Japan, [07:40] there's a lot of different possible solutions. Do I want an aisle seat? Do I want a window seat? Do I rather a non-stop over a business class with a stop? And these are the differences that mean there's no one solution. These are the... [07:54] I love the word irreducible here, and I stole it from... [07:59] Stephen Wolfram wrote a few articles on ChatGPT. Like computational irreducibility? Yeah, yeah. I love that. I love that theory. And I think the irreducible part. Can you define that for people who are listening and haven't heard that before? Totally. So, and I'm not going to do as good of a job at defining it. So I encourage everyone to look it up and read the original. Use ChatGPT to look it up if you're listening to this or watching this.

8:23-9:53

[08:23] Yeah. [08:23] So I would define computational irreducibility as [08:27] There's... [08:29] when you're solving a problem, [08:31] You can collapse a lot of parts and speed up a lot of parts of solving the problem. But there's one part that you'll never be able to. [08:39] collapse. I like to think of when in school, in math class, you're, what was it, factoring, right? When you're kind of taking things out into the parentheses, but there's always that [08:51] irreducible part that you can't factorize or you can't simplify and in this case [08:56] We're saying if you have an amazing AGI that can solve any problem, [09:01] The hard part is [09:02] What do you even tell it to solve? Yeah. I think another way to think about computational irreducibility is if you imagine the world to be like a big computer or the universe to be a big computer, there are certain operations that like you actually have to run the computer to find out what happens. You have to run out the reality of the system in order to know. There's no like you don't have calculus to be like, OK, I know where it's going to be in 10 steps because like the universe doesn't know yet. And I think that's super cool. Yeah. Halting problem-esque. Yeah. [09:32] like pluralistic framework of the future of ai how did you come to that [09:36] Well, it depends, I guess. What are you referring to as pluralistic? Like multiple models or multiple stakeholders? Multiple approaches to any given problem. Yeah, I think... [09:47] I think it just... [09:48] Just observation. Honestly, I'm a hacker. That's my background. I'm not...

9:53-11:38

[09:53] necessarily a researcher and not necessarily, to be honest, and this might be a controversial thing, AGI kind of doesn't interest me whether AI is going to take over. [10:04] I love that. You're the first person to say that on this show. Hot take alert. It is. And I get into, you know, we have a lot of debates in it. We actually, when we interview someone, we usually take them to lunch and ask them what they think about AGI. Just curiously, if you get their takes. What's the bad response? There is no bad response. [10:23] There's no bad response. So why do you ask it? I'm curious what they think. We have a whole different, lot of different ideas. [10:31] What I'm most interested in, not so much [10:34] is ai going to take over the world is it going to kill us all i'm interested like how do you build with it i think it's just a really cool technology and we are as a hacker i am like [10:46] My mental model of AI is not [10:49] the reasoning part as much as it's self language to data and data to language. And how much does that open for us? So [10:59] That's how I get here. It's a tool in the toolbox. We have a lot of tools in the toolbox. And what can you solve? [11:06] In the same way that like my non-tech friends will ask me, why does Facebook have engineers? The site works for me, right? Why do you have to hire more engineers? I think it's the same thing. [11:15] I look at AI the same way. You're going to always be iterating. There's so many degrees of flexibility in these things. Yeah. You said earlier before we started recording that one of the things you're kind of excited about or thinking about a lot is the non-technical prompt engineer. Can you just open it up for me? What does that mean and how did you start knowing that that was a thing you wanted to pay attention to?

11:39-13:09

[11:39] Yeah, yeah. [11:41] So when we launched PromptLayer, as I was saying, it was kind of an internal tool we built for ourselves and people started liking it. So we didn't have this idea when we started. [11:51] But I'll tell you, there was kind of a light bulb moment I look back on. It's this team parent lab. They're building a very cool app. It's basically an AI parenting coach. [12:00] to help you [12:01] Be a better parent. We got on a call with them. [12:04] and just feedback on our product about a year ago, and ask them, how do you like PromptLayer? What can you improve? [12:11] But the most interesting thing is someone on the call, one of their prompt engineers was a teacher, 15 years as a teacher. [12:18] not technical at all. And we're like, what are you doing? Why are you using PromptLayer? What's going on? And she had explained to us that the engineers had set it up for her, [12:28] She would go into prompt layer, edit the prompts, [12:31] and then pull up the app on her phone and it will pull down the latest version of the prompt and from there kind of the gear started turning and [12:38] Where we're at now is the core thesis of what we're building and [12:42] PromLayer itself is that you're not going to win in the age of Gen AI by hiring the best ML engineers. You're not going to build defensibility. [12:50] through machine learning. [12:52] for most companies, you're going to win by working with the domain experts who can [12:56] write down that problem as we were talking about earlier who could define the specs of what you're solving and [13:02] In... [13:03] cases like [13:04] building AI teachers, AI therapists, AI doctors, lawyers,

13:09-14:43

[13:09] I'm an engineer. [13:10] I don't know how a therapist should respond to someone who's depressed, right? Like, I'm not the right person to be in the driver's seat of the [13:17] building that. [13:18] Does that make sense? Yeah, I think it makes sense. Can you give me like a concrete example of like what this teacher was prompting? Like, I don't really understand like what she's going in to change and then what changes she's like able to see in the app. [13:29] yeah, so she's, when I say she's prompting, she's [13:34] editing the prop that powers the app. So, [13:38] Maybe... [13:40] They're opening the app and it's... [13:44] a user could be saying, "How do I discipline my child?" The AI needs to respond very specifically to that, and maybe it was saying, [13:51] I'm a language model trained by OpenAI. I don't support any discipline. And then she needs to go into it and say, okay, in that use case, [13:59] Let me make sure the AI responds. [14:01] I don't have children, so maybe I'm not the best one to say how it should respond, but she makes it respond in the right way. Smack them. Yeah, yeah. Hit them with a belt. [14:10] And... [14:11] You don't want to ruin all the other cases, though, where they're asking about what I should feed them. You shouldn't also hit them with the belt. So it's kind of that systematic process. And it's just an iteration. We really believe prompt engineering is best. [14:26] How do you close the feedback loop? How do you iterate as quick as possible? Yeah. [14:32] Okay, let me just play devil's advocate for a second. So like, I think I agree with you. I really, I really like this. And it resonates a lot with some of the things that we've been building internally at every, which I'll tell you about in a second. But um,

14:43-16:16

[14:43] I think a AGI-pilled response would be, sure, you can have the teacher do that, or you can just put the AI in the loop and have it measure parents who say thumbs up or thumbs down on a response, and then it can just try new things until it learns what responses to give. [15:06] And so you don't really even need human prompt engineers anymore? What do you say to that? Yeah. So... [15:13] Let's switch gears to AI therapy because I think that's an even better example. [15:18] do you want the therapist that just gives the responses that you want to hear and people are giving thumbs up to maybe that's not the best metric uh maybe there's a better metric that you can find maybe the metric is how long people [15:31] Maybe just an exit survey or something like that. Yeah, it's like long-term well-being or whatever, you know, something like that. Yeah, I think the data-driven approaches need to be used as well. [15:40] But... [15:41] at the end of the day, I think you're going to reach a local minimum. You're not going to [15:47] When there's 20 other teams working on this exact problem and you're facing off in the market against them, the differentiation is everyone's going to do this data driven approach. [15:56] The differentiation is [15:58] What? [15:59] Going back again, what is that problem you can write? What is that domain expertise you can bake into the application? And [16:07] As you have more and more users using it, you have them using different ways. [16:10] What edge cases can you find? [16:12] What is the trade-off you're willing to take between latency and quality?

16:16-17:57

[16:16] And that's there's no one answer to that. Every company has to decide that themselves. And that's the job of the prompt engineer, in my opinion. [16:24] Yeah, I think people forget with the data-driven stuff that... [16:29] Having a perspective is really important. And... [16:34] the way that you're collecting data and the way that you're doing the loop of response to changing the prompt or whatever, that's not neutral. That embodies a perspective on the world, even if it's like, [16:49] implicit or you haven't thought about it too much. And having a different loop that embodies a different perspective will get much different results. [16:57] and will create behaviors in people and all that kind of that are just different. And what's really interesting is... [17:06] If you're a human with domain expertise, like... [17:10] You've developed a perspective over many, many thousands and thousands of data gathering, you know, [17:18] loops that maybe you could get with an AI, but it would take a while. [17:23] We're back to the computational irreducibility thing. You just have to... [17:28] You can't just be a super intelligent sitting on a server and theorizing about what might make people react well in a therapy situation. You probably have to try it. There's maybe some weirdness there where if you have enough simulated data, if you have enough data, you can simulate different things and do self-play or whatever. But I don't know. We can just put that to the side for a second. It's much harder to do that. It's much harder and it's much easier and better probably to take people who are smart and have a lot of domain expertise. Also...

17:57-19:45

[17:57] Who sets up the loop, like you were saying? Even the supercomputer setting up the loop to learn. It's super intelligence all the way down, Jared. [18:07] All right. So... [18:09] the super intelligence all the way down it's still [18:12] like you said, it's still taking an opinion. There's still... [18:17] bit [18:18] I think, [18:19] it's the same. I think also to go back, Steven Wolfram also talked about this in that article of, uh, [18:25] You have a computational irreducibility and then you have the [18:29] Because of that, [18:30] There's not going to be one model that rules them all because there's multiple... [18:34] answers uh there's uh it's irreducible how to find it and because of that [18:40] there is no one way to gather data and come to the conclusion. And by setting up these data-driven approaches, [18:46] Instagram is different than TikTok. [18:48] They're both data-driven, but the users are different and how they make decisions are different. Totally. I think that's totally right. And so the thing that you're making me think of is this product that we have. And I think this product, it's a pattern that we're replicating across a lot of different products. So I think it actually... [19:07] It's a general thing in AI, and it relates to what you're doing, but it's a sort of slightly different take on the same idea, which is... [19:13] We have this product called Spiral, and it helps you automate a lot of repetitive creative work. So let's say for this podcast, I'm constantly taking the podcast transcript and turning it into a tweet. And... [19:27] And Claude, with a few-shot prompt, is really good. If I have historical examples of podcasts that I've turned into tweets, getting me to 80% on my tweets. So I just throw it into Claude and whatever. But the prompt is kind of hard to construct. And it's kind of messy to do it in Claude. So we built Spiral.

19:45-21:20

[19:45] And in Spiral, you can just create a spiral that converts podcast transcripts to tweets. And then you give it examples. It writes a little guidebook for itself. And then you have a little form where you can just paste in your podcast transcripts and it makes tweets in your voice and with your examples or whatever. And there's no one podcast transcript to tweet converter. You can make many different spirals with many different tones or voices or styles. [20:10] And you can share it with your team. You can make them public, whatever. We launched that like three or four months ago. [20:15] that same flavor of not having one single answer to a problem and sort of there's some differentiation in having like this diversity or pluralistic like approach to like what prompt is best for a given situation. [20:28] Um, but one interesting difference that I'd be curious to like pull apart with you is in, I think in the prompt layer example, you've got like a teacher who is like the expert for everyone else who's using the app. Like she's the expert for the parents more or less. Um, and then in, in a, in a spiral example, like each user is their own expert to some degree. Cause they're all, uh, either constructing their own prompts or taking a spiral that exists and cloning it and maybe changing it. [20:58] Seems like there's two... [21:00] There's two places to prompt engineer. One is to turn your users into prompt engineers without them really knowing it. And the other is to have domain experts that have crafted something that users are using. [21:12] Sportball. [21:13] Or both. That's also, yeah, I mean, for sure. Like, yeah. I mean, there's probably a skeleton prompt that you guys are using, right?

21:22-22:54

[21:22] Yeah, I guess my take on this is that, [21:25] At the core, [21:27] what Spiral is doing, what PromptLayer is doing, what these end user applications are doing, [21:32] Taking a knowledge expert... [21:34] and distributing their knowledge. Whether it's distributing it across a workflow and across data and distributing it by writing tweets from a... [21:45] podcast transcript or whether it's taking someone's knowledge of parenting and [21:50] distributing it to the user base of their app. So, [21:54] I actually see them as the same problem. And it's less about [21:58] how much degrees of flexibility do I want my user to have at first? [22:02] What workflows can I... [22:04] provide them. What [22:06] What workflows can I kind of scale on my own? Can I hire a team up? [22:11] salespeople and build a sales AI application that does what they're doing. [22:16] Same thing, I think. [22:17] What have you learned about the most effective ways to make good prompts? So you talked about the primitives of prompting data evals. When you think about the people who are most effective at... [22:32] uh, making good prompts and improving them over time without backsliding, especially, you know, dealing with new models and all this kind of stuff. Like what are the, what are the characteristics of good prompting? [22:42] Yeah, I have a really annoying answer here, which is it's just the scientific method, just trying and fixing it. I'll give another... [22:50] AGI style hot take, which is

22:54-24:25

[22:54] I really don't like these research papers about prompting. I think people love sharing them, "Oh, this research paper came out of this new prompting strategy." [23:02] Prompting doesn't feel like that sort of science. It's more of... [23:06] I'm trying a bunch of stuff. [23:08] and [23:09] as long as I have a good data set, [23:11] and a good framework to iterate and see if it worked. [23:14] That's what I should be focusing on. How can I edit my problems in an operational way as opposed to, [23:20] How do I discover the correct prompt? [23:24] for me. [23:25] And do you have a sense about why it's not that kind of science? [23:31] that's a good question and maybe maybe it has to do with it being a language i mean like listen i took machine learning in school i understand how it works under behind the scenes but i actually [23:41] I think the best prompt engineers kind of treat it as a black box. [23:44] and say, [23:46] The LM is getting more complicated, more hard to understand, not less. [23:50] Let's not think about how it works. All I want to think about is, how do I map the inputs to the outputs I want? And if you're not getting the outputs you want, that's a skill issue. That's not something else. And I think there's another... [24:03] There's another degree here, besides just the prompt, is [24:06] What is the combination of prompts you're using? Now you have exponentially more... [24:12] ways to improve it. Are you [24:14] breaking down the problems. We have a lot of opinions we've learned about the right architecture to use to build it with a team, but that's more about... [24:23] How do you actually...

24:25-25:56

[24:25] Shit. [24:26] a product that works in practice as opposed to these are the right ways to prompt. [24:31] Yeah, I think that makes sense. I mean, if I had to guess, it's like, it's sort of the same things that we've been talking about, like, in order to say, like, X technique is better. [24:41] You have to define what better is, which, you know, depending on how you define better, you're going to get different results. And also, so it's very dependent on that. And then it's also dependent on the data set. And so, like, you can kind of find some hand wavy things, but like, it's, it's so contextual that like, it's probably, I mean, maybe you start with some of the things that are like best practices or whatever, but you're probably just going to like discover stuff for your particular use case where group level statistics don't really apply. [25:11] which, you know, I love that kind of stuff. I'm a huge nerd for that. Yeah, and there's like... [25:19] Can you speak the language of a certain model family? [25:22] Obviously, I mean, the popular ones is Claude, it's better XML, GPT, it's better Markdown. And those are kind of model level things that you can learn. [25:32] But I look at it as like, [25:35] What idioms does the AI understand? And can you speak that? [25:38] I think the best example here is tool calling and function calling, whereas... [25:44] Personally, I love using function calling when I can, even for things that are not functions, because implicitly that's the language that it knows. And you're actually... [25:53] by having the AI...

25:56-27:27

[25:56] return or interpret a tool call. [25:59] You're conveying... [26:01] much more information. [26:02] at least in my mental model, than you would be as writing, this is the data I got back. [26:08] If that makes sense. Yeah, I think so. But I don't want to let you off the hook. Like, I agree that it is like scientific method, but like, I just think that as someone who spends all their time thinking about how to help people do this, like you must have some opinions on like... [26:24] what do you see that works and how do you think about it and all that kind of stuff that goes deeper than try it and see what happens? Sure, sure, sure. Yeah, the first thing I always start with, try and see what happens, then make your iteration loop as quick as possible so you can try as much. [26:37] Then we can go to kind of best practices that we recommend. [26:42] The biggest one... [26:43] I think is this... [26:47] we call it prompt router approach, but there's a lot of different words and it's kind of overloaded. So I think there's, [26:53] The naive approach to prompting is to just stack messages. So I just have one prompt, [26:58] That does everything and the user sends a message and I stack another one and I stack, stack, stack. [27:03] I think the better approach is to kind of build [27:06] a workflow, a DAG, a graph. And [27:09] route them to the right problem because [27:12] If you can make your prompts do one thing and do one thing really well, it's going to be easier to test, easier to collaborate. [27:20] And it's just going to work better. As you... [27:23] Try to make the prompt do more and more things. [27:26] Uh, it's...

27:27-29:00

[27:27] it's going to be more likely to fail and more likely to actually build unit tests on it. So it's [27:31] It's around operationalizing and then kind of, [27:35] structuring your prompts in this discreet way. Although, [27:39] I'll argue with myself here a little bit, that as models get better, you'll have to do that less probably. [27:45] But there's always a trade-off because... [27:48] If you're building... [27:50] individual prompts to do one and only one thing. [27:53] it's going to, [27:55] work much more of the time and have much less failure cases, but it's also going to have less degrees of flexibility. So you might get new user messages you're not expecting. [28:04] And then the opposite is also true. If everything's one prompt, [28:07] you're going to be able to answer any message the user sends. But [28:10] Might not always be private. [28:11] Yeah, that's interesting. [28:14] That makes a lot of sense. And it also sounds like maybe... [28:20] Towards the beginning of the life cycle of a prompt, when it's less known what the distribution of questions you're going to get or messages from a user you're going to get, the better it is to start with a single prompt. [28:34] so that you can iterate more quickly and cover the whole variety of like cases. And then as you're maturing, you're probably you're probably learning like, OK, these are the kinds of things I typically get. And then you can build that directed graph of prompts and then you can start to tune each individual prompt once once things are a little bit more set. Does that sound right? [28:55] Totally. It's also a solution to jailbreaking, right? If each prompt does one thing,

29:01-30:34

[29:01] Say I'm making a sales bot to issue... [29:04] so users can place orders and refunds and that sort of thing. Yeah. [29:10] I can... [29:12] make individual prompts to do a refund or to do a new sale or [29:17] whatever and [29:18] If I route to the right problem, if the user asks me to refund a million dollars and I'm in, [29:24] the sales problem. [29:25] It's not... [29:26] There's no risk of it kind of breaking out. But then you can ask for both. [29:32] What about best practices for evals? There's a lot of ways you can go here, I think. [29:38] And I have a very similar answer, which is starting easy. The easiest thing and the 80% case for evals is just backtest on your old data. [29:46] and see how it's changing because [29:48] The most... [29:51] information-dense thing you can learn from an e-ball is that [29:54] it changed everything. [29:57] That's what you don't want to do. So what we have a lot of our users do is they just, [30:02] create a backtest based on like, [30:04] All right. [30:04] our last thousand or our last 10,000 prompt responses, [30:08] Let's just run the new prompt using that data. See how much it changes. Maybe you want to do a cosine similarity, or maybe you just want to scroll through and eyeball the diffs. [30:16] Once you do that, [30:18] then you can kind of [30:20] get a little bit more fancy. And it depends on the use case. Again, if you have a ground truth. So if you can plug into... [30:27] thumbs up, thumbs down, you can plug into did the sale was the sale made? Did the ticket close that sort of thing?

30:35-32:20

[30:35] then you can really... [30:36] Thank you. [30:37] throw some A/B testing in there and anchor it on real metrics or build an eval that gives you a real score. [30:44] Unfortunately, if you're doing something like [30:48] AI summarizing or summarizing calls or something like that where there isn't [30:52] a ground truth. [30:54] Then it gets a little more complicated and [30:57] What we recommend is either if you're [31:00] having human graders read it you could do that or you have your prompt engineer you need to sit down and say [31:06] What are the heuristics when you're [31:09] looking at the output yourself and trying to decide if it's [31:11] real and then trying to mimic each and every heuristic and build metrics based on that. [31:17] Hmm. [31:17] Interesting. And then what about data sets? The three primitives, like we said, you can skip one of them. So if you care a lot about evals, you care a lot about problems, you could just build back tests and those could be your data sets. [31:31] Bye. [31:32] if you don't have the backtest data, you're going to want to focus a lot on building ground truth data sets. And if you can really get that ground truth, [31:40] then you're sailing then prompt engineering is kind of easy because it's do you get 100 or do not um [31:47] So data sets, I think there's a lot of room of synthetically generating them as well. [31:51] I've done that to kind of bootstrap data sets myself. Yeah. What about situations where you don't have ground truth? That's just something I'm thinking about personally is like, how do you eval on that? And it's funny because I think a lot of the AI labs are like going hard only on problems that have ground truth. So like O1, for example, like they're doing reasoning and the reasoning chains, like they can validate the reasoning chains because it's all math problems. Something about that seems like kind of fishy to me.

32:21-33:54

[32:21] able to like eval on things without ground truth like yeah talk about that yeah well first of all [32:29] My only one take is that [32:31] It's just prompt engineering. And that's the cool part of it. Everything is prompt engineering. You heard it here first. I really think O1 is just a bunch of different... Just feeding it into it a few times. And maybe they do some low-level stuff there to make it a little bit better. But I think the core... [32:47] of that innovation prompted here. [32:50] to answer your question of, uh, [32:53] What if you don't have the ground truth? What's the important? They don't eval ground truth because... [32:58] They don't eval problems without ground truth because you can't really do that in a generic way. It's kind of hard. [33:05] That's why I don't trust these kind of eval benchmarks that people... [33:09] come out with. They're not a little bit useful. [33:12] Uh, [33:14] Yeah, maybe we'll talk about example because I think it is very example based. So let's go back to this. [33:19] transcript summaries. [33:20] because I was [33:22] thinking about that in depth the other week. [33:25] What I would do is if I'm building... [33:28] Let's say it's an email summarizer. Every day it sends me a summary of all my emails. We're building that internally right now. Are you reading my mind? No, but that sounds good. It'll probably be out in a couple weeks. So let's talk about how you email it. I'll tell you what I would do and you tell me what you're doing. [33:47] If I was building that, [33:49] I would just make something quick and then look at a few summaries and say, what do I like? What do I not like?

33:55-35:15

[33:55] and start to generate [33:56] individual test cases based on that. So maybe the first test case is [34:01] and I'm sure you've experienced this in this case, sometimes the AI will give you a little excerpt at the bottom. This summary is about blah, blah, blah. That would be my first check. [34:11] Does it have that excerpt? I don't want it to have that excerpt. Second check. [34:16] Does it use Markdown? Maybe it's too indented. [34:19] Does it only have one level of index? And then these really simple [34:23] The hard part of this is understanding what your brain does when you eyeball a summary to see if it's good or not, and then breaking that down to individual heuristics. That's really interesting. So I'm not close enough to the actual engineering happening on this to know exactly the evals. The person building it is one of our entrepreneurs in residence, Kieran. [34:44] And... [34:44] I know we have an eval set for another component of the product is drafting. So I know we have an eval set for that. And that's basically just like labeled email responses. So it's like email responses that we wished we had gotten. And then it like checks. There's a couple of different similarity scores between like what got sent and what got actually got sent and what got drafted. So it's like... [35:07] It checks to see if the prompt is generating an email that is close to the email that I actually generated or someone else in the data set actually generated.

35:18-36:54

[35:18] The summary, email summaries, I can tell you the way that we approached it is I literally just sat down, looked at my inbox and wrote... [35:29] in a Google document, the, the summary of my inbox that I wish I'd gotten. [35:34] Thank you. [35:34] And that's how we started. [35:36] like handwriting it. And then from there, you can pull out all these like little principles, like the summary, um, you know, [35:44] should not be passive voice. It should be active voice or whatever. And I don't actually know if there's evals yet for that, but we'll have that eventually. In the next couple of weeks, I'm sure we'll have something. But yeah, it required one person, I guess me, just to write it out. And then from there, you can pull out principles and stuff. [36:06] Yet. [36:06] It's like, [36:08] When you build these applications, you really want to skip [36:11] You want to skip the hard work, but the hard work just changes from... [36:16] writing the code to [36:18] figuring out how [36:20] it'll actually be correct or not. And in that case, it's like you said, you have to create the summaries and read it. And yeah, I'm a little skeptical of that similarity thing you were talking about. I mean... [36:33] I think it's good as an approximation, but... [36:37] In the email, maybe, I think there's a lot of... I use the superhuman command J response thing. [36:46] I click retry like six times when I'm doing that. I'm not trying to hate on the team. I'm sure they've done a great job, but I click retry a lot and,

36:55-38:28

[36:55] The retries are all very similar to each other, but [36:58] very different if i were to send it uh in terms of wording that's interesting [37:04] Yeah, it's not one similarity score. It's like five different ones. And we are like, we are like eyeballing it too and using it every day. So that's one of the things that I think is really great, which I think goes to your point is like, [37:18] We're building all this stuff internally. [37:20] We've launched three apps so far. We have a bunch of other ones that we're in the process of. And what's really cool is we're all building stuff that we want to use. [37:30] And we're all kind of similar because everyone's a fan of every and sort of like came in from every. And so someone makes something and then we're all using this email summarizer. And so like every day there's like feedback in the channel being like, hey, like. [37:46] This thing didn't work or I want this or blah, blah, blah, whatever. So it creates this really... [37:51] rich feedback loop. Because not only are you using it yourself, but everyone else around you is using it too. And I think that's sort of unique. I mean, there's something unique about it to every, I think right now, but I think what's really unique is just the time [38:07] Right now is one where... [38:10] The low-hanging fruit hasn't been picked, so you can build something for yourself. I assume problem player to some degree is an internal tool, so it came out of something that you just needed. And we're early enough in the AI wave that...

38:29-40:05

[38:29] you can still do that and it's not like 15 other people have tried it over the last 10 years or whatever and i think that's so special to like be in a place where you can just make stuff for yourself i love it so much that the hacker the hacker uh energy is back we we don't have to buy a fleet of cars to build a startup in the weekend anymore but i uh i i yeah i [38:54] I actually think it also [38:56] is another phenomenon that's going on, which is, [38:59] LMs unlock a new type of kind of [39:03] software you can build where, for example, one of the internal tools I have that I use all the time is a very simple natural language to SQL. [39:12] specifically for our schema. So I have a lot of information on like, this is a request log, whatever. [39:19] I don't know if I, maybe I would pay for a tool like that if it was really good. But this also might just exist in that subset of [39:29] software, like people call it like the single use software that you're not really going to sell to other people, but it's easy enough to make. It's one problem. [39:36] pretty simple. I mean, I could, [39:37] I could really spend months and make it really good, but for my use case, [39:41] that's as good as I need it. [39:43] Yeah, I love that. I think there's this single-use thing. And I think that that extends even further away from software than people realize. There's a lot of consternation right now about... [39:53] like AI replacing writing or like art or podcasting or whatever. And it certainly is lowering the cost of telling stories.

40:05-41:44

[40:05] But my experience thus far is what that actually does, rather than crowd out people who are telling professional storytellers, it allows you to tell stories about things that it used to cost too much to tell stories about. So a good example is just Notebook LM. The thing that's so compelling about that... [40:27] is you can get an NPR style talk show about something that NPR would never cover. [40:33] And I think that that's so cool because there are so many stories that are happening all the time in our lives that... [40:41] Maybe they don't deserve a Netflix special, but... [40:45] they certainly deserve better storytelling than we have. And I think that's sort of the promise of these things. And I think that's kind of beautiful. Yeah. My friend made a, [40:55] in the software side, my friend made like a website that helps you just make a button with sliders for the button radius and the color. And she just made that for herself. Nobody's ever going to [41:04] spend the time to set up the server and pay for the hosting for that. But [41:09] If you can make it with one Claude command, [41:11] Why not? [41:12] but, uh, [41:13] I don't know. I also like, [41:15] not to get too in the clouds here a little bit, but I look at it as like a junk food analogy where, uh, [41:26] So about like AI music, for example. I think we'll have a lot of AI music. [41:30] And a lot will be junk food, meaning a lot of people will consume it and will love it. And it'll be great. But there'll still be the organic farm-to-table musicians where a human makes it. And it's just going to be...

41:44-43:14

[41:44] It's just going to be different. It's going to solve different things. Yeah, I definitely agree with that. What's the gnarliest prompt you've run into? [41:52] Um, [41:55] I'm going to be honest. It's the chat GPT or anthropic system prompts. Those are bad. [42:01] Hot take. [42:04] Yeah, I mean... [42:05] This is actually, I like this because this connects to the other stuff we were talking about. [42:10] They don't. [42:11] If you're building ChatGPD or Claude and... [42:14] I encourage everyone to check out [42:15] People leak their system problems. You should be able to find it with a quick Google search. [42:20] Uh, [42:21] what are they evaluating on? They don't have a specific use case they're evaling or tying their prompt into. So you see these prompts are long, [42:30] and just run on, I call it prompt debt. And it's just they're accumulating all this debt [42:37] of, oh, do this. If they say this, don't say this. You know, the classic, uh, [42:43] the google uh make sure every historical figure is diverse all that stuff and uh it just you keep adding it to your prompt and i don't think that's a good prompt i think that's a [42:52] You're not... [42:54] clear and concise with the problem you're solving. You're just tacking on more stuff and [42:59] Maybe that's overfitting, in my opinion. Yeah. Well, how would you solve for that? Because they really need to build a general purpose tool that hundreds of millions of people can use. So how would you do it differently? I don't know. But I think...

43:14-44:45

[43:14] I think [43:16] The fact that [43:17] They're building a general purpose tool shows you why. [43:21] your prompt shouldn't look like that because you should have a better way than them to evaluate if your prompt works and then be able to trim it down. [43:29] If... [43:31] I don't think they should do this because this is not their business model. But if I started a company to build a general purpose AI tool, [43:38] I would probably have different prompts for different types of thing and try to route it. But then you have the latency concern. [43:44] I spoke with someone from the Snap... [43:46] AI team. [43:47] And I think there's [43:49] You don't hear about it a lot, but it's one of the most popular AI chatbots, the Snapchat AI, and [43:55] Yeah, because all the kids who don't know about ChatGPT or aren't old enough to sign up, I think, use it. [44:00] And, uh, [44:01] They have tons of models from last I spoke to someone there. Maybe it's different now, but... [44:06] they're constantly trying to understand, do I send it to this model? Do I send it to this prompt? And they have a lot of... [44:13] Thought. [44:14] that went into making that good, I think. [44:17] So do you think that's sort of like the future of this? Like, do you think, um, these kind of, even these kinds of general purpose ones are gonna do explicit routing or is like, cause there's another way to do this, which is like, it's sort of implicit or internal in the model. Like the models are mixtures of experts where it's like, it is sort of routing your request to a certain part of the model, depending on what it is. Like there are trade-offs between the explicit and implicit versions of, of this kind of thing. Like which one do you think the

44:47-46:19

[44:47] the future. [44:48] I think they're the end user. [44:50] probably i i mean that's a ux problem but look at chat gpt's evolution you had to select which uh tools you wanted and which plugins you wanted and that was just a horrible experience uh and i think [45:02] They quickly move to this world where... [45:05] ChatGPT will choose whatever tool they want you to use and will let you use the code interpreter. You don't have to use the code interpreter style ChatGPT. From the technical perspective of someone building these applications, [45:20] Bad answer, but it's very use case dependent. It's very, what are you building? What are your trade-offs? What's your latency? What's your cost trade-offs? [45:27] Hard to say. [45:28] Hmm. What are like, what's on your mind right now? Like, what are you thinking about currently in the evolution of the business? And like, what's, you know, what does the next couple months look like? [45:38] I think we're shifting into the next... [45:40] phase of this company where we spent the last year and a half really [45:45] trying to understand [45:47] where we fit in, [45:49] what the prompt layer is what layer of the stack is this and uh what the what the value prop is and i think [45:56] Now we have a really strong thesis of kind of [45:59] what we were talking about earlier, bringing these [46:02] domain experts, these non-technical subject matter experts, and putting them in the driver's seat of prompt engineering. [46:08] And our product does that. But I think a lot of the next few months is, [46:13] making kind of the happy path a little bit easier, making, encouraging that and

46:20-47:51

[46:20] and kind of scaling it up from this point, running at that. Like, how do you make that happen? Like, this is not a... [46:29] role that I think a lot of people are hiring for right now or that people haven't realized it yet. [46:37] So if you're kind of, [46:40] pushing the company in that direction and you're sort of skating to where you think the puck is going to be. How do you make sure the puck gets there? Are you doing anything in particular? Is it just about building the right tools and people will figure it out? Or like, yeah, how do you think about it? [46:54] So we have a lot of people doing this today already. [46:58] But you're right. [46:59] People haven't realized this. Wait, actually, before you tell me that, what are the characteristics of people who figured out that this is a thing that they're doing? Like your early adopter crowd. So there's two things that bring people to the thesis of non-technical prompt engineers. The first thing is, [47:16] they just are kind of... [47:19] Thank you. [47:20] Not a visionary, but they understand it. They're just smart. [47:26] They have this like, [47:27] they come in with this as a preconceived notion. [47:30] The second thing, [47:31] And this is the thing we're kind of locking into more because the first you get a few of those, but you can't count on that. Second is, [47:38] companies that are [47:40] actually building something revenue generated building something that you're going to put a team behind because when you put a team behind something [47:48] collaboration becomes a big issue. They're going to,

47:51-49:23

[47:51] almost always, especially in non-technical domains. If you're building Cursor, might be different because your experts are [47:57] software engineers but [47:59] In these other cases... [48:00] They're ending up having a lot of handoffs with QA testers who are asking a lot of legal questions or whatever, and then they have to keep coming back. [48:10] telling the engineers to fix something so they experience the handoff and then they're also [48:15] trying to hire, but I think at the core, [48:18] the [48:19] Interesting thing here in the Big Bet, [48:22] is that we don't have to convince anybody this. The market is going to do the convincing for us because the winners, [48:28] are going to be the ones who do this. And then everyone's going to say, oh, [48:31] they're winning. I got to do that. And, uh, [48:34] And so if you're building the tools to connect domain experts to companies, and one way to do that is to kind of rope them into the loop of prompts, is part of this also like recruiting domain experts and the sort of marketplace thing where it's like, if you're building a teacher app, you'll hook me up with a teacher? Is that the user's job? Today it's the user's job. But I have heard from a lot of teams that... [48:59] that that's hard for them to, because it's, [49:03] It's not just data labelers. You're not just mTurking it. It's [49:07] expert data labelers in a specific category and [49:11] People do hire out for this. I've heard of legal AI teams doing this, but [49:15] You have to find the right person. So, [49:17] I don't think we're going to become a marketplace for that, but maybe we'll help people. Maybe we'll help connect them.

49:23-51:19

[49:23] That's always been kind of a pet thing of mine is like... [49:30] Language models, like their summaries are all like RLHFed on like human feedback from data labelers that are not professional writers. And even among professional writers, the like level of skill and quality is highly variable. And then there's no one good metric for like what is a good summary or a good writer. [50:00] Like, you know, my favorite, a couple of my favorite writers, like doing data labeling. I feel like it would be different. Would they argue with each other though? Because maybe it's the nature of them being your favorite writer that they're a little unique. [50:14] in their own thing. And LLF's not going to do that. It's possible, but it would definitely be... Even if they argued and didn't agree, and maybe that makes the final output a little more unreliable, I feel like that's... I'm fine with it. It's better than the bland, whatever the bland thing is now. [50:32] Because I think there's this thing happening with language model summaries where they're very adept at saying a lot without saying anything. [50:40] which is a specific kind of bad summary, you know, that a data labeler... Because I know this because... [50:49] I edit a lot of articles at every, and I mean, I don't edit as much anymore, but like I used to, and there's such a big difference. Like you can glance at an article and be like, yeah, this is pretty much, pretty much right. Um, and I would just like give it the, give it the check mark and just be like, yep. Um, approved human approved. Um, and then you actually like sit down with the article and you're really like pulling it apart to be like, does this really make sense? And then you just see all the stuff that you would not have seen before. Um, and I can imagine that

51:19-52:57

[51:19] happening with these labelers. You know, it's like you're, you're sort of like tuned out at some point and you're just like, yeah, this isn't like garbage. And then you just get this sort of milquetoast thing, especially with all the like, they want it to be like not controversial and whatever. I don't know. Interesting. Interesting. Maybe, maybe you do that with prompt engineering. Maybe you don't need to train your own model. Maybe. I mean, I can get Claude to do some of this, but like not, it's still not amazing. It depends on the thing. It depends on the task. [51:49] It's hard. Yeah, it's hard to know will it ever... [51:52] i'm sure it'll be good it'll get better i when i summarize with ai like i have to [51:58] I almost always say use bullets, don't use complete sentences. And like, it helps it a little bit, but I totally see what you're saying. Yeah, I don't... [52:07] can't. [52:08] That's an interesting one. That's an interesting one. Because it's what is the core that makes [52:13] be writing good. Is it that it's concise or is it something a little bit more to piss off the AGI crowd or again, a little more human that you can't really codify and that's not just logic? I don't know. Maybe not. I think that you can, but it's... [52:32] If you said it, it would just be like a high level thing. Like, um, a good summary has to be concrete and it has to tell you the most important thing that you need to know and leave out everything else in a concise way. Right. But like, what's the most important thing is like incredibly dependent on the situation and the person. Um, and language models are one of the beauties of language models is they're super sensitive to context. So they should be able to like pick that up.

53:02-54:53

[53:02] no generically best summary. There's just like, there's a principle, but the way that that principle, like, [53:09] plays out operationally is like, [53:12] very different depending on the context. [53:16] So... [53:17] What you said sounds like multiple tasks to me. [53:21] And maybe that's the solution. [53:24] When I was in high school, in some classes I tried a little too hard. Then I learned it doesn't really help. But I would make summaries of my notes and then make summaries of that to learn. And... [53:37] it is hard to figure out what the most important thing is. Right. Um, so maybe, um, [53:43] That needs to be the pre-processing task. Maybe the fact that the LLM doesn't generate [53:48] what to you is a good summary is because its task is just to [53:54] condensed it. [53:55] data, but not to make the reader understand the important thing. Yeah. This is really interesting. I just spent, I was in SSF like twice over the last like three weeks. And so I got like super AGI pilled because you just like, you just see enough of the like thousand yard stare of the like internal researcher at the like AI labs. And you're just like, oh, fuck. And you're just like, no, this is gonna be fine. Like, [54:25] you're not saying it's all going to be fine, but I think that you have a different perspective, at least on the fundamental limits of whatever, at least near-term AGI might look like, or more advanced AI might look like. I'm getting that vibe from you. And I actually, I'm swinging back a little bit. And I always sort of swing. Come back to the dark side. I think it's... No, I think that there's something...

54:54-56:32

[54:54] Thank you. [54:55] It's really important to realize that people have been predicting the end of the world because of the latest new technology since the world started, basically. And so if you're going to place bets... [55:06] you got to bet against those people probably, um, is my, is my thought. And there, there are way more limits until they're right until they're right. Well, you know, but yeah, so you never know when you're the Turkey and it's Thanksgiving, but like, and if they are right, like the bet is, you know, it's like a Pascal's a wager type thing. Like, uh, if they are right, then the bet doesn't matter because you're dead anyway. So you might as well believe that. Yeah. Yeah. [55:31] I... [55:31] It's also how do you define AGI? Because when I kind of [55:35] When I like to [55:36] rail on AGI, I don't know. I'm using kind of the hypothetical, oh, it's this [55:42] human-like AI that takes everything over definition. But I think the real definition is that it's just [55:49] shit like [55:50] an AI that can do most tasks humans can do. [55:54] And, [55:55] I think we're there already. [55:56] I think you could do that with [55:59] enough prompt engineering um and you can kind of build systems using that toolbox but [56:05] Does that [56:06] matter yeah it matters for the economy for a little bit maybe it'll correct after that but but i think by the time you're like talking about prompt engineering though like that's already you're eliminating the agi thing what about o1 or sora o1 doesn't even know modus ponens i don't know what that is either exactly uh you do know what it is um basically um for example with o1 you can be like who is john mayer's mother

56:33-58:06

[56:33] It'll know the name. It'll think it through and it'll know the name. I don't remember what her name is. Let's say it's Cynthia Mayer. [56:38] And then if you ask, you just start a new thing and you're like, who is Cynthia Mayer? [56:42] It won't now. [56:43] And you can even be like, I want you to like think about like other like famous people who have the last name Mare. So it gets in the gets in its thought process and it still doesn't get it all the time. And that's something that like, yeah, like a human just knows basically. We don't know her name, though. [57:00] I don't know her name both ways, but if I knew it one way, I would know it the other. [57:06] I see. I see. I see. Well... [57:08] Yeah, I mean, this is why... [57:11] I know this is not a complete... [57:14] way to think about it but i think the simplest way to think about llms is just [57:18] Again, solving problems. [57:20] language. And language doesn't know who John Mayer's mom is. But if you give language [57:26] the repository of information, [57:28] it can tell you that. I don't know what it means to solve language. Tell me, like, what is, what do you mean? So, I mean... [57:36] A big part of computer science for many years is [57:39] processing, [57:41] natural language or language and turning it into data. [57:45] LLM is really good at that and basically have solved the problem of going from human speech or human text to... [57:54] information that a computer can understand. [57:56] and vice versa. [57:58] Before that, that was [58:00] Completely... [58:01] at least going from data to language is basically impossible to do it well.

58:06-59:36

[58:06] That's solved. I'm... [58:08] I think... [58:08] Anything with regard to bringing in [58:12] real world data, John Mayer's mob, that sort of stuff. [58:16] is just [58:17] what you hook this to. [58:19] box up to what you hook this technology up to i'm i think like [58:23] how you get to the... I think what you're saying is like, you could probably take the same architecture that we have now and with a different training method or, you know, data set or whatever, you could get it to answer both ways. And I totally think that that's right. But what I'm trying, I [58:43] Point two is like, that's why I'm like, I think it's possible we're going to get there. I just don't think we're there right now. Oh, one is like clearly the thing that people miss is like, in my opinion is, oh, they're like, oh yeah. Oh, one can do physics problems better than I can. And it's like, yeah, but like, [58:59] it gets all this other basic stuff wrong so um so for so i'm not ready to call agi i'm ready to call it's smart on certain in certain domains and i think that's really important like that shouldn't be overlooked it's not like i'm not like being gary marcus here where i'm like you know they suck or whatever um uh i think that they're i think that they're truly incredible but i also think there's a lot of there's a lot of worship i wouldn't [59:24] I wouldn't call it AGI. So... [59:27] hypothetically, [59:28] say we all we needed to do is train the LLM on [59:33] 100x or 1000x the data size.

59:36-1:01:16

[59:36] Probably not, but [59:37] Let's say that was the way to do it. [59:40] uh, [59:41] What's the difference between having an O2 model that, [59:45] can answer John Mayer's mom both directions versus having a one that's plugged into the data set and the kind of end user application that lets you ask it both ways. [59:56] You have to predefine the data set. So you have to have an explicit knowledge store that has all the knowledge in it. So you're kind of like cheating. And also having that predefined knowledge store is an impossible problem. [1:00:15] You can't have all of that in a database, and that's why it took so long to... [1:00:21] get even the models we have now. Because you have to have, for a knowledge store like that to work, you have to have a more bounded problem, a less general problem. A lot of hypothetical, I mean, the weights... [1:00:32] are technically a data set. [1:00:34] They are, but it's inexplicit. It's a totally different thing. It's not the same thing as a database. It's much more flexible. I like where this is going. I think this is a very fun conversation. We are at time. I really, really appreciate you coming on the show. For people who want to try out PromptLayer or find you, where can they find you? Yeah, you can find me on Twitter on X at IamJarredZ. I-M-J-A-R-E-D-Z. Also, check out PromptLayer, [1:01:04] up and use for hackers and hobbyists and small teams. And then not so expensive for startups either. And it's a great way to build AGI. Sweet. Thanks, Jared. Thank you.

1:01:24-1:02:07

[1:01:24] Oh my gosh, folks. You absolutely, positively have to smash that like button and subscribe to AI&I. Why? Because this show is the epitome of awesomeness. It's like finding a treasure chest in your backyard. But instead of gold, it's filled with pure unadulterated knowledge bombs about chat GPT. [1:01:47] on the edge of your seat, craving for more. It's not just a show, it's a journey into the future, with Dan Shipper as the captain of the spaceship. [1:01:56] So do yourself a favor. Hit like, smash subscribe, and strap in for the ride of your life. [1:02:01] And now, without any further ado, let me just say, Dan, I'm absolutely hopelessly in love with you.

Want to learn more?

Ask about this episode