Nicholas

Claude Fable 5 review: what the new Mythos model gets right (and very wrong)

Nicholas

Claude Fable 5 is the first Mythos-class intelligence model to be generally available, and I got early access to test it before launch. In this episode, I walk through what Anthropic is promising, what actually stood out when I used it on real work, and where I think it fits in your AI stack. — In this episode, we cover: (00:00) Introduction: Fable 5 is finally here (00:31) What Anthropic says about the model (05:14) Token-intensive by design (06:28) Safety classifiers and the new fallback concept (07:46) Is this or is this not Mythos? (08:30) New product launches: Managed Agents and more (09:20) Crushing benchmarks (09:55) What it’s actually like to use (the good and the bad) (11:40) Test 1: product graph spec (12:56) Test 2: designing a skills registry (14:04) Conservative on execution (14:43) Test 3: multi-agent orchestration (15:39) My takeaways — Tools referenced: • Claude Fable 5: https://www.anthropic.com/news/claude-fable-5-mythos-5 • Claude Managed Agents: https://platform.claude.com/docs/en/managed-agents/overview — Other reference: • SWBench Pro benchmark: https://www.swebench.com/ — Where to find Claire Vo: ChatPRD: https://www.chatprd.ai/ Website: https://clairevo.com/ LinkedIn: https://www.linkedin.com/in/clairevo/ X: https://x.com/clairevo — Production and marketing by https://penname.co/ . For inquiries about sponsoring the podcast, email [redacted email].

Published
Published Jun 9, 2026
Uploaded
Uploaded Jun 12, 2026
File type
POD
Queried
0

Full transcript

Showing the full transcript for this episode.

AI-generated transcript with timestamped sections.

0:00-1:36

[00:00] It's here! The model, the myth, the legend. Mythos from Anthropic has finally dropped. Well, [00:08] baby mythos. We're calling it Fable 5. And this new model is crushing benchmarks. But the question is, [00:16] "Can it crush my backlog?" [00:18] I got early access to the model and of course I have my own opinions on where it does really well, where it needs a little work and the question on everyone's mind. [00:27] Does it live up to the terrifying marketing hype? [00:31] let's get to it okay let's talk about what anthropic is telling us about this model and then we'll get into what i think about it so this is claude fable 5 the first mythos class intelligence model to reach ga now if you haven't been paying attention anthropic has been marketing scaring [00:49] warning us about the unbelievable capabilities of Mythos and it is finally here. Now they had originally been rolling this out with a couple select companies. I got [00:59] early access to test what I thought [01:02] of the model, but you have to know this is not Mythos, capital M, big mythos. This is theming mythos. This is fable. And so it's going to have some guardrails on it, in particular around cybersecurity exercises and biology exercises. Now, [01:17] Good news. Your girl's working on PRDs. She's shipping SAS. She's not working on biology quite yet. Although give me give me. [01:24] a little time and some time to experiment and maybe I'll get there. So this is really going to be focused on [01:28] what the everyday user, what the everyday software engineer is going to think about when they're using this model. Although I did run into some things that I suspect...

1:36-3:12

[01:36] are a result of the tuning and training of this particular model to be extra safe now [01:41] quick, it's not cheap. It's $10 per input token and $50 per output token. It's going to be a new tier above Opus. And so if you're going to use this model, you're going to pay the price. [01:54] So what is Anthropic saying? Basically it's a completely new model class. So we had Sonnet, we had Opus, and now we have Mythos, the first of which is Fable 5. It's completely state-of-the-art. It is exceeding every benchmark they tested by a significant amount. This 80% on Sweebench Pro, you'll look at that compared to some of the more recent models that have come out. [02:13] Very, very good benchmark performance. [02:17] And then they're saying it's really good for long, complex tasks. [02:21] Now, what are some things that earlier models couldn't do that they are saying now that Fablefy can do? It's very autonomous, including running days long asynchronous tasks. It's really an engineer's engineer. And that's some of the downside I experienced with this model. I'm going to show you a very specific example of where you don't want an engineer. [02:39] doing your work with an engineer's point of view. [02:42] It's proactive. It's very good at vision, exceptionally good at vision. This is a place where I actually really loved the model. And you know me, I'm pretty critical of models, but I did see a step ahead of vision. So that's something we're going to dive into. And then effort, it works hard. [02:57] It builds harder. It verifies more. It's built for ambitious work. Now, [03:02] Guess what it also can do? It can consume those tokens. So Anthropic has said it consumes rate limits and tokens at about 2x the rate of other models. So again...

3:12-4:46

[03:12] This is a big boy model and it's going to consume tokens. And some of the things that it's good at and even some things that they have done in the harness seem like they're intentionally or not. [03:23] token consumers. So we're going to keep an eye out on costs and an eye out on efficiency when using this model. [03:29] Again, talking about long running tasks, Fable 5 is supposed to be able to run for days. So doing long running planning, being able to spin up sub agents. And I show a little bit about dynamic workflows, which are different architectures of sub agents and holding multi-day sessions. Now, I have done probably day, days long sessions with other models. [03:53] I didn't have Fable for many days, so I cannot verify that it ran for days. I did get it to run, however, for several hours on some tasks that [04:02] May or may not have merited that several hour effort, but it definitely seems like it has both the harness and the intelligence capability to run for a very long time, if that's appropriate for your task. Now, here's your pros and here's your cons. They explicitly say that Fable works like a seasoned engineer. [04:18] Unfortunately, if you have worked with a seasoned engineer, you know there's good to this and you know there's bad to this. So it is [04:25] very complete in its investigation. [04:29] It's definitely going to go search out all the corners. It's definitely going to think about how it can be 120% sure that [04:36] that it's shipping the right thing, [04:37] But guess what? That's not always in service of launching. And that's honestly not always in service of building a great

4:46-6:26

[04:46] product. So while you can give it a goal and it will be very autonomous and it will be very thorough, [04:53] Honestly, sometimes you want like a slightly less thorough... [04:57] engineer. [04:58] product manager talking, even engineer talking. Sometimes you want it to be a little bit dumber. We'll talk about some of the prompting techniques it says and when to use this model. But it's just something to think about. [05:09] when you're working with any high intelligence model is how much intelligence does the task actually take? [05:14] Now, as I said before, it is token intensive by design. And I did most of my tasks on extra high. And so it was like, [05:24] token burning on token burning. [05:27] And so they say that high is probably the sweet spot for most work. I used extra high just because I don't want anybody in the comments saying, Claire... [05:34] You picked high for this task and it should have been extra high and you would have had a better experience. I used extra high. I used all of the brains of Fable. But again, it is very, very token intensive. My question for any of these models, this is not an anthropic model question. This is not a Fable question is... [05:50] Does this token intensity actually output? [05:55] the right results. And that's a place where I'm just not 100% sure. But again, as us humans in the loop, [06:01] we're going to have to be much more intelligent about where to put what model and where to use what [06:07] reasoning and what effort level to match what we're doing. And again, I think that the untrained of us will say, oh, well, I have this Fable model. I should use it. It's better than anything. And honestly, I still think there's a place for good old Sonnet. I think there's a place for Opus. And I think there's a place for other models in the ecosystem.

6:27-7:57

[06:27] Now, there are safeguards in this model. And so this was one of the first things that Anthropic told me testing the model. And this is one of the headlines that they're making in the release, which is there are specific classifiers in this model for cybersecurity, biology, chemistry, and distillation. Basically, they don't want anybody doing bad stuff in those categories in particular with this very intelligent model. [06:57] concept, and so if you get classified into one of these categories, [07:01] Instead of saying, like, do not pass go, you may no longer fable, [07:04] It just falls you back to Opus 4.8. This is also a capability in the API now where you can do this graceful fallback to [07:13] for eight if you're using a mythos class model. [07:17] They also have a 30-day retention policy used only to catch misuse, and it's not used to train Claude. So while it's still not training Claude, they do want to check the use of this model because they have been and will forever be very cautious about this. [07:32] us normies using their intelligent models. And just, you know, for context, 95% of sessions on this model did not hit a fallback. I don't believe I hit a fallback. [07:43] But again, I'm not doing anything in cybersecurity, biology, or chemistry, at least, yet. [07:48] Okay, so this is the question. Is this or is this not Mythos? It is Mythos. Fable has the safeguards. Mythos does not. Fable...

7:57-9:30

[07:57] All us normies can have in general availability. Mythos is still restricted to these project glass mean partners, some of these enterprise level partners that are really checking it against. [08:08] cybersecurity use cases. I would suspect that at some point we get some access to a Fable 5.0 whatever, or that the Project Glasswing class opens up. [08:20] But for now, [08:21] We at Fable, Project Glasswing, or these preselected companies get Mythos, but they are all fundamentally the same underlying model. [08:31] A couple product things that are also launching today, along with the Fable 5 model, [08:35] Cloud managed agents are going into public beta. If you haven't paid attention, this is Anthropix hosted harness, hosted sandbox. [08:43] for running long running agentic work i am still trying to figure out what a good use case for cloud managed agents is i will get there um but fable ships out of the box in [08:53] cloud manage agents. There's also a new advisor strategy where you can use Fable 5 as a senior advisor and use cheaper models as an execution layer. A lot of people are doing this with Opus and Sonnet. [09:05] And so this is gonna work today in the API and in Claude code and is a strategy you can use. And then as I mentioned, this fallback API... [09:13] where you can put an optional parameter on the messages API, [09:16] that allows you to continue to block requests by using 4.8 [09:20] at Opus Pricing. [09:22] Okay, as we said, crushing benchmarks. Look at this. Fable 5 compared to Opus 48, GPT-55, and Gemini 3.1 Pro.

9:31-11:02

[09:31] significant increase in SWE Bench Pro benchmark, very far ahead of these other models. And [09:40] While I wasn't testing the most advanced use cases, I didn't find something that technically it failed at. So I think these benchmarks are really going to hold. And these benchmarks have outperformed across the board. So this is Anthropik's state-of-the-art. [09:55] model. Okay, so enough about what they say. Let's talk about what I say. What is it actually like to use? So I read Fable 5 on [10:03] a bunch of different work and i want to give you my feedback on where i thought it did well where it needed a little bit of work and where i was really surprised [10:11] As I said before, it's really good at vision. And where is it good at vision? [10:15] that really impressed me. It's really good at document formatting. So this super simple [10:21] But we've been doing these handwriting things [10:23] documents for my seven-year-old based on classic texts and classic poems, [10:29] And on the right is Opus 4a, and on the left is Mythos 5. And it looks so silly. [10:35] But I really do think Mythos 5 did a much better job of a second grade layout for a handwriting sheet. There's just like the right spacing. It's very clear to read. [10:48] There's enough white space. I think on the one on the right, it's just very dense. [10:53] And even the lines themselves are sort of hard to tell. Do you write above? Do you write below? So I do think that PDF formatting documents, I tested this against a bunch of different models.

11:02-12:36

[11:02] Mythos 5 really did a good job. So very simple eval for me, but a very, very good one. [11:08] Now here's the problem though. [11:11] The writing is nearly unreadable. [11:14] So if you're thinking about mythos for prose, for spec writing, for PRDs, [11:20] Unfortunately, it's an engineer. And what's the problem with engineers? They just really get wrapped around the axle on details. [11:29] This is a real struggle with these more intelligent frontier models is they're like too smart. [11:35] And so it's just very, very hard to parse what they're saying. And I'm going to show an example of this. [11:41] in. [11:42] actually Claude Cone. [11:44] So I have this concept of a product graph that I'm working on for ChatPRD. It's actually a fairly complex open source project. [11:51] And I had Fable 5 go through that and actually do like an adversarial review of my requirements to try to figure out. [12:00] where there were internal consistencies in the logic and [12:03] It gave me this markdown document that looks very long and intelligent, but... [12:09] If you actually go through it, it's just really hard. [12:14] to parse. It's like internal references. It's very detailed, but not in a way where you can zoom out. [12:20] There are these big blocks of paragraphs like, look at this. [12:24] It is just really hard to see the forest for the trees in this particular model. [12:30] And I saw this sort of like over and over again, working on it with specs is it was very complete design.

12:36-14:14

[12:36] but merely imparsible. [12:39] And that's a real challenge when working with these very, very high intelligence models. Again, I would actually suggest pulling back to... [12:46] maybe a sonnet or opus model for specs, and then looking at Fable as an orchestrator of execution where that detail really matters, but you don't have to read it. The other thing that shock, shock, shocked me was how... [13:00] like actually legitimately terribly bad. [13:04] It was at design, or at least at one-shot design. [13:07] And so I asked Fable to design a skills registry and [13:13] Man alive, did it do a very poor job. I mean... [13:17] I'm not even talking like... [13:19] AI slob bad is like fundamentally terrible design. Gray, black, red... [13:26] Simple outlines, just really, really terrible. [13:30] The Anthropoc team suggested that I just needed to be a little bit more detailed in my prompting. I've never had to do this before in, I would say, the last year of models in terms of front end. [13:40] But even when I prompted it, it was still just not very impressive design. [13:45] I think there's this real balance between design slop and specificity and just shipping a terrible design. I'm not sure what about Fable 5. [13:54] resulted in this. I'm going to have to keep testing it as it rolls out today. [13:57] But this was a real disappointment in terms of design. So again, you might want to toss an opus in the mix instead of relying on fable for design. [14:06] It's really conservative on execution. So when I was trying to do that ambitious days-long work, I took a spec and I said, can you...

14:14-15:47

[14:14] ship the V zero of this, the MVP, I said enough to that a customer could get value. [14:20] And the MVP, they just really took minimal to heart. It was like very, very narrow, not actually that useful. [14:26] And I'm curious if this comes from some of the safeguards on this model. It's been a challenge I've seen since the kind of later Opus models is they're not super ambitious. [14:36] And so, again, you'll have to think about how to prompt this to get that long-running outcome paired with the right... [14:41] product ambition. [14:44] And then I really doubled down trying to test these claw dynamic workflows and these subagent designs, trying to see if... [14:53] this would really add value. And the multi-agent capability is definitely there. And I definitely had some successful... [14:59] multi-agent runs kicked off in Fable, [15:02] But I also ran into a lot of stalls and errors in using multi-agent orchestration. Now, I made the mistake. I walked away from my laptop and came back to these sub-agents that had stalled after about [15:14] Three hours? [15:16] And so like egg on my face. But I really want to see how... [15:20] Technically, the Claude Code model holds up to the promise of multi-agent orchestration, [15:26] I had some successes and some bugs. I think this is a Claude Code issue. [15:30] not necessarily a model issue, although with this promise of long-running days-long prompts, [15:36] You really got to deliver technically on the outcome. [15:39] Thank you. [15:40] So what's my takeaway? I would hand it hard problems. Of course, not cybersecurity, bio or chemistry problems, but hard problems.

15:47-17:20

[15:47] Technical problems were being extremely detailed, [15:51] matters, long horizon work, [15:53] I would also hand it vision problems where you really want something to [15:57] look good or you want it to parse PDFs or other documents. It's done exceptionally well there. I [16:04] I probably wouldn't [16:06] hand it my front end work or I definitely wouldn't hand it my front end work and I definitely wouldn't hand it [16:11] strategy or spec work. I think it overthinks things. I think its prose is nearly imparsible. [16:17] And so maybe I'll test it again with effort level lower on sort of pros and spec writing, but it wasn't it for that. That being said, I'm [16:25] Not a hater on this model. I definitely not. It definitely has a place. [16:29] In your stack, I'm going to test it. [16:31] If you want to learn more, definitely look up the prompting guide for Fable. It's going to probably repeat a lot of what I said. [16:38] Hand it your hardest problems, what this model is good for and what it's not, and how to get a good outcome. [16:43] That being said, [16:45] Mythos is here. I cannot wait to hear what you build, what you overbuild. [16:50] and what you make ugly with this new model. [16:52] Thanks for joining How I AI. [17:03] You can also find this podcast on Apple Podcasts, Spotify, or your favorite podcast app. [17:09] Please consider leaving us a rating and review, which will help others find the show. You can see all our episodes and learn more about the show at howiaipod.com.

17:20-17:21

[17:20] See you next time.

Want to learn more?

Ask about this episode