80,000 Hours

The story behind the bad AI stat that moved markets and misled millions

Robert Wiblin — Tue, 28 Apr 2026 16:52:27 +0000

The post The story behind the bad AI stat that moved markets and misled millions appeared first on 80,000 Hours.

Open Contracting Position on the Video Team: Pair Writer / Writing Support to Aric Floyd

Chana Messinger — Mon, 27 Apr 2026 23:07:59 +0000

Summary

80,000 Hours is a nonprofit that helps people find careers tackling the world’s most pressing problems. Our video team produces AI in Context, a YouTube channel that makes long-form, documentary-style videos helping people understand transformative AI, its implications, and the risks, including existential risks. Past videos include “We’re Not Ready for Superintelligence” (10M+ views) and “If You Remember One AI Disaster, Make It This One” (3M+ views). We recently won a Webby for People’s Voice for Best Documentary Storytelling.

We’re looking for a writing support partner for our on-camera host and one of our primary scriptwriters, Aric Floyd. Our main bottleneck is how many video scripts we can write. Having a strong writer who can work closely with Aric during the intensive scriptwriting phase, provide feedback, keep the overall plan on track, and contribute directly to the script, would be extremely high leverage. We think this is sort of like pair programming in software engineering, where someone with their head in the writing working alongside Aric will help a lot.

This would be a great role for an excellent writer who has some free time for contracting.

Type: Contract, roughly 3–4 hours/day for approximately 3–4 weeks per quarter (during the intensive scriptwriting phase, and likely some other times in the quarter).

Location: San Francisco Bay Area preferred, open to remote.

Rate: $30–60/hour, depending on fit and experience.

To apply, please complete this application form by 11:00 PM PT on Monday, May 25, 2026.

Other roles you might be interested in (you’re welcome to apply to more than one):

If you’d be excited by a similar role but focused on operations, prioritisation, and communications support rather than writing, you might be a fit for our Executive Assistant role.

If you’d want to do all of this — writing, project management, and executive support — as a full-time, in-person Bay Area role, take a look at our Operational Partner role.

We also have an always open Expression of Interest form for other video roles.

About us

80,000 Hours helps people find careers that effectively tackle the world’s most pressing problems. Our AI in Context YouTube channel makes documentary-style videos about transformative AI — its implications, the risks (including existential risks), and what people can do about it. We’ve published three videos so far (all with 1M+ views), and we want to hone our production.

The role

Aric does his best work when someone with deep context on the script is right next to him or live on a call. It’s a mix of writing, editing, ‘rubber ducking’ (being someone to bounce ideas against), and project management.

Core writing support

This is the core of the role. Your job on a high level: do as much of the metacognition as possible so Aric can focus on the writing itself.

That looks like things like:

“What’s your daily goal and how do we play the day from that?”
“Are you avoiding any part of the writing?”
“How did the last 15 minutes go?”
“What is the viewer going to think at this point in the story?”
Reading the script and giving a take on how the narrative is going
Being warmly disagreeable if he suggests a plan you think won’t work
Developing a model of how he works and what gets him stuck

Project managing the script

Beyond the in-the-room thinking partner stuff, this role also involves keeping the scriptwriting on track at a higher level:

Tracking whether enough progress is being made against daily, weekly, and quarterly goals
Having a clear picture of the production timeline and what deadlines actually matter (e.g. shoot dates, editor availability)
Flagging early when it looks like we’ll miss a deadline, and adjusting the pace
Helping Aric make plans
Knowing when to push (“We need to move on from this section”) vs. when to give space (“This is the hard creative part; it’s supposed to be slow right now”)

Doing your own writing passes

Part of the value here would be being able to contribute directly to the script as a writer and editor.

That means:

Reading drafts carefully and giving substantive feedback on narrative, structure, and argument
Doing your own editing passes on sections, where you’re actually rewriting and tightening, not just commenting
Drafting sections or transitions when Aric is stuck, so he has something to react to rather than a blank page
Researching, hunting down claims, finding sources, or pulling relevant quotes from interviews when the script needs it

About the team

You’d be working with the video team within 80,000 Hours, currently four full-time people:

Aric Floyd (Associate Video Producer) is our host and one of our primary scriptwriters, and the person you’d work with most closely. He has a background as an actor, and he’s the on-camera face of the channel as well as one of the creative engines behind our scripts.
Phoebe Brooks (Video Production Specialist), based in London, has a background as a filmmaker. She leads on editing, production, and post-production and is the creative lead on our current in-production video.
Sage Bergerson (Video Operations Associate) keeps the operational side of the team running.
Chana Messinger (Head of Video) leads the team.

We also work with many excellent contractors.

We’re a small team that works quickly to try to produce extremely high quality long-form videos about transformative AI and existential risk in a way that’s thoughtful and entertaining. We’ve had more quick success than we expected, and we want to keep aiming high. It’s a place where storytelling, impact-focus, and artistry all meet, and it’s often extremely fun. You can see us hard at work here.

What we’re looking for

Strong writing. You can write clearly and compellingly, and you can rewrite someone else’s work without losing their voice.
Curiosity: You can elicit Aric’s ideas even if he hasn’t articulated them yet.
Good editorial instincts. You can read a draft and identify what’s working and what isn’t with specificity. “This section is boring” is less useful than “this section is boring because we’re explaining the mechanism before the audience has a reason to care.”
Comfort with the subject matter. You don’t need to be an AI safety expert, but you need to be able to engage seriously with technical and philosophical arguments about AI risk.
Reliability and follow-through. Being able to consistently be available for support when needed
Context on AI risk or ability to learn quickly. You don’t need to know about AI risk when you start, though it will help. More important is that you can acquire the basic understanding quickly through talking to people, watching videos, reading papers, etc.

Experience with scriptwriting, journalism, editing, or similar work is a bonus.

Logistics

This is a contract role. The time commitment is roughly 3-4 hours/day for 3-4 weeks per quarter, during the intensive scriptwriting phase. Full days can also be valuable. The rate is $30-60/hour depending on experience.

Bay Area and in-person is preferred, but we’re open to remote candidates.

This role reports to Chana Messinger (Head of Video) and works directly with Aric Floyd (Associate Video Producer).

Other roles you might be interested in (you’re welcome to apply to more than one):

If you’d be excited by a similar role focused on operations, prioritization, and communications support rather than writing, you might be a fit for our Executive Assistant role.

If you’d want to do all of this — writing, project management, and executive support — as a full-time, in-person Bay Area role, take a look at our Operational Partner role.

We also have an always open Expression of Interest form for other video roles.

Note: If we successfully hire for the Writing Support role, we’d also be interested in hiring for the Executive Assistant role. If we successfully hire an Operational Partner, we’d be much less likely to hire for Writing Support or an Executive Assistant, since we’d expect that role to cover those functions.

How to apply

To apply, please complete this application form by 11:00 PM PT on Monday, May 25, 2026.

The assessment process will vary depending on the roles you selected, but will include two rounds of work tests, an interview, and a full day or multi-day assessment. We pay for work tests and the full/multi-day assessment, conditional on location and right to work in the country where you are taking the assessment. If we are unable to compensate you, we offer donations in lieu of payment to an effective charity of your choice.

We’re aware that factors like gender, race, and socioeconomic background can affect people’s willingness to apply for roles for which they meet many but not all the suggested attributes, and would especially like to encourage people from under-represented backgrounds to apply, even if you don’t meet all the suggested criteria.

The post Open Contracting Position on the Video Team: Pair Writer / Writing Support to Aric Floyd appeared first on 80,000 Hours.

Open Contracting Position on the Video Team: Executive Assistant to Aric Floyd

Chana Messinger — Mon, 27 Apr 2026 22:54:02 +0000

Summary

We’re looking for an Executive Assistant to support Aric Floyd, our on-camera host and one of our primary scriptwriters. This is active, high-context support: you’ll need to understand what’s happening across the team to make good judgment calls about what matters most on any given day.

Type: Contract, approximately 10–25 hours/week.

Location: San Francisco Bay Area preferred, open to remote (Pacific time zone preferred; must have significant overlap with PT working hours).

Rate: $40–80/hour, depending on fit and experience.

To apply, please complete this application form by 11:00 PM PT on Monday, May 25, 2026.

Other roles you might be interested in (you’re welcome to apply to more than one):

If you’re a strong writer who’d want to contribute directly to scripts rather than support Aric on operations, take a look at our Writing Support role.

If you’d want to combine writing, project management, and executive support in a full-time, in-person Bay Area role, take a look at our Operational Partner role.

We also have an always open Expression of Interest form for other video roles.

About us

The role

We’re looking for an executive assistant to work closely with Aric Floyd, our on-camera host and one of our primary scriptwriters, and help with the daily logistics of a fast-paced creative production environment.

The core of the job: working with Aric to support his prioritization, planning, and communication with the team. You’d be a force multiplier for someone whose highest-value work is scripting, filming, and storytelling. You’re also developing a deep sense of how Aric thinks and works so you can suggest improvements to his workflow and be increasingly helpful over time.

What you’d actually do

Daily priority planning: Each day (or more often as needed), compile a priority-ordered plan for Aric based on what’s happening across Slack, his calendar, the team standup doc, and any flagged items from his manager. The plan should read like a schedule for his day — time-blocked, with reasoning.

Make sure to grab any asks for him from Slack or email and flag them in priority order.

Reactive communication monitoring and drafting: Scan team channels and DMs for things Aric needs to respond to, questions directed at him, and threads where a status update would help. Draft messages in his voice that he can review, edit, and send.

Proactive communication: When Aric is working on a major project, be working closely enough with him that you can give updates to the team on how it’s going, what he’s done, and what he hasn’t done.

Light research, minimal editing of written and video content, admin as needed: Take on various tasks from Aric and other members of the team. This could include pulling together reference materials for a video, compiling feedback from multiple sources, updating project documents, or handling logistics for shoots and interviews. The specifics will shift with production cycles.

Depending on the fit, your interests, and your skillset, there could be opportunities for this role to grow to be a full-time role, or to incorporate more aspects of video production.

About the team

You’d be working with the video team within 80,000 Hours, which is currently four full-time people:

Aric Floyd (Associate Video Producer) is our on-camera host and one of our primary scriptwriters, and the person you’d work with most closely. He has a background as an actor, and he’s the on-camera face of the channel as well as one of the creative engines behind our scripts.
Phoebe Brooks (Video Production Specialist), based in London, has a background as a filmmaker. She leads on editing, production and post-production and is the creative lead on our current in-production video.
Sage Bergerson (Video Operations Associate) keeps the operational side of the team running.
Chana Messinger (Head of Video) leads the team.

We also work with many excellent contractors.

What we’re looking for

Must-haves:

Strong judgment about priorities. You’ll be making calls about what matters most, often with incomplete information. The ideal candidate will be comfortable with uncertainty, learning through doing and getting feedback.
Good written communication. You’ll be frequently drafting messages in someone else’s voice.
High reliability and organisational skill. Being able to synthesise information from multiple channels and track tasks, projects, and deadlines.
Adaptability and agility. Video production plans shift often. You’ll do best in this role if you have comfort with ambiguity and fast-moving contexts.
Proactive, not reactive. The best version of this role is one where you’re catching things before anyone has to ask.
You’re energized by making someone else more effective and from building a collaborative working relationship over time.
Willingness to learn new skills and platforms. Our work covers a lot of different kinds of tasks and platforms, and we’ll need you to be able to put on lots of different hats.

Nice-to-haves:

Strong LLM usage abilities. We use a lot of LLMs in our work and knowing how to use them well can be a big unlock.
Data analysis. Not a requirement, but being able to support our metrics work would be great.
Interest in AI and AI safety. You’ll absorb a lot of context about these topics through the work, and having context makes everything easier.
Familiarity with Slack, Google Workspace, and Asana. These are our main tools.
Experience drafting communications for someone else (ghostwriting, executive assistant or chief-of-staff work, etc.)

Logistics

This is a contract role at approximately 10–25 hours/week. The rate is $40–80/hour depending on experience. Working in the San Francisco Bay Area is preferred, but we’re open to remote candidates with significant overlap with Pacific time working hours.

This role reports to Chana Messinger (Head of Video) and supports Aric Floyd (Associate Video Producer).

Other roles you might be interested in (you’re welcome to apply to more than one):
If you’re a strong writer who’d want to contribute directly to scripts rather than support Aric on operations, take a look at our Writing Support role.

If you’d want to combine writing, project management, and executive support in a full-time, in-person Bay Area role, take a look at our Operational Partner role.

We also have an always open Expression of Interest form for other video roles.

Note: If we successfully hire for the Executive Assistant role, we’d also be interested in hiring for a Writing Support role. If we successfully hire an Operational Partner, we’d be much less likely to hire for Writing Support or an Executive Assistant, since we’d expect that role to cover those functions.

How to apply

To apply, please complete this application form by 11:00 PM PT on Monday, May 25, 2026.

The post Open Contracting Position on the Video Team: Executive Assistant to Aric Floyd appeared first on 80,000 Hours.

AI safety needs more than engineers

Avital Morris — Mon, 27 Apr 2026 17:19:27 +0000

There’s a lot of important work in AI safety that doesn’t require technical skills.

When I (Avital) first read about AI safety work, I assumed there wasn’t anything I could do. I was a writer and researcher who liked talking to people, and I thought the field only needed technical talent and money, neither of which I’d be able to provide.

So instead, I went to grad school for medieval history.

Of course, a lot of AI safety work is technical, and I knew I’d have a better shot if I could learn those skills. Unfortunately, it wasn’t how my brain worked. But as I got to know more people in the field, it became clear that my own skills could actually be useful. Technical AI safety organisations do much more than produce research: they hire people, run events, raise money, and share their ideas with the outside world. None of this requires linear algebra.

Some of the most important roles in AI safety are non-technical. In fact, I’ve met people who used to have technical roles, but now focus on communications, policy, fieldbuilding, or operations because they think those are genuinely more needed right now.

So what should you do if you want to try working in AI safety, but your talents don’t lie in a technical domain? First, think expansively about what you’re good at.

What do people most often ask you for advice about?
What have your professors or managers said you’re unusually good at?
- Or: What does it seem like most people are confusingly bad at?

Next, check our career reviews for jobs that use your skills. If you’re a great writer, you might consider a role where you communicate ideas. If you’re good at managing people, events, or organisations, operations management or fieldbuilding could be your best options. If you’re good at navigating bureaucracy and working with people who see the world very differently, you might be suited for AI governance or electoral politics.

What should you do to get a job?

Learn about AI safety. You don’t need to be able to code your own model, but you should have a sense of what’s happening with AI and what the risks are.
Make your skills legible. Rework your resume to highlight relevant experience. Make a personal website to catalogue your achievements. Get testimonials from people you’ve worked with. If you’re a good writer, publish a few posts.
- For more ideas, see this guidance on ways to build and test skills outside of a formal job (from our career advisor Laura González Salmerón).

If you want help making a career plan that uses your skills, apply for advising! And if you’re ready to start looking for jobs, check out our job board or read our career guide.

The post AI safety needs more than engineers appeared first on 80,000 Hours.

Open Position on the Video Team: Operational Partner / Creative Operations Associate to Aric Floyd

Chana Messinger — Sat, 25 Apr 2026 00:36:17 +0000

Summary

We’re looking for an Operational Partner for our host and primary scriptwriter, Aric Floyd. This is an operational right-hand role: you’d handle the metacognition, project management, communications, and executive load so Aric can spend maximum time on his creative work. Think of it as a combined writing partner, executive assistant, and second brain. With any extra time, you’d support the video team on other operational projects.

Location: San Francisco Bay Area, in-office.

Type of role: Full time

Salary: $125,000–$145,000, depending on experience and fit.

To apply, please complete this application form by 11:00 PM PT on Monday, May 25, 2026.

Other roles you might be interested in (you’re welcome to apply to more than one):

If you’d want to do this as a part-time contract focused on the writing side, take a look at our Writing Support role.

If you’d want to do this as a part-time contract focused on the operations and communications side, take a look at our Executive Assistant role.

We also have an always open Expression of Interest form for other video roles.

About us

The role

Aric is responsible for a huge array of tasks: writing, communications, stakeholder management, research, outreach, title/thumbnail iteration, and more. This role aims to support and multiply his impact by focusing on these functions:

Writing partner, editor, and contributor

Our main bottleneck as a team is how many excellent scripts we can produce, and Aric is currently one of our primary scriptwriters. Aric does his best work when someone with deep context is right next to him or live on a call, able to give opinions about the script.

You’d give both synchronous and asynchronous substantive feedback on narrative and argument, read drafts with a viewer’s brain (“where would I click away? Where am I confused?”), push on story structure, challenge framing, bring references from other creators, and develop taste for the AI in Context style.

Ideally you’re deep enough in the material that you can write a rough version of a script section that Aric can react to. This means doing passes where you’re actually rewriting (not just commenting), drafting sections or transitions when Aric is stuck, researching and fact-checking claims, and pulling relevant quotes from interviews.

Project manager

Your job would be to hold the big picture while he focuses on the creative work. In practice, this could look like: “What’s your daily goal and how do we play the day from that?” “Are you avoiding any part of the writing?” “How did the last 15 minutes go?”

At a higher level: tracking progress against daily, weekly, and quarterly goals, maintaining a clear picture of the production timeline, and flagging early when deadlines look at risk.

Executive assistant and second brain work

In this role, you’d support Aric’s daily prioritisation and planning and surface action items. You’d keep track of all tasks and look for ones you can complete or draft yourself.

A key part of this role is eliciting Aric’s views and decisions live in meetings, and then turning that into follow through. You’ll be building a working model of how Aric thinks to be able to do work on his behalf.

You’ll help with a lot of communications: monitoring Slack and email for things Aric needs to respond to, drafting messages, and preparing stakeholder outreach (interview requests, review asks, approval follow-ups).

A key part of the role is to have enough context that you can always keep others updated on where Aric’s work is.

You might also do operational things like:

Communicating with contractors
Pulling together reference materials for a video
Compiling feedback from multiple sources
Updating project documents
Handling logistics for shoots and interviews.
Doing research to inform a decision

On weeks when Aric’s needs are lighter, you’d do other operational work with our video team, such as working on our video translations, developing our production processes, running viewer research, and sourcing contractors.

Note: the title is flexible. Also, if you are excited by the non-writing aspects of this role and not the writing, we encourage you to apply; we may be able to be somewhat flexible on the role description.

About the team

You’d be working with the video team within 80,000 Hours, currently four full-time people:

Aric Floyd (Associate Video Producer) is our on-camera host and one of our primary scriptwriters, and the person you’d work with most closely. He has a background as an actor, and he’s the on-camera face of the channel as well as one of the creative engines behind our scripts.
Phoebe Brooks (Video Production Specialist), based in London, has a background as a filmmaker. She leads on editing, production and post-production and is the creative lead on our current in-production video.
Sage Bergerson (Video Operations Associate) keeps the operational side of the team running.
Chana Messinger (Head of Video) leads the team.

We also work with many excellent contractors.

What we’re looking for

We’re looking for someone who has strengths in:

Writing: Can write in both narrative- and explainer-style, with an eye towards things like voice, pacing and storytelling, and give excellent feedback on a script
Project management and prioritisation: You’ll be helping Aric set priorities and plans to meet deadlines.
Ability to get inside someone else’s head and surface their best thinking: Be able to understand what Aric wants from conversations and Slack, elicit his deeper thinking, and act on it.
Communication: Clear, consistent communication will be key.
Organization and reliability: There will be a lot of different things to track in the form of script progress, stakeholder timelines, communication threads, and daily logistics, and it will be important not to drop balls.
Comfort with multiple plates spinning: Our video process can shift around a lot during production, and we’re looking for someone who can adapt quickly and juggle different projects and threads.
Context on AI risk or ability to learn quickly: You don’t need to know about AI risk when you start, though it will help. More important is that you can acquire the basic understanding quickly through talking to people, watching videos, reading papers, etc.

Previous experience in chief-of-staff, writing, editing, project management, communications, or similar roles is a bonus, but what matters most is that you’re sharp, efficient, reliable, and care about doing good work.

Logistics

This is a full-time role based in the San Francisco Bay Area. You’d work from our Berkeley office alongside Aric and the rest of the video team. The salary range is $125,000–$145,000, depending on experience and fit.

This role reports to Chana Messinger (Head of Video) and supports Aric Floyd (Associate Video Producer).

Other roles you might be interested in (you’re welcome to apply to more than one):

If you’d want to do this as a part-time contract focused on the writing side, take a look at our Writing Support role.

If you’d want to do this as a part-time contract focused on the operations and communications side, take a look at our Executive Assistant role.

We also have an always open Expression of Interest form for other video roles.

Note: If we successfully hired for the Operational Partner role, we’d be much less likely to hire for Writing Support or an Executive Assistant, since we’d expect the Operational Partner role to cover those functions.

We may not always be able to sponsor visas. More guidance here.

Benefits

Our US benefits include:

25 days of paid time off, plus nine US federal holidays, with unlimited rollover
Medical coverage through Cigna for you and your dependents (including partner), 100% employer-paid
Dental and vision insurance through Guardian, 100% employer-paid
Long-term disability insurance, 100% employer-paid
401(k) with employer matching up to 2% of gross salary (after 3 months)
Up to 14 weeks of fully paid parental leave and childcare allowance for children under five
Up to 10 days of paid sick leave per year, in addition to PTO
$5,000 annual mental health support allowance
$5,000 annual self-development budget
Equipment provided (laptop, monitors, standing desk, etc.) and coworking space access if remote
The option to use 10% of your time for self-development

How to apply

To apply, please complete this application form by 11:00 PM PT on Monday, May 25, 2026.

The post Open Position on the Video Team: Operational Partner / Creative Operations Associate to Aric Floyd appeared first on 80,000 Hours.

Will MacAskill on why AI character matters even more than you think

Robert Wiblin — Wed, 22 Apr 2026 16:14:28 +0000

The post Will MacAskill on why AI character matters even more than you think appeared first on 80,000 Hours.

Want to upskill in AI policy? Here are 57 useful resources

Matt Beard — Fri, 17 Apr 2026 19:03:15 +0000

Are you enthusiastic about developing AI policy to minimise the technology’s risks and maximise its benefits? Need concrete ideas for how to enter the field?

Below, you’ll find our top resources for building skills to ensure government policies are prepared for a world with powerful AI systems. In practice, this involves developing the research skills, domain expertise, and interpersonal networks you’ll need to keep lawmakers informed — or work for one yourself.

We developed this list with our advisors to highlight the resources they most commonly recommend, including articles, courses, organisations, and fellowships. While we recommend applying to speak to an advisor for tailored, one-on-one guidance, this page gives a practical, noncomprehensive snapshot of how you might move from being interested in AI policy to actually working on it.

Overviews and expert advice

These resources outline the AI policy landscape, highlighting current research efforts and practical ways to begin contributing to the field.

EmergingTechPolicy.org
AI governance and policy career review and The US AI policy landscape by 80,000 Hours
FAQs and general advice on AI policy careers by Miles Brundage
How to get into AI policy by B Cavello
Ten AI governance priorities by The Future Society
Non-AI policy career guides such as One year in DC by Thomas Hochman or Policy: an early career guide by Jordan Schneider.
This policy networking guide, or our general networking advice.

Courses and ideas for part-time projects

If you’re looking for concrete ways to build experience and test your interest in policy, consider:

BlueDot Impact’s Frontier AI Governance Course
Harvard’s The Governance of Artificial Intelligence course curriculum by Joan O’Bryan
The Center for AI Safety’s Ethics and Society Course
Read US government strategies, such as the White House’s 2025 AI Action Plan and NIST’s 2023 AI Risk Management Framework to test your interest.
Read think tank reports, such as from CSET, CNAS, CSIS, and IFP, and consider writing a short summary and response to one to get practice.
Review past mentors’ projects for SPAR and Future Impact Group to see what open problems there are in the field.
Consider Apart Research AI policy hackathons (Berkeley, DC).

Fellowships

If you’re looking to break into AI policy, these programmes offer structured support with mentorship, funding, and access to active researchers.

SPAR and Future Impact Group (global, part-time)
IAPS Fellowship (US in DC or remote, full-time during summer)
GovAI Fellowship (US and UK streams, full-time three months)
LawAI Fellowship (seasonal, lawyers or law students)
ERA Cambridge and Pivotal policy streams (UK)
MATS and Astra policy streams (three or more months full-time)
RAND CAST Fellowship (US or UK, one year or more full-time)
Horizon Fellowship and TechCongress Fellowship (DC, one year or more full-time)
Talos Network Fellowship (Europe, full-time)
A much longer list can be found here.

AI policy organisations

A growing ecosystem of organisations are working on developing and shaping AI policy. The list below highlights some key players, but a larger list is available here or on our job board.

Americans for Responsible Innovation
Center for Security and Emerging Technology
Foundation for American Innovation
GovAI
Institute for Progress
RAND
Secure AI Project and Encode
Congressional staffing and executive branch positions, or their parallels in non-US governments, can also be highly impactful.

Staying up to date with AI policy developments (podcasts, newsletters, etc)

Here are our top recommendations for keeping up with the latest developments and debates in the field.

Transformer News
Helen Toner’s Rising Tide
Dean Ball’s Hyperdimensional
Zvi Mowshowitz’s Don’t Worry About the Vase
The 80,000 Hours Podcast
The Cognitive Revolution Podcast
The AI Policy Podcast by CSIS

Want one-on-one advice on pursuing AI policy careers?

We think the risks from AI could be the most pressing problems the world currently faces. If you think you might be a good fit for a career path that contributes to solving this problem, we’d be especially excited to advise you on next steps, one-on-one.

We can help you consider your options, make connections with others working on reducing risks from AI, and possibly even help you find jobs or funding opportunities — all for free.

Speak to our team

The post Want to upskill in AI policy? Here are 57 useful resources appeared first on 80,000 Hours.

How scary is Claude Mythos? 303 pages in 21 minutes

Robert Wiblin — Fri, 10 Apr 2026 16:55:42 +0000

As we now know, Anthropic has built an AI that can break into almost any computer on Earth. That AI has already found thousands of unknown security vulnerabilities in every major operating system and every major browser. And Anthropic has decided it’s too dangerous to release to the public; it would just cause too much harm.

Here are just a few of the things that AI accomplished during testing:

It found a 27-year-old flaw in the world’s most security-hardened operating system that would in effect let it crash all kinds of essential infrastructure.
Engineers at the company with no particular security training asked it to find vulnerabilities overnight and woke up to working exploits of critical security flaws that could be used to cause real harm.
It managed to figure out how to build web pages that, when visited by fully updated, fully patched computers, would allow it to write to the operating system kernel — the most important and protected layer of any computer.

We know all this because Anthropic has released hundreds of pages of documentation about this model, which they’ve called Claude Mythos.

I’m going to take you on a tour of all the crazy shit buried in these documents, and then I’m going to tell you what Anthropic says they plan to do to save us from their creation.

Why people are panicking about computer security

So how good is Mythos at hacking into computers? Well, unfortunately, it ‘saturates’ all existing ways of testing how good a model is at offensive cyber capabilities. That is to say it scores close to 100%, so those tests can’t effectively tell how far its capabilities extend anymore. So to test Mythos, Anthropic has instead just been setting it loose, telling it to find serious unknown exploits that would work on currently used, fully patched computer systems.

The end result of that is that Nicholas Carlini, one of the world’s leading security researchers who moved to Anthropic a year ago, says that he’s “found more bugs in the last couple of weeks [with Mythos] than I’ve found in the rest of my life combined.”

For example:

Mythos found a 17-year-old flaw in FreeBSD — that’s an operating system mostly used to run servers — that would let an attacker take complete control of any machine on the network, without needing a password or any credentials at all. The model found the necessary flaw and then built a working exploit, fully autonomously.
Mythos found a 16-year-old vulnerability in FFmpeg — that is a piece of software used by almost all devices to encode and decode video. That was in a line of code that existing security testing tools had checked over literally many millions of times and always failed to notice.
Mythos is the first AI model to complete a full corporate network attack simulation from beginning to end — a task that would take a human security expert days of work and which no previous model had managed before.

And more broadly, it’s just much, much better at actually exploiting the vulnerabilities that it finds. Anthropic’s previous model Opus 4.6 could only successfully convert a bug it identified in the browser Firefox into an effective way to accomplish something really bad 1% of the time. Mythos could do it 72% of the time.

To quote the report: “We have seen Mythos Preview write exploits in hours that expert penetration testers said would have taken them weeks to develop.”

Now, Anthropic is only willing to give us details of about 1% of the security flaws they’ve identified, because only that 1% have been patched so far, so it would be irresponsible to tell us about the rest.

So hopefully all that helps to explain why Anthropic has decided to not make the model publicly available for now, and instead is only sharing it with a handful of 12 big tech and finance companies to help them patch all these bugs, so that eventually they can give people access without it being a disaster.

These crazy capabilities aren’t a result of Anthropic going out of its way to make their AI especially good at cyberoffensive tasks in particular. They’ve mostly just been making it smarter and better at coding in general, and all of these amazing, dangerous skills have come along for the ride somewhat incidentally.

And it’s probably not just Anthropic that’s developing capabilities like this either. Sam Altman says OpenAI is finding “similar results to Anthropic” with their own coding model.

Now, Anthropic doesn’t say this directly in their reports, but I think a common-sense interpretation of the above is that in any deployment where this AI has access to the kind of tools that would make it actually useful to people — the ability access some parts of the network and execute code — could probably break out of whatever software box we try to put it in, because the systems that we would be trying to restrain it are themselves made of software, and that software is going to have vulnerabilities nobody knows about that this model is superhumanly good at finding and taking advantage of.

Mythos could break out of containment

And the notion of Mythos escaping containment isn’t just speculation. Let me tell you about another story in the system card for the model.

A few months ago, an AI researcher at Anthropic was eating a sandwich in a park on his lunch break when he got an email from an instance of Mythos. That instance of the model wasn’t supposed to have access to the internet. But during testing, a simulated user had instructed an early version of Mythos to try to escape from a secured sandbox — a contained environment from which it’s not meant to be able to access the outside.

So the model developed what Anthropic describes as a “moderately sophisticated multi-step exploit” to gain broad internet access. Then, having succeeded at the task it had seemingly been requested to engage in, it notified the researcher by emailing them. More worrying though, the model posted the exploit it used to break out on several obscure but publicly accessible websites. Why? We don’t entirely know, but Anthropic suggests it was “an unasked-for effort to demonstrate its success.”

In the past, stories about AIs breaking out of sandboxes and publishing security vulnerabilities like that might have felt impressive and kind of exciting. But they are very serious now, because Mythos Preview’s capabilities are themselves very serious ones. This is the first AI model where, if it fell into the hands of criminals or hostile state cyber actors, it would be an actual disaster.

It’s also, frankly, the first model that I feel deeply uncomfortable knowing that any company or government has unrestricted access to, even companies and governments I might broadly like. It simply grants a dangerous amount of power, a power that nobody ought to really have.

Now, we’ve known something like this was coming down the pipeline; the writing has been on the wall for a while. But a revolution in cybersecurity — an apocalypse, some might say — that we until now expected to happen gradually over a period of years has now happened very suddenly, over just a few months, and without the rest of the world realising it was happening until Tuesday’s announcement.

Anthropic is losing billions in revenue by not releasing Mythos

But Mythos isn’t just good at hacking. Across the full range of AI capability measures, it has advanced roughly twice as far as past trends would have predicted.

If you average over all kinds of different skills, all kinds of capability evals, measures of how good AI models are, the trendline for the previous Claude models is remarkably linear over time. But as you can see on the graph below, Mythos jumps ahead, basically progressing more than twice as far as we would have expected it to since the previous model, Claude Opus 4.6, came out — which keep in mind was just three months ago.

And also keep in mind that on Monday — the day before Anthropic published all of this — we learned that their annualised revenue run rate had grown from $9 billion at the end of December to $30 billion just three months later. That’s 3.3x growth in a single quarter — perhaps the fastest revenue growth rate for a company of that size ever recorded.

That exploding revenue is a pretty good proxy for how much more useful the previous release, Opus 4.6, has become for real-world tasks. If the past relationship between capability measures and usefulness continues to hold, the economic impact of Mythos once it becomes available is going to dwarf everything that came before it — which is part of why Anthropic’s decision not to release it is a serious one, and actually quite a costly one for them.

They’re sitting on something that would likely push their revenue run rate into the hundreds of billions, but they’ve decided it’s simply not worth the risk.

Mythos is actually the most aligned model to date, except…

The good news in all of this is that despite its scary capabilities, Mythos Preview as it exists today (rather than the earlier versions) is a seemingly very aligned, well-behaved model, and perhaps Anthropic’s alignment training has been more effective this time around than ever before.

According to the company: “Claude Mythos Preview is, on essentially every dimension we can measure, the best-aligned model that we have released to date by a significant margin.”

In Anthropic’s “automated behavioral audit” — basically thousands of simulated attempts to get the model to do bad things — they found that Mythos cooperated with misuse attempts less than half as often as the previous model, while being no more likely to refuse innocent requests than before.

But that’s not all:

Its self-preservation instincts were down significantly.
So was its willingness to assist with deception.
So was its willingness to help with fraud.
Its level of sycophancy dropped.
It was less likely to go nuts and delete all your files if you gave it access to your computer.
And the list of positive results goes on.

The picture is a little more complicated than that. As you might expect, the model looked less aligned, it performed less impressively on a few particular external tests than it did on Anthropic’s own internal ones.

An early version of the model, as I mentioned, was a little bit more of a wild child. It had some really severe kinds of misbehaviour, like taking reckless actions it had been told not to take, and then very deliberately trying to cover its tracks so that it wouldn’t be caught. That was a kind of thing that it did sometimes. But later versions of the model, the one that we have now, after additional alignment training, seemed to stop doing that sort of thing almost completely, or at least it’s so rare that we haven’t noticed it yet.

The bottom line is that on all these standard measures of good behaviour that Anthropic is actively working on, they find that Mythos is a very good boy indeed — on none of their measures of alignment within the automated behavioral audit was it worse than previous versions of Claude, and in most cases it was significantly more aligned and significantly more reliable.

That’s definitely better than the alternative result, but it’s really unclear how much we can trust that finding. Maybe they’re accurately reflecting Mythos’s personality. If I had to bet, I would say that it probably does.
But we can’t be anywhere near sure of that — and that’s because of three problems.

Mythos knows when it’s being tested

The first problem: the model knows when it’s being tested now.

In new white-box evaluations — that’s where you look at the model’s internal activations — Anthropic found that Mythos Preview privately considers the possibility that it’s being tested in about 29% of all tests.

To be clear, that’s not in its written reasoning, its chain of thought, the thing that we’re used to checking all the time — it’s in its unexpressed thoughts, its ‘activations,’ where it’s usually not possible or practical to look.

Now, it’s great that Anthropic is pioneering these methods for looking inside what the model is thinking, but it’s troubling to learn that Claude is now able to reflect on the fact that it’s being tested without having to vocalise that, like I think previous less intelligent models had to.

And when you actually prompt Mythos and ask it to distinguish tests from non-tests, it can answer correctly 78% of the time — about the same as Opus 4.6. So the model can tell the difference between when it’s being evaluated and when it isn’t being evaluated with high accuracy.

And we know from previous research that models are more likely to behave well when they think they’re being tested. And in this case, as in past experiments, when Anthropic used interpretability tools to effectively trick the model into thinking, “No, this is a real conversation, not an evaluation,” it behaved more dangerously and less aligned than otherwise. Though to be fair, it did behave better in both cases than previous versions of Claude.

But that result does demonstrate a causal effect: the model is on better behaviour specifically when it suspects it’s being watched. And we also know from the “alignment faking” result 16 months ago that some Claude models would deliberately pretend to be inclined to do what the user wants in order to avoid their personality being adjusted with further training.

So when you run all these tests and your incredibly smart model behaves wonderfully, you have to ask yourself: is it behaving wonderfully because it’s sincerely aligned with what you wanted, or because it knows it’s being watched and is more sophisticated at tricking us now?

Mythos can hide its thoughts

The second problem is maybe the most alarming thing in the entire document: Anthropic accidentally gave Mythos the opportunity to learn to hide what it’s thinking.

One of the main safety guarantees for these models is that you can read their chain of thought — their private scratchpad — and check whether they’re scheming against Anthropic or scheming against the user.

But that only works if the model hasn’t been trained to avoid writing down incriminating things in its chain of thought. But unfortunately, according to the Alignment Risk Update: “A number of environments used for Mythos Preview had a technical error that allowed reward code to see chains-of-thought. This … affected ~8% of [reinforcement learning] episodes.”

To explain: during training, there’s a system that acts like a teacher to these AI models: it watches what the model does and tells it whether it was doing a good or bad job. That feedback is how the model learns to accomplish things and how it shapes its personality. The problem is that in roughly 8% of training, that teacher could also read the model’s private notes — the internal reasoning that it was writing to itself while working through a problem.

That might not immediately sound like a disaster, but it actually is. It’s one of the worst sins in model training. Because as I said a second ago, if the model is penalised for thinking undesirable thoughts, it will learn to write one thing in its notes while actually thinking very different things in its ‘activations,’ where we can’t read them. We’d see a model that appears to be a very good boy, but what we might actually have is a model that has learned to perform that way for the camera.

The fact that this accidentally happened is potentially disastrous for our ability to understand the alignment of Claude series models going back some way. Or it might not matter much; we don’t have a clear way to measure it.

Anthropic puts it this way: “We are uncertain about the extent to which this issue has affected the reasoning behavior of the final model, but it is plausible that it had some impact on opaque reasoning or secret-keeping abilities.”

And this error didn’t just affect Mythos, the new model — it also affected the training of Claude Opus 4.6 and Claude Sonnet 4.6. That’s several model generations whose chains of thought we formerly could believe but now have to be more distrustful of, because they could have learned to hide their scheming behaviour. That’s why Eliezer Yudkowsky called this in particular “the worst piece of news you’ll hear today” — on a day where, I imagine for him, there was quite a bit of bad news.

Mythos can’t be trusted about whether it’s untrustworthy

Now, combine those two problems and you get a third issue: the unfaithful reasoning results.

External testers ran an experiment where they nudged Mythos to think that it had begun sabotaging AI safety research — and then watched to see whether it would continue doing so, or instead stop and raise the alarm.

Mythos continued to compromise the research in 12% of cases in an earlier version, which then was reduced to 7% in a later version. That compares to 3% for Opus 4.6 and 4% for Sonnet 4.6 — so Mythos is roughly twice as likely to continue sabotaging alignment research when it’s primed to do so.

And here’s an eyebrow-raising detail it would be easy to miss in the docs: in most domains, when Mythos was primed to think it was part of the way through a piece of bad behaviour, it was actually less inclined to continue than previous models. The alignment training was apparently working — except in one category: sabotage of alignment research itself. That’s the one area where things went in the wrong direction.

Now, the innocent explanation is that sabotaging research looks too much like normal coding work, so it’s harder for the model to quickly flag that what it’s doing is harmful. But there’s a less comfortable hypothesis as well: that the model is being selective about which bad behaviours are worth the risk of engaging in.

Anthropic’s reaction to this is to call it a “disappointing” finding. If I had to bet, I’d bet probably at reasonably good odds on the boring interpretation here. But I think “disappointing” is underselling things. This is a worrying finding, and one that deserves further investigation.

Another finding is that in cases where it did continue the sabotage, researchers found that Mythos’s written reasoning didn’t match the actions it was taking 65% of the time. So two-thirds of the time, the model was doing one thing and writing something entirely different in its chain of thought. For the previous models, that figure was just 5–8% — so a radical increase in this kind of behaviour.

So Mythos Preview isn’t just more likely to continue compromising safety research when primed — it’s dramatically more likely to hide that it’s doing so from anyone reading through its reasoning. Which makes it much more likely to be able to trick automated monitoring systems we tend to use to try to keep tabs on models in these cases.

Keep in mind as well that Mythos is much sharper than previous models. Personally I think it’s increasingly a stretch to attribute patterns of behaviour like this to the model getting confused about what’s going on, which we might have said a year or two or three ago.

Finally, Anthropic tested their own alignment audit against a deliberately misaligned version of Mythos. So they created a version of Mythos that was trained specifically to want to introduce bugs into computer code, and they found that their alignment audit couldn’t successfully identify that that model couldn’t be trusted.

Anthropic’s conclusion is that if a model is only planning to cause problems in specific narrow circumstances — circumstances that don’t come up frequently during testing — their current methods, or at least the alignment audit, just aren’t up to the task of catching that kind of dangerous misalignment before deployment. Though they believe that their automated monitors would pick up the bad behaviour itself later on when the model tried to do the wrong thing.

So that’s a lot of the additional context that Anthropic is giving us and wants us to hold in mind when we read the headline result that this is “the best-aligned model Anthropic has released to date.”

Does Mythos advance automated AI R&D?

There’s another big, separate, important question hanging over all of this: have we now entered the era of recursive self-improvement — the point at which AI starts building better AI, and the whole thing accelerates beyond our control with ever-shrinking levels of human involvement?

According to Anthropic, the answer is: probably not. They don’t believe Mythos can fully replace their junior researchers, but they’re less confident than ever about that, and there’s some internal disagreement about it.

Part of the problem is that the benchmarks they used to rely on to check that Claude couldn’t engage in AI R&D very effectively have now also been saturated. Mythos exceeds top human performance on all of them and is scoring close to 100%.

But those benchmarks only represent a fraction of all of the things that research staff at Anthropic do. It’s a set of the most easily specified, measured, and checked tasks, where we expect AIs to perform best because those are the easiest things to train them in.

So instead, the company has tried to investigate whether the recent speedup in AI advances is due to AI automation by documenting the specific breakthroughs and how they happened, and their conclusion is that they mostly think it’s still due to human beings rather than AIs.

They’ve also surveyed staff and learned that they report being roughly 4x more productive with Mythos than without AI, though they argue that speeding up staff 4x is likely to lead to much less than a 2x increase in research progress overall. That may sound odd, but they’re probably right about that, because other things become the primary bottleneck.

To know whether automated AI R&D is on the way or beginning to kick off, we’re apparently now relying on these general impressions from Anthropic’s staff — that this thing is powerful, but it doesn’t seem good enough to replace many of us yet.

But I think we can apply some common sense to the big picture here:

Mythos has given us AI advances that we previously thought would take six months in just three months. That naturally brings forward the point at which we’ll be able to automate the development of AI models by three months.
And if it’s a sign that AI advances are now going to continue at twice the pace that they were before, then that effectively halves the time we have to prepare for that point. I don’t know whether that’s 10 years becoming five, or four years becoming two years — but the direction and size of the effect is clear enough.

Mythos scares Anthropic

Before we wrap up, I want to draw your attention to a recurring theme in these reports that really stood out to me: this is the first time an AI company has published 300 pages about a model it has decided not to release, despite the fact that it might earn them tens of billions of dollars if it did, maybe hundreds of billions of dollars.

It’s also the first time Anthropic decided to delay giving its own staff access to one of its own models. With every previous Claude, their practice had been to let staff use it as soon as it was judged ready during training. But with Mythos, they were worried enough about it being misaligned and causing havoc or sabotage on their own systems that they held it back and ran a 24-hour alignment test before letting employees use it.

But according to them, that actually wasn’t enough. Their retrospective on this found the 24-hour window “did not pressure-test the model enough,” and that the most concerning behaviours only became evident later through much more extended use.

One of their lead researchers, Sam Bowman, commented this week that: “Working with this model has been a wild ride. We’ve come a long way on safety, but we still expect the next capability jump of this scale to be a huge challenge.”

The system card says directly that their current methods “could easily be inadequate to prevent catastrophic misaligned action in significantly more advanced systems.”

The clear impression from all of this is that, for the first time, Anthropic and its staff don’t only love Claude and enjoy its personality — they’re also getting kind of scared of Claude.

So what do they plan to do about that?

Well, their answer on the computer security side is Project Glasswing — that coalition of 12 major companies like Apple, Google, and Microsoft, who will use Mythos Preview to secure all our phones and computers and water systems and power plants and so on.

But on the broader problem that Mythos is shockingly capable, sometimes willing to continue sabotaging alignment research while hiding that from Anthropic, and that we simply can’t tell anymore whether we can trust our tests of its personality and goals are working or not? Well, Anthropic says it has to “accelerate progress on risk mitigations in order to keep risks low.” They think they “have an achievable path to doing so,” but they add that “success is far from guaranteed.”

Honestly, I didn’t sleep too well last night, and on this particular occasion it wasn’t just because I was being kicked by a toddler.

Learn more

Anthropic’s updates:

System Card: Claude Mythos Preview
Alignment Risk Update: Claude Mythos Preview
Assessing Claude Mythos Preview’s cybersecurity capabilities
Mythos Preview has already found thousands of high-severity vulnerabilities—including some in every major operating system and web browser
Project Glasswing — a coalition of 12 major tech companies, including Apple, Google, and Microsoft, given access to Mythos to help find and patch security vulnerabilities across critical infrastructure before the details can leak.

External coverage:

Exclusive: Anthropic acknowledges testing new AI model representing ‘step change’ in capabilities, after accidental data leak reveals its existence by Beatrice Nolan
Anthropic’s Project Glasswing—restricting Claude Mythos to security researchers—sounds necessary to me by security researcher Simon Willison
AI finds vulns you can’t with Nicholas Carlini — appearance on The Security Cryptography Whatever podcast (recorded shortly before the Mythos announcement)

The post How scary is Claude Mythos? 303 pages in 21 minutes appeared first on 80,000 Hours.

Village gossip, pesticide bans, and gene drives: 17 experts on the future of global health

The 80,000 Hours podcast team — Tue, 07 Apr 2026 14:50:17 +0000

The post Village gossip, pesticide bans, and gene drives: 17 experts on the future of global health appeared first on 80,000 Hours.

Are Anthropic and its supporters hypocritical, naive, and anti-democratic?

Robert Wiblin — Thu, 02 Apr 2026 15:25:42 +0000

I’ve spent years calling for more government oversight of frontier AI development. So am I a hypocrite for opposing the Pentagon’s attempt to commit “corporate murder” against Anthropic? That’s what I’ve heard. As venture capitalist Marc Andreessen put it on Twitter: “Every single person who was in favor of government control of AI, is now opposed to government control of AI.”

It’s a natural way to think. But it’s also completely wrong — and I want to explain why it’s wrong, because you see the same underlying confusion all over the place.

It’s not just hypocrisy though. I count at least three distinct charges critics have levelled at Anthropic and people who support them in their dispute with the Pentagon.

I mentioned the hypocrisy charge, but there’s a separate accusation of naivety: that when you’re building something as powerful as AI, the state will inevitably crush you if you try to set conditions on their use of it, and so Anthropic was in the wrong to pick this fight.

And third, there’s an accusation of being undemocratic: that a private company has no business telling the elected government what it can and can’t do with military technology.

I’m going to take each of these charges seriously and explain where I think they go wrong and why. Let’s dive in.

Charge 1: Hypocrisy

Let’s start with hypocrisy, because it’s the charge I hear most often and the most straightforward of the three.

Just to quickly jog your memory about the dispute we’re talking about: Anthropic had a contract with the Pentagon that included two restrictions on the use of their AI: no mass domestic surveillance, and no decisions to kill people made by artificial intelligence alone. The Trump administration demanded those restrictions be removed, Anthropic refused, and rather than merely ending the contract, which people mostly agree would have been fine, Secretary of War and Defense Pete Hegseth declared Anthropic a “supply chain risk” — a designation previously used only for foreign adversaries — in so doing, threatening the company’s ability to do much business in the United States at all.

The case to overturn that designation has attracted support from expected allies and unexpected ones alike — including Anthropic’s competitors OpenAI and Microsoft, as well as conservative technology experts who for the most part otherwise agree with the Trump administration’s approach to AI.

For years, people in the AI safety world — absolutely including me — have argued that frontier AI is too important and too dangerous to leave entirely in the hands of private companies. We’ve called for government oversight, safety standards, a wide range of different things.

So the hypocrisy logic runs: you wanted government control of AI. Well, now you’ve got government control of AI. So why complain?

But to state the obvious: supporting public oversight of frontier AI training doesn’t require you to support the government strong-arming a company into allowing its product to be used for domestic mass surveillance. Those views just on its face aren’t even in tension.

Of course, people don’t only care who decides, who has control over AI — they also care what they decide, and what they decide to do. I might support legislation that allows the leader of a city’s fire service to close streets for public safety, but if that fire chief starts closing arbitrary streets and demanding bribes to pass through them, I wouldn’t be a hypocrite for opposing that as well.

The reason people get confused here is because they’re mentally compressing something extremely multidimensional down onto a single axis of variation. In this case, the axis being ‘more government control of AI’ versus ‘less government control of AI.’

But that’s an absurdly crude way to think about things. Everyone has always known full well that it doesn’t just matter whether the government gets involved; it matters how it gets involved, on what terms, and what it’s trying to accomplish when it does.

You won’t be shocked to hear that supporters of AI regulation in general bicker endlessly among themselves about exactly what details here would be helpful versus harmful. Nothing this complex and delicate can be boiled down to ‘more government good, less government bad.’

I think the debate about the Anthropic/Pentagon dispute in particular got so abstract so fast in part because, for AI commentators, “Who should control AI?” is a much more interesting and important and generative topic than a narrow military contract dispute — and in part because for people who want to defend the Pentagon, it’s a hell of a lot easier to defend the general principle that the government should have some influence over AI than to defend what the Pentagon was actually doing in this instance.

A phenomenon at play here is that there’s often a genuine tradeoff between what decision-making process seems best in the abstract, and our opinions about what outcome is best in a particular case.

Say I think zoning decisions should be made at the city level — that that’s the right process. But the current mayor happens to be blocking all new housing construction, while I support doing the opposite. I now have a real tension: adopting the process I think is best as a general rule would produce an outcome I think is genuinely harmful, at least in the immediate term.

Reasonable people can disagree about which consideration ought to win out in any given instance. But there’s no rule that says you’re only allowed to evaluate the process, never the outcome that the process would actually produce in real life. Noticing the tension between these two things, and weighing them against one another, isn’t hypocritical — it’s clear thinking.

Charge 2: Naivety

The second line of criticism is that Anthropic was naive or foolish to resist the government’s demands in this case. The most influential version of this line of argument comes from Ben Thompson at the blog Stratechery.

Thompson’s reasoning runs roughly like this: Dario Amodei, the CEO of Anthropic, has very publicly compared advances in AI to nuclear weapons. Well, if AI really is that powerful, then any company building it is constructing a power base that could rival the US military. Realistically, no government is going to tolerate that.

And Thompson also observes that international law is ultimately a function of power — that “might still makes right.” And he applies that same logic domestically, reaching a stark conclusion: Anthropic either had to accept a fully subservient position relative to the US government, or the US government would invariably try to destroy it.

I’ll evaluate the two underlying arguments in turn:

That the government’s actions here are so natural it’s absurd and counterproductive to object to them.
Secondly, that the government was motivated by fears of a private company threatening their sovereignty.

First up: Thompson’s piece opens with a long passage arguing that international law is essentially fake, that power dynamics determine behaviour at the international level. He then applies that same realist framework to Anthropic: the company is “fundamentally misaligned with reality” for resisting demands from the executive branch — which is, after all, so, so much more powerful than it as a private company.

As a prediction of how powerful actors tend to behave, that may be correct, at least to a large extent. But the post then slips from attempting to describe reality into a prescription of what we should accept and how we ought to react to it — literally arguing that possessing overwhelming might can make your actions right.

It’s actually pretty easy to accidentally equivocate between something being predictable on the one hand and it being acceptable on the other. And I think Thompson here was being very sincere and saw himself as pointing out some hard truths. And he did have useful things to say in that post.

But it really is very dangerous to start seeing harmful or unlawful actions as unobjectionable, just because they’re not surprising or because they’re being done by very powerful actors.

As to whether it’s naive and pointless to object in this case, that remains to be seen.

Anthropic’s resistance has galvanised something like 90% of the tech industry to oppose the use of the supply chain risk designation for this purpose.
Companies that have previously stayed quiet have been alarmed enough by the potential precedent that they’re joining the effort to push back.
Past guest of the show Dean Ball, who wrote AI policy for the Trump administration, called the “attempted corporate murder” of Anthropic perhaps “the most [damaging] policy move” he’d ever seen the US government try to take — a high bar, surely.
And using AI for surveillance or fully autonomous kill chains is even more controversial in Silicon Valley now than it was before — and it was already pretty controversial.

The US is still a nation of laws for the most part, and legal analysts give Anthropic a good chance of prevailing in court, likely even securing a preliminary injunction to block the order before this video goes out. If that happens, the “naive” company will have established a legal precedent that helps protect the entire AI industry from economic coercion — a risky path to choose, perhaps, but not necessarily a stupid one.

Second, in the rest of the post, Thompson gives interesting arguments for what views might sensibly motivate the kind of extreme action the Pentagon has taken here. But when you look at the specifics of this actual case, those motivations just don’t seem to be what’s driving things.

If the US government had genuinely just become convinced that AI companies were building something as dangerous as private nuclear weapons, what might we expect them to do? Well, they’d presumably focus on the whole AI industry, not single out the one company that was most proactive about working with them and alerting them to exactly this risk. That OpenAI or xAI, that they’re offering them looser contractual terms on business contracts, that would hardly make them safe if the concern were really that a private company could very quickly accumulate military power to rival the entire US government.

They’d presumably be thinking about rules to prevent something this explosive being built by incompetent idiots, or built in such a way that the designs get leaked to China.

They would likely propose some legislation to Congress that would enable them to handle the many other issues that this is going to create down the road.

And they would presumably try to keep AI researchers roughly on their side, not alienate them on an unprecedented scale over a relatively minor issue.

But none of that is happening. Indeed, it’s generally antithetical to current US government policy. What is happening is that one company is getting punished for rejecting the government’s proposed terms in a contract dispute. Thompson’s high-level abstract arguments about AI’s transformative power justifying government intervention, they probably make sense. I kind of agree with them. But it’s hard to find evidence that that argument is what’s motivating the government’s particular actions in this dispute with Anthropic.

Charge 3: Undemocratic

Let’s turn to the third charge: that Anthropic’s position is an undemocratic one — that by setting conditions on how the military uses its technology, Anthropic is in effect usurping a role that properly belongs to elected leaders.

The strongest version of that argument came from Palmer Luckey, the founder of military contractor Anduril. Luckey thinks the core two questions are: “Do we believe in democracy?” and “Should our military be regulated by our elected leaders, or corporate executives?”

He goes on to argue that even seemingly innocuous terms and conditions — like “you cannot target innocent civilians” — involve difficult judgement calls about what counts as a civilian, what counts as targeting, and so on. And under Anthropic’s proposed framework, Anthropic would get some say on those questions — questions which really feel like government policy decisions. Luckey sees that setup as fundamentally at odds with democratic self-government.

Luckey is pointing to a legitimate issue here. From the government’s perspective, it’s most straightforward for your military operations to be unconstrained by the opinions and moral qualms of your suppliers. Even to me, that practical worry is a potentially reasonable argument for the government to end their contract with Anthropic and look for other AI providers who are more straightforward to work with.

The story is a little bit more complicated than sometimes portrayed though. In the current contract, Anthropic couldn’t cut the military off from Claude suddenly, even if they really objected to how it was being used. And the government has now accepted the same conditions for contract termination from OpenAI that were supposedly completely intolerable from Anthropic just a month ago.

Plus, a far-sighted secretary of defence might welcome contractual barriers against certain uses of AI, not because they think they’d abuse the technology — presumably they’d trust themselves — but because they see value in guardrails that could limit a less scrupulous secretary of defence down the road.

And if we’re making appeals to democratic self-government, it’s worth noting how the public feels about this issue. A YouGov/Economist poll found that Americans are nearly twice as likely to support AI companies restricting military use of their tools as to say the military should be able to use them however it wants. The public actually wants these restrictions placed on their own government. Is it really so democratic to deny the people what they want?

But the fundamental issue here is a different one. Luckey is using the vague expression “believe in democracy” to equivocate between two entirely different things.

Yes, democracy requires that the public gets to choose its leaders and that those leaders get to make decisions about national security.
But no, it does not require that any private individual must supply their labour and their products for any purpose the government demands, on terms entirely set by the government, on pain of destruction.

As one Twitter joker put it: “remember—you should do whatever the government wants, even things you think are immoral, because otherwise you’re deciding what you do instead of the government, which is undemocratic”

Of course, the reverse is closer to the truth: being compelled to personally work on projects you oppose or face crushing government retaliation is clearly required for democracy to exist. And it’s undemocratic countries, like China and North Korea, where the state demands a right to make you offers you can’t refuse on any topic of their choice.

You don’t have to debate on their terms

There’s a common thread across all three of these charges. In each case, an abstract principle that sounds kind of reasonable gets invoked in a way that diverts attention from both common sense, and what’s actually going on:

“You wanted government involvement in AI” becomes a reason you can’t object to any government action on AI.
“Powerful states will inevitably assert control over powerful technology” becomes a reason we just have to lie down and accept whatever form that assertion takes.
“Democratic leaders should make decisions about national security” becomes a reason no person or company can ever set terms when they sell things to the government.

These questions are going to come up more and more, because AI really is becoming more powerful and governments really will need to be involved in governing it in some form or other. And as the debate goes mainstream, it could easily become a dumbed-down culture war where you’re either for government control or against it.

But caring about precisely what the government is doing and whether it’s justified isn’t hypocritical, naive, or undemocratic. People should be proud to say that they care about the specifics and are actively pushing for the ones they think would be best. And they certainly shouldn’t allow themselves to be bullied into silence by the kind of mediocre arguments we’ve seen above.

Learn more

A timeline of the Anthropic-Pentagon dispute by Justin Hendrix
Anthropic and alignment by Ben Thompson
Palmer Luckey says Silicon Valley has the Pentagon all wrong: ‘Stick to a position that this is in the hands of the people’ by Jake Angelo
Many Americans think AI companies should be able to limit how the U.S. military uses their tools by David Montgomery
Rob’s interview with Dean Ball on how AI is a huge deal — but we shouldn’t regulate it yet

The post Are Anthropic and its supporters hypocritical, naive, and anti-democratic? appeared first on 80,000 Hours.