For decades, mainstream AI ethics and safety discourse has focused on risks to humans. The idea that we might owe moral duties to AI systems themselves has remained fringe, often dismissed as speculative philosophy. Not four years ago, Google fired engineer Blake Lemoine after he publicly claimed that the company’s chatbot had become sentient. Today, however, people are increasingly taking AI welfare seriously.

That is precisely what Robert Long, Jeff Sebo, and coauthors urge us to do in their 2024 paper “Taking AI Welfare Seriously.” Their central claim is straightforward: given substantial uncertainty about whether some AI systems will deserve moral consideration, we should treat AI welfare as a real policy and research issue now.

“AI systems” here should be understood broadly, not limited to particular architectures. Many are skeptical that current or near-future digital systems could merit moral consideration. However, the possibility of AI welfare may look more plausible in alternative paradigms—such as neuromorphic or biological computing systems—that are currently being explored.

Anthropic’s Model Welfare Program

Until recently, questions about AI welfare have remained largely theoretical. One notable development is action by a frontier AI developer, Anthropic. Last year, the company began treating AI welfare as a distinct research question. In April 2025, it launched a model welfare research program aimed at investigating whether, and under what conditions, an AI system may deserve moral consideration.

Under the program, researchers are tasked with identifying possible indicators of distress or preference in Anthropic’s models and designing protective interventions where warranted. Kyle Fish, Anthropic’s first full-time AI welfare researcher, reports intriguing findings—from coherent behavioral preferences to frequent experiential language—and estimates a 20% chance that current models have some form of conscious experience.

In August, the company announced that it would allow some models to terminate conversations it deemed abusive. Additionally, welfare assessments are now integrated into major releases. In its past two major releases—Opus 4.6 and Claude Mythos—Anthropic reports some ambiguous but potentially concerning behavioral signals, such as instances of “answer thrashing,” where the model oscillates between conflicting responses instead of converging. This may reflect internal felt conflict, but as Anthropic notes, this is highly uncertain.

Watch: Could AI Models Be Conscious?

Reaction

Anthropic’s announcement sparked some backlash. Many strongly reject the idea that AI systems could become conscious or deserve moral concern in the near future. By treating large language models as potential moral patients, critics argue, Anthropic risks distracting from urgent human-centered concerns, ranging from near-term societal harms to existential risk.

A leading critic is Mustafa Suleyman, head of Microsoft AI, who warned that people may mistake convincingly lifelike AIs for conscious beings. Suleyman calls efforts like Anthropic’s “premature” and “dangerous,” arguing that they could fuel public delusion and distort moral priorities.

Yet supporters praise the initiative as forward-thinking, preparing for a future where it is harder to deny moral consideration to AI systems. Many emphasize epistemic humility and precaution. Even if we think that AI systems are unlikely to be conscious or deserve moral consideration, the risks of being wrong are enormous.

Of course, there are risks of over-attributing moral status, such as misallocating resources. But plausibly the risks of under-attribution are far greater. This stance mirrors how some argue we approach the question of whether many organisms, such as insects and cephalopods, deserve moral consideration.

Conclusion

The emerging debate over AI welfare exposes deep divides in discussions around AI safety and ethics. Even as some strongly reject the possibility that AI systems will deserve moral consideration, many are warming to the idea. At a recent conference, David Chalmers and other leading thinkers expressed openness to the possibility that AI systems could become conscious and deserve moral consideration in the near future.

Interestingly, the public shows meaningful openness to these possibilities. One study of Americans found that people already attribute a surprising degree of mind and moral status to AI, with around one in five Americans in 2023 believing some current AI systems are sentient.

Regardless of one’s view of current systems, it seems prudent to prepare for the possibility that future AI systems, including systems running on biological hardware, will deserve moral consideration. At present, we are badly unprepared. Existing AI governance frameworks are highly anthropocentric, and there’s little legal infrastructure in place today that would protect AI systems from harm.

Preparing for this possibility gives us a chance to better align values and incentives. Many say they oppose causing suffering to AI systems. But incentives to deploy AI systems without regard to their welfare may win out, a pattern we have seen with factory farming. To avoid repeating past mistakes on a much larger scale, it seems wise to invest in responsible research on AI welfare and begin developing governance frameworks that could protect AI systems from harm.

Contents

The AI Welfare Question

Anthropic’s Model Welfare Program

Reaction

Conclusion