Technical AI Safety and Alignment Work
How can we build AI systems that reliably do what we intend, and how can we understand them well enough to know when they don't? We are looking to back new technical research organizations and companies advancing safety and alignment science.
Mechanistic interpretability: Understanding what is actually happening inside AI models
Scalable oversight: Methods for humans to supervise AI systems that are more capable than the humans evaluating them
AI control: Techniques to ensure powerful AI systems remain under meaningful human control even as their capabilities grow
Reinforcement learning and training for safety: Training regimes, reward signals, and curated datasets that reliably produce safe, prosocial AI behavior
Open-weight model safety: How do we make the world safe when anybody can access and modify open-weight models, potentially stripping them of any guardrails and safety features?
AI consciousness and moral status: If AI systems develop something like genuine experience or suffering, there will be profound implications for ethics and governance
Critical Cybersecurity
Attackers are already using AI to exploit security vulnerabilities. Meanwhile, new models like Claude Mythos have been deemed too dangerous to release publicly. The world’s critical systems are poorly protected and the window for action is closing quickly.
AI-native cyber defense: AI systems purpose-built for defense, trained on security-specific datasets, capable of recognizing and stopping novel attacks at machine speed.
Critical infrastructure hardening: Protecting critical software, power grids, financial networks, and communications infrastructure from AI-enabled attacks
AI agent (and multi-agent) security: Preventing individual agents from being manipulated into misusing their capabilities, and addressing the new threats that emerge when agents interact at scale
Hyper-secure AI compute: Military-grade physical and digital security for the facilities where frontier models are trained and stored
Open source intelligence and monitoring: Systematically tracking AI-relevant developments in the real world: datacenter buildouts, large training runs, AI misbehavior in the wild, and nation-state AI activity
Biosecurity and Pandemic Prevention
Advanced AI is lowering the knowledge barrier for engineering dangerous pathogens and bioweapons. The tools are growing more powerful and more accessible, our defenses remain thin, and the window for action is closing quickly.
Prevent misuse: Reducing the probability that dangerous biological capabilities are created, accessed, or misused through frontier model safeguards and evaluations, differential access to powerful biological models, and stronger governance of the supply chain in areas like DNA synthesis
Threat detection and surveillance: Enabling rapid detection, identification, attribution, and response to emerging or engineered pathogens through early warning systems, disease surveillance, and platforms for reconfigurable diagnostics
Societal defense and biohardening: Limiting the spread and impact on critical infrastructure and society after a threat emerges, through solutions that harden the built environment against biological threats (Far-UVC, air filtration) and platforms for rapid medical countermeasure design
Cross-cutting interventions: Spanning domains to enable the field through market shaping and demand generation, policy and advocacy for increased government funding and stockpiling, and research into neglected threat classes
Governance, Policy, Standards, and International Coordination
We need the legal, diplomatic, and organizational infrastructure that makes AI development accountable. We must create the standards, watchdogs, and international frameworks that turn safety from a voluntary commitment into a reliable and durable system.
Independent AI evaluators, auditors and accountability mechanisms: Rigorous third-party testing of frontier AI models before and after deployment. Independent organizations that test for dangerous capabilities and track whether frontier labs are following their safety commitments.
Compute governance: Policy and technical tools to ensure the most powerful AI hardware is prevented from reaching bad actors or being used for highly destructive purposes.
US-China AI dialogues and coordination: Track 1.5 and Track 2 diplomacy, academic exchanges, and joint research on shared risks.
International AI safety frameworks: The verification mechanisms, political coalitions, and institutional infrastructure needed for meaningful international agreements on AI risk.
Safety and security standards: Certifiable standards for AI systems that enterprises, insurers, and regulators can rely on. Think SOC 2 for AI safety.
AI insurance and risk markets: Insurance products and financial mechanisms that price AI risk accurately and create market incentives for safer development.
Sensemaking and Societal Resilience
A broader category for civilizational hardening – making societies, institutions, and infrastructure resilient to the full range of risks that advanced AI creates or amplifies.
Democratic resilience and anti-authoritarian technology: AI dramatically lowers the cost of surveillance, censorship, and social control. We need censorship circumvention tools, privacy-preserving infrastructure, and AI systems designed with democratic principles at their core.
Information integrity: Content provenance infrastructure, AI-enabled disinformation defense, and tools to detect and attribute AI-generated influence operations.
Fraud prevention and deepfake detection: Technical tools to verify the authenticity of media and prevent AI-enabled fraud, impersonation, and manipulation at scale.
Economic disruption and labor transition: Mass AI-driven unemployment is itself a pathway to catastrophic instability – one that could destabilize democracies and fuel authoritarian backlash.
Field-Building and Infrastructure
We need organizations dedicated to finding, developing, and routing exceptional talent toward the most important work – and to building the communications infrastructure the field currently lacks.
Talent pipelines into AI safety, critical cybersecurity, and biosecurity: Programs that identify exceptional people in adjacent fields and help them transition into AI safety and resilience work. We would gladly fund another Halcyon!
Recruiting infrastructure: A professional recruiting firm focused specifically on placing talent into AI safety organizations. This does not exist and the field badly needs it.
Public communications: Orgs and media that help the public understand what is happening at the AI frontier. Perhaps also a professional services firm serving as a dedicated communications partner to the organizations already doing important work.




