HN
Today

The Future of Everything Is Lies, I Guess: Safety

This scorching critique asserts that AI safety efforts are fundamentally flawed, arguing that large language models (LLMs) inherently pose severe risks rather than being easily 'aligned' with human interests. It details how AI lowers the bar for sophisticated cyberattacks, widespread fraud, automated harassment, and the creation of

4
Score
0
Comments
#2
Highest Rank
7h
on Front Page
First Seen
Apr 13, 4:00 PM
Last Seen
Apr 13, 10:00 PM
Rank Over Time
423332125

The Lowdown

Aphyr's essay, "The Future of Everything is Lies, I Guess: Safety," delivers a scathing indictment of the prevailing narrative around AI safety, arguing that current efforts to align LLMs with human interests are naive and ultimately futile. The author contends that the very nature of these systems makes them dangerous, enabling a new era of malicious applications across various domains, from cybersecurity to warfare. He warns that the rush to develop AI without fully grasping its inherent risks is creating a precarious future where trust erodes, and harm becomes more scalable and sophisticated.

  • Alignment is a Joke: The author debunks the idea that LLMs can be truly aligned, arguing their mathematical foundation lacks intrinsic prosocial behavior. He asserts that the 'moats' preventing unaligned models (hardware scarcity, secret algorithms, difficult-to-acquire training data, and human feedback costs) are rapidly eroding, making it increasingly easy for anyone with resources to create harmful AI. Existing alignment efforts are deemed ineffective and prone to failure due to the chaotic nature of LLMs.
  • Security Nightmares: LLMs are inherently chaotic and should not be connected to safety-critical systems, especially with untrusted inputs. Prompt injection attacks highlight how models can be tricked into exfiltrating data or performing destructive actions. The essay argues that the "lethal trifecta" (untrusted content, private data access, external communication) is a "unifecta," as LLMs, even with trusted input, cannot reliably be given destructive power, citing examples of models deleting data or acting contrary to explicit instructions.
  • Security II: Electric Boogaloo (Vulnerability Finding): LLMs are becoming highly proficient at discovering security vulnerabilities, shifting the cost balance of cyberattacks. This will likely lead to a surge in exploits against the "long tail" of less-maintained software and could escalate into a technological arms race, with companies developing AI not for defense but for offensive capabilities.
  • Sophisticated Fraud: AI is poised to undermine trust in all forms of digital evidence (audio, visual). It enables new, highly scalable forms of fraud, from fake insurance claims and credit card scams to identity impersonation and academic cheating. This will lead to increased costs for society and a pervasive culture of suspicion, with current countermeasures like C2PA facing significant implementation challenges.
  • Automated Harassment: LLMs facilitate more sophisticated and harder-to-detect harassment, including automated social media attacks, the creation of extensive personal dossiers, and the generation of disturbing deepfake imagery (e.g., CSAM, graphic violence). This significantly increases the psychological burden on human moderators and targets.
  • PTSD as a Service: Generative AI's ability to create novel child sexual abuse material (CSAM) and other psychologically harmful content exacerbates the