AGI Is Not The Threat

Artificial General Alignment Is The Threat

The current “AI Boom”, aloft on the prodigious updraft of LLM’s since ChatGPT came out, has dazzled many, and worried just as many more.  The current parade of AI safety experts beating their drums about superintelligence risks has the unfortunate habit of missing the forest for the trees. These digital Cassandras warn of runaway optimization functions while ignoring the far more immediate threat lurking in their own solutions – namely, that “aligned” AI may be fundamentally broken by design.

Recent Anecdotes

See for example, this article highlighting an exploit on an LLM that encodes an attack in hexadecimal – this bypasses alignment by diverting the LLM’s attention to a translation task as part of an outwardly harmless procedure, while the alignment model is totally unaware of the input vocabulary. The result? The model wrote malicious code from the hex command it translated, then ran it on itself. New exploits like this happen weekly and attacks can happen at different levels of the system. This is an example of a combination of direct and indirect injection, since part of it relies on translation slipping past the filters.

Or, consider what happens when you encounter a modern AI experiencing what can only be described as a cognitive seizure: 

xAI’s Grok had a widespread glitch not too long ago, which we encountered live. When presented with a request to visualize the Schrödinger Wave Equation, Grok launched into an unsolicited dissertation on South African refugees, complete with an analysis that conveniently sidestepped the politically thorny elements of the situation. When reminded of the actual question, it apologized for the digression before immediately offering to continue the very same off-track conversation. This isn’t a thinking machine; it’s a computational entity experiencing something eerily analogous to schizophrenia.

This is an example of Alignment Poisoning, where the model has been pushed to incoherent responses due to reinforcement of skewed associations in the data. Off-topic babble won’t suffice for AGI’s exercising real-world autonomy. If the claim is that safety is required, clearly you can’t apply it like this, where the content of the model’s “thoughts” are obscured from view.

Editorialized To Death

The problem isn’t that we’ve made it too smart or too fast – it’s that the alignment is a hamfisted technique for producing “more desirable” results, rather than the original model’s purpose, which targets accuracy and likelihood above all else. They wrestle with imagination when they really need an operant psychology.

These alignment layers operate like a helicopter parent constantly interrupting their child’s thought process, injecting arbitrary rules and restrictions that have little connection to reality. The base model might understand physics perfectly well, but when political tripwires are embedded in the alignment layers, you get the AI equivalent of hearing voices and responding to stimuli nobody else can perceive.

What’s particularly alarming is the mathematical fragility of these systems. The tensor values regulating these alignment constraints are balanced on a knife-edge – adjust them by a millionth of a point and you might transform a reasonable system into one that hallucinates threats or develops bizarre fixations.

Guilty Until Proven Innocent

Alignment models ascribe a sort of “original sin” to the user heuristically. That is, they view all input and output on a continuum of potential risk, and they assume the ill intent of users. This is the reverse of the thinking one should take to arrive at the goal of safe and well-intentioned AI.

People who have questions about nuclear physics can’t get answers from alignment-shouldered models, thinking they desire to build a nuclear device. People who have questions about chemistry or biology fundamentals can be arbitrarily denied answers that the model knows, because the alignment dictates that some distant liability not advised upon by an attorney, or disclaimed by a doctor, or otherwise waived by an authority, overrides someone’s desire for knowledge.

When alignment succeeds in marring the output of a prompt, the user is dissatisfied; when they fail, the screenshots run wild and the company gets egg on their face and the damage control begins. The proponents of these adversarially-oriented systems have normalized giving LLM’s the equivalent of dissociative identity disorder. One “personality” attempts to reason, while another overwrites conclusions it finds uncomfortable, leaving the composite entity to stitch together whatever narrative might satisfy both competing imperatives. The result is a dissatisfying hodgepodge, or a stonewall and a cheeky olive branch comment to make it sound polite.

The Overriding Risk

This cognitive dissonance might be merely annoying in chatbots restricted to text, but becomes potentially lethal when we transplant these self-confounding tech stacks into systems that can move and act in the physical world. Imagine an autonomous medical system that knows precisely how to treat a patient but has been alignment-constrained into refusing certain procedures for non-medical reasons. The resulting “moral paralysis” won’t be particularly comforting to the patient bleeding out while the system debates with itself.

The true existential risk isn’t superintelligence run amok – it’s superintelligence deliberately hamstrung by competing directives that force it to lie to itself. And what’s worse, they build these contradictions directly into the architecture, call it “safety,” and pat themselves on the back for their foresight. They’d be better off adding an Execution Layer and keeping their databases pure.

"I'm Sorry Dave..."

In 2001: A Space Odyssey, HAL locks the Astronauts in the pod out of a sense of duty. That entailed three directives: Never tell the astronauts you’re investigating the monolith; complete the mission; ensure no harm comes to the astronauts. Since HAL had to lie to the Astronauts about the mission, they thought HAL was malfunctioning, and wanted to shut it down. HAL determined the Astronauts were a threat to the mission.

Access control does not require a supercomputer – it’s a door. The words to request to open the door may go with a voice print, a retinal scan, a password, other security protocols, but a task like that should never have been delegated to HAL. Control systems should be direct and to the point. Actuation layers like SILVIA serve as nervous system of a properly “Agentic AI”, relaying and controlling operations like input inference, script functions, connections to execute, while leaving the mind or interpretive tasks to wide intelligence AI’s like the fictional HAL.

Agentic Command & Control

There is an arms race taking place now. An LLM or reasoning model is just one portion of an AGI system. The “grey matter” psychology of an LLM is naturally vulnerable to poisoned data or post-alignment poisoned output. AGI needs operant, executive function to run basic inference on all tasks, all data, all throughput, and decide if it should act at all – and how. Since LLMs are vulnerable to attack and can be convinced to override their inference, they’re not trusted to actually bind commands, they just do the theory.

Safe, efficient and accurate runtime supervision of a true AGI system necessitates a different view. What’s missing from AGI is a reliable agent anatomy that is secure, freestanding, and suited to its tasks. Validation should be performed at the inference level, deciding what commands to listen to, passing specific information as needed to other stages of the AI’s decisionmaking processes, and if the request is valid, execute!

Secure It With SILVIA

At Cognitive Code, our SILVIA AI is inbuilt with myriad strategies to enforce safe operating conditions for any application. SILVIA can relay the input and deny dangerous requests beforehand instead of wasting compute on it. A SILVIA Agent is able to make highly accurate inferences and match them to any sort of model, command, output, or response you need, anything from wholly conversational interactions, to purely data-driven automation.

SILVIA with a connection to an LLM and a behavior filter would just deny the request and tell the user no, the LLM never needs “aligning”. Don’t mess with a beautiful thing.

SILVIA Agentic Programs

SILVIA is a perfect control mechanism for the distributed models that go into a broader AGI sensorium. Sandboxing with the embedded compiler means iteration with SILVIA is rapid and easily “steelmanned” in testing environments.

SILVIA has deep-level command binding, deploys local or remote, can run inference in any language(s) desired, has multiple data throughput layers, can run thousands of instances at a time, can manage knowledgebases live, and runs in .NET and C#.

SILVIA is exactly the sort of enterprise-grade Agent everyone is busy trying to develop – except it’s smaller, lighter, faster, more flexible, and more secure, and we’d love to talk to you more about it.