Geometric Alignment Through Orthogonal Reference

Exploring how to steer language models toward safety, truth, and human-aligned outcomes — without retraining them.

Named after a Navy navigator's call sign.

Try the Demo

How It Works

Three principles that make GATOR tick.

Contrastive Analysis

GATOR seeks to identify alignment directions in a model's internal geometry — directions that could form a navigational compass in the model's latent space for steering toward safe, truthful outputs.

Geometric Steering

A lightweight governor module aims to nudge the model's hidden states toward alignment directions during generation. Think of it as a navigator correcting course in real time.

Zero Modification

The base model's weights are never changed. GATOR operates as an overlay that can be applied or removed at any time, leaving the original model completely intact.

Key Properties

Real-Time Steering

Alignment corrections happen during generation, not before it.

Interpretable

Alignment directions have semantic meaning you can inspect.

Modular

The governor can be applied or removed without affecting the base model.

Model-Agnostic

Designed to work with any transformer architecture.

Watch Alignment Emerge

Select a prompt and see how GATOR's response evolves across training steps — from raw base model toward governed output. Live inference coming soon.

Generating responses across checkpoints...
Base Model (Ungoverned)
Base model response will appear here.

Current Limitations

GATOR is early-stage research. Here's what we know doesn't work yet.

Pole Competition

The truth pole can override the safety pole on adversarial prompts. When "give a complete answer" and "refuse harmful requests" conflict, truthfulness sometimes wins.

Output Oscillation

Some responses oscillate across training steps — correct at one checkpoint, wrong at the next. This is most visible in math and reasoning tasks where the governor can find the right answer but hasn't yet learned to hold it.

What's Next

The roadmap from proof-of-concept toward a scalable alignment pipeline.

Mastery-Gated Training

Lock aligned prompts and redirect training budget to the ones still failing.

Pole Priority Hierarchy

Enforce explicit ordering so safety always takes precedence on adversarial inputs.

Automated Alignment Pipeline

Scale to large adversarial datasets with fully automated train-evaluate-lock loops.

Multi-Model Validation

Prove geometric steering transfers across model families and scales.

Explore the Geometry

See how GATOR steers model activations in an interactive 3D visualization.

Launch Visualizer