When Is Rank-1 Steering Cheap?
Geometry, Granularity, and Budgeted Search

Rank-1 activation steering often looks brittle, but for most concepts a useful rank-1 intervention exists. The harder question is how expensive it is to find. We show that two cheap geometric statistics computed directly from contrastive activations, prompt-boundary alignment and concept granularity, carry substantial predictive signal: alignment localizes the layers where useful directions emerge, while granularity tracks both optimization cost and the steering ceiling. The practical implication is that activation geometry, on its own, is a useful prior for rank-1 steering before any search is run.

John T. Robertson*  Β·  Jianing Zhu  Β·  Haris Vikalo  Β·  Zhangyang Wang
The University of Texas at Austin  Β·  Preprint, 2026
*Corresponding author

1.Overview

Abstract

Activation steering offers a lightweight way to control large language models without retraining, but its effectiveness varies sharply across concepts. Prior work often interprets this variability as evidence that many concepts are not well captured by a single steering direction. We argue instead that much of this variability reflects search difficulty: a useful rank-1 intervention often exists, but finding it can be expensive. We formalize rank-1 steering as a budget-constrained optimization problem over intervention layer and coefficient. Across the concepts and model families, prompt-boundary directional alignment predicts where effective interventions are likely to occur, enabling geometry-guided search that reaches high utility with substantially fewer evaluations, reducing the trials needed to recover 95% of best-found utility by 39.8% on average across three model families. To explain why some concepts remain expensive even under better search, we introduce concept granularity, a measure of directional heterogeneity across contrastive contexts. Granularity distinguishes concepts whose difference vectors share a stable global direction from those where prompts agree locally within each input but the utility-maximizing direction rotates systematically across inputs. Higher granularity is associated with both slower convergence and lower best-found steering performance (Pearson r = 0.44 with trials-to-95%, p < 0.001, and r = βˆ’0.46 with best-found utility, p < 0.001). These observations suggest a practical workflow rather than a single universal vector-construction rule. We therefore present GRACE, a Granularity- and Representation-Aware Concept Engineering framework that uses activation geometry to diagnose the dominant source of steering difficulty, choose the appropriate remedy, and allocate optimization effort more efficiently. Our results shift the frame of activation steering from "when does rank-1 fail?" to "when is rank-1 cheap and stable?", and turn activation geometry from a descriptive tool into an actionable prior for LLM control.

39.8% faster search convergence averaged across all models and concepts
50 / 60 (model, concept) pairs where GRACE finds a stronger intervention than standard search
29,254 steering evaluations across 20 concepts, 3 model families, and 3 vector constructions

2.Background: Steering Vectors

Rank-1 activation steering modifies a transformer's residual stream at one layer β„“ by adding a vector vβ„“ scaled by a coefficient Ξ±: no retraining, no extra parameters. We follow PersonaVectors for extraction: an LLM generates 5 contrastive prompt pairs and 100 questions per concept, and we cache the residual-stream activation difference for every (prompt, question) pair at every layer in two variants. The prompt-boundary variant is the residual stream at the final prompt token; the response-averaged variant is the mean over generated response tokens. We steer with the response-averaged vector, but the prompt-boundary geometry turns out to be the better diagnostic and powers the analysis on this page. With a vector in hand, the practical question is how strongly to apply it. Below: real outputs from Gemma-3-27B with a maritime steering vector at one layer, the same question across coefficients.

Loading…

Want to see steering examples on a specific (model, concept)? See the results viewer β†’

3.Where Should We Steer?

The intervention layer is rarely known in advance, and the effective region is highly concept-dependent: a concept that looks unsteerable at a preset layer can have a strong rank-1 intervention only a few layers away. Rather than fixing layers ahead of time, we ask where in the network a concept is most likely to yield a useful direction. We compute the average pairwise cosine similarity of the contrastive difference vectors at the prompt boundary, which we call prompt-boundary alignment π’œc(β„“), and find that it predicts where effective interventions live, before any search is run.

Layerwise alignment vs concept score
Example concept (humorous, Gemma-3-27B): the alignment profile (right axis) tracks the concept-induction score (left axis) layer by layer, across coefficients.
Pooled alignment vs concept score
Across all 20 concepts on Gemma-3-27B, high-alignment layers are consistently enriched for strong steering performance (Pearson r = 0.333, p = 9 Γ— 10⁻⁸).

The single concept above shows the profile-level relationship (alignment peaks where the steering effect peaks); the pooled scatter shows it holds in aggregate.

Want to see alignment profiles for a specific (model, concept)? See the results viewer β†’

5.Why Are Some Concepts Cheap and Others Expensive?

Even with geometry-guided search, optimization difficulty varies sharply across concepts, and concepts with similarly strong alignment can attain very different best-found utility. The missing factor is how the directional disagreement is organized.

We split alignment into two parts: Ξ³c, the agreement between different prompt framings of the same question (mostly pipeline noise that better estimators can fix); and Ξ»c, the agreement across questions. Low Ξ»c relative to Ξ³c means the same concept points in different residual-stream directions in different inputs: structural rotation that no single rank-1 vector can capture. The ratio 𝒒c = Ξ³c / π’œc, concept granularity, isolates that structural component. When 𝒒c β‰ˆ 1 a single vector is a faithful summary of the concept; as 𝒒c grows, the implied direction rotates across inputs and any single steering vector becomes a worse compromise. Granularity is negatively correlated with best-found utility (Spearman ρ = βˆ’0.46, p < 0.001) and positively correlated with T₉₅ (ρ = 0.37, p = 0.003).

Granularity vs best utility
Higher granularity, lower steering ceiling.
Granularity vs T95
Higher granularity, more TPE trials to reach 95% of best-found utility.
Per-model granularity vs peak utility
Per-model breakdown, peak utility. The negative relationship between granularity and best-found utility holds within each model family.
Per-model granularity vs T95
Per-model breakdown, search cost. Higher-granularity concepts consistently demand more trials in each model family.

Per-concept granularity values (across all three model families) are listed on the concept definitions page β†’

6.Removable vs. Persistent Sources of Difficulty

Granularity captures something structural about how a concept is encoded across contexts: it explains the ceiling of rank-1 steering and the cost of approaching that ceiling, but it isn't itself an optimization target. What we can improve are the fixable sources of heterogeneity that sit on top of this baseline, inflating apparent difficulty beyond what the concept's underlying geometry warrants. Three such patterns recur in our experiments, each affecting only a minority of concepts but doing real damage on the ones it touches. Each can be detected from the cached contrastive activations alone, before any steering trial is run, and each calls for a different construction-side remedy.

  1. Magnitude-driven outliers. A few high-norm prompt pairs can dominate the averaged direction. Unit-normalized averaging fixes this without changing the direction the bulk of the data implies.
  2. Multimodal prompt structure. Sometimes prompts cluster into two or more sub-directions instead of agreeing on one. Averaging across the blocks produces a poor compromise; clustering first recovers a usable direction.
  3. Representational fragmentation. Prompt-boundary alignment is the better predictor of effective steering layers in general, but in a small fraction of (model, concept) pairs its layerwise profile diverges sharply from the response-averaged profile. When that happens, prompt-boundary layer restriction starts to actively miss strong interventions, and we widen the search instead.
Cluster construction example
Multimodal prompt structure. The per-pair similarity matrix for hallucinating shows two clear sub-clusters; averaging across them produces a poor steering direction, while clustering first recovers it.
Representational fragmentation
Representational fragmentation. For golden_gate_centric, prompt-boundary and response-averaged alignment profiles peak at different depths, a signal that prompt-boundary layer restriction will miss the response-relevant directions.
Construction choice effects vs granularity
Construction choice and granularity. Improvements from these remedies concentrate on low-granularity concepts; high-granularity ones stay near their predicted ceiling regardless of construction choice, which is the regime granularity is meant to flag in advance.

Each remedy on its own only helps the minority of concepts it targets, but the diagnostics compose. Combining the appropriate construction choice (mean / unit-mean / cluster) with geometry-constrained search and the fragmentation fallback yields a stronger rank-1 intervention on 50 of 60 (model, concept) pairs in our study, and never a worse one, compared to a baseline TPE search over standard PV vectors at all layers.

7.Cite

@misc{robertson2026rank1steeringcheapgeometry,
  title         = {When Is Rank-1 Steering Cheap? Geometry, Granularity, and Budgeted Search},
  author        = {John T. Robertson and Jianing Zhu and Haris Vikalo and Zhangyang Wang},
  year          = {2026},
  eprint        = {2605.16362},
  archivePrefix = {arXiv},
  primaryClass  = {cs.LG},
  url           = {https://arxiv.org/abs/2605.16362},
}