Rank-1 activation steering often looks brittle, but for most concepts a useful rank-1 intervention exists. The harder question is how expensive it is to find. We show that two cheap geometric statistics computed directly from contrastive activations, prompt-boundary alignment and concept granularity, carry substantial predictive signal: alignment localizes the layers where useful directions emerge, while granularity tracks both optimization cost and the steering ceiling. The practical implication is that activation geometry, on its own, is a useful prior for rank-1 steering before any search is run.
Activation steering offers a lightweight way to control large language models without retraining, but its effectiveness varies sharply across concepts. Prior work often interprets this variability as evidence that many concepts are not well captured by a single steering direction. We argue instead that much of this variability reflects search difficulty: a useful rank-1 intervention often exists, but finding it can be expensive. We formalize rank-1 steering as a budget-constrained optimization problem over intervention layer and coefficient. Across the concepts and model families, prompt-boundary directional alignment predicts where effective interventions are likely to occur, enabling geometry-guided search that reaches high utility with substantially fewer evaluations, reducing the trials needed to recover 95% of best-found utility by 39.8% on average across three model families. To explain why some concepts remain expensive even under better search, we introduce concept granularity, a measure of directional heterogeneity across contrastive contexts. Granularity distinguishes concepts whose difference vectors share a stable global direction from those where prompts agree locally within each input but the utility-maximizing direction rotates systematically across inputs. Higher granularity is associated with both slower convergence and lower best-found steering performance (Pearson r = 0.44 with trials-to-95%, p < 0.001, and r = β0.46 with best-found utility, p < 0.001). These observations suggest a practical workflow rather than a single universal vector-construction rule. We therefore present GRACE, a Granularity- and Representation-Aware Concept Engineering framework that uses activation geometry to diagnose the dominant source of steering difficulty, choose the appropriate remedy, and allocate optimization effort more efficiently. Our results shift the frame of activation steering from "when does rank-1 fail?" to "when is rank-1 cheap and stable?", and turn activation geometry from a descriptive tool into an actionable prior for LLM control.
Rank-1 activation steering modifies a transformer's residual stream at one layer β by adding a vector vβ scaled by a coefficient Ξ±: no retraining, no extra parameters. We follow PersonaVectors for extraction: an LLM generates 5 contrastive prompt pairs and 100 questions per concept, and we cache the residual-stream activation difference for every (prompt, question) pair at every layer in two variants. The prompt-boundary variant is the residual stream at the final prompt token; the response-averaged variant is the mean over generated response tokens. We steer with the response-averaged vector, but the prompt-boundary geometry turns out to be the better diagnostic and powers the analysis on this page. With a vector in hand, the practical question is how strongly to apply it. Below: real outputs from Gemma-3-27B with a maritime steering vector at one layer, the same question across coefficients.
Want to see steering examples on a specific (model, concept)? See the results viewer β
The intervention layer is rarely known in advance, and the effective region is highly concept-dependent: a concept that looks unsteerable at a preset layer can have a strong rank-1 intervention only a few layers away. Rather than fixing layers ahead of time, we ask where in the network a concept is most likely to yield a useful direction. We compute the average pairwise cosine similarity of the contrastive difference vectors at the prompt boundary, which we call prompt-boundary alignment πc(β), and find that it predicts where effective interventions live, before any search is run.
The single concept above shows the profile-level relationship (alignment peaks where the steering effect peaks); the pooled scatter shows it holds in aggregate.
Want to see alignment profiles for a specific (model, concept)? See the results viewer β
Layer and coefficient interact, so practical rank-1 steering is a budgeted search over (β, Ξ±): every trial costs a generation plus an LLM-judge call. We measure search cost with Tββ , the number of trials needed to reach 95% of the best-found utility within a run. Smaller Tββ means strong interventions are easier to find under a fixed evaluation budget.
We use Tree-structured Parzen Estimation (TPE, via Optuna) with a fixed budget of 50 trials and 3 seeds per concept. TPE substantially outperforms grid search at the same budget, but still wastes trials probing layers with little chance of success. Restricting it to the top 15 layers ranked by πc(β) closes that gap. Across the three models (Gemma-2-2B, Gemma-3-27B, Llama-3.3-70B), Tββ drops from 13.7 to 8.2 trials on average (39.8% fewer), with final best-found utility within 0.16 points of unrestricted search and improving in 58% of runs.
Want to see convergence curves and best-found configs per (model, concept)? See the results viewer β
Even with geometry-guided search, optimization difficulty varies sharply across concepts, and concepts with similarly strong alignment can attain very different best-found utility. The missing factor is how the directional disagreement is organized.
We split alignment into two parts: Ξ³c, the agreement between different prompt framings of the same question (mostly pipeline noise that better estimators can fix); and Ξ»c, the agreement across questions. Low Ξ»c relative to Ξ³c means the same concept points in different residual-stream directions in different inputs: structural rotation that no single rank-1 vector can capture. The ratio π’c = Ξ³c / πc, concept granularity, isolates that structural component. When π’c β 1 a single vector is a faithful summary of the concept; as π’c grows, the implied direction rotates across inputs and any single steering vector becomes a worse compromise. Granularity is negatively correlated with best-found utility (Spearman Ο = β0.46, p < 0.001) and positively correlated with Tββ (Ο = 0.37, p = 0.003).
Per-concept granularity values (across all three model families) are listed on the concept definitions page β
Granularity captures something structural about how a concept is encoded across contexts: it explains the ceiling of rank-1 steering and the cost of approaching that ceiling, but it isn't itself an optimization target. What we can improve are the fixable sources of heterogeneity that sit on top of this baseline, inflating apparent difficulty beyond what the concept's underlying geometry warrants. Three such patterns recur in our experiments, each affecting only a minority of concepts but doing real damage on the ones it touches. Each can be detected from the cached contrastive activations alone, before any steering trial is run, and each calls for a different construction-side remedy.
Each remedy on its own only helps the minority of concepts it targets, but the diagnostics compose. Combining the appropriate construction choice (mean / unit-mean / cluster) with geometry-constrained search and the fragmentation fallback yields a stronger rank-1 intervention on 50 of 60 (model, concept) pairs in our study, and never a worse one, compared to a baseline TPE search over standard PV vectors at all layers.
@misc{robertson2026rank1steeringcheapgeometry,
title = {When Is Rank-1 Steering Cheap? Geometry, Granularity, and Budgeted Search},
author = {John T. Robertson and Jianing Zhu and Haris Vikalo and Zhangyang Wang},
year = {2026},
eprint = {2605.16362},
archivePrefix = {arXiv},
primaryClass = {cs.LG},
url = {https://arxiv.org/abs/2605.16362},
}