Hi, I'm John.

I’m a first-year ECE PhD student at the University of Texas at Austin, co-advised by Dr. Haris Vikalo and Dr. Atlas Wang.

My research focuses on interpretability and AI safety, with additional interests in healthcare and computational biology. Lately I’ve been working on activation engineering: lightweight interventions on model internals that make LLMs more reliable. An early version of this work appears in the updated preprint below.

If you would like to connect, please email me using the contact button above.

John Robertson

Recent Publications.

Granularity in submission

When Is Rank-1 Steering Cheap? Geometry, Granularity, and Budgeted Search

John T. Robertson, Jianing Zhu, Haris Vikalo, Zhangyang Wang
We formalize rank-1 activation steering as a budget-constrained search over intervention layer and coefficient, and introduce concept granularity: a measure of directional heterogeneity that predicts how hard a concept is to steer. The resulting GRACE workflow makes cheaper, more reliable interventions on LLMs. In submission, 2026.
NextVir PLOS Comp Bio 2025

NextVir: Enabling Classification of Tumor-Causing Viruses with Genomic Foundation Models

John T. Robertson, Shorya Consul, Haris Vikalo
We adapt genomic foundation models with LoRA fine-tuning for viral mixture separation, achieving state-of-the-art oncoviral DNA classification. PLOS Computational Biology, 2025.
All publications

Selected Experience.

Course Assistant, Probability and Random Processes

2026
University of Texas at Austin (Instructor: Dr. Vivek Telang)
  • TA for Probability and Random Processes in Shinjuku, Japan at J. F. Oberlin University.

Graduate Researcher

2025 to Present
University of Texas at Austin (Advised by Dr. Haris Vikalo & Dr. Atlas Wang)
  • Developing interpretable machine learning methods for AI safety, with a focus on activation engineering and applications in computational biology / healthcare.
  • Spearheading multiple early works in activation steering.

AI Research Intern

2024
Kilby Labs, Texas Instruments (Advised by Dr. Arthur Redfern)
  • Sole undergraduate intern; developed two patent-pending works on efficient deep learning for edge devices.
  • TIedNet: a CNN architecture using shared weights and LoRA-like perturbations for memory-efficient image classification.
  • Conditional PTQ: a method for post-training static quantization that predicts optimal scales per sample.

Projects.

This section is under construction. Detailed project pages are coming soon.

TWIIRL: Token-Wise Interpretable Interventions via Reinforcement Learning

We reformulate activation steering as a token-level decision problem: a small GRU controller emits per-token coefficients on a fixed diffmeans direction, trained via offline preference-based RL with an explicit KL trust region. On Gemma 2 9B, TWIIRL strictly dominates fixed-coefficient steering on the concept-coherence Pareto frontier at less than 0.01% per-token overhead.

DNA-ADLM: Anchored Diffusion for DNA Inpainting

We frame DNA inpainting as constrained generation under an Anchored Diffusion Language Model: observed anchor tokens are pinned while missing positions are iteratively resampled. Built on a masked discrete diffusion backbone (MDLM-style with a DiT denoiser), pretrained on chromosome 11 with 5-mer tokenization.

Audio Spectrogram Transformer + MIL: Interpretable ALS Severity Classification

Interpretable model for ALS severity classification from speech, combining an audio spectrogram transformer with multiple instance learning. Won second place at the Speech Analysis for Neurodegenerative Diseases Grand Challenge. Accepted to IEEE ICASSP 2026 (oral); unpublished due to attendance conflicts.

All projects