Ongoing Work

Research

Investigating the internal mechanisms of neural networks and how architectural choices shape model behavior. Two active research threads, plus ongoing paper work.

Active Research

Mesa-Optimization Probe

Apr 2026 -- Present

PyTorch TransformerLens Mechanistic Interpretability Activation Patching

Research Question

Do small transformers trained on in-context learning tasks implement gradient-descent-like optimization internally during their forward pass? If so, where in the network does this emerge, and what computational signatures distinguish a mesa-optimizer from a lookup table?

Approach

Training small transformers (1--4 layers) on synthetic in-context linear regression tasks, then using mechanistic interpretability tools to reverse-engineer the algorithms they learn. The goal is to identify whether the internal computation resembles an optimization process -- specifically gradient descent on an implicit loss -- rather than simple pattern matching or memorization.

Three-Phase Methodology

Train & Baseline

Train small transformers on in-context linear regression following von Oswald et al.'s experimental setup. Establish baseline performance against ordinary least squares and ridge regression. Vary model depth (1, 2, 4 layers) and context length.

Activation Analysis

Use TransformerLens to extract activations at every layer. Apply activation patching and causal tracing to identify which attention heads and MLPs are causally responsible for the in-context prediction. Compare activation trajectories against the weight updates that gradient descent would produce.

Mesa-Optimizer Classification

Develop quantitative criteria to distinguish mesa-optimizers from lookup tables. Test whether the model generalizes to out-of-distribution function classes (following Garg et al.'s OOD evaluation protocol). Ablation studies to identify minimal circuits required for the optimization behavior.

Key References

von Oswald, J. et al. (2023). "Transformers Learn In-Context by Gradient Descent." ICML.
Garg, S. et al. (2022). "What Can Transformers Learn In-Context? A Case Study of Simple Function Classes." NeurIPS.
Hubinger, E. et al. (2019). "Risks from Learned Optimization in Advanced Machine Learning Systems."

View on GitHub

Encoder Archaeology

Apr 2026 -- Present

BEIR Mamba Dense Retrieval RWKV Hyena

Research Question

Are bidirectional transformer encoders truly optimal for dense retrieval, or can alternative architectures (SSMs, recurrent models, long-convolution models) achieve comparable quality at lower latency? What does the latency-accuracy Pareto frontier actually look like?

Encoder Architectures Under Study

BERT

Transformer

Mamba

State-Space

RWKV

Linear RNN

Hyena

Long Convolution

RetNet

Retention

GLA

Gated Linear Attention

Three Analyses

Error Analysis

Per-query NDCG@10 breakdown across BEIR datasets. Identifying query categories where alternative encoders systematically outperform or underperform BERT. Categorizing failure modes by query length, domain, and complexity.

Pruning Sensitivity

Structured pruning at 30%, 50%, 70% sparsity levels. Measuring how retrieval quality degrades across architectures under compression. Testing whether SSMs are more robust to pruning than transformers due to their recurrent structure.

Position Sensitivity

Evaluating how each architecture handles positional information in passages. Testing retrieval quality with shuffled passages, truncated contexts, and varying document lengths. Probing whether position-free architectures (like SSMs) produce more robust embeddings for long documents.

View on GitHub

Papers

TITAN: Trustworthy Identity Tokenization and Attestation Network

Submitted

Submitted to SeCrypt 2026 (International Conference on Security and Cryptography)

End-to-end identity verification pipeline combining AI-powered document forensics, zero-knowledge proofs for private on-chain attestation, and GNN-based transaction graph analysis for anti-money laundering. The system enables KYC compliance without exposing raw identity documents to verifiers.

ZK Proofs Circom GNN KYC/AML

Emergent Features in Large Language Models

Working Draft

In progress

Investigating feature emergence patterns in LLMs across scale. Analyzing how capabilities appear discontinuously as models grow, and what mechanistic signatures precede emergent behavior. Connecting findings from the mesa-optimization probe to broader questions about what large models learn to compute.

LLMs Emergence Interpretability