Anthropic Maps Claude's Mind With Interpretability
Anthropic researchers use mechanistic interpretability to extract millions of interpretable features from Claude, reveal…
1 articles about 'Sparse Autoencoders'
Anthropic researchers use mechanistic interpretability to extract millions of interpretable features from Claude, reveal…