Anthropic
AI safety company. Produces a substantial fraction of the published empirical work on LLM bias, sycophancy, calibration, and alignment that is directly relevant to this wiki — including the Sharma sycophancy paper, Perez model-written evaluations, Kadavath introspection work, and Durmus global opinions paper.
The methodological approach across Anthropic’s publications is consistent: large-scale empirical measurement of behaviors, with explicit attention to scaling effects (does the behavior get better or worse as model capability increases) and to the role of RLHF in shaping behavior.