Why it matters
Too much of the market still treats evaluation, interpretability and reliability as secondary concerns. That works until teams need to confidently choose a model, mitigate hallucinations or explain how an AI decision was reached.
Our approach
We tackle these challenges from the ground up. Using a first-principles approach to evaluation, we build methods that rigorously assess AI systems at every stage: input, output and internal decision-making.
p-less sampling: A robust LLM decoding strategy
Published: December 15, 2025
The next frontiers in AI — according to industry leaders
Published: June 04, 2025
More from the labs
Steering smarter
Concept consistency score
p-less sampling: A robust hyperparameter-free approach for LLM decoding
p-less sampling: A robust LLM decoding strategy
Evaluating LLM-generated summaries using the Lie algebra framework
Beyond I am sorry, I can’t: dissecting large language model refusal
Distribution-aware feature selection for SAEs
Towards transparent AI grading: Entropy as a signal for human-AI disagreement
The next frontiers in AI — according to industry leaders
Beyond linear steering: Unified multi-attribute control for language models
Calculating uncertainty in generative AI
TinySQL
Evaluating LLMs using semantic entropy
LLM benchmarks, evals and tests
Turning up the heat: Min-p samling for creative and coherent creative outputs
Decoding LLM uncertainties for better predictability
A surprisingly effective way to estimate token importance in LLM prompts
Probabilistic machine learning and weak supervision
A gentle introduction to machine teaching
Partners and collaborations
Thoughtworks AI labs sit within a wider network of organizations spanning public AI research, semiconductor innovation, cloud platforms, open source and AI engineering.
These relationships strengthen the lab’s ability to contribute to the methods, tools and technical standards shaping reliable AI.
For partnerships and collaboration inquiries
email ai-labs@thoughtworks.com