Advancing AI reliability

Research for trusted, scalable AI deployments

Our AI research team advances model interoperability, control and robustness. Focusing on rigorous evaluation, interpretability and decision-making, we deliver insights through open-source solutions and peer-reviewed publications.

Why it matters

Too much of the market still treats evaluation, interpretability and reliability as secondary concerns. That works until teams need to confidently choose a model, mitigate hallucinations or explain how an AI decision was reached.

Our approach

We tackle these challenges from the ground up. Using a first-principles approach to evaluation, we build methods that rigorously assess AI systems at every stage: input, output and internal decision-making.

p-less sampling: A robust LLM decoding strategy

By Phillip Howard, Parag Mahajani, Runyan Tan and Shuang Wu

Published: December 15, 2025

𝜌-less sampling improves autoregressive text generation by offering a new alternative to existing sampling techniques for LLMs.

The next frontiers in AI — according to industry leaders

By Parag Mahajani

Published: June 04, 2025

AI is evolving fast. Keynotes and events reveal the most important emerging trends and narratives shaping generative AI today.

Evaluating LLM-generated summaries using the Lie algebra framework

By Parag Mahajani and Manikandan Ravikiran

Published: October 10, 2025

We model summarization as a geometric flow, using Lie algebra to detect incompleteness via source-text contribution vectors.

Calculating uncertainty in generative AI

By Parag Mahajani and Runyan Tan

Published: May 06, 2025

Dropout estimates model uncertainty cost-effectively using prediction variance or softmax entropy while improving generalization and reducing overfitting.

Research for trusted, scalable AI deployments

Why it matters

Our approach

p-less sampling: A robust LLM decoding strategy

The next frontiers in AI — according to industry leaders

Evaluating LLM-generated summaries using the Lie algebra framework

Calculating uncertainty in generative AI

More from the labs

Anti-slopping — An innovation for rectifying LLM writing clichés

Steering smarter

Concept consistency score

p-less sampling: A robust hyperparameter-free approach for LLM decoding

p-less sampling: A robust LLM decoding strategy

Evaluating LLM-generated summaries using the Lie algebra framework

Beyond I am sorry, I can’t: dissecting large language model refusal

Distribution-aware feature selection for SAEs

Towards transparent AI grading: Entropy as a signal for human-AI disagreement

The next frontiers in AI — according to industry leaders

Beyond linear steering: Unified multi-attribute control for language models

Calculating uncertainty in generative AI

TinySQL

Evaluating LLMs using semantic entropy

LLM benchmarks, evals and tests

Turning up the heat: Min-p samling for creative and coherent creative outputs

Decoding LLM uncertainties for better predictability

A surprisingly effective way to estimate token importance in LLM prompts

Probabilistic machine learning and weak supervision

A gentle introduction to machine teaching

Partners and collaborations