Ranjay Krishna

I teach machines to see like people and interact with people. As modern machines struggle to fully conceptualize the visual world, my research bootstraps machine learning using frameworks from behavioral and social sciences.

Bio: Ranjay Krishna is an Assistant Professor at the Allen School of Computer Science & Engineering. He co-directs the RAIVN lab at UW and is a member of technical staff at Microsoft SuperIntelligence. Prior to this, he directed the multimodal and embodied AI team at the Allen Institute. His work has been recognized with an NSF Career Award '26 and as one of MIT Technology Review's 35 under 35 Asia Pacific '25. He holds a bachelor's degree in Electrical & Computer Engineering and in Computer Science from Cornell University, a master's degree in Computer Science from Stanford University and a Ph.D. in Computer Science from Stanford University.

His research lies at the intersection of computer vision, natural language processing, robotics, and human computer interaction. This research has received best paper honorable mentions at CVPR'25 and CSCW'23, best paper award nominee at CVPR'26, outstanding paper at NeurIPS'21 and ACL'21, and dozens of orals at CVPR, ACL, CSCW, NeurIPS, UIST, ICCV and ECCV, and has been reported by Science, Forbes, the Wall Street Journal, and PBS NOVA. His research has been supported by Google, Apple, Ai2, Amazon, SRC, Sony, Samsung, Cisco, Toyota Motor Inc, Toyota Research Institute, NSF, ONR, and Yahoo.

RECENT PAPER HIGHLIGHTS

[Jul 2026] Our DiScoFormer paper received Spotlight award at ICML 2026, awarded to top 2.2% of submissions.
[Jun 2026] Our Molmo2 paper was nominated for Best Paper award at CVPR 2026, awarded to top 74 submissions.
[Jun 2026] Our Agile Deliberation paper received Highlight award at CVPR 2026, awarded to top 5% of submissions.
[Jun 2026] Our VideoNet paper received Highlight award at CVPR 2026, awarded to top 5% of submissions.
[Oct 2025] Our Latte paper received Oral award at EMNLP 2025, awarded to top 1.5% of submissions.
[Oct 2025] Our TrajVIT paper received Highlight award at ICCV 2025, awarded to top 5% of submissions.
[Jun 2025] Our Molmo paper received Best Paper Honorable Mention at CVPR 2025.
[Jun 2025] Our Molmo paper will appear as a Oral at CVPR 2025, awarded to top 0.7% of submissions.
[Apr 2025] Our interleaved scene graph paper will appear as a Spotlight at ICLR 2025, awarded to top 5% of submissions

RECENT TALKS

[July 2026] Keynote talk at WWW 2026 workshop on Agents for Recommendations & Online Marketplaces
[Jun 2026] Public talk at Technology Alliance Discovery Series titled "What does it mean for a machine to understand the world through sight?"
[Jun 2026] Keynote talk at CVPR'26 workshop on Multi-Modal Reasoning for AI Agents titled "It is Time to Rethink Grounding"
[Jun 2026] Keynote talk at CVPR'26 workshop on Multimodal Learning and Applications titled "One Model Enabling Robotics, Computer Use and Motion Modeling"
[Jun 2026] Keynote talk at CVPR'26 workshop on Visual Concepts titled "Why does it still suck? What is going on inside a vision-language model?"
[Jun 2026] Keynote talk at CVPR'26 workshop on Unified Robotic Vision with Cross-Modal Sensing and Alignment titled "Reasoning models for Robotics"
[Jun 2026] Keynote talk at CVPR'26 workshop on Embodied Reasoning in Action titled "Making Open Robotics Reasoning Models"
[Jun 2026] Keynote talk at CVPR'26 workshop on Evaluation of Generative Foundation Model titled "Updating generative evaluations to meet modern demands"
[Jun 2026] Keynote talk at CVPR'26 workshop on 3D-LLM/VLA titled "Zero-shot Sim2Real is possible for Manipulaton (Introducing MolmoSpaces & MolmoBot)"
[Jun 2026] Keynote talk at CVPR'26 workshop on Test-time Scaling for Computer Vision titled "Visual Reasoning by Simulating Future Steps"
[Jun 2026] Keynote talk at CVPR'26 workshop on Transformers for Vision titled "Architectures for Grounded Visual Reasoning (Introducing MolmoPoint & MolmoMotion)"
[Jun 2026] Keynote at CVPR'26 workshop on Emerging Directions in Data for Multimodal Foundation Models titled"Truly Open, Reproducible Video-Language Models"
[Jun 2026] Keynote talk at CVPR'26 workshop on AI for Visual Arts titled "User Controllability through Models, Interfaces Interventions"
[Jun 2026] Keynote at CVPR'26 workshop on Multimodal Spatial Intelligence titled "Visual Reasoning is not working and we don’t know why"
[Jun 2026] Keynote at CVPR'26 workshop on Agentic AI for Visual Media titled "From Vision to Action: Extracting Structure and Agency from Flat Pixels"
[Jun 2026] Keynote at CVPR'26 workshop on Multimodal Alignment for a Pluralistic Society titled "Multilingual Pluralism through Dataset Interventions"

ACademic Publications

MolmoSpaces: Large-Scale Open Ecosystem for Robot Manipulation and Navigation
Yejin Kim*, Wilbert Pumacay*, Omar Rayyan*, Max Argus*, Winson Han, Eli VanderBilt, Jordi Salvador, Abhay Deshpande, Rose Hendrix, Snehal Jauhri, Shuo Liu, Nur Muhammad Mahi Shafiullah, Maya Guru, Ainaz Eftekhar, Karen Farley, Donovan Clay, Jiafei Duan, Arjun Guru, Piper Wolters, Alvaro Herrasti, Ying-Chun Lee, Georgia Chalvatzaki, Yuchen Cui, Ali Farhadi, Dieter Fox, Ranjay Krishna
RSS 2026
[pdf] [blog] [demo] [benchmark] [datasets] [code]

DiScoFormer: Plug-In Density and Score Estimation with Transformers
Vasily Ilin, Petr Sushko, Ranjay Krishna
ICML 2026 [ICML Spotlight awarded to top 2.2% of submissions]
[pdf]

Spurious Rewards: Rethinking Training Signals in RLVR
Rulin Shao, Shuyue Stella Li, Rui Xin, Scott Geng, Yiping Wang, Sewoong Oh, Simon Shaolei Du, Nathan Lambert, Sewon Min, Ranjay Krishna, Yulia Tsvetkov, Hannaneh Hajishirzi, Pang Wei Koh, Luke Zettlemoyer
ICML 2026
[pdf] [code]

Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding
Christopher Clark, Jieyu Zhang, Zixian Ma, Jae Sung Park, Rohun Tripathi, Sangho Lee, Mohammadreza Salehi, Jason Ren, Chris Dongjoo Kim, Yinuo Yang, Vincent Shao, Yue Yang, Weikai Huang, Ziqi Gao, Taira Anderson, Jianrui Zhang, Jitesh Jain, George Stoica, Ali Farhadi, Ranjay Krishna
CVPR 2026 [CVPR Oral awarded to top 0.7% of submissions]
[pdf] [blog] [video] [datasets] [code] [models]

VideoNet: A Large-Scale Dataset for Domain-Specific Action Recognition
Tanush Yadav, Mohammadreza Salehi, Jae Sung Park, Vivek Ramanujan, Hannaneh Hajishirzi, Yejin Choi, Ali Farhadi, Rohun Tripathi, Ranjay Krishna
CVPR 2026 [CVPR Highlight awarded to top 5% of submissions]
[pdf] [website]

Agile Deliberation: Concept Deliberation for Subjective Visual Classification
Leijie Wang, Otilia Stretcu, Wei Qiao, Thomas Denby, Krishnamurthy Viswanathan, Enming Luo, Chun-Ta Lu, Tushar Dogra, Ranjay Krishna, Ariel Fuxman
CVPR 2026 [CVPR Highlight awarded to top 5% of submissions]
[pdf]

CapNav: Benchmarking Vision Language Models on Capability-conditioned Indoor Navigation
Xia Su, Ruiqi Chen, Benlin Liu, Jingwei Ma, Zonglin Di, Ranjay Krishna, Jon E. Froehlich
CVPR 2026
[pdf] [code]

OlmoEarth: Stable Latent Image Modeling for Multimodal Earth Observation
Henry Herzog, Favyen Bastani, Yawen Zhang, Gabriel Tseng, Joseph Redmon, Hadrien Sablon, Ryan Park, Jacob Morrison, Alexandra Buraczynski, Karen Farley, Joshua Hansen, Andrew Howe, Patrick Alan Johnson, Mark Otterlee, Ted Schmitt, Hunter Pitelka, Stephen Daspit, Rachel Ratner, Christopher Wilhelm, Sebastian Wood, Mike Jacobi, Hannah Kerner, Evan Shelhamer, Ali Farhadi, Ranjay Krishna, Patrick Beukema
CVPR 2026
[pdf] [website] [blog] [code] [models]

Synthetic Object Compositions for Scalable and Accurate Learning in Detection, Segmentation, and Grounding
Weikai Huang, Jieyu Zhang, Taoyang jia, Chenhao Zheng, Ziqi Gao, Jae Sung Park, Ranjay Krishna
CVPR 2026
[pdf] [code] [datasets]

TrajTok: Learning Trajectory Tokens enables better Video Understanding
Chenhao Zheng, Jieyu Zhang, Jianing Zhang, Weikai Huang, Ashutosh Kumar, Quan Kong, Oncel Tuzel, Chun-Liang Li, Ranjay Krishna
CVPR 2026
[pdf]

Mull-Tokens: Modality-Agnostic Latent Thinking
Arijit Ray, Ahmed Abdelkader, Chengzhi Mao, Bryan A. Plummer, Kate Saenko, Ranjay Krishna, Leonidas Guibas, Wen-Sheng Chu
CVPR 2026 Findings
[pdf] [website] [models] [code]

MolmoAct: Action Reasoning Models that can Reason in Space
Jason Lee*, Jiafei Duan*, Haoquan Fang*, Yuquan Deng, Shuo Liu, Boyang Li, Bohan Fang, Jieyu Zhang, Yi Ru Wang, Sangho Lee, Winson Han, Wilbert Pumacay, Angelica Wu, Rose Hendrix, Karen Farley, Eli VanderBilt, Ali Farhadi, Dieter Fox, Ranjay Krishna
ICRA 2026
[pdf] [website] [video] [data] [code]

RoboEval: Where Robotic Manipulation Meets Structured and Scalable Evaluation
Yi Ru Wang, Carter Ung, Christopher Tan, Grant Tannert, Jiafei Duan, Josephine Li, Anh Le, Rishabh Oswal, Markus Grotz, Wilbert Pumacay, Yuquan Deng, Ranjay Krishna, Dieter Fox, Siddhartha Srinivasa
ICRA 2026
[pdf] [website] [code]

The One RING: a Robotic Indoor Navigation Generalist
Ainaz Eftekhar, Rose Hendrix, Luca Weihs, Jiafei Duan, Ege Caglar, Jordi Salvador, Alvaro Herrasti, Winson Han, Eli VanderBilt, Aniruddha Kembhavi, Ali Farhadi, Ranjay Krishna, Kiana Ehsani,* Kuo-Hao Zeng*
ICRA 2026
[pdf] [website]

Unfolding Spatial Cognition: Evaluating Multimodal Models on Visual Simulations
Linjie Li, Mahtab Bigverdi, Jiawei Gu, Zixian Ma, Yinuo Yang, Ziang Li, Yejin Choi, Ranjay Krishna
ICLR 2026
[pdf]

MindCube: Spatial Mental Modeling Capability from Limited Views
Qineng Wang, Baiqiao Yin, Pingyue Zhang, Jianshu Zhang, Kangrui Wang, Zihan Wang, Jieyu Zhang, Keshigeyan Chandrasegaran, Han Liu, Ranjay Krishna, Saining Xie, Manling Li, Jiajun Wu, Li Fei-Fei
ICLR 2026
[pdf]

Theory of Space: Can Foundation Models Construct Spatial Beliefs Through Active Perception?
Pingyue Zhang, Zihan Huang, Yue Wang, Jieyu Zhang, Letian Xue, Zihan Wang, Qineng Wang, Keshigeyan Chandrasegaran, Ruohan Zhang, Yejin Choi, Ranjay Krishna, Jiajun Wu, Li Fei-Fei, Manling Li
ICLR 2026
[website] [pdf]

AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning
Mingyang Song, Haoyu Sun, Jiawei Gu, Linjie Li, Ranjay Krishna, Yu Cheng
ICLR 2026
[pdf]

Generate Any Scene: Scene Graph Driven Data Synthesis for Visual Generation Training
Ziqi Gao, Weikai Huang, Jieyu Zhang, Aniruddha Kembhavi, Ranjay Krishna
ICLR 2026
[pdf]

TrustGen: A Platform of Dynamic Benchmarking on the Trustworthiness of Generative Foundation Models
Huang et al.
ICLR 2026
[pdf]

ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning
Jiawei Gu, Yunzhuo Hao, Huichen Will Wang, Linjie Li, Michael Qizhe Shieh, Yejin Choi, Ranjay Krishna, Yu Cheng
ICLR 2026
[pdf]

Towards Acyclic Preference Evaluation of Language Models via Multiple Evaluators
Zhengyu Hu, Jieyu Zhang, Zhihan Xiong, Alexander Ratner, Kaize Ding, Ranjay Krishna
AAAI 2026
[pdf]

Scale Can’t Overcome Pragmatics: The Impact of Reporting Bias on Vision-Language Reasoning
Amita Kamath, Jack Hessel, Jena Hwang, Kai-Wei Chang, Ranjay Krishna
TACL 2026
[pdf]

RDoFlow: Automatically assessing under-specified statistical analyses in HCI
Madeleine Grunde-McLaughlin, Weixuan Liu, Ria Patil, Nino Migineishvili, Emily Reif, Ranjay Krishna, Daniel S. Weld, Jeffrey Heer
IUI 2026
[pdf]

2025

Convergent Functions, Divergent Forms
Hyeonseong Jeon, Ainaz Eftekhar, Aaron Walsman, Kuo-Hao Zeng, Ali Farhadi, Ranjay Krishna
NeurIPS 2025
[pdf] [website] [code]

Reinforcing Visual State Reasoning for Multi-Turn VLM Agents
Kangrui Wang, Pingyue Zhang, Zihan Wang, Yaning Gao, Linjie Li, Qineng Wang, Hanyang Chen, Yiping Lu, Zhengyuan Yang, Lijuan Wang, Ranjay Krishna, Jiajun Wu, Li Fei-Fei, Yejin Choi, Manling Li
NeurIPS 2025
[pdf] [website] [docs] [code]

Seeking and Updating with Live Visual Knowledge
Mingyang Fu, Yuyang Peng, Dongping Chen, Zetong Zhou, Benlin Liu, Yao Wan, Zhou Zhao, Philip S. Yu, Ranjay Krishna
NeurIPS 2025
[pdf] [website] [code] [data]

MedicalNarratives: Connecting Medical Vision and Language with Localized Narratives
Wisdom O. Ikezogwo, Kevin Zhang, Mehmet Saygin Seyfioglu, Fatemeh Ghezloo, Linda Shapiro, Ranjay Krishna
NeurIPS 2025
[pdf] [website]

LATTE: Learning to Think with Vision Specialists
Zixian Ma, Jianguo Zhang, Zhiwei Liu, Jieyu Zhang, Juntao Tan, Manli Shu, Juan Carlos Niebles, Shelby Heinecke, Huan Wang, Caiming Xiong, Ranjay Krishna, Silvio Savarese
EMNLP 2025 [EMNLP Oral awarded to top 1.5% of submissions]
[pdf]

Wait, Do We Really Need to "Wait"? Towards Training-Free Efficient Reasoning in R1-style Models
Chenlong Wang, Yuanning Feng, Dongping Chen, Zhaoyang Chu, Ranjay Krishna, Tianyi Zhou
EMNLP 2025
[pdf]

GraspMolmo: Generalizable Task-Oriented Grasping via Large-Scale Synthetic Data Generation
Abhay Deshpande, Yuquan Deng, Arijit Ray, Jordi Salvador, Winson Han, Jiafei Duan, Kuo-Hao Zeng, Yuke Zhu, Ranjay Krishna, Rose Hendrix
CoRL 2025
[pdf]

ManiFlow: A Dexterous Manipulation Policy using Flow Matching
Ge Yan, Jiyue Zhu, Yuquan Deng, Shiqi Yang, Ri-Zhao Qiu, Xuxin Cheng, Marius Memmel, Ranjay Krishna, Ankit Goyal, Xiaolong Wang, Dieter Fox
CoRL 2025
[pdf] [website]

MultiRef: Controllable Image Generation with Multiple Visual References
Ruoxi Chen, Dongping Chen, Siyuan Wu, Sinan Wang, Shiyun Lang, Petr Sushko, Gaoyang Jiang, Yao Wan, Ranjay Krishna
ACM MM 2025
[pdf] [website] [data] [benchmark]

Visual Representations inside the Language Model
Benlin Liu, Amita Kamath, Madeleine Grunde-McLaughlin, Winson Han, Ranjay Krishna
CoLM 2025
[pdf]

The Delta Learning Hypothesis: Preference Tuning on Weak Data can Yield Strong Gains
Scott Geng, Hamish Ivison, Chun-Liang Li, Maarten Sap, Jerry Li, Ranjay Krishna, Pang Wei Koh
CoLM 2025
[pdf]

SAT: Dynamic Spatial Aptitude Training for Multimodal Language Models
Arijit Ray, Jiafei Duan, Ellis L Brown II, Reuben Tan, Dina Bashkirova, Rose Hendrix, Kiana Ehsani, Aniruddha Kembhavi, Bryan A. Plummer, Ranjay Krishna, Kuo-Hao Zeng, Kate Saenko
CoLM 2025
[pdf]

One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory
Chenhao Zheng, Jieyu Zhang, Mohammadreza Salehi, Ziqi Gao, Vishnu Iyengar, Norimasa Kobori, Quan Kong, Ranjay Krishna
ICCV 2025 [ICCV Highlight awarded to top 5% of submissions]
[pdf]

Contrastive Flow Matching
George Stoica, Vivek Ramanujan*, Xiang Fan*, Ranjay Krishna, Judy Hoffman
ICCV 2025
[pdf]

PathFinder: A Multi-Modal Multi-Agent System for Medical Diagnostic Decision-Making Applied to Histopathology
Mehmet Saygin Seyfioglu*, Fatemeh Ghezloo*, Rustin Soraki*, Wisdom O. Ikezogwo*, Beibin Li*, Tejoram Vivekanandan, Joann G. Elmore, Ranjay Krishna, Linda Shapiro
ICCV 2025
[pdf] [website]

CoSyn: Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation
Yue Yang*, Ajay Patel*, Matt Deitke, Tanmay Gupta, Luca Weihs, Andrew Head, Mark Yatskar, Chris Callison-Burch, Ranjay Krishna, Aniruddha Kembhavi, Christopher Clark
ACL 2025
[pdf] [data] [code] [website]

SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation
Haoquan Fang, Markus Grotz, Wilbert Pumacay, Yi Ru Wang, Dieter Fox, Ranjay Krishna, Jiafei Duan
ICML 2025
[pdf] [benchmark] [code] [website] [mentioned in AI Index]

Unsettling the Hegemony of Intention: Agonistic Image Generation
Andre Ye, Andrew Shaw, Ranjay Krishna, Amy Zhang
Faact 2025
[pdf]

Improving Interpersonal Communication by Simulating Audiences with Language Models
Ryan Liu, Howard Yen, Raja Marjieh, Thomas L. Griffiths, Ranjay Krishna
CogSci 2025
[pdf] [code]

Perception Tokens Enhance Visual Reasoning in Multimodal Language Models
Mahtab Bigverdi, Zelun Luo, Cheng-Yu Hsieh, Ethan Shen, Dongping Chen, Linda G. Shapiro, Ranjay Krishna
CVPR 2025
[pdf] [website]

Synthetic Visual Genome
Jae Sung Park, Zixian Ma, Linjie Li, Chenhao Zheng, Cheng-Yu Hsieh, Ximing Lu, Khyathi Chandu, Quan Kong, Norimasa Kobori, Ali Farhadi, Yejin Choi, Ranjay Krishna
CVPR 2025
[pdf] [website] [dataset] [model] [code]

Eval3D: Interpretable and Fine-grained Evaluation for 3D Generation
Shivam Duggal, Yushi Hu, Oscar Michel, Aniruddha Kembhavi, William T. Freeman, Noah A. Smith, Ranjay Krishna, Antonio Torralba, Ali Farhadi, Wei-Chiu Ma
CVPR 2025
[pdf]

RealEdit: Reddit Edits As a Large-scale Empirical Dataset for Image Transformations
Petr Sushko, Ayana Bharadwaj, Zhi Yang Lim, Vasily Ilin, Ben Caffee, Dongping Chen, Mohammadreza Salehi, Cheng-Yu Hsieh, Ranjay Krishna
CVPR 2025
[pdf]

NVILA: Efficient Frontier Visual Language Models
Zhijian Liu, Ligeng Zhu, Baifeng Shi, Zhuoyang Zhang, Yuming Lou, Shang Yang, Haocheng Xi, Shiyi Cao, Yuxian Gu, Dacheng Li, Xiuyu Li, Haotian Tang, Yunhao Fang, Yukang Chen, Cheng-Yu Hsieh, De-An Huang, An-Chieh Cheng, Jinyi Hu, Sifei Liu, Ranjay Krishna, Pavlo Molchanov, Jan Kautz, Hongxu Yin, Song Han, Yao Lu
CVPR 2025
[pdf] [website] [code] [demo]

One Diffusion to Generate Them All
Duong H. Le, Tuan Pham, Sangho Lee, Christopher Clark, Aniruddha Kembhavi, Stephan Mandt, Ranjay Krishna, Jiasen Lu
CVPR 2025
[pdf] [code]

Coarse Correspondences Boost Spatial-Temporal Reasoning in Multimodal Language Model
Benlin Liu, Yiqin Wang, Yuhao Dong, Yongming Rao, Yansong Tang, Wei-Chiu Ma, Ranjay Krishna
CVPR 2025
[pdf] [website]

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Ai2 + UW
CVPR 2025 [CVPR Oral awarded to top 0.7% of submissions]
[pdf] [live demo]

Semantic and Expressive Variations in Image Captions Across Languages
Andre Ye, Sebastin Santy, Jena D. Hwang, Amy X. Zhang, Ranjay Krishna
CVPR 2025
[pdf]

Interleaved Scene Graph for Interleaved Text-and-Image Generation Assessment
Dongping Chen, Ruoxi Chen, Shu Pu, Zhaoyi Liu, Yanru Wu, Caixi Chen, Benlin Liu, Yue Huang, Yao Wan, Pan Zhou, Ranjay Krishna
ICLR 2025 [ICLR Spotlight awarded to top 5% of submissions]
[pdf] [website] [code]

AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic
Manipulation
Jiafei Duan, Wilbert Pumacay, Nishanth Kumar, Yi Ru Wang, Shulin Tian, Wentao Yuan, Ranjay Krishna, Dieter Fox, Ajay Mandlekar, Yijie Guo
ICLR 2025
[pdf] [website]

Self-Enhancing Video Data Management System for Compositional Events with Large Language Models
Enhao Zhang, Nicole Sullivan, Brandon Haynes, Ranjay Krishna, Magdalena Balazinska
SIGMOD 2025
[pdf]

DreamSync: Aligning Text-to-Image Generation with Image Understanding Feedback
Jiao Sun, Deqing Fu, Yushi Hu, Su Wang, Royi Rassin, Da-Cheng Juan, Dana Alon, Charles Herrmann, Sjoerd van Steenkiste, Ranjay Krishna, Cyrus Rashtchian
NAACL 2025
[pdf]

Designing LLM Chains by Adapting Techniques from Crowdsourcing Workflows
Madeleine Grunde-McLaughlin, Michelle S. Lam, Ranjay Krishna, Daniel S. Weld, Jeffrey Heer
TOCHI 2025
[pdf]

2024

Task Me Anything
Jieyu Zhang, Weikai Huang, Zixian Ma, Oscar Michel, Dong He, Tanmay Gupta, Wei-Chiu Ma, Ali Farhadi, Aniruddha Kembhavi, Ranjay Krishna
NeurIPS 2024
[pdf] [website] [UI] [code]

NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples
Baiqi Li, Zhiqiu Lin, Wenxuan Peng, Jean de Dieu Nyandwi, Daniel Jiang, Zixian Ma, Simran Khanuja, Ranjay Krishna, Graham Neubig, Deva Ramanan
NeurIPS 2024
[pdf] [website]

ActionAtlas: A VideoQA Benchmark for Fine-grained Action Recognition
Mohammadreza Salehi, Jae Sung Park, Aditya Kusupati, Ranjay Krishna, Yejin Choi, Hannaneh Hajishirzi, Ali Farhadi
NeurIPS 2024
[pdf]

Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models
Yushi Hu*, Weijia Shi*, Xingyu Fu, Dan Roth, Mari Ostendorf, Luke Zettlemoyer, Noah A Smith*, Ranjay Krishna*
NeurIPS 2024
[pdf] [website] [code]

Multilingual Diversity Improves Vision-Language Representations
Thao Nguyen, Matthew Wallingford, Sebastin Santy, Wei-Chiu Ma, Sewoong Oh, Ludwig Schmidt, Pang Wei Koh, Ranjay Krishna
NeurIPS 2024 [NeurIPS Spotlight awarded to top 5% of submissions]
[pdf]

The Unmet Promise of Synthetic Training Images: Using Retrieved Real Images Performs Better
Scott Geng, Cheng-Yu Hsieh, Vivek Ramanujan, Matthew Wallingford, Chun-Liang Li, Pang Wei Koh*, Ranjay Krishna*
NeurIPS 2024
[pdf]

Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass
Ethan Shen, Alan Fan, Sarah M Pratt, Jae Sung Park, Matthew Wallingford, Sham M. Kakade, Ari Holtzman, Ranjay Krishna, Ali Farhadi, Aditya Kusupati
NeurIPS 2024
[pdf]

Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps
Yung-Sung Chuang, Linlu Qiu, Cheng-Yu Hsieh, Ranjay Krishna, Yoon Kim, James R. Glass
EMNLP 2024
[pdf] [blog 1] [blog 2] [video]

Is C4 Dataset Enough for Pruning? An Investigation of Calibration Data for LLM Pruning
Abhinav Bandari, Lu Yin, Cheng-Yu Hsieh, AJAY KUMAR JAISWAL, Tianlong Chen, Li Shen, Ranjay Krishna, Shiwei Liu
EMNLP 2024
[pdf]

ImageInWords: Unlocking Hyper-Detailed Image Descriptions
Roopal Garg, Andrea Burns, Burcu Karagol Ayan, Yonatan Bitton, Ceslee Montgomery, Yasumasa Onoe, Andrew Bunner, Ranjay Krishna, Jason Baldridge, Radu Soricut
EMNLP 2024
[pdf] [website] [code]

Manipulate-Anything: Automating Real-World Robots using Vision-Language Models
Jiafei Duan, Wentao Yuan, Wilbert Pumacay, Yi Ru Wang, Kiana Ehsani, Dieter Fox, Ranjay Krishna
CoRL 2024
[pdf] [website]

RoboPoint: A Vision-Language Model for Spatial Affordance Prediction for Robotics
Wentao Yuan, Jiafei Duan, Valts Blukis, Wilbert Pumacay, Ranjay Krishna, Adithyavairavan Murali, Arsalan Mousavian, Dieter Fox
CoRL 2024
[pdf]

I Can Tell What I am Doing: Toward Real-World Natural Language Grounding of Robot Experiences
Zihan Wang, Brian Liang, Varad Dhat, Nick Walker, Zander Brumbaugh, Ranjay Krishna, Maya Cakmak
CoRL 2024
[pdf]

EVE: Enabling Anyone to Train Robots using Augmented Reality
Jun Wang, Chun-Cheng Chang, Jiafei Duan, Dieter Fox, Ranjay Krishna
UIST 2024
[pdf]

BLINK: Multimodal Large Language Models Can See but Not Perceive
Xingyu Fu, Yushi Hu, Bangzheng Li, Yu Feng, Haoyu Wang, Xudong Lin, Dan Roth, Noah A. Smith, Wei-Chiu Ma, Ranjay Krishna
ECCV 2024
[pdf] [website] [code] [dataset] [eval]

Videoshop: Localized Semantic Video Editing with Noise-Extrapolated Diffusion Inversion
Xiang Fan, Anand Bhattad, Ranjay Krishna
ECCV 2024
[pdf] [website] [code]

m&m's: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks
Zixian Ma, Weikai Huang, Jieyu Zhang, Tanmay Gupta, Ranjay Krishna
ECCV 2024
[pdf] [huggingface] [code]

SPARO: Selective Attention for Robust and Compositional Transformer Encodings for Vision
Ankit Vani, Bac Nguyen, Samuel Lavoie, Ranjay Krishna, Aaron Courville
ECCV 2024
[pdf] [code]

The Hard Positive Truth about Vision-Language Compositionality
Amita Kamath, Cheng-Yu Hsieh, Kai-Wei Chang, Ranjay Krishna
ECCV 2024
[pdf]

Efficient Inference of Vision Instruction-Following Models with Elastic Cache
Zuyan Liu, Benlin Liu, Jiahui Wang, Yuhao Dong, Guangyi Chen, Jiwen Lu, Ranjay Krishna, Yongming Rao
ECCV 2024
[pdf] [code]

Found in the middle: Calibrating Positional Attention Bias Improves Long Context Utilization
Cheng-Yu_Hsieh, Yung-Sung Chuang, Chun-Liang Li, Zifeng Wang, Long Le, Abhishek Kumar, James R. Glass, Alexander Ratner, Chen-Yu Lee, Ranjay Krishna*, Tomas Pfister*
ACL Findings 2024
[pdf]

The Colosseum: A Benchmark for Evaluating Generalization for Robotic Manipulation
Wilbert Pumacay*, Ishika Singh*, Jiafei Duan*, Ranjay Krishna, Jesse Thomason, Dieter Fox
RSS 2024
[pdf] [project] [code] [website]

Training Language Model Agents without Modifying Language Models
Shaokun Zhang, Jieyu Zhang, Jiale Liu, Linxin Song, Chi Wang, Ranjay Krishna, Qingyun Wu
ICML 2024
[pdf] [code] [blog]

Iterated Learning Improves Compositionality in Large Vision-Language Models
Chenhao Zheng, Jieyu Zhang, Aniruddha Kembhavi, Ranjay Krishna
CVPR 2024
[pdf] [code] [website] [video]

Modeling Collaborator: Enabling Subjective Vision Classification With Minimal Human Effort via LLM Tool-Use
Imad Eddine Toubal, Aditya Avinash, Neil Gordon Alldrin, Jan Dlabal, Wenlei Zhou, Enming Luo, Otilia Stretcu, Hao Xiong, Chun-Ta Lu, Howard Zhou, Ranjay Krishna, Ariel Fuxman, Tom Duerig
CVPR 2024
[pdf]

Holodeck: Language Guided Generation of 3D Embodied AI Environments
Yue Yang, Fan-Yun Sun, Luca Weihs, Eli VanderBilt, Alvaro Herrasti, Winson Han, Jiajun Wu, Nick Haber, Ranjay Krishna, Lingjie Liu, Chris Callison-Burch, Mark Yatskar, Aniruddha Kembhavi, Christopher Clark
CVPR 2024
[pdf] [code] [website]

Quilt-LLaVA: Visual Instruction Tuning by Extracting Localized Narratives from Open-Source Histopathology Videos
Mehmet Saygin Seyfioglu, Wisdom O. Ikezogwo, Fatemeh Ghezloo, Ranjay Krishna, Linda Shapiro
CVPR 2024
[pdf] [code] [website] [data]

Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World
Kiana Ehsani, Tanmay Gupta, Rose Hendrix, Jordi Salvador, Luca Weihs, Kuo-Hao Zeng, Kunal Pratap Singh, Yejin Kim, Winson Han, Alvaro Herrasti, Ranjay Krishna, Dustin Schwenk, Eli VanderBilt, Aniruddha Kembhavi
CVPR 2024
[pdf] [code and data] [website]

Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models
Yushi Hu, Otilia Stretcu, Chun-Ta Lu, Krishnamurthy Viswanathan, Kenji Hata, Enming Luo, Ranjay Krishna, Ariel Fuxman
CVPR 2024 [CVPR Oral awarded to top 0.7% of submissions]
[pdf] [website]

Selective Visual Representations Improve Convergence and Generalization for Embodied AI
Ainaz Eftekhar, Kuo-Hao Zeng, Jiafei Duan, Ali Farhadi, Ani Kembhavi, Ranjay Krishna
ICLR 2024 [ICLR Spotlight awarded to top 5% of submissions]
[pdf] [code] [website] [slides]

Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-Image Generation
Jaemin Cho, Yushi Hu, Roopal Garg, Peter Anderson, Ranjay Krishna, Jason Baldridge, Mohit Bansal, Jordi Pont-Tuset, Su Wang
ICLR 2024
[pdf] [website] [code]

VOCALExplore: Pay-as-You-Go Video Data Exploration and Model Building
Maureem Daum, Enhao Zhang, Dong He, Brandon Hayes, Ranjay Krishna, Magdalena Balazinska
VLDB 2024
[pdf] [code]

2023

OBJECT 3DIT: Language-guided 3D-aware Image Editing
Oscar Michel, Anand Bhattad, Eli VanderBilt, Ranjay Krishna, Aniruddha Kembhavi, Tanmay Gupta
NeurIPS 2023
[pdf] [website] [code] [dataset]

SugarCrepe: Fixing Hackable Benchmarks for Vision-Language Compositionality
Cheng-Yu Hsieh, Jieyu Zhang, Zixian Ma, Aniruddha Kembhavi, Ranjay Krishna
NeurIPS 2023
[pdf] [code]

Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias
Yue Yu, Yuchen Zhuang, Jieyu Zhang, Yu Meng, Alexander Ratner, Ranjay Krishna, Jiaming Shen, Chao Zhang
NeurIPS 2023
[pdf] [code]

Quilt-1M: One Million Image-Text Pairs for Histopathology
Wisdom Oluchi Ikezogwo, Mehmet Saygin Seyfioglu, Fatemeh Ghezloo, Dylan Stefan Chan Geva, Fatwir Sheikh Mohammed, Pavan Kumar Anand, Ranjay Krishna, Linda Shapiro
NeurIPS 2023 [NeurIPS Oral awarded to 0.6% of submissions]
[pdf] [code]

Cola: How to adapt vision-language models to Compose Objects Localized with Attributes?
Arijit Ray, Filip Radenovic, Abhimanyu Dubey, Bryan A. Plummer, Ranjay Krishna, and Kate Saenko
NeurIPS 2023
[pdf] [project] [data]

DataComp: In search of the next generation of multimodal datasets
Samir Yitzhak Gadre, Gabriel Ilharco, Alex Fang, Jonathan Hayase, Georgios Smyrnis, Thao Nguyen, Ryan Marten, Mitchell Wortsman, Dhruba Ghosh, Jieyu Zhang, Eyal Orgad, Rahim Entezari, Giannis Daras, Sarah Pratt, Vivek Ramanujan, Yonatan Bitton, Kalyani Marathe, Stephen Mussmann, Richard Vencu, Mehdi Cherti, Ranjay Krishna, Pang Wei Koh, Olga Saukh, Alexander Ratner, Shuran Song, Hannaneh Hajishirzi, Ali Farhadi, Romain Beaumont, Sewoong Oh, Alex Dimakis, Jenia Jitsev, Yair Carmon, Vaishaal Shankar, Ludwig Schmidt
NeurIPS 2023 [NeurIPS Oral awarded to 0.6% of submissions]
[pdf] [website] [code]

AR2-D2:Training a Robot Without a Robot
Jiafei Duan, Yi Ru Wang, Mohit Shridhar, Dieter Fox, Ranjay Krishna
CoRL 2023
[pdf]

Agile Modeling: From Concept to Classifier in Minutes
Otilia Stretcu, Edward Vendrow, Kenji Hata, Krishnamurthy Viswanathan, Vittorio Ferrari, Sasan Tavakkol, Wenlei Zhou, Aditya Avinash, Enming Luo, Neil Gordon Alldrin, MohammadHossein Bateni, Gabriel Berger, Andrew Bunner, Chun-Ta Lu, Javier A Rey, Giulia DeSalvo, Ranjay Krishna, Ariel Fuxman
ICCV 2023
Also published at NeurIPS 2023 ReALML workshop [Best paper nominee]
[pdf]

TIFA: Text-to-Image Faithfulness Evaluation with Question Answering
Yushi Hu, Benlin Liu, Jungo Kasai, Yizhong Wang, Mari Ostendorf, Ranjay Krishna, Noah Smith
ICCV 2023
[pdf] [website] [code]

Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes
Cheng-Yu Hsieh, Chun-Liang Li, Chih-Kuan Yeh, Hootan Nakhost, Yasuhisa Fujii, Alex Jason Ratner, Ranjay Krishna, Chen-Yu Lee and Tomas Pfister
ACL 2023 Findings
[pdf] [code] [video] [blog]

EQUI-VOCAL: Synthesizing Queries for Compositional Video Events from Limited User Interactions
Enhao Zhang, Maureem Daum, Dong He, Brandon Hayes, Ranjay Krishna, Magdalena Balazinska
VLDB 2023
[pdf] [code]

CREPE: Can Vision-Language Foundation Models Reason Compositionally?
Zixian Ma*, Jerry Hong*, Mustafa Omer Gul*, Mona Gandhi, Irena Gao, Ranjay Krishna
CVPR 2023 [CVPR Highlight awarded to 2.5% of submissions]
[pdf] [code]

Explanations can Reduce Overreliance on AI Systems during Decision-Making
Helena Vasconcelos, Matthew Jorke, Madeleine Grunde-McLaughlin, Tobias Gerstenberg, Michael Bernstein, Ranjay Krishna
CSCW 2023 [Best paper honorable mention awarded to the top 23 papers]
[pdf]

2022

Alignment as a Multi-Agent Intrinsic Reward
Zixian Ma, Rose Wang, Li Fei-Fei, Michael Bernstein, Ranjay Krishna
NeurIPS 2022
[pdf] [code]

Socially situated artificial intelligence enables learning from human interaction
Ranjay Krishna, Donsuk Lee, Li Fei-Fei*, Michael Bernstein*
* = equal last authors
PNAS 2022
[main paper] [appendix] [science article] [techxplore article]

Searching for Computer Vision North Stars
Li Fei-Fei, Ranjay Krishna
Book: Daedalus Special issue on "AI & Society"
Daedalus Spring 2022
[book] [pdf] [website]

Measuring Compositional Consistency for Video Question Answering
Mona Gandhi*, Mustafa Omer Gul*, Eva Prakash, Madeleine Grunde-McLaughlin, Ranjay Krishna, Maneesh Agrawala
CVPR 2022
[pdf] [website] [dataset] [code]

VOCAL: Video Organization and Interactive AnaLytics
Maureem Daum*, Enhao Zhang*, Dong He, Magdalena Balazinska, Brandon Hayes, Ranjay Krishna, Apryle Craig, Aaron Wirsing
CIDR 2022
[pdf] [video]

EARLIER PUBLICATIONS

Visual Intelligence through Human Interaction
Ranjay Krishna, Mitchell Gordon, Li Fei-Fei, Michael Bernstein
Book: Artificial Intelligence for Human Computer Interaction: A Modern Approach
Springer 2021
[book] [chapter] [preprint]

On the Opportunities and Risks of Foundation Models
Center for Foundation Models @ Stanford
Report 2021
[pdf] [website] [workshop]

Mind Your Outliers! Investigating the Negative Impact of Outliers on
Active Learning through the Lens of Visual Question Answering
Siddharth Karamcheti, Ranjay Krishna, Li Fei-Fei, Christopher Manning
ACL 2021 [Outstanding paper awarded to top 6 papers]
[pdf] [code]

AGQA: A Benchmark for Compositional Spatio-Temporal Reasoning
Madeleine Grunde-McLaughlin, Ranjay Krishna, Maneesh Agrawala
CVPR 2021
[pdf] [website] [dataset] [blog] [video]

AGQA 2.0: An Updated Benchmark for Compositional Spatio-Temporal Reasoning [pdf]

Conceptual Metaphors Impact Perceptions of Human-AI Collaboration
Pranav Khadpe, Ranjay Krishna, Li Fei-Fei, Jeffrey Hancock, Michael Bernstein
CSCW 2020 [Best paper honorable mention award]
[pdf] [blog] [press] [video]

Action Genome: Actions as Composition of Spatio-temporal Scene Graphs
Jingwei Ji, Ranjay Krishna, Li Fei-Fei, Juan Carlos Niebles
CVPR 2020
[pdf] [website]

AI-based Request Augmentation to Increase Crowdsourcing Participation
Junwon Park, Ranjay Krishna, Pranav Khadpe, Li Fei-Fei, Michael Bernstein
HCOMP 2019
[pdf]

Visual Relationships as Functions: Enabling Few-Shot Scene Graph Prediction
Apoorva Dornadula, Austin Narcomey, Ranjay Krishna, Michael Bernstein, Li Fei-Fei
ICCV 2019 - Scene Graph Representation and Learning workshop
[website] [pdf]

Scene Graph Prediction with Limited Labels
Vincent Chen, Paroma Varma, Ranjay Krishna, Michael Bernstein, Christopher Re, Li Fei-Fei
ICCV 2019
[website] [pdf] [code]

HYPE: Human eYe Perceptual Evaluation of Generative Models
Sharon Zhou*, Mitchell Gordon*, Ranjay Krishna, Austin Narcomey, Li Fei-Fei, Michael Bernstein
NeurIPS 2019 [Oral awarded to top 0.53% of submissions]
[website] [pdf]

Information Maximizing Visual Question Generation
Ranjay Krishna, Michael Bernstein, Li Fei-Fei
CVPR 2019
[website] [pdf] [code]

Referring Relationships
Ranjay Krishna*, Ines Chami*, Michael Bernstein, Li Fei-Fei
* = indicates equal contribution
CVPR 2018
[website] [pdf] [code]

Dense-Captioning Events in Videos
Ranjay Krishna, Kenji Hata, Frederic Ren, Li Fei-Fei, Juan Carlos Niebles
ICCV 2017
[website] [pdf] [dataset] [eval code] [challenge] [poster]

Crowd Research: Open and Scalable University Laboratories
Rajan Vaish, Snehalkumar Gaikwad, Geza Kovacs, Andreas Veit, Ranjay Krishna, Imanol Arrieta Ibarra, Camelia Simoiu, Michael Wilber, Serge Belongie, Sharad C. Goel, James Davis, Michael Bernstein
UIST 2017 [Awarded best paper honorable mention]
[website] [pdf]

A Hierarchical Approach for Generating Descriptive Image Paragraphs
Jonathan Krause, Justin Johnson, Ranjay Krishna, Li Fei-Fei
CVPR 2017 [Spotlight award to top 6% of papers]
[website] [pdf] [dataset]

A Glimpse Far into the Future: Understanding Long-term Crowd Worker Accuracy
Kenji Hata, Ranjay Krishna, Li Fei-Fei, Michael Bernstein
CSCW 2017
[website] [pdf]

Visual Genome: Crowdsourced Visual Knowledge Representations
Ranjay Krishna
Masters Thesis - Stanford University 2016
[pdf] [Christofer Stephenson Memorial award for best Stanford CS Thesis]

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li Jia-Li, David Ayman Shamma, Michael Bernstein, Li Fei-Fei
IJCV 2016
[website] [article] [pdf] [download] [api] [twitter] [press]

Visual Relationship Detection with Language Priors
Cewu Lu*, Ranjay Krishna*, Michael Bernstein, Li Fei-Fei
* = indicates equal contribution
ECCV 2016 [Oral awarded to top 1% of papers]
[pdf] [dataset] [images (2GB)] [code] [project] [slides] [poster] [video]

Embracing Error to Enable Rapid Crowdsourcing
Ranjay Krishna, Kenji Hata, Stephanie Chen, Joshua Kravitz, David Ayman Shamma, Li Fei-Fei, Michael Bernstein
CHI 2016
[pdf] [talk] [slides] [demo] [code]

DAEMO: A Self-Governed Crowdsourcing Marketplace
S. Gaikwad, D. Morina, R. Nistala, M. Agarwal, A. Cossette, R. Bhanu, S. Savage, V. Narwal, K. Rajpal, J. Regino, A. Mithal, A. Ginzberg, A. Nath, K. R. Ziulkoski, T. Cossette, D. Gamage, A. Richmond-Fuller, R. Suzuki, J. Herrejon, K. V. Le, C. Flores-Saviaga, H. Thilakarathne, K. Gupta, W. Dai, A. Sastry, S. Goyal, T. Rajapakshe, N. Abolhassani, A. Xie, A. Reyes, S. Ingle, V. Jaramillo, M.D. Godinez, W. Angel, M. Godinez, C. Toxtli, J. Flores, A. Gupta, V. Sethia, D. Padilla, K. Milland, K. Setyadi, N. Wajirasena, M. Batagoda, R. Cruz, J. Damon, D. Nekkanti, T. Sarma, M.H. Saleh, G. Gongora-Svartzman, S. Bateni, G. Toledo-Barrera, A. Pena, R. Compton, D. Aariff, L. Palacios, M. P. Ritter, Nisha K.K., A. Kay, J. Uhrmeister, S. Nistala, M. Esfahani, E. Bakiu, C. Diemert, L. Matsumoto, M. Singh, V. Jaramillo-Lopez, K. Patel, R. Krishna, G. Kovacs, R. Vaish, M. Bernstein
UIST 2015
[pdf]

Generating Semantically Precise Scene Graphs from Textual Descriptions for Improved Image Retrieval
Sebastian Schuster, Ranjay Krishna, Angel Chang, Li Fei-Fei and Christopher D. Manning
EMNLP 2015 - Vision and Language Workshop
[oral] [pdf]

Image Retrieval using Scene Graphs
Justin Johnson, Ranjay Krishna, Michael Stark, Li-Jia Li, David Ayman Shamma, Michael Bernstein, Li Fei-Fei
CVPR 2015
[pdf] [bib] [dataset (2GB)]

Non-Archival PAPERS

Lasagna: Layered Score Distillation for Disentangled Object Relighting
Dina Bashkirova, Arijit Ray, Rupayan Mallick, Sarah Adel Bargal, Jianming Zhang, Ranjay Krishna, Kate Saenko
ArXiv 2023
[pdf] [code]

EcoAssistant: Using LLM Assistant More Affordably and Accurately
Jieyu Zhang, Ranjay Krishna, Ahmed H. Awadallah, Chi Wang
ArXiv 2023
[pdf] [code] [blog]

Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models
Cheng-Yu Hsieh, Si-An Chen, Chun-Liang Li, Yasuhisa Fujii, Alexander Ratner, Chen-Yu Lee, Ranjay Krishna, Tomas Pfister
ArXiv 2023
[pdf]

MIMIC: Masked Image Modeling with Image Correspondences
Kalyani Marathe*, Mahtab Bigverdi*, Nishat Khan, Tuhin Kundu, Aniruddha Kembhavi, Linda G. Shapiro, Ranjay Krishna
CVPR 2024 Workshop for Learning 3D with Multi-View Supervision
[pdf] [code]

Determining Question-Answer Plausibility in Crowdsourced Datasets Using Multi-Task Learning
Rachel Gardner, Maya Varma, Clare Zhu, Ranjay Krishna
EMNLP 2020 - Workshop on Noisy User-Generated Text [Oral awarded to top 10% of submissions]
[pdf] [code]

Deep Bayesian Active Learning for Multiple Correct Outputs
Khaled Jedoui, Ranjay Krishna, Michael Bernstein, Li Fei-Fei
ArXiv 2019
[pdf]

The ActivityNet Large-Scale Activity Recognition Challenge 2018 Summary
Bernard Ghanem, Juan Carlos Niebles, Cees Snoek, Fabian Caba Heilbron, Humam Alwassel, Victor Escorcia, Ranjay Krishna, Shyamal Buch, Cuong Duc Dao
CVPR 2018 - The ActivityNet Large-scale Activity Recognition Challenge Workshop
[website] [pdf] [leaderboard]

Engagement Learning: Expanding Visual Knowledge by Engaging Online Participants
Ranjay Krishna, Donsuk Lee, Fei-Fei Li, Michael Bernstein
UIST 2018 [Poster]
[pdf]

ActivityNet Challenge 2017 Summary
Bernard Ghanem, Juan Carlos Niebles, Cees Snoek, Fabian Caba Heilbron, Humam Alwassel, Victor Escorcia, Ranjay Krishna, Shyamal Buch, Cuong Duc Dao
CVPR 2017 - The ActivityNet Large-scale Activity Recognition Challenge Workshop
[website] [pdf] [leaderboard]

Assistant Professor
Computer Science & Engineering
University of Washington

Co-director of RAIVN Lab

Member of Technical Staff @ Microsoft SuperIntelligence

PAST AFFILIATIONS

Director of multimodal & embodied AI
@ Allen Institute for Artificial Intelligence

Visiting Faculty @ Google Research

Visiting Faculty @ Apple Research

Ph.D. @ Stanford University, 2021
Co-advised by Fei-Fei Li
and Michael Bernstein.

Curriculum Vitae [2026]
Google scholar

Research statement [2021]
Teaching statement [2021]
Diversity statement [2021]

CONTACT

ranjay [at] cs [dot] washington [dot] edu

Bill & Melinda Gates Center
Room 304
3800 E Stevens Way NE,
Seattle, WA 98195

Follow @RanjayKrishna

TEACHING

University of Washington:
CSE 599H: AI vs IA [2023]
CSE 493G1: Deep learning [2025] [2024] [2023]
CSE 455: Computer Vision [2025] [2024]

Stanford University:
CS231N: Convolutional Neural Networks for Visual Recognition [2021] [2020]
CS131 Computer Vision: Foundations and Applications [2019] [2018] [2017]
[crowdsourced class notes]

RESEARCH GROUP

Prospective students read this.

PostDocs

Zhongzheng (Jason) Ren
(2025-)

Jaemin Cho
(2025-)

PhD students

Jieyu Zhang
(2020-)

Benlin Liu
(2021-)

George Stoica with Judy Hoffman
(2021-)

Jiafei Duan with Dieter Fox
(2022-)

Ainaz Eftekhar with Ali Farhadi
(2022-)

Amita Kamath with Kai-Wei Chang
(2022-)

Mahtab Bigverdi with Linda Shapiro
(2022-)

Wisdom O. Ikezogwo with Linda Shapiro
(2022-)

Zixian (Sunnie) Ma
(2023-)

Xiang Fan
(2023-)

Linjie Li with Yejin Choi
(2023-)

Chenhao Zheng
(2024-)

Arjun Guru
(2025-)

Long term collaborating PhD students

Madeleine Grunde-McLaughlin with Dan Weld and Jeff Heer

Arijit Ray with Kate Saenko

Enhao Zhang with Magdalena Balazinska

Former PostDocs

Wei-Chiu Ma (2023-2024)
Assistant Professor @ Cornell

Former PhD students

Cheng-Yu Hsieh (2020-2025)
Research Scientist @ Apple

M. Saygin Seyfioglu (2020-2025)
Applied Scientist @ Amazon

Jae Sung Park (2020-2026)
Research Scientist @ Allen Institute

Selected Talks

Venue: CVPR 2024 - Computer Vision and Pattern Recognition
Panel: CVPR: past, present, and future

Venue: CVPR 2020 - Computer Vision and Pattern Recognition
Title: Compositionally in Computer Vision
[slides][video][workshop]

Venue: CVPR 2020 - Computer Vision and Pattern Recognition
Title: Dense Captioning Events in Videos
[slides][video][workshop]

Venue: ECCV 2016 - European Conference on Computer Vision
Title: Visual Relationship Detection with Language Priors
[pdf][project][slides][poster][video]

Venue: CHI 2016 - Conference on Human Factors in Computer Systems
Title: Embracing Error to Enable Rapid Crowdsourcing
[pdf][slides]

Tweets by @RanjayKrishna

MISCELLANEOUS

Trailer for a documentary
Venue: PBS NOVA
Title: Can we build a brain?
Year: 2018

Complete documentary
Venue: PBS NOVA
Title: Can we build a brain?
Year: 2018