Ai2
Non-profit Organizations
Seattle, WA 64,690 followers
Breakthrough AI to solve the world's biggest problems.
About us
We are a Seattle-based non-profit AI research institute founded in 2014 by the late Paul Allen. We develop foundational AI research and innovation to deliver real-world impact through large-scale open models, data, robotics, conservation, and beyond.
- Website
-
http://allenai.org
External link for Ai2
- Industry
- Non-profit Organizations
- Company size
- 201-500 employees
- Headquarters
- Seattle, WA
- Type
- Nonprofit
- Founded
- 2014
- Specialties
- Artificial Intelligence, Deep Learning, Natural Language Processing, Computer Vision, Machine Reading, Machine Learning, Knowledge Extraction, Common Sense AI, Machine Reasoning, Information Extraction, and Language Modeling
Employees at Ai2
Locations
-
Primary
Get directions
Seattle
Seattle, WA 98013, US
Updates
-
We're releasing 𝗠𝗼𝗹𝗺𝗼𝗠𝗼𝘁𝗶𝗼𝗻, a 3D motion forecasting model—plus the full training data & a new benchmark. 👇 Given one or a few video frames, 3D points on an object, & an instruction like "Put the white bowl on the table," MolmoMotion predicts where those points will go over the next few seconds in a shared 3D world frame. MolmoMotion can predict different motions across scenes, like a bowl sliding and rotating on a table, a flamingo dipping its beak as it walks, & a lint roller working back and forth on cloth. MolmoMotion represents motion as 3D points attached to an object, tracked in a shared world frame; each predicted path follows the instruction and stays close to the ground-truth motion. This approach doesn't need templates for objects, stays stable as camera perspectives change, & is compact enough to feed straight into downstream applications. Training MolmoMotion required data that didn't exist: web-scale video with 3D point tracks grounded to objects & paired with actions. So we built a pipeline to extract from ordinary video, now released as 𝗠𝗼𝗹𝗺𝗼𝗠𝗼𝘁𝗶𝗼𝗻-𝟭𝗠: 1.16M videos, 736 motion types, & 5.6K objects. We also built 𝗣𝗼𝗶𝗻𝘁𝗠𝗼𝘁𝗶𝗼𝗻𝗕𝗲𝗻𝗰𝗵, a human-validated benchmark for object-centric 3D motion forecasting. On it, MolmoMotion outperforms every motion prediction method we tested, including pixel-space video generators, parametric 3D methods, & a constant-velocity baseline. Motion forecasters like MolmoMotion have many potential applications. Fine-tuned, MolmoMotion can predict object paths to help grasping robots plan where to move objects, or help image generators capture motion more accurately—particularly motions hard to describe in a prompt. We think motion forecasting is as fundamental to machine intelligence as perceiving what's stationary. MolmoMotion is a step toward it: 3D motion prediction that works across object types, learned from everyday video. Everything is open—download the MolmoMotion weights, inspect the training data, & customize for your applications. ✏️ Blog: https://lnkd.in/gfq3wWqn 🤗 Models: https://lnkd.in/gFbUs6QV 📊 Data: https://lnkd.in/gsxzhvqQ 📄 Paper: https://lnkd.in/gFWSS_92
-
Today we’re releasing 𝗼𝗹𝗺𝗼-𝗲𝘃𝗮𝗹, a workbench built for iterative AI model development. 👇 Building an LLM means evaluating it over and over as it changes. Tweak a hyperparameter or scale the model up, and every new checkpoint sends you back through the same benchmarking loop. olmo-eval is designed for this—it extends our OLMES project, which made benchmark scores comparable and reproducible by standardizing how models are evaluated, to the intermediate experiments teams compare throughout model development: ⚡ Running every benchmark in a locked-down sandbox – as many eval platforms do – is compute-heavy. 𝗦𝗼 𝗼𝗹𝗺𝗼-𝗲𝘃𝗮𝗹 𝗶𝗻𝘀𝘁𝗲𝗮𝗱 𝘁𝗿𝗲𝗮𝘁𝘀 𝗯𝗲𝗻𝗰𝗵𝗺𝗮𝗿𝗸𝘀 𝗱𝗶𝗳𝗳𝗲𝗿𝗲𝗻𝘁𝗹𝘆 𝗱𝗲𝗽𝗲𝗻𝗱𝗶𝗻𝗴 𝗼𝗻 𝘁𝗵𝗲𝗶𝗿 𝗿𝘂𝗻𝘁𝗶𝗺𝗲 𝗻𝗲𝗲𝗱𝘀. For example, a plain Q&A benchmark runs directly—faster and cheaper than sandboxing. 🔁 𝗜𝗻 𝗼𝗹𝗺𝗼-𝗲𝘃𝗮𝗹, 𝗲𝘃𝗲𝗿𝘆 𝗰𝗼𝗺𝗽𝗼𝗻𝗲𝗻𝘁 𝗶𝘀 𝘀𝘄𝗮𝗽𝗽𝗮𝗯𝗹𝗲: the model being evaluated, its tools, LLM-as-a-judge graders, and more. You can change one without touching the rest. 📊 𝗕𝗲𝗻𝗰𝗵𝗺𝗮𝗿𝗸 𝗿𝗲𝘀𝘂𝗹𝘁𝘀 𝗹𝗮𝗻𝗱 𝗶𝗻 𝗮 𝘂𝗻𝗶𝗳𝗼𝗿𝗺 𝘀𝗰𝗵𝗲𝗺𝗮, so checkpoints stay comparable across a long project. 🔍 After training a model with a new intervention, 𝗼𝗹𝗺𝗼-𝗲𝘃𝗮𝗹 𝗹𝗲𝘁𝘀 𝘆𝗼𝘂 𝗹𝗶𝗻𝗲 𝘁𝘄𝗼 𝗺𝗼𝗱𝗲𝗹 𝗰𝗵𝗲𝗰𝗸𝗽𝗼𝗶𝗻𝘁𝘀 𝘂𝗽 𝗾𝘂𝗲𝘀𝘁𝗶𝗼𝗻 𝗯𝘆 𝗾𝘂𝗲𝘀𝘁𝗶𝗼𝗻—𝗵𝗼𝗹𝗱𝗶𝗻𝗴 𝗲𝘃𝗲𝗿𝘆𝘁𝗵𝗶𝗻𝗴 𝗲𝗹𝘀𝗲 𝗳𝗶𝘅𝗲𝗱. The comparison view makes it easier to see real gains and regressions. If you find yourself asking "how does this model checkpoint differ from the last, and where did it improve/regress?", that's what olmo-eval is for. We're releasing it openly so the community can build on it. 💻 Code: https://lnkd.in/gmfTAkdY 📝 Blog: https://lnkd.in/gwgz5mDb
-
-
We’re excited to share that Ai2 is hiring for Senior Research Engineers on our Olmo and Molmo teams to help build and advance some of the world's leading open models. There are opportunities to contribute on a range of dynamic research challenges from novel training infrastructure to multimodality to agentic system development. As a member of our tight-knit team, you’ll collaborate with researchers and engineers to shape new research directions, push the boundaries of open models, and make meaningful contributions to truly open research that moves the field forward. If you're passionate about advancing foundational AI research and excited about being part of an innovative team committed to AI for the common good, we'd love to hear from you! https://lnkd.in/gUCXNTfA
-
-
Introducing 𝗠𝗼𝗱𝗦𝗹𝗲𝘂𝘁𝗵, a tool for tracing the models and datasets behind modern LLMs. 👇 LLMs are no longer created with human data alone. They rely on other models to generate and filter data, evaluate outputs, and guide development work. We made ModSleuth to track this. Modern LLM dependencies are scattered, recursive, and hard to see. So how do we even find them all? ModSleuth helps by reading papers, model and dataset cards, code configs, and upstream artifacts, then reconstructing a model's “family tree.” ModSleuth found that Olmo 3 has 89 model and 183 dataset dependencies, while Nemotron 3 has 273 model and 560 dataset dependencies. Some dependency chains go 8 hops deep—a web of models and data that contributed to an LLM’s core. Turns out AI supply chains may be more tangled than we thought. A model's lineage is broader than its training data, and every step can affect what – and how – the final model learns. Without provenance, it's harder to know where dependencies came from, whether benchmark scores are accurate, and which upstream licenses/terms may apply. ModSleuth generates a graph that surfaces what's nearly impossible to find manually, including: 📜 Hidden license inheritance 🔗 Train/eval coupling 📝 Documentation inconsistencies 🤖 Models used as judges, filters, OCR systems, and data generators As LLM pipelines become more complex, we need tools like ModSleuth to find out and identify what artifacts models are built on. ▶️ Demo: https://lnkd.in/gs8AivRn 📄 Paper: https://lnkd.in/g49NCSt8
-
-
Ai2 reposted this
Our ACE2 models released last year had a subtle but important drawback. Although they get the right overall atmospheric warming response to steady increases in both sea surface temperature (SST) and CO2 concentration, they did not get the right response to abrupt changes in either of these forcing variables, nor the right sensitivity to changes in just one or the other. This paper shows how to resolve this issue by generating training data from a wider range of SST/CO2 values. With this new dataset we are able to train a global atmospheric climate emulator which faithfully captures signals such as the response to +4K SST perturbations, abrupt quadrupling of CO2 and simulations with increasing SST but fixed CO2. The new training dataset actually has fewer overall samples than our prior datasets, but leads to a model with superior performance across multiple evaluation scenarios. Preprint: https://lnkd.in/g9aqQpGq Model: https://lnkd.in/gmdGgnp5
Today we're introducing 𝗔𝗖𝗘𝟮𝗦-𝗦𝗛𝗶𝗘𝗟𝗗+, a climate emulator that learns to separate the effects of sea surface temperature & CO2. 👇 Climate emulators are AI models that simulate global weather & climate. They run about 100x faster than the physics-based models they learn from, making it practical to run many simulations + explore a wider range of scenarios. We've been developing the ACE family for several years. ACE2-SHiELD was trained on historical simulations from a physics-based model. ACE2-SOM came next, coupling ACE2 to a slab ocean: a simplified ocean representation where ocean temperature responds to CO2. Sea surface temperature & CO2 have major impacts on climate, and they typically change together—SST tends to rise as CO2 increases. Because earlier ACE models had only seen them move in sync, they couldn't accurately predict what happens when one changes and the other doesn't. Earlier ACE models produced unrealistic results on two scenarios climate scientists often use to probe model behavior: AMIP +4 K, which raises sea surface temperature by 4 degrees with CO2 unchanged, and abrupt 4xCO2, which quadruples CO2 against a still-cold ocean. To address this, we generated a new class of training data where sea surface temperature rises steadily at 1 degree per year while CO2 jumps to a new randomly-chosen value every 30 days, spanning from well below to well above present-day levels. Trained on the new & existing data, ACE2S-SHiELD+ accurately handles the scenarios earlier ACE models were good at as well as the ones they struggled with. It's more flexible than ACE2-SHiELD + ACE2-SOM combined, using ~25% fewer training samples than either alone. This work was done in collaboration with NOAA: National Oceanic & Atmospheric Administration's Geophysical Fluid Dynamics Laboratory. → Download the model: https://lnkd.in/gMja2U5d → Read more about ACE2S-SHiELD+ in our preprint: https://lnkd.in/gYmZp3bW
-
-
Today we're introducing 𝗔𝗖𝗘𝟮𝗦-𝗦𝗛𝗶𝗘𝗟𝗗+, a climate emulator that learns to separate the effects of sea surface temperature & CO2. 👇 Climate emulators are AI models that simulate global weather & climate. They run about 100x faster than the physics-based models they learn from, making it practical to run many simulations + explore a wider range of scenarios. We've been developing the ACE family for several years. ACE2-SHiELD was trained on historical simulations from a physics-based model. ACE2-SOM came next, coupling ACE2 to a slab ocean: a simplified ocean representation where ocean temperature responds to CO2. Sea surface temperature & CO2 have major impacts on climate, and they typically change together—SST tends to rise as CO2 increases. Because earlier ACE models had only seen them move in sync, they couldn't accurately predict what happens when one changes and the other doesn't. Earlier ACE models produced unrealistic results on two scenarios climate scientists often use to probe model behavior: AMIP +4 K, which raises sea surface temperature by 4 degrees with CO2 unchanged, and abrupt 4xCO2, which quadruples CO2 against a still-cold ocean. To address this, we generated a new class of training data where sea surface temperature rises steadily at 1 degree per year while CO2 jumps to a new randomly-chosen value every 30 days, spanning from well below to well above present-day levels. Trained on the new & existing data, ACE2S-SHiELD+ accurately handles the scenarios earlier ACE models were good at as well as the ones they struggled with. It's more flexible than ACE2-SHiELD + ACE2-SOM combined, using ~25% fewer training samples than either alone. This work was done in collaboration with NOAA: National Oceanic & Atmospheric Administration's Geophysical Fluid Dynamics Laboratory. → Download the model: https://lnkd.in/gMja2U5d → Read more about ACE2S-SHiELD+ in our preprint: https://lnkd.in/gYmZp3bW
-
-
We're extending 𝗔𝘂𝘁𝗼𝗗𝗶𝘀𝗰𝗼𝘃𝗲𝗿𝘆 early access through July 31. 👇 New accounts start with 500 Hypothesis Credits (one credit = one hypothesis), & any credits you already have will still work. Most AI research tools need prompting. AutoDiscovery analyzes your data instead, generating its own hypotheses & writing code to test each one, then surfacing the most surprising results—the ones most likely to be genuine discoveries. AutoDiscovery has already surfaced mutual-exclusivity patterns in cancer mutations, trophic relationships in 20 years of marine data, & social science findings later published in a peer-reviewed paper. → Try it in AstaLabs: https://lnkd.in/gvfF7xTx
-
-
Now you can fine-tune 𝗠𝗼𝗹𝗺𝗼𝗔𝗰𝘁 𝟮 for more robots and tasks. 👇 MolmoAct 2 artifacts have been downloaded 400K+ times in under 1 month, and today, we’re releasing the full code & training data. It’s everything you need to customize or build on our fully open robotics foundation model. What's now open alongside the model: 1️⃣ Fine-tuning scripts 2️⃣ Every dataset used to train MolmoAct 2 3️⃣ All of our evaluation rollouts 4️⃣ Training recipe for the open source MolmoAct 2 tokenizer MolmoAct 2 now also officially supports Hugging Face’s LeRobot platform. Teams already working in the LeRobot ecosystem can drop the model into their existing setup without retooling. 🤗 Learn more: https://lnkd.in/e8mY5y9n Open robotics gets stronger when researchers can evaluate models like MolmoAct 2 themselves. Try it on new robots and tasks and tell us what you discover. 💻 Code: https://lnkd.in/eZnB-zK4 📝 Read our blog: https://lnkd.in/gaaGv8bp