Our computer vision textbook is now available for free online here:
visionbook.mit.edu
We are working on adding some interactive components like search and (beta) integration with LLMs.
Hope this is useful and feel free to submit Github issues to help us improve the text!
Our computer vision textbook is released!
Foundations of Computer Vision
with Antonio Torralba and Bill Freeman
mitpress.mit.edu/9780262048972/…
It’s been in the works for >10 years. Covers everything from linear filters and camera optics to diffusion models and radiance fields.
1/4
Language-conditional models can act a bit like decision transformers, in that you can prompt them with a desired level of "reward".
E.g., want prettier #dalle creations? "Just ask" by adding "[very]^n beautiful":
n=0: "A beautiful painting of a mountain next to a waterfall."
A simple, fun example to refute the common story that ML can interpolate but not extrapolate:
Black dots are training points. Green curve is true data generating function. Blue curve is best fit.
Notice how it correctly predicts far outside the training distribution!
1/3
We’re releasing a new image similarity metric and dataset!
--> DreamSim: a metric which outperforms LPIPS, CLIP, and DINO on similarity and retrieval tasks
--> NIGHTS: a dataset of synthetic images with human similarity ratings
paper+code+data: dreamsim-nights.github.io
1/n
An interesting thing about ChatGPT is you can script in it a bit like you would in a programming language.
You can define functions, compose them, etc. Except all in natural language!
This means you can write out common tasks and attach them to command names. For example:
1/n
Arxiv has been such a wonderful service but I think this is a step in the wrong direction.
We have other venues for peer review. To me the value of arxiv lies precisely in its lack of excessive moderation.
I'd prefer it as "github for science," rather than yet another journal.
Over the past year, my lab has been working on fleshing out theory/applications of the Platonic Representation Hypothesis.
Today I want to share two new works on this topic:
Eliciting higher alignment: arxiv.org/abs/2510.02425
Unpaired rep learning: arxiv.org/abs/2510.08492
1/9
Back in 2018 at OpenAI, a few of us wrote a story with gpt as an AI "co-author". We didn't have an AI illustrator back then, but now we sort of do, so I tried plugging the text into #dalle.
Here is the result! “The Bees”, a short story by humans & AIs: web.mit.edu/phillipi/www/t…
This looks like one of those results that marks a phase transition in science: for years people have anticipated that synthetic data would eventually outperform / boost real, but an imagenet scale result has been elusive. Finally models are good enough that it works!
n=22: "A very very very very very very very very very very very very very very very very very very very very very very beautiful painting of a mountain next to a waterfall."