Meta has just introduced V-JEPA 2, the latest version of its AI “world model” built to help machines understand how the physical world works.
It builds on last year’s V-JEPA and has been trained on more than a million hours of video.
The idea is to help AI systems, especially robots, spot patterns and predict what’s likely to happen next in everyday situations.
Think of how a dog chases a ball, not where it is, but where it’s going.
That’s the kind of common sense Meta wants to replicate.
One example they give is a robot carrying a plate and spatula while approaching a stove with eggs in the pan.
The model can figure out that the next logical step is moving the eggs onto the plate. It’s not just object recognition, it’s about anticipating what’s likely to happen next.
Meta also claims V-JEPA 2 runs 30 times faster than Nvidia’s Cosmos model, which tackles similar tasks around physical intelligence.
But it’s worth noting that the two may be using different benchmarks, so the speed comparison might not tell the full story.
Here’s what you should know:
V-JEPA 2 helps AI make real-world predictions using video-based training.
Meta says it’s far faster than Nvidia’s Cosmos, but the results aren’t measured on the same scale.
The goal is smarter AI that can handle physical tasks with less manual training.
According to Meta’s AI chief Yann LeCun, world models like V-JEPA 2 could help robots carry out everyday tasks without needing endless training data.
It’s another step towards building AI that can navigate the real world with a bit more human-like understanding.
One million hours of video later and AI still won’t clean your room.