https://github.com/facebookresearch/vggt
While drafting my Repo Roundup post this week, I was intrigued by the VGGT project that came out of Facebook Research. Rather than review a few projects, I wanted to spend some time diving deeper into this one to understand it better. It is very much still in research project mode, but as a relative layperson in this area, I think I can see where it's headed from a practical use perspective.
This project is demo code from an award-winning paper titled "VGGT: Visual Geometry Grounded Transformer." It essentially takes one or many images (or video) of a subject and reconstructs it as a 3D scene. Behind the scenes, it uses a neural network to predict cameras, point maps, depth maps, etc., as opposed to more simplistic solutions that more or less attempt to stitch scenes together or actual 3d scanning hardware. It is capable of doing these calculations in less than a second, so very close to real-time.
Previously, if you wanted to create a 3D model of something, you would need to use a 3D scanner (like a Leica BLK360) or something like the LiDAR scanner on an iPhone. While this method works well and can produce excellent results, the high-end hardware is expensive and the process is time-consuming. While there is still work to be done with VGGT, it is just a matter of time until we will be able to generate real 3D models on consumer-grade hardware using just 2D input sources in real-time!
Anywhere 3D models are needed, this could be applied. I would wager we'll see this or similar technology introduced on our mobile phones within 12-24 months. Other practical applications of this type of technology could be found in:
I tested it using just a single photo I took of a LEGO figurine that was near my desk. While the 3D model still has a number of artifacts, I think with either additional images or some post-processing, it would generate a relatively good 3D model. It outputs the results as a glb file that I imported into the Threejs sandbox.
Input Image:
Output Model: (a glb file I loaded in the threejs sandbox)