On 'Simulated Worlds'

OpenAI’s video generation model Sora, in its current iteration, is incredible even though it’ll undoubtedly get better in the coming months. The post they have on their website makes for fascinating reading.

Extending generated videos. Sora is also capable of extending videos, either forward or backward in time.

Long-range coherence and object permanence. A significant challenge for video generation systems has been maintaining temporal consistency when sampling long videos. We find that Sora is often, though not always, able to effectively model both short- and long-range dependencies. For example, our model can persist people, animals and objects even when they are occluded or leave the frame. Likewise, it can generate multiple shots of the same character in a single sample, maintaining their appearance throughout the video.

Interacting with the world. Sora can sometimes simulate actions that affect the state of the world in simple ways. For example, a painter can leave new strokes along a canvas that persist over time, or a man can eat a burger and leave bite marks.

These advancements, alongside how far LLMs and other transformer-based technologies have come in the past few months, have been quite something to behold. While equal parts exciting and terrifying, it’s hard not to think about how and how much they’ll impact industries and society at large. It will likely become harder (and more time-consuming) to sift through what is a genuine advancement and not just another grift (NFTs, anyone?). Art, music, technology, video games, programming, editing, writing, law, disinformation, misinformation, capital markets, cybersecurity, democracy, and medicine will all invariably see some impact. A small part of me thinks that, as amazing as all this is right now, ‘AI’ (quotes intentional) may not be immune to enshittification—not just from the pressure to monetise but also from the unstoppable deluge of low-quality and unimaginative generated content.