Speed in the space of AI
The speed at which generative AI progresses is unfathomable. In 2017, a harmless little paper was published that introduced the transformer architecture, a new type of neural network that has been the backbone of modern large language models and all kinds of visual applications. GPT? Transformers. Prompt Embeddings of diffusion networks? Transformers. Classification Networks? Transformers. This 2017 paper has been cited 3x as much as the 1953 paper by Watson&Crick in which they argued, that the structure of the DNA molecule should be a double-helix. 3 times as much. In just five years!
Transformer is such a standard architecture by now, it bores me to even write about it. Now think about the following: 2 years ago, OpenAI published Dall-E - an incredible text-to-image model. There were some reports about how impressive this technology was, but it wasn’t until the open source alternative Stable diffusion as well as the easily accessible Midjourney popped up, that this space saw nothing short of a light-speedization. Stable Diffusion was released only half a year ago - and what has happened since? The fact alone, that the weights of this neural network were open at all times, lead to fine-tuning of the models to produce customized outputs, to many, many million dollars earned (by indie-hackers by Pieter Levels, Danny Postma, as well as companies LensaAI and TikTok), to the awesome open-source library Deforum, which allows producing videos with these type of networks.
It's truly a testimony to the power of open source. Free and open access can activate a tremendous amount of force. I want to show the open-source movement to all the cynics who are so vocal about how bad the world is and how humans are screwing up everything. It seems to be the pure good, the pure productive - although StabilityAI - the company behind Stable Diffusion - does seem to have financial problems, at least according to some reportings. It is also a mystery to me, how a company can finance itself that puts out everything for free, except for a web UI to their model.
Let me sleep
I have two hearts beating inside my chest. One is in constant awe of the rapid progress of this field. I love seeing how crazy things are quickly becoming. People are programming computer games without coding knowledge. I can build a web-based video generation tool with the react knowledge of a 5-year-old. People produce videos on neural frames, from music videos to porn even, and Runway, a company backed by tens of Millions of Dollars, who has been pouring R&D work into this space for years, empowering people with really advanced AI editing tools to their videos.
And this is only video. Midjourney is going completely crazy in the beautiful outputs it can produce. The “photo” of the pope in puff jacket went around the world and it is not so crazy to think, that these type of images will penetrate deeper and deeper into society.
Then there is Adobe with new generative AI products, not even speaking of Microsoft seemingly going all-in on this front.
I love progress, I love technology, and I love this.
On the other hand: I really need to sleep at some point. I am building products in this space, and the speed is so crazy that it is impossible to build on one hand and experiment with new stuff coming out. And I desperately want to experiment. But building a product with technology that I kind of understand (through months of work with it) is hard enough. Should I really pivot now, cause this new type of diffusion model seems to produce nicer results? Or because I have ten ideas to build products with large language models and progress with neural frames is slow? What type of metric do you even use to assess which direction to go, if the available technology landscape changes on a weekly basis? How could you take the time to fundraise for a company when the whole technological landscape changes within a month?
Typically, progress happens as step functions almost, in stages - be it with human aging, with technology, or with economic progress. Will it be the same here? Is this just some crazy phase of progress until the low-hanging fruits are picked from the trees until there are harder problems that take time again? Or is it just going vertical from here, enabled by the ever-so-more technological capabilities?
And now there’s AutoGPT….
Gonna be a fun night ahead.