New OpenAI is Crazy Powerful

OpenAI's new model can code a simple video game from a prompt by thinking through the structure before giving the final answer.

OpenAI's new model can code an entire simple video game from a prompt. I want to show an example of a coding prompt that 0 one preview is able to handle but previous models might struggle with. The coding prompt is to write the code for a very simple video game called Squirrel Finder. The reason 0 one preview is better at handling prompts like this is that it thinks before giving the final answer, planning out the structure of the code to ensure it fits the constraints.

To give a brief overview of the prompt, the game Squirrel Finder features a koala. If somebody touches you, you die. After 3 seconds, a squirrel icon spawns, which you can move using the arrow keys. One impressive aspect of this model is its ability to take conversational English and transform it into technical language, then activate on that technical language. In the game, strawberries spawn every second and bounce around, and you want to avoid them. After 3 seconds, a squirrel icon appears, and you need to find the squirrel to win. There are additional instructions, such as putting OpenAI in the game screen and displaying instructions before the game starts.

First, you can see that the model thought for 21 seconds before giving the final answer. During its thinking process, it gathers details on the game's layout. This development could mean a significant shift in the coding industry, potentially reducing the number of coders needed. However, some argue that AI tools are still far from perfect and often serve as templates that require significant refinement by experienced coders.

Much like AI art, which has improved over the years, AI coding is expected to keep getting better. Initially, AI-generated art had noticeable flaws, but it has since improved significantly. Similarly, AI coding tools will continue to evolve. My understanding, based on limited formal programming education, is that these tools will handle much of the syntax and busy work, creating a boilerplate that doesn't have syntax errors. The actual coder, with higher-level experience, will then refine this boilerplate.

Here's the code that the model generated. I will paste it into a window to see if it works. You can see the instructions, and let's try to play the game. The squirrel appeared very quickly, but this time I was hit by an obstacle.

=> 00:05:14

Teaching AI to solve puzzles it creates is like watching the birth of the singularity.

Refines is that effectively for any of you guys that work in this, is that what this is effectively doing? Because that's what I'm guessing. Yeah, I would assume that. Right indeed, and currently very poorly. Well, it'll keep getting better, but yeah, you're right. Mapping out the instructions, setting up the screen, etc. Here's the code that it gave, and I will paste it into a window and we'll see if it works.

So you see there's instructions, and let's try to play the game. Oh, the squirrel came very quickly, but oops, this time I was hit by a strawberry. Let's try again. Oh, there he is. You can see that the strawberries are appearing. Let's see if I can win by finding the squirrel. Oh, looks like I won. Keep in mind, right now at this very moment, this game has more players than Concord. That's crazy, isn't that crazy? Wow, look at that Korean Cipher, Building open ai1. What is this here, coding? Let's see if there's anything else interesting about this. This is the one that we just saw, writing puzzles, math that open ai. I'll look at maybe this and then I'll look at maybe this.

One of my favorite puzzles that I would do when I was a little kid is called a nonogram. You're given an empty grid and some numerical clues that tell you which squares in the grid you have to fill in. I thought we could have the model play a little game where it first generates a puzzle to solve, then we ask another instance of the model to try to solve the problem that it generated. So yeah, I'll ask it to generate, say, a 5x5 nonogram where the final answer is the letter M. Okay, we'll see what it comes up with. Alright, we see it just gave us a little puzzle. We'll go ahead and copy this, open up another window of 01, and ask it to solve the following puzzle.

Bro, so he's making the puzzles and teaching the AI to solve the puzzles. Isn't this literally how the singularity happens? Yeah, theoretically, right? Oh God. Let's say visualize the answer in some pretty way, why not? This puzzle doesn't look like it's too hard. The way that a nonogram works is for each row and each column, you're given a list of numbers. The numbers tell you how many squares are filled in. If the squares are consecutive, you'll see a two for two consecutive squares. If there's a space between them, you'll see a one comma one. You're supposed to figure out which squares you have to fill in. It looks like the model got this right and illustrated a nice little letter M.

I think one of the things that is nice about examples like this is it's similar to sudoku or a crossword. You have to make a guess, see if that's a right guess or a wrong guess, and then backtrack if you get it wrong. Any type of task where you have to search through a space with different pieces pointing in different directions but with mutual dependencies, a model like Owen is really good at refining the search space. I feel like this is probably the easiest thing for an AI to do because it's completely mathematical and logical. The real challenge for AIs is merging human intuition with logical and methodical thinking. Not really, AI can't do everything, I would assume. I'm not sure. No, you guys don't think so? Abstract ideas, yeah, maybe I don't understand. I've never seen a puzzle like this before, so I could be misunderstanding the way the puzzle works.

What's this one here? Building open AI. We're starting a series of new models with the new name 01. This is to highlight the fact that you might feel different when you use 01 compared to previous models such as GPT-4. As others will explain later, 01 is a Reon model, so it will think more before answering your question. We are releasing two models: 01 Preview, which is to preview what's coming for 01, and 01.

=> 00:10:02

AI can help, but human intuition and reasoning are irreplaceable.

The human intuition combined with logical and methodical thinking is something that AI can't fully replicate. While AI has advanced significantly, it still can't do everything. For instance, abstract ideas might be challenging for AI to grasp. I've never seen a puzzle like this before, so I could be misunderstanding how it works.

Building open AI involves starting a series of new models with the new name o1. This is to highlight the fact that you might feel different when using o1 compared to previous models such as GPT-40. As others will explain later, o1 is a reasoning model, so it will think more before answering your question. We are releasing two models: o1 preview, which is to preview what's coming for o1, and o1 mini, which is a smaller and faster model trained with a similar framework as o1. We hope you like our new naming scheme.

What is reasoning anyway? One way of thinking about reasoning is that there are times when we ask questions and need immediate answers because they are simple. For example, if you ask, "What's the capital of Italy?" you know the answer is Rome. However, kids today can just take a picture of a math question, and it gives them the answer. This might lead to a decline in learning basic skills like math. Math and logic are interchangeable in many ways, and without a foundation in math, life can become very challenging.

Reasoning is the ability to turn thinking time into better outcomes, whatever the task. There are moments in research where something surprising happens, and things click together—these are the "aha" moments. For instance, during our training process, we put more compute into our models, and they started generating coherent chains of thought. This was a meaningful difference from before.

Training a model for reasoning can involve having humans write out their thought processes and training the model on that. An aha moment for me was when we saw that training the model using reinforcement learning to generate its own chains of thought could be even more effective. This realization showed us that we could scale and explore model reasoning significantly.

For a long time, we've been trying to make models better at solving math problems. Despite various methods, the models often failed to question their mistakes. However, with one of the early o1 models, we saw it start to question itself and reflect, which was a powerful moment. This indicated that we had uncovered something new and meaningful.

Thank you, and congrats on releasing this. I'm curious to see what will happen next—this is nuts. Interestingly, Microsoft recently laid off 650 people from their gaming division, which seems ironic. Some groups of people in tech are mainly guys, while others are mainly women, depending on the field. Trying to count how many women versus men in a group is unnecessary and doesn't reflect the complexity of the situation.