Meta's new LLaMA 3.1 model is a game-changer in AI, boasting 405 billion parameters and being mostly open-source.
When Mark Zuckerberg isn't wake surfing, wearing a tuxedo and a puka shell necklace at his Lake Tahoe mansion, crushing K's yellow bellies, and waving the American flag, he clocks into work with a sunburn to battle Google and OpenAI for artificial intelligence supremacy. Yesterday, Meta released its biggest and baddest large language model ever, which also happens to be free and arguably open source. It took months to train on 16,000 Nvidia H100 GPUs, which likely cost hundreds of millions of dollars and used enough electricity to power a small country. The end result is a massive 405 billion parameter model with a 128,000 token context length, which, according to benchmarks, is mostly superior to OpenAI's GPT-4 and even beats Claude 3.5 Sonet on some key benchmarks. However, benchmarks can be misleading, and the only way to find out if a new model is any good is to vibe with it. In today's video, we'll try out Llama 3.1 Heavy and find out if it actually doesn't suck like most Meta products.
It is July 24th, 2024, and you're watching The Code Report. AI hype has died down a lot recently, and it's been almost a week since I've mentioned it in a video, which I'm extremely proud of. But Llama 3.1 is a model that cannot be ignored. It comes in three sizes: 8B, 70B, and 405B, where B refers to billions of parameters or the variables that the model can use to make predictions. In general, more parameters can capture more complex patterns, but more parameters don't always mean that the model is better. GPT-4 has been rumored to have over 1 trillion parameters, but we don't really know the true numbers from companies like OpenAI and Anthropic.
Llama's open model weights are a game-changer for developers, making AI more accessible without breaking the bank.
The cool thing about Llama is that it's open source—well, kind of. You can make money off of it as long as your app doesn't have 700 million monthly active users, in which case you need to request a license from Meta. What's not open source is the training data, which might include your blog, your GitHub repos, all your Facebook posts from 2006, and maybe even your WhatsApp messages. What's interesting is that we can take a look at the actual code used to train this model, which is only 300 lines of Python and PyTorch, along with a library called FairScale to distribute training across multiple GPUs. It's a relatively simple decoder-only transformer as opposed to the mixture of experts approach used in a lot of other big models like its biggest open-source rival, Mixtral. Most importantly, the model weights are open, which is a significant advantage for developers building AI-powered apps. This openness means that developers no longer have to pay substantial amounts to use the GPT-4 API; instead, they can self-host their own model and pay a cloud provider for GPU rentals. However, self-hosting a big model is not cheap. For instance, using Olama to download and use it locally, the weights weigh 230 GB, and even with an RTX 490, it was challenging to manage.
The good news is that you can try it for free on platforms like Meta, Gro, or Nvidia's Playground. Initial feedback from users online suggests that Big Llama is somewhat disappointing, while the smaller Llamas are quite impressive. The real power of Llama lies in its ability to be fine-tuned with custom data, promising some amazing uncensored fine-tuned models like Dolphin in the near future. A favorite test for new LLMs is to ask them to build a Svelte 5 web application with Runes, a new yet-to-be-released feature. So far, the only model to do this correctly in a single shot is CL 3.5. Unfortunately, Llama 405b failed miserably and seemed unaware of this feature. Overall, it is decent at encoding but still clearly behind Claude. In terms of creative writing and poetry, it performs well but is not the best.
Reflecting on the current state of AI, it is astonishing that multiple companies have trained massive models with massive computers, yet they all seem to plateau at the same level of capability. OpenAI made a significant leap from GPT-3 to GPT-4, but since then, there have only been small incremental gains. Last year, Sam Altman practically begged the government to regulate AI to protect humanity, but a year later, we still haven't seen the apocalyptic Skynet human extinction event that was predicted. AI hasn't even replaced programmers yet. It's reminiscent of the time when airplanes transitioned from propellers to jet engines, but the advancement to light-speed engines never happened. Artificial superintelligence is still nowhere in sight, except in the imagination of the Silicon Valley elite.
Interestingly, Meta appears to be the only big tech company keeping it real in the AI space. While there might be an ulterior motive, Llama represents one small step for man, one giant leap for Zuckerberg's redemption arc. This has been The Code Report. Thanks for watching, and I will see you in the next one.