Meta's Big Leap Forward

Nofil Khan
April 21, 2024

Welcome to edition #35 of the “No Longer a Nincompoop with Nofil” newsletter.

Here’s the tea ☕

Meta releases their new AI models 🤖

Open thy eyes

You can’t see it, but its happening. The world is changing. Technology is advancing faster than we can adapt, faster than we can even comprehend.

The cost of intelligence is reaching near zero. If you had access to an incredibly intelligent entity, what would you do with it? What would you ask? What will you build?

Source

Look at this video. This is an AI model almost as good as GPT4, producing output at 800 tokens a second (for simplicities sake, lets say a token is a word). The implications of this can’t be conveyed even if I spent this entire newsletter talking about them.

The speed at which you can produce information, iterate on it, manipulate it, is truly extraordinary. Mind you, this was made available only 12 hours after the model was released. This is an unprecedented time of technological advancement.

What was released

Llama 3 8B

Meta has released and open sourced two versions of their new Llama 3 model. There’s an 8B and 70B model. Both of these models are the best in their category.

I know what you’re thinking, both of these are rather small. Are they going to be useful for you? Okay so, the 8B model can answer questions that only GPT4 and Claude Opus can answer. That’s how good it is. Can it code snake too? Yes it can.

Source

A lot more examples here.

How is it so good? Meta trained this (comparatively) nugget sized model on 15 Trillion tokens. 15 TRILLION. Do you know how much text that is…

Like I don’t even understand how they got that many tokens. This is especially wild considering they claim they did not use any user data from FB, Instagram etc. No idea how they did this, but that’s not the crazy part.

So there’s something called Chinchilla scaling laws. They determine the optimal amount of training a model should undertake depending on the size of the model (oversimplified understanding). An 8B model’s optimal amount would be to train for only about ~200B tokens, which is obviously significantly less than 15 trillion.

But why does this matter? They’ve mentioned that even after training this tiny model on 15 trillion tokens, they noticed that it was still learning and improving! We can keep improving even the smallest models by simply training on more data. The implications for the larger models is staggering.

The fact that a model as small as 8B can take in 15 Trillion tokens and is still learning, means that the models we’re currently using, the GPT4s and the Claudes, are undertrained by up to 1000X.

The models being trained right now by the biggest AI labs are 100X bigger, and will be trained on 1000X the training data. We’re still so, so early.

So why did they stop training it? Why not just keep going? Considering they already went way past the optimal training point, they should’ve just kept going.

Well, they only stopped training because they needed the GPUs for Llama 4...

It took the 8B model 1.5M H100 hours to train. On a 16k H100 cluster, that’s about 4 days of training [Link]. Mind you, Meta already has over 400k H100s and is aiming for 600k by the end of this year. They’re GPU rich.

It’s also been a year since the first Llama model release. The smallest Llama 3 model is better on the benchmarks than the biggest Llama 1 model.

Again, it has been a single year.

If you’d like to learn more about Chinchilla scaling laws and how we determine how many tokens to use when training LLMs [Link]
Llama 3 8B in context learning is really good [Link]
Llama 3 8B in 4bit, 8bit and 16bit [Link]
Llama 3 8B already running on an iPhone [Link]
Someone has it running on a Raspberry Pi?? [Link]

Llama 3 70B

Llama 3 70B is now third on the LLM Leaderboard. This is an open source model right behind GPT4. A 70 Billion model being compared to a 1.8 Trillion model… What this means is that we have a bloody long way to go with model capabilities. Also, Meta clearly has some of the best, if not the best, training data in the world. This begs the question -

What does a properly trained trillion parameter model even look like?

Source

Also, notice how Claude Opus has dropped significantly. This is the annoying thing that happens with closed sourced models. They are released and we’re amazed at how good they are. Then they suddenly aren’t as good anymore and we have no idea why. Although Anthropic has mentioned that they haven’t made any big changes, its capabilities have definitely decreased.

This 70B model is good enough to use in production apps and if you can run it locally, won’t cost you a cent. If you’re planning on using it, use it with Groq. Using it with Groq will get you 800t/s and get a million output tokens for 80¢.

What can you build with this type of intelligence and speed? This is a financial analyst, calling 4 different tools to provide real time info in <10 seconds.

Source

On a H200, 70B runs at 3000 t/s [Link]
4-bit Llama 70B [Link]
RAG app using Llama 3 [Link]

Just the beginning

Meta has 400B+ models cooking behind the scenes. We don’t know how many models, but we do know that one of their models is already as good as GPT4 and Opus on benchmarks, and it is still in training…

If Meta decide to open source this model, it’ll be the first open source model to beat GPT4 on benchmarks.

Will this happen?

I think there’s a good chance they’ll release it.

Will GPT4 be the best model out there?

No, I think OpenAI might release a new model in the next few months. Actually, there’s a lot of speculation that they might release something on the 22nd, Altman’s birthday.

Why is Meta doing this?

Destroying the technological advantage

Here’s the thing - Meta already has the most important thing in the world. Distribution. Do you know how many daily users Meta has across all their apps?

3,000,000,000. Three Billion. Imagine your apps have a combined access to over a third of the entire world. Every. Single. Day.

For Meta, open sourcing LLMs destroys any technological advantage their competitors can possibly have. Imagine being OpenAI and your competitor is open sourcing work that’s costing them billions of dollars.

Let me put it another way.

Meta makes $134B a year.

OpenAI makes $1.6B a year.

Right now, OAI is nothing but a small tumour that could turn into a cancer in the future. By open sourcing state of the art models now, they can neutralise the tumour.

If everyone has it, no one does.

This is why OpenAI needs Microsoft and well, now they’ve been engulfed by them. OpenAI doesn’t exist as a sole entity anymore.

Comes back to one thing

I’ve been covering everything in AI since ChatGPT came out and I used to think Microsoft and OpenAI were the biggest threats to Google. I think after this Meta release, the biggest threat to Google, besides Google themselves of course, might actually be Meta.

Meta is pushing AI assistants very hard on Whatsapp, Facebook, Instagram and Messenger. All with real-time access to the internet. Have a question? Just ask the Meta Assistant, no need to go to Google.

Plus, Meta’s assistants can also know basically everything about you (I mean, Meta probably knows everything about you already). If they become the default assistant + search tool on phones its game over.

Once again, in the end, it comes back to one thing - advertising.

It was and will continue to be an advertising game.

Side note: This is why Google pays Apple $20 Billion a year to be the default search engine on iPhones.

Own the industry

Since the release of AI models doesn’t really affect Meta, they have another compelling reason to open source their own - standardisation.

If Meta open sources their models and the standards they use become the industry standards, they stand to gain tremendously.

They did the same thing with React. They did the same thing with PyTorch. They did the same thing with the Open Compute Project. Listen to Zuck himself talk about why they open source [Link]

The real reason

Zuck went on Dwarkesh’s podcast and discussed a lot of things. Here’s a breakdown of what he said.

Open sourcing Llama

No, Meta isn’t open sourcing Llama out of the goodness of Zuck’s heart.

Zuck spoke about open sourcing Llama 3 and basically said that they open source the models because the models themselves aren’t the product.

Energy

As I’ve mentioned a number of times before, Zuck also believes that energy is the next big bottleneck and that we won’t be able to accommodate for the necessary energy requirements.

He points out that a meaningful nuclear power plant, going towards only training a model (not inference), needs to be at least 1 Gigawatt.

Such a plant does not exist right now.

It will take time for us to build the necessary infrastructure to scale models further. We’ll be restricted by regulatory, not technological pace.

Will they be built?

Yes. We’re entering a new era of nuclear energy production.

Side note: Would suck to be Germany right now.

Amazon recently purchased a nuclear power plant that does up to 960MW for $650 Million [Link].

We’re headed toward a future where the energy consumption for AI is going to be more than most countries [Link].

Chips

Meta has also made extensive progress on building out their own chips to run models. Zuck mentions that they use their own chips to run inference and only using NVIDIA for training purposes.

Eventually, they want to get rid of NVIDIA entirely, though I don’t think this is happening anytime soon.

This is one of the reasons why even Musk’s x.AI are looking to maximise the compute efficiency per watt. They understand the energy constraints that are looming. Musk has been saying this for years, long before ChatGPT even came out and I thought he was looney; I was definitely wrong there.

Other

An interesting thing to note is that Zuck doesn’t believe we can get to AGI soon or that we can have models that are 100x GPT4. This goes against what a lot of other leaders in industry are saying - Elon Musk, Sam Altman and Dario Amodei (Founder of Anthropic) are all sounding the alarm bells, claiming we are very close to AGI.

Is it in their interest to make it seem like we’re close to AGI? Well, all 3 are trying to raise insane amounts of money to get there. There might be some connection there, who knows really. Zuck is the only one not raising money and also not claiming we’re on the brink of AGI. The whole situation is a game of BS, who’s bluffing? It’s anyone’s guess at this point.

Regardless, if GPT5 isn’t far and beyond better than GPT4, then it would be clear that we have hit some sort of limit.

Don’t fret though. Even if LLMs plateau at some point, we are still so very early in our exploration of robotics and the combination of robots and AI. We have a long, long way to go. Where will it lead us? Hopefully to a point where I can buy a damn house.

Here’s the link to the interview on Youtube [Link], and here is the official blog post by Meta [Link].

Zuck also mentions that the next iterations of Llama 3 and beyond will focus on multimodality. People will probably build multimodal Llama 3 before an official release using things like the Cauldron, a dataset for images and q&a pairs [Link]
Meta is also working on consumer neural interfaces that read your neuron activity to control devices [Link]. We are not ready for the future of HumanXAI
If Meta trains their 400B model on 15 trillion tokens like they did for the 8B model, it will actually exceed the EU limit for general purpose AI and then be categorised as carrying systemic risk [Link]. It would be just below the US limit.
Llama 3 also confirms that fine-tuning definitely adds knowledge to a model, although I don’t know why that would be argued otherwise. Also, Meta used a dataset containing 10 Million human-annotated examples and none of this was Meta user data. Where on Earth did they get this data from? [Link]
JSON structured output from invoice document with Llama3 8B [Link]
With Meta open sourcing such good models, it begs the question - what happens to other AI labs that are also in the open source space like Mistral? Meta’s new models beat all of their models, even the closed sourced ones. Will Meta doing this hinder new AI labs being formed? Will it make it harder for labs, especially new ones, to raise money? Considering Meta has basically infinite money to spend on training LLMs compared to most companies, it definitely makes competition harder. Almost seems like Meta doing this, although good for us, means they’ll end up being the only ones doing it at all. Just a thought.

I was planning to cover everything else that happened last week but this newsletter is now past 2k words. Sign up to premium ($5/month) to get every single newsletter, every week, covering the AI landscape in detail.

Your support keeps this newsletter going ❤️.

How was this edition?

As always, Thanks for Reading ❤️

Written by a human named Nofil

Reply

or to participate.