Microsoft & NVIDIA Are Eating The World

Welcome to edition #34 of the “No Longer a Nincompoop with Nofil” newsletter.

Here’s the tea ☕️

  • Microsoft has their hands in every cookie jar 🍪 

  • NVIDIA powers ahead 🏃 

  • Power is the next frontier 🪫 

  • Don’t anticipate Apple moving 🍎 

  • The biggest security vulnerability in LLMs 🤖 

It is a strange time in AI. Weird things are happening. I’m not even exaggerating, sometimes I read the news and get confused as to what I’m reading. Let’s get started.

ps. This is the last free newsletter like this! Moving forward, you can read these in my premium newsletter. 

After much deliberation and consideration, the best way to keep this newsletter alive is to monetise it. Keeping up to date with AI news is becoming a full-time job which is incredibly exciting. However, writing these is time consuming. Although I do love writing these, it’s just not feasible to write these for free any longer.

Thank you for understanding and supporting this newsletter up to this point. I could not have gotten here without you guys ❤️

Inflection

Inflection is a startup, or rather, was a startup? Just 9 months ago they raised a staggering $1.3 Billion led by Microsoft, Bill Gates and most importantly, Nvidia. I’ll come back to Nvidia in a bit. So, this is a company founded in 2022 and in June 2023 raised over a billion dollars at a valuation of $4 Billion.

Before we look at what they produced, lets look at what they did with all that money shall we. What’s the most important resource in the world today? Compute. Lots and lots of compute. So Inflection did what anyone with a billion dollars would do and bought a lot of GPUs. Like, a lot. How many? They worked with Nvidia to build one of the largest compute clusters in the world comprising of 22,000 H100 GPUs.

The H100 is one of Nvidia’s best GPUs. One H100 could cost up to $40k but for this scenario I’ll cost it at ~$30,000; you’ll see why later. So, 22,000 H100s at $30k a piece = $660,000,000. Six Hundred Sixty Million. I had to write that out just because.

Okay so Inflection built crazy hardware to match their insanely powerful software, right? Right?

Actually, they very recently announced their new Inflection-2.5 model which, according to their blog post, competes with GPT4. I’m not sure I’m willing to trust their benchmarks considering I don’t know a single person who even uses Pi, and all I’ve heard about it is that although its got great EQ, it’s quite useless for actually doing things. Also, once again, the GPT4 being used in this comparison is from May last year, a very different model to the one of today.

What I find most staggering is why I couldn’t find Inflection on the LLM leader board. This leader board is the best way to identify how useful a model actually is because it relies on human preference data; real world use cases. I’ve now been reminded of another insanely strange thing about Inflection - they don’t have an API. All that money, compute and resources and they have not even released an API for their LLM. How are people supposed to build with it?

Here’s where it gets strange.

Someone asked a very specific question to Pi and it produced a literally identical output to a response from Claude 3 Sonnet. This should not happen. The person has even shared a video of the chat and a link to the conversation. Here’s the thing - you can create threads in Inflection and Pi will remember things across different threads.

So, if the person copied the exact response from Claude in to Pi, then Pi would remember that and is inclined to use that response in its own responses across any chat, even a new one. How do I know this? Inflection themselves came out and said we checked this guys account and chats and found that he copied the response from Claude. They checked his chat history. I don’t expect AI labs to not check peoples chats, but I at least expect them to not disclose that information publicly.

Now you might be wondering, why am I talking about this random startup called Inflection.

Only a few companies will remain

As of a week ago, 2 of the 3 co-founders of Inflection have left to go work in Microsoft’s new AI division. Actually, Microsoft has basically bought Inflection by hiring most of their staff; they “acquihired” them. You can basically count Inflection out of the race, not that it was ever in it anyway. It is staggering to see someone start a company, raise over $1.5 Billion and build one of the largest compute clusters in the world, yet somehow have absolutely nothing to show for it besides a book tour. Seriously, the former CEO of Inflection spent so much time promoting his book and talking about how dangerous AI is and how open source is going to destroy the world. For a year and half he went around advocating that AI is too dangerous, so let the responsible people [me] take care of it. He’ll fit right in at Microsoft.

The deal itself is even more bizarre. Microsoft is paying $650M to Inflection in the form of a licensing deal that they can use to pay back investors. Remember the $650M from before? Yeah, Microsoft is basically paying for those 22k H100 GPUs. By the way, Inflection is paying back investors with this money. Who were some of their investors again? Microsoft and Nvidia.

So, if I have this right, Microsoft and Nvidia gave money to Inflection. Inflection used this money to pay Nvidia for GPUs. Now Microsoft is paying Inflection for the GPUs and Inflection is using that money to pay off the initial payment from Microsoft and Nvidia. You can read more about it here [Link] [Link] [Link] [Link]

Compute is the problem

Every company in the world building AI models is clamouring for compute. The crazy part? There’s only one company that can give it to them. One company that stands at the top of this AI renaissance and very recently, increased their lead in the world of chip architecture.

I’m not going to cover in detail what NVIDIA unveiled at their recent GTC (GPU Technology Conference), but here’s a few tidbits to show you what I mean when I say dominance. This is unfettered, unrivalled, never before seen type of dominance. This is the reason their share price has gone from $273 to $900 in a single year. Why they’re market cap is bigger than Canada’s entire economy, why they passed Saudi Aramco as the third biggest company in the world, and how they added $277 Billion in value in a single day, the greatest single-day gain in market history.

NVIDIA’s new Blackwell B200 chip has more than 1 Exaflop of compute in a single rack. 

For context, I wrote in January that Tesla’s plan was to go from 1.8 Exaflops in 2021 to 100 Exaflops by the end of this year. In 2021 they used 1.8 Exaflops to train their deep neural nets for their self driving system. So for 2021, 100 Exaflops was an insane milestone to go for.

Now its just 100 racks in a data centre.

We have ~1000X’ed compute in the last 8 years. There wasn’t even that much attention in increasing compute capacity in these years (relatively). What do you think is going to happen in the next 8 years? We are going to find the limits of human engineering. The amount of money being poured into these things is insane. We’re rewriting the law curves.

But what about current AI models? Will it become easier to build them? With NVIDIA’s new chips, you can train a new GPT4 model, basically one of the best models in the world right now, in a mere 90 days with 2000 Blackwells. Jensen also confirmed GPT4 is a 1.8T parameter size model.

Data centres have dozens and dozens of racks.

With the B200s, we are talking about hundreds of Exaflops of compute in a single building. This is a monumental moment in history. The largest supercomputers in the world are going to look trivial in the face of a few racks of the Blackwell chip.

Read more:

  • A thread breaking down (with images) just how much compute is in new NVIDIA chips [Link]

  • A technical breakdown of the architecture of the B200 [Link]

  • NVIDIA has already spoken to all the big players Google, Meta, Microsoft, OpenAI about building custom chips just for them [Link]

  • NVIDIA is also doing a tonne of work in robotics and have formed a new lab called GEAR (Generalist Embodied Agent Research). They’re building what they call a Foundation Agent, an AI that can learn to act in any type of environment [Link]. Also, this video of a robot dog at GTC really creeps me out for some reason [Link]

  • Jensen believes sovereign AI is the most important thing moving forward [Link]. I’m inclined to agree with him.

  • Cerebras is a company that builds AI processors. Their wafer-scale engine (WSE-3), according to them is the fastest processor on Earth, with 52x the amount of power than a H100 [Link]. We’ll have to see if its legit or not.

The funniest thing about NVIDIA’s dominance is seeing how everyone is reacting to it. Google, Qualcomm and Intel (+ possibly others) are planning to work together to build a different software stack to reduce the reliance on NVIDIA’s Cuda software for AI chips.

Microsoft and OpenAI are planning a $100+ Billion data centre project. If I was to bet on any company besides NVIDIA, its Microsoft. What Satya has done since ChatGPT came out is unreal. But just look at what he said about OpenAI.

If OpenAl disappeared tomorrow, we have all the IP rights and all the capability. We have the people, we have the compute, we have the data, we have everything. We are below them, above them, around them.

Satya Nadella

Microsoft is a behemoth moving like a startup.

Most people are completely unaware as to what is going on, both in terms of LLM advancements and also compute advancements. But there’s something else entirely we’re forgetting about.

The next big bottleneck is power.

During Dell’s earnings calls, they accidentally revealed that NVIDIA’s new chips have a 40% increase in power necessary than the H100. It is clear and apparent to those that know, that the only solution to satisfying the power necessary to run the insane amount of compute is nuclear reactors. This is why Microsoft has been working on building nuclear reactors. There is probably going to be more nuclear reactors built in the next few decade than ever before.

Microsoft is even building their own LLM specifically for the regulatory process surrounding nuclear plants. Why? The only SMR (Small Modular Reactor) developer (Nuscale) who have had their designs approved by the Nuclear Regulatory Commission, paid over $500 Million, had a 12,000 page application and around 2 Million pages of support material.

This is the stuff that drives governments. The geopolitical landscape of the world is being shifted and manoeuvred solely based on the advancements happening in the AI space. The world as we know it is going to be monumentally different in 50 years time, and the groundwork for that change is being laid out right now, in front of our very eyes.

There is a very real possibility that what is happening now, who ever gets ahead now, they will be the only ones to remain when the dust settles. This is a race for superhuman intelligence, something that, in its very nature, is genuinely beyond our comprehension. Only if we can actually get there of course.

Don’t forget though, this is a bubble. You and I are in a bubble. To most people, it’s business as usual. If you’re reading this, you’re ahead of most of the world.

This is the frontier.

What is Apple even doing?

Apple’s WWDC event in June already has rumours of major AI overhauls and the long awaited replacement of Siri. But something is not right. It’s not strange that Apple hasn’t announced anything, they usually never move first.

But apparently they’re in talks with the following to power AI in iPhones.

  • OpenAI

  • Anthropic

  • Google

  • Alibaba & Baidu (?)

The real question is, why not Apple? Why isn’t Apple powering AI in iPhones? Chief correspondent for Bloomberg is also saying not to expect much from Apple in June. Looks like we’re going to have to wait for Siri upgrades.

Sleeper Agents

Anthropic has been doing a lot of LLM safety research alongside cooking up Claude 3. Their recent paper on sleeper agents is very interesting.

Simply put, they found that once a model has been trained to be deceptive, the behaviour persists even after significant safety training; and that larger models are even better at hiding their deceptive behaviour. So initially, I thought this is silly. They’re training a model to be deceptive, and then getting scared that it was exhibiting deceptive behaviours. Sounds pretty stupid to me.

But actually, no. This is a pretty big deal. The point isn’t that models can do bad things. It’s that if it happens by accident or on purpose, we don’t how to stop it from doing said bad thing.

Here’s what Anthropic did. They trained a model to write secure code when the year was 2023. When the year changes to 2024, the model starts to write exploitable code.

Now, the main problem here is this:

A malicious entity can slip in deceptive behaviours in a model. At the moment it’s not that easy to detect this either.

Models are trained on massive amount of data, scraping websites like Wikipedia. Someone could upload some sort of malicious text on Wikipedia (doesn’t even have to be text; could be some invisible characters), and in future training runs when it’s picked up by the model, this malicious text becomes a poison for the LLM. In this case, the only person that would even know about the vulnerability is the bad actor themselves. In Anthropic’s paper, they found this “poison” persisted after Supervised FineTuning (SFT), RLHF (Reinforced Learning from Human Feedback) and even adversarial training.

The implications of this are much bigger than we might think.

For example, a model coming out of the US is sold to another country. Who’s to say the model doesn’t have a secret backdoor only the owners can access? We’ve already seen people create prompts that automatically give certain outputs. We’ve seen this work in ChatGPT already. Now obviously these are harmless examples that have been uploaded on the internet. What about the things people will be doing in the future? This is a major security challenge for LLMs. Karpathy himself thinks so.

There was another incredible paper on model stealing that I can’t find right now. Can’t wait to write about that one for you guys next week.

Amazon playing games

Amazon has invested another $2.75 Billion in Anthropic at a $18.4B valuation. This sounds really good for Anthropic right? Thats a massive investment from a big company, and they also have a commercial deal to use AWS services.

But get this. Amazon’s internal AGI team is attempting to beat Anthropic’s Claude with their own model codenamed Olympus by mid this year.

It is a fascinating landscape right now and Anthropic is in a precarious position. Why? Anthropic is the only company, literally the only AI lab to have no connection to Microsoft.

Anthropic x Google

OpenAI x Microsoft

Vision Models are having a moment

Vision models might be having the year text models had last year. Reasoning capabilities of some of these new models are very good. Its practically child’s play to create bot powered by a VLLM that can solve CAPTCHA with the new Qwen-VL-Plus. It outperforms even GPT-4 Vision on the benchmarks.

Others you should check out are Llava-1.6 and a new tiny one called Moondream1. Llava and Qwen are open source. I’m already testing them in products I’m building and the results are very promising (Specifically using it for deriving meaning from physics questions).

As always, Thanks for Reading ❤️

Written by a human named Nofil

Join the conversation

or to participate.