The Future of Using Computers is Simply Talking to Them

Here’s the tea 🍵

  • AI will eat software 🍽️

  • How to best find AI apps 🧐

  • AI will control your computer 💻

I’m not sure if you’ve been paying attention, but AI models have changed. Models like GPT-4 and Claude are old school.

Yes - Claude has gotten old… It’s been my go to for a year now but there are simply better models now. I still use it regularly, but, for challenging problems, I use R1 or o3.

Reasoning models like R1, o1 and o3 are just better. They’re ability to think through problems deeply completely changes what AI systems can do what problems they can solve.

For example, I’m using reasoning models to build an estimation engine for the construction industry. The AI can read drawings and diagrams and understand all the work that needs to be done.

This was not possible even a year ago.

An entirely new use case in one of the largest industries has opened up. This is happening on almost a weekly basis, across every industry.

Let me show you how simple it really is.

NVIDIA’s simple discovery

Please read this image. Simple right?

It’s just a loop. It uses R1 to get an output and a verifier checks if the output is good enough. If not, it refines the prompt and tries again.

An incredibly simple process.

NVIDIA engineers used this process to test how well R1 could write GPU Kernel code.

This is low level code that most engineers will never write. You have to be a solid programmer to write this code.

The results?

“The results turned out to be better than the optimised kernels developed by skilled engineers in some cases”.

AI is unbelievably good at coding, and, this is only going to get better and better.

Any kind of work that is done solely on a computer and is verifiable, will be automated by AI.

This is something people are not grasping. How many jobs will AI automate away?

In my opinion, long term, the number is easily in the tens of millions.

Funnily enough, most people are using AI specifically for coding, which is likely to be the first thing automated away.

According to a new report from Anthropic, over a third of Claude’s usage is for coding. This doesn’t even include their API usage, which is used by massive coding apps like Cursor, Lovable, Replit etc. I wouldn’t be surprised if at least half their usage is from coding.

One of the most interesting findings from Anthropic’s report was their observations on augmentation vs automation.

This aligns with much of the work I’ve done as well. AI isn’t necessarily automating entire workflows away (although this is very possible), rather, it is used in tandem with other tools and processes to make life easier.

This is what I tell businesses and how everybody should be using it. If you're not consulting AI for the work you're doing, you're either doing labor work or you're not using AI effectively.

Personally, I can’t stop thinking about the implications of AI on software simply because it is impossible to know what the world will be like in the coming decades.

Are we really going to build the cyberpunk worlds we’ve imagined?

What does this mean for businesses?

For one, it means you don’t need such large teams.

All of these businesses scaled to $10M+ with tiny teams.

There are many more examples of small teams doing crazy numbers. A perfect example is Cal AI, which is doing $1M/month with a team of just 12.

The appetite for AI apps is simply ridiculous. Even now, over two years after the release of ChatGPT, we are only scratching the surface of how AI will reshape consumer software.

But, what I really want to explore is… How will we interact with software?

The future of Software

There are so many AI models, so many different use cases, so many different applications.

Talking about AI models now sounds like talking about where to watch a tv show.

In no particular order, we have:

  • Gemini Pro, Gemini Flash, Gemini Flash Thinking, Gamma

  • GPT-4o, 4o-mini, o1, o1-pro, o1-mini, o1-preview, o3-mini, o3-mini-high

  • Claude 3.5 Sonnet, Claude 3 Opus, Claude 3.5 Haiku

  • Llama 7b, 14b, 30b, 70b, 405b + all the distillations and finetunes

  • DeepSeek R1 & V3

  • Qwen Plus, Qwen Max, Qwen Turbo, Qwen QvQ, Qwen2.5 Coder + more

  • + many, many more

These are all real models.

There are way, way more and the names don’t get any better. Point is, how is someone supposed to stay on top of it all?

Realistically, you don’t. So then, how does AI change the way we will use software? How does AI effect the person that isn’t going to build an app themselves or connect to an API?

Well, Hugging Face are giving us a glimpse.

Need anything done with AI?

Just search your use case and find a space running the code for it.

With over 400k spaces, this is the absolute best place to find AI apps.

Right now this is a manual process - searching and finding the exact app for your use case.

Why don’t we automate it?

OpenAI’s Operator

OpenAI have released their first iteration of an AI that can automatically browse the internet. It uses simple screenshots to control the mouse and keyboard and take actions on a computer.

Here it is using Replit’s AI agent. An agent running an agent to build an app. The future of software.

Eventually we’ll have AI models that can use computers flawlessly. They will be able to use Excel, they’ll be able to manage your calendar, they’ll be able to build apps for you - all you’ll have to do is ask.

The future of using computers is simply talking to them.

OpenAI isn’t the only company working on this.

Qwen’s latest AI model, Qwen2.5-VL 72B, is a very, very good vision model.

It can be used for classifying objects, it can return structured data, but, most importantly, it is natively agentic.

What does that mean?

It can be used to operate your computer.

Finding flights for you

Microsoft also just released OmniParser V2, an open source tool that lets AI models see your screen and control your computer.

It absolutely destroys Claude Computer Use and is the best open source option to build an AI powered computer use agent.

The reason I’m mentioning this is because OpenAI’s Operator is under the Pro version which is $200/month. Most people aren’t paying that kind of money for tech that won’t work half the time.

I think the underrated competitor here is actually Google. Considering Gemini Flash is a near perfect OCR machine, I imagine it will be a phenomenal computer use model.

Oh, and yes, if you have any kind of data extraction workflow or pipeline, feel free to insert Gemini Flash into it; works very well.

You know what’s even better than Gemini Flash?

Believe it or not, DeepSeek R1.

Uploading docs into R1 and having it read them has been genuinely amazing to use. The model can read extremely complex construction documents perfectly. Not a single model, not even OpenAI’s o3 has worked as well.

There are however, a few problems.

  1. Privacy. Like any AI model, we can never really know what they do with the data sent to them.

  2. We have no idea how DeepSeek has implemented their document processing, meaning there’s no way for us to try and replicate it. I imagine it’s some kind of vision model, but, we have no details on this.

  3. They don’t serve their document processing via API, meaning you can only use it on their website.

If you aren’t restricted by privacy concerns, I highly recommend trying it out. I’ve given it documents with tiny writing and extremely complex drawings and images, and it’s worked flawlessly. Superb model.

Yes, this is a relatively short newsletter for me. With the last few being 2,500+ words, I thought I’d shorten it this week.

A lot is happening next week so keep a look out for the next one.

Grok 3 is coming [Link]. Considering it’s been trained on 100k+ GPUs, it’s going to be very interesting to see if it’s any good. Anything below o1 means it’s a failure imo.

Most likely, the next iteration of Claude is also coming very soon, possibly next week. Over the next few weeks I’ll be sharing a lot more tools and repos as well.

See you next week 🙂.

Please consider supporting this newsletter or going premium. It helps me write more :).

How was this edition?

Login or Subscribe to participate in polls.

As always, Thanks for Reading ❤️

Written by a human named Nofil

Reply

or to participate.