Aaron Levie, the founder of Box, a public company specializing in cloud file storage, tweeted his thoughts on how AI agents will operate in our world. There are a ton of different perspectives on how AI Agents will impact our work. I mean, come on, even IBM is making videos about them!
It's been almost two years since ChatGPT induced a Cambrian explosion in AI, specifically Large Language Models (LLMs). Millions of people use LLMs daily, and rumours say that OpenAI makes $3.4 billion annually in revenue! Insane growth for any company whose main product launched less than 2 years ago. But AI has so much more room to grow. We are still in the early days. The next natural step for AI is taking it out of ChatGPT and embedding it into our workflows, and we can do this using AI Agents.
https://x.com/levie/status/1776673128573284523
Before diving in, let’s quickly go through what an AI Agent is.
<aside> <img src="/icons/groups_green.svg" alt="/icons/groups_green.svg" width="40px" /> Agents are AI software that performs tasks on behalf of a user. They can automate processes, make decisions, and intelligently interact with their environment.
</aside>
Agents leverage multiple technologies, including LLMs, to provide more fluid and dynamic ways of interacting with software. With agents, you can work in natural language and have the agent perform a series of tasks. LLMs help ensure the flexibility of inputs, allowing for easier-to-use tools and faster job execution.
That sounds a little wordy. Let’s have Notion AI rewrite this for us.
<aside> <img src="/icons/book_green.svg" alt="/icons/book_green.svg" width="40px" /> Prompt: Rewrite to make it understandable by a five-year-old
</aside>
Simply put, agents are like smart helpers who use many different technologies, including large language models (LLMs), to make talking to computers and software easier. With these helpers, you can just talk to them like you talk to a friend, and they will do many tasks for you. LLMs help these helpers understand many different ways of talking, so using them is simple and quick.
Thanks, Notion AI 😊
Since the idea and technology for agents are still very new, a ton of money is being spent on research on how to improve them and make the technology commercially viable.
Here is a leaderboard from SWE-Bench that shows the percentage of tasks an agent can complete successfully. The agents are run through a dataset that tests their ability to resolve GitHub tickets. This leaderboard changes almost weekly, with new tools, methods, or startups taking the top crown.