Stop Asking, Start Showing: Why GUIs Still Win in the Age of AI
The AI revolution is here, and consumer founders have noticed.
This revolution has brought a flood of chat and voice-based consumer apps. But there's a problem: chat may be the wrong interface for most of them.
Since the breakout success of ChatGPT, we have seen a wave of AI-native consumer applications and experiences. Some, like Lovable in vibe coding and Suno in music generation, have seen significant traction, while others have faltered. The one common theme among this generation of products is that they assume that chat (or voice) is now a “mainstream interface”. We’ve all heard breathless takes about how natural language is the ultimate goal of human-computer interaction. If it works for ChatGPT, it should also work for your product, right?
Wrong. Chat (or voice) is the worst possible interface for interacting with technology, with some exceptions. I’ll get to the exceptions in a minute, but first let me explain why. Imagine the following scenario:
You walk into a restaurant and sit down at a table. In a few minutes, a waiter walks up to you and you have the following conversation:
Waiter: “What would you like?”
You: … (blanking)
You: “Can I have the menu?”
Whether you are a leading technologist or a construction worker, your response is unlikely to be any different. Why? The number of possible options is staggering – Do they have pepperoni pizza? What about sushi? Maybe Pad Thai? Processing a high number of possible choices taxes the human brain. The term to describe this state is high cognitive load. The way to reduce cognitive load is to limit the number of choices. Luckily, humans solved this problem a few thousand years ago with the invention of the menu – a simple list of dishes you could have. Anything not on it is off the table.
The problem of high cognitive load reared its head again at the birth of personal computing in the 1970s. Back then, you could only interact with a computer by typing in a specific command. Not very different from a (much more restrictive) chat interface. You had to know all possible commands in order to use the device. If you didn’t, tough luck. As a result, personal computers were the realm of hobbyists. This changed when the industry brought the equivalent of a “menu”, or the graphical user interface (GUI), to PCs. With a GUI, users could see the possible options and click (or later tap) on what they wanted to do. Much easier.
Going back to our original scenario, the waiter’s question is a proxy for the chat (or voice) interface. And the menu plays the role of the GUI. Now it should hopefully be obvious why the chat/voice interface is a terrible default choice. That does not mean it can never be used. It obviously works for ChatGPT, for example, and many other products. The key is to figure out when chat/voice interfaces are an acceptable choice and when they are not.
The key rule to remember is this:
Chat and voice interfaces work well only when they reduce cognitive load, i.e., replace text or voice input that has an even higher cognitive load. In every other context, default to a GUI.
While LLMs like ChatGPT and Anthropic’s Claude are “generalized” in theory, their dominant use cases adhere to this rule. Anthropic’s own data shows that writing and coding account for roughly 50% of LLM usage. Therapy and research are also rapidly growing use cases. The reasons are obvious:
- Writing a 1000-word blog post has a much higher cognitive load than a few sentences to prompt an LLM to do it. Similarly, describing the image you want to an artist has much higher cognitive load than a few sentences to create one with an LLM.
- Writing code has a much higher cognitive load than asking an LLM.
- Finding and talking to a therapist has a much higher cognitive load than a simple chat with an LLM.
- Research involving multiple, iterative text inputs into Google search and/or a back-and-forth conversation with an analyst has a much higher cognitive load than a short LLM prompt. It’s no surprise that all of the most popular search terms on Google range from 1-3 words, not quite research-level input.
On the flip side, examples of a poor use of chat/voice interfaces would include things like:
- Ordering food
- Booking a restaurant
- Booking flights
- Shopping
- And so on…
Why? A GUI can show a user their options, and a visual filter makes that even easier. In this case, a chat interface simply increases the cognitive load for the end user – users have to imagine what they want, ask the LLM for that, then ask for it to filter, and so on. Even if your app is AI-powered on the back-end, you need to abstract the complexity away with a GUI on the front-end.
Remember, as technology evolves, the limitations that matter are not technological, they are human. So don’t ask users what they want. Give them a menu.
If you're building in consumer AI and have questions about the ideal user interface, feel free to drop me a note at sameer.singh@speedinvest.com or sign up for my office hours here.