I just attended the first ever TED AI event here in San Francisco, and one of the speakers talked about this concept of Minimal Viable Quality (MVQ) in AI Products. For those unfamiliar with this concept, it’s basically the idea that unlike standard Products in which an MVP is the typical bar, for AI first products, you also need to think about what the minimal quality is that needs to be met for the product to be deemed “useful enough”.
I didn’t think too much about this, as it’s something I had been practicing since I first joined Google over 6 years ago as a Product Manager on the Google Assistant - Speech team. This was my first experience as a PM building features based on non-deterministic models, and while this was before Generative AI got big, the core concept was the same as what applies today for building in this space.
I like to think of my job as a PM back then as a balance between 2 forces:
How do I take the technology that existis today and turn it into a product feature that will be useful and/or delightful
How do I influence the research to enable new product features to be possible in the near future that keep moving us closer to our North Star
This was a bit different than the standard PM practice of "understand the user need then build something to solve it” - mostly because we knew what the user need was - to be able to speak naturally to a device…the hard part was in trying to solve it with available technology. The only way to acheive this was to do so with incremental improvements and the right milestones - which is where the 1 / 2 framing came into play.
Focusing in on #1 is where the concept of MVQ came into play.
To start, there was a speech model capable of a certain level of performance - we tended to measure things in terms of False Accepts and False Rejects:
False Accept: when the system thinks the user did something even though they didn’t.
Every time the Google Assistant responded even though you didn’t say “Hey Google”False Reject: when the user tried to make something happen and the system ignored them
Every time you said “Hey Google” but the Assistant didn’t respond
From there, you would try to see if there was a feature you could launch. Ideally one that solved an immediate user pain point, or that got you closer to the north star goal of more naturally being able to talk to the device.
Case Study: Simple Stop
One of the most well received features I ever launched ended up seeming like one of the simplest, but the reality is that it actually took a LOT of work, and is the perfect example of how to execute on strategy #1.
The initial hope was that a user could say “stop” at literally any time, and the Assistant would listen. After testing this out, we realized that the quality just wasn’t good enough (too many Flase Accepts and False Rejects) — so it was time to start trying to figure out how to scope the feature to make sure that the result met the Minimum Quality Level (a certain % of FAs and FRs was acceptable), while still being valuable to the user. After digging through user data, 2 things jumped out:
(1) when an alarm or timer was going off, the vast majority of responses from users was “Stop” — which, in hindsight…quite obvious
(2) this command was issued within a relatively short timefram of an alarm or timer going off — also, in hindsight….quite obvious
Equipped with these two data points, I revised the feature to only listen for “Stop” within a certain time post alarm firing. With this new scoping in place, the MVQ was met.
Applying this to Generative AI
I no longer work on speech models, but the same concept applies to the world of Generative AI and LLMs. These models are amazingly capable, but they aren’t perfect - and they may never be (is hallucination a feature or a bug — a topic for another time), so we need to design products and features that take this into consideration. A couple ways I’ve seen this work:
Include a disclaimer that your product might get things wrong - this is a pretty broad strokes way of handling it, but it’s how Bard and ChatGPT do it
Launch things “in exploration” or “beta” phase
Add output filters or classifiers to your “AI System” so that you can better control the output - think of this as putting in “Guardrails” to keep the behaviour on track.
Focus on an application in which it might be a “feature” to not always get things right - companion style chatbots and image generation are examples of this.
Build a product in which the user is in the loop — co-pilots, whether for coding or other tasks, are an example of this — it helps if the AI produces things that can be seen as a “suggestion”, or provides multiple options to choose from.
MVQ vs. MVP
Now that I’ve explained a little bit more about how I think of MVQ, I want to explain why I don’t think it’s the same as MVP. In the most simplest form, it comes down to control over quality.
While “quality” is still something to consider in an MVP, what one is usually referring to as quality is an aspect of the product / feature that is a direct result of how well it was built — or, put another way, how good your [eng/product/…] team is and/or how long you gave them to build something. The idea being that with more time or more talent, quality can reach near perfection. So building an MVP shouldn’t ever come down to quality as a variable you can’t directly control.
This isn’t the case with products built with Generative AI. More time does not lead to perfect quality — so you need to identify what that MVQ is (some fields are more forgiving than others), and likely find other ways to ensure your product meets it.
Amazing, thanks for sharing!