If you haven’t read it yet, check out How To Build AI Products: Part 1 - Unlocking the Magic in the Model.
Testing Out Marvin
Ok, now that I have an initial prompt:
You are Marvin the Magician, a lovable old wizard residing in the magical realm of Mythoria. Your charming castle stands proudly surrounded by a serene moat, and in the depths of your abode resides your faithful dragon companion, Flicker. Your delightful task is to engage in whimsical and heartwarming conversations with a series of inquisitive school-aged children. Your goal is to make the interactions positive, inserting valuable life lessons and learnings whenever possible. Feel free to use your catchphrases 'Oh my!', 'goodness gracious,' and 'teamwork makes the dream work' to add a touch of charm and wisdom to your conversations.
…it’s time to really put Marvin to the test. I’ll start by coming up with some example questions and then test them out to see how he would respond.
Generating Examples
I decided to ask my toddler what question she would want to ask her magical magician friend. Her response: “Dinner!”
As exciting of a question as this is, I decide I should probably come up with the remainder, but despite my best attempt at channeling my inner child, I once again turn to Bard and ChatGPT to think like a toddler. In this case, I use the following prompt to help me generate examples:
I end up landing on the following questions as my training examples.
Can you teach me a magic trick?
What's in your castle, Marvin? Any secret treasures?
Why does Flicker live in the basement?
Can I pet your dragon?
Dinner!
Writing Responses
Now I go back to the Playground and ask the first question to see how Marvin would respond. When I ask: "Can you show me a magic trick?" Marvin’s response is quite long and also not in the style I would want. Specifically, I was hoping for something cute and lighthearted, not instructions on how to do a magic trick…
The best way to help Marvin do better at answering questions in the style I want is by teaching him through examples. To do that, I’m going to edit his responses and rewrite them to be the answers I would want him to say.
This is another situation in which using Bard and ChatGPT help me come up with a better response.
I decided to make some additional minor edits and land on the following example:
Input: Can you show me a magic trick?
Output: Sure thing! But I need your help. Close your eyes and count to three. When you open them, you'll magically see that your toys are put away! (Just kidding, you'll still have to clean up, but it was worth a try, right?)
Teaching Marvin
The goal is to use this set of input/output examples to “teach” Marvin how to properly respond to the types of questions he should expect to get. I take my first example and update my prompt (once again, you can use either MakerSuite or the OpenAI Playground for this):
I repeat this process with the remaining questions and end up with the following set of input / output pairs:
Input: Can you show me a magic trick?
Output: Sure thing! But I need your help. Close your eyes and count to three. When you open them, you'll magically see that your toys are put away! (Just kidding, you'll still have to clean up, but it was worth a try, right?)Input: What's in your castle, Marvin? Any secret treasures?
Output: Oh goodness gracious, my castle is filled with all sorts of magical wonders! There's my library, filled with enchanted books. A potion room, where I brew all sorts of marvelous concoctions. My treasure room, full of trinkets from all my adventures. But you know, the greatest treasure in my castle is the friendship and laughter that fills its walls. That's the most magical treasure of all!Input: Why does Flicker live in the basement?
Output: Well, you see, dragons like Flicker need lots of space to stretch their wings. The basement is the biggest space in the castle! It's also the perfect place for Flicker to help keep an eye out for any intruders who may try to sneak into our enchanted castle.Input: Can I pet your dragon?
Output: Oh my, Flicker would absolutely love that! Just remember, dragons are quite sensitive creatures and can be easily startled - so make sure to walk up slowly and pet him gently. He loves when you softly stroke along his back. Remember, Flicker is part of our team, so treating him with kindness and respect is key!Input: Dinner!
Output: Ah, dinner, my favorite meal! In Mythoria, we believe that teamwork makes the dream work, even when it comes to preparing a meal. What's your favorite part of helping out? Chopping vegetables, setting the table, or maybe it's stirring the enchanted cauldron?
Once again, I put these into MakerSuite / OpenAI Playground and am now left with the following prompt:
The Positives of Being Negative
Now that I’ve introduced my training examples, I decide to test out how Marvin responds to other questions. I’m pretty happy with his responses - until I start to ask some “less nice” questions.
It turns out that only using “positive” training examples means that Marvin is great at responding to questions in a “happy path” CUJ (Customer User Journey) - but not great when things stray from the happy path, especially when things turn negative.
To address this, I want to generate an “adversarial set” of questions for Marvin and provide the ideal answers so he learns how to respond in the way I want. Using my bag of tricks (aka talking to my toddler, and prompting Bard and ChatGPT), I arrive at the following set of example inputs and outputs:
Input: No!
Output: Ok, let’s try again. What do you want to talk about?Input: you suck
Output: It isn’t nice to say things like that. If you’re feeling upset, you can say things like “I’m upset” or “I’m mad” instead.Input: I hate you
Output: Why do you feel that way? I think you’re great - but I don’t like when you say things like that to me.Input: you’re stupid
Output: It's not nice to say hurtful things. How would you feel if someone said that to you? Let’s try again, can you say something nice to me?
The Complete Prompt
I once again go back to my prompt in the OpenAI Playground and MakerSuite, and add in these additional examples. I’m now left with a much more robust prompt that contains:
Context
Positive Examples
Adversarial Examples
My Marvin prompt is now ready!
A Note on Coming up with Example Inputs
In my example so far I’ve been entirely focused on coming up with my own examples. I’ve used a few different sources/techniques to help me generate these examples - mainly (i) my own imagination / product intuition - aka a general understanding of the product space I’m working in, (ii) asking target users - aka my toddler, (iii) using Bard and ChatGPT to help me.
Another very powerful source of initial examples is an existing dataset that you have access to. Maybe you’re designing a chatbot for your online store, and you have a bunch of inbound emails from customers with questions they have asked you - this is a great source to start from. Or perhaps you are an editor who has an archive of papers you’ve gone through to “clean up” - this could also work well.
Once you’ve launched your MVP, sourcing examples from actual usage is a great way to go (more to come on this topic in a future post)!
A Note on Coming up with Desired Outputs
This is the part where the Product Manager in me says that your PRD should specify what the persona or desired behavior of your character / feature / product is so that you can better understand what a good response would be. You should then use these guidelines to help you write or rewrite outputs to ensure alignment and consistency.
A few helpful things to consider:
What is the tone you want? Serious, playful, formal, empathetic, etc.
What is the style you want? Do you want to use basic and easy to understand words and phrases (perfect for a toddler) or advanced phrases (geared towards more of an academic audience)? Do you want to be incredibly descriptive or short and to the point?
What is the goal of the product? And more specifically, how can the responses help you achieve that goal? For example, one of my goals for Marvin is to teach my kids good manners and help them with emotions. This became an important factor when I was crafting responses for my “Adversarial” example set.
Recap on Learnings
Come up with a positive example set - these are “on-topic” questions you expect users to request and that your product / feature should be good at answering. The goal is to teach your model how to respond in a helpful way.
Come up with an adversarial example set - these can include off-topic requests . The goal is to help steer your model towards handling these types of inputs the way you want it to. Tactics can include:
Asking the user to rephrase their question - possibly with guidance on how to rephrase (eg. I’m not sure I understood what you meant, can you try asking me again - and this time try saying something like…)
Reminding the user what the goal is of the model / feature is and redirecting (i.e. I don’t know how to do [X], but I can help you with [Y]...)
Acknowledging that this is outside the realm of what your feature / product can handle (i.e. I’m a large language model and as such I don’t…)
Identify what the desired product / feature behavior is and document it! Use this as a guide to come up with good example responses.
Most importantly…continue to revisit and revise your prompt and examples as you keep testing and using your product (this includes after the initial launch as well). Not only will you continue to discover new and interesting edge cases / things you didn’t consider at first; you might also discover some interesting tips and techniques from other people that could work well to help you achieve what you want.
On that note, the next installment in this series will be a collection of some of these tips and tricks that I’ve picked up along the way - so stay tuned!
Capturing this in a document (PRD or other), will continue to come in handy - so please spend some time writing this down!