[Launch Day] The Gemini API is now live!
Start building with Google’s newest Generative AI model
I’m breaking from my regular Thursday morning post routine because today is a very special day. After months of hard work with an amazing group of people, I’m so excited to finally be able to talk about, and showcase, what I’ve been working on with my team. We have officially launched the Gemini API in Google AI Studio!
Here’s a roundup of some of the helpful links we’ve released about Gemini:
Google AI Studio - where you can start interacting with Gemini and get an API key
Developer Docs - where you can read about how to get started
Introducing Gemini: our largest and most capable AI model - a note from Sundar + Demis
Welcome to the Gemini era - Google DeepMind site
Hands-on with Gemini: Interacting with multimodal AI - YouTube video showcasing what’s in store for Gemini capabilities
You can now get started building for free with Gemini Pro, one of Google’s newest multimodal Generative AI models that we announced last week. Simply go to Google AI Studio to sign-up and start prompting. A few helpful tips:
There are 3 types of prompts: freeform, structured, and chat (I like to use freeform, which is our default).
You can add photos via Google Drive, or drag and drop them from your computer into your prompt.
There are 2 models you can try right now; gemini-pro for text input and gemini-pro-vision for text + image input. Both models output text.
Once you’ve written your prompt, hit Run to see the results!
The things I’m most excited about are the new types of use cases and interactions a truly multimodal model enables. As the blog posts say,
[Gemini] can generalize and seamlessly understand, operate across and combine different types of information
but if you’re like most people, it’s hard to really understand what these broad, sweeping statements (as exciting as they sound) really mean and what you can build with Gemini.
I was talking to a coworker about this, and something that really resonated with me was when he described this leap forward in capability as going from Recognition to Reasoning. I thought this was a great way to differentiate between simple object identification and actual understanding.
But I want to go a step further than that and show you some amazing ways in which you can start interacting with Gemini.
So to celebrate this monumental launch, I’m going to run a special series on my Substack. For the next week I’ll be releasing a new post each day that dives into a fun and novel way that you can try using Gemini for multimodal tasks. I hope to spark some inspiration and creativity within you!
Getting Festive
Since we're embracing the full spirit of Christmas at my home, I'll begin with a few festive examples. My wish is that these vignettes plant some ideas, or that you’re able to adapt them to your own unique holiday customs, whatever those may be!
Elf on the Shelf - Scene Ideas
Full disclosure, work has been a little crazy this year so I definitely opted to buy daily pre-packaged “scenes” for my Elf on the Shelf. That said, I wanted to see if Gemini could creatively generate some ideas, in case I felt like taking a more DIY approach next year. Gemini did not disappoint.
Elf on the Shelf - Where Are You?
In previous years, my husband and I have taken it upon ourselves to simply hide “Sir Elf” (who this year finally got the name Sir Elmer). By the end of the month, it became a real struggle to find new and creative places to hide the elf, so I decided to see if Gemini could help me identify where a good place might be to hide the elf…
Elf on the Shelf - Photo Captions
Part of my annual tradition is taking pictures of my girls each day as they find the elf. I decided to use Gemini to help me generate captions for these photos…turns out, it’s more creative than I am ;)
Happy prompting!