AI
Gemini Omni Flash: Google's Any-to-Any Video AI for Marketers
Google's Gemini Omni Flash turns text, photos, and footage into 10-second videos with audio. What it means for marketers — and how to use it in MITPO.

What is Gemini Omni Flash?
Gemini Omni Flash is Google's first "any-to-any" video model. Give it text, photos, reference images, audio, or your own footage, and it generates short videos with sound — or edits existing clips through plain-English instructions. Google unveiled it in May 2026, and marketers can now use it inside MITPO's Creative Studio.
Why this matters for marketers right now
Every AI video tool until now has really been a text-to-video tool with extras bolted on. You typed a description, crossed your fingers, and got a clip that looked nothing like your product, your brand, or your last post.
Gemini Omni Flash flips that. Google built it to accept any combination of inputs — words, pictures, sound, and video — and reason across all of them at once. In Google's own head-to-head tests with human raters, it earned leading marks for video editing and reference-based video, the two capabilities marketers actually need. It also understands real-world physics (things fall, liquids pour, fabric moves), so the results feel filmed rather than dreamed.
The timing matters too. Google announced Omni at its annual I/O event in May 2026 and rolled it out through the Gemini app, its Flow creative studio, and YouTube Shorts — with business access following in the weeks after. In other words: your audience is about to see a lot more AI video in their feeds. The brands that learn this tool first will look native. The rest will look late.
One more detail worth knowing: every video Omni creates carries Google's invisible SynthID watermark, so the content is verifiable as AI-generated behind the scenes. That's a quiet plus for brands worried about trust and disclosure.
What Gemini Omni Flash can actually do
The model works in four modes, and each one maps to a real marketing job:
-
Text to video. Describe the scene — "a slow pan across a candle-lit dinner table, warm light, soft jazz" — and Omni generates the clip and the matching audio. Good for concept ads, mood pieces, and B-roll you'd otherwise buy from a stock library.
-
Image to video. Upload one photo and Omni brings it to life. A still product shot becomes a slow rotating hero clip; a founder headshot becomes a subtle, breathing portrait. Google reports leading performance on the standard image-to-video benchmark, which matches what we see: the motion respects what's in the photo instead of melting it.
-
Reference to video. This is the big one for brands. Upload two or more reference images — your product, your packaging, your model, your setting — and Omni combines them into one coherent scene. Your actual product, in a scene that never existed, looking consistent shot to shot.
-
Video editing. Upload footage you already have and change it with a sentence. Swap the background, change the season, replace an object, adjust the mood — while the parts you didn't mention stay put. Google calls this conversational editing: you refine the clip step by step, like giving notes to an editor, without starting over each time.
Clips come out in both portrait (9:16) and landscape (16:9), so the same idea works for Reels, Shorts, TikTok, and YouTube without awkward cropping.
Real ways marketers are using it
Product demo clips without a shoot. Take the product photos already sitting in your brand folder, add a two-line prompt, and get a short demo-style clip with ambient sound. What used to need a videographer, a light kit, and an afternoon now needs a photo and a sentence.
UGC-style ads at scale. Reference-to-video lets you place your real product into casual, handheld-feeling scenes — kitchen counters, gym bags, car dashboards. It reads like creator content, but every frame shows your packaging, not a lookalike.
One photo, a week of posts. This is the quiet superpower. A single strong product shot can become five different clips: a slow zoom for Monday, a lifestyle scene for Wednesday, a seasonal variant for Friday. Same asset, five moods, zero reshoots.
Refreshing old footage. That launch video from two years ago with the outdated backdrop? Upload it and ask for a new setting, updated colors, or a summer version. Editing beats regenerating because the parts that already worked — the pacing, the product moments — stay exactly as they were.
How to use Gemini Omni Flash inside MITPO
MITPO offers Omni Flash in two places: Creative Studio's Video mode for quick generations, and Canvas when you're building out a bigger visual project. Either way, the flow is the same and takes about a minute to learn:
- Open Creative Studio and switch to Video. Pick Gemini Omni Flash from the model choices.
- Type what you want. Plain language works best — describe the scene, the mood, and the motion the way you'd brief a freelancer.
- Attach your visuals (optional but powerful). One photo turns a product shot into video. Two or more reference images tell Omni "keep these consistent" — ideal for brand work. Or upload your own footage and describe the change you want made.
- Generate. Each run costs credits from your plan and returns a 10-second, 720p clip with audio — sound effects and ambience included, no separate audio step.
- Refine and reuse. Not quite right? Adjust the prompt and go again. Happy with it? It's ready for your social calendar.
Everything runs on Google's enterprise-grade infrastructure, the same foundation Google offers its business customers — not a consumer app wrapper. Your prompts and uploads stay in a professional environment built for commercial work.
From one product photo to a week of videos: a walkthrough
Theory is nice. Here's what this actually looks like on a Monday morning.
Meet Maya. She runs a three-person candle company. Her entire video budget is whatever's left after wax and shipping, and her entire asset library is one good photo: an amber jar candle on a walnut shelf, shot by a friend with a decent camera. Here's how that one photo becomes five days of channel-ready video.
Monday — the hero clip. Maya opens Creative Studio, switches to Video, and picks Gemini Omni Flash. She uploads the shelf photo and types: "Slow push-in on the candle, flame flickering gently, soft evening light, quiet crackle of the wick." One generation later she has a 10-second clip with the flame moving and the ambient sound baked in. She sets it to portrait for an Instagram Reel. The candle in the video is her candle — same jar, same label — because the motion starts from her photo instead of a text description.
Tuesday — the UGC-style clip. She wants something that feels like a customer filmed it. This time she uploads two reference images: the product photo plus a snapshot of her own kitchen counter. Her prompt: "Handheld feel, the candle sitting on this counter next to a mug of coffee, morning light, casual and cozy." Omni combines the two into a scene that never existed but looks like it did. That one goes to TikTok, where polished ads die and casual ones thrive.
Wednesday — the seasonal variant. Rather than generating from scratch, Maya takes Monday's hero clip and edits it: "Make it autumn — rain on the window behind the shelf, warmer light." The candle, the framing, and the pacing stay exactly as they were; only the world around it changes. Same product moment, new mood, no reshoot.
Thursday — the before/after. One more generation: the candle unlit in a dim room, then lit, the light blooming across the shelf. A classic transformation beat, and it's her actual product doing the transforming. That's a natural Short for YouTube.
Friday — the assembled ad. Now she has four strong clips. She drops the best three into MITPO's Motion mode, which stitches them into a 30-second branded video — her logo, her colors, a text hook up front, and an end card with a call to action. That's the piece she'll actually put ad spend behind.
Total cost: a handful of credit-priced generations and maybe forty minutes, most of it spent choosing between versions she liked. The old version of this week involved a videographer, a location, and a four-figure invoice — or, more honestly, it involved not making the videos at all.
Prompt recipes you can copy today
You don't need to be a prompt engineer. You need to describe three things: what's in the scene, how it moves, and how it feels. Sound is optional but worth mentioning — Omni generates audio with the video, so telling it what to hear pays off. Here are five recipes to start from. Swap in your own product and adjust the mood.
1. The product demo (image to video) Upload your best product photo, then:
"Slow rotating view of the product on a clean, neutral surface, soft studio lighting, gentle shadows, calm and premium feel, subtle ambient room tone."
Works for almost any physical product. The word "slow" matters — fast motion is where AI video most often looks artificial.
2. The UGC-style ad (reference to video) Upload your product photo plus a photo of a casual setting (a counter, a desk, a gym bag), then:
"Handheld phone-camera feel, the product being picked up and looked at in this setting, natural daylight, everyday and unpolished, quiet background noise of a real room."
The "unpolished" instruction is the whole trick. You're asking for creator-content energy with your real packaging in frame.
3. The before/after (text or image to video)
"The product sitting unused in a dull, dim room. Then the scene transforms — the same room now bright, warm, and inviting with the product in use. One smooth transition between the two moments."
Transformation is the oldest ad structure there is, and a 10-second clip is exactly the right length for it.
4. The brand sting (reference to video) Upload your product and something that carries your brand palette, then:
"A quick, energetic five-beat sequence: close-up details of the product — texture, edge, label — cutting together with rhythm, bold lighting in the brand's colors, punchy sound design, ending on the full product held in frame."
This is your three-second attention hook for the top of Reels and Shorts. Pair it with Motion mode to put your actual logo on the end.
5. The footage refresh (video editing) Upload an existing clip, then:
"Keep everything the same, but change the setting to a bright summer morning and update the background to feel current and fresh."
The "keep everything the same, but" framing is the key habit for editing: name what should stay before you name what should change.
Pair it with Motion mode for full brand videos
Omni Flash is brilliant at single clips. But most campaigns need more than one shot — an opening hook, product moments, text on screen, a call to action.
That's where MITPO's Motion mode comes in. Motion builds editable, multi-scene brand videos with your logo, your colors, and your copy — and Omni Flash clips slot in as footage. A practical combo: generate three product clips with Omni Flash, then let Motion assemble them into a polished 30-second ad with branded titles and an end card. Generation and editing in one place, instead of five browser tabs and an export folder.
Omni Flash vs. Motion mode vs. stock footage: which do you reach for?
Three tools now compete for the same job — putting moving pictures on your feed — and picking the right one saves both credits and afternoons.
Reach for Omni Flash when the shot itself is the problem. You need footage that doesn't exist: your product in a scene you never filmed, a photo brought to life, an old clip moved to a new season. Omni gives you realistic single shots with sound — but to change a clip, you edit or regenerate; you can't open it up and nudge a headline.
Reach for Motion mode when the structure is the problem. You have the pieces — clips, photos, a message — and you need them assembled into a branded video: hook, scenes, text on screen, logo, end card. Motion videos stay editable, so when the offer changes, you swap a line instead of regenerating everything.
Reach for stock footage when speed beats specificity. Stock is instant and cheap, but it's someone else's kitchen and someone else's product — fine for generic B-roll behind a voiceover, wrong for anything where your product should be on screen.
The rule of thumb: stock for filler, Omni for shots, Motion for the finished video. The strongest workflow uses all three.
Honest limitations to know before you start
No tool is magic, and it's better you hear the caveats from us:
- Clips are short. Think 10 seconds — perfect for social, not for a webinar. For longer pieces, stitch clips together in Motion mode.
- The technology is new. Google still labels the business version a preview, which means occasional quirks and busy periods. Results can vary between runs — generating two or three options and picking the best is the professional move.
- English works best. Google notes full support for English prompts; other languages are less tested for now.
- Editing uploaded footage has regional limits. Google currently restricts that specific feature in the EEA, Switzerland, and the UK.
- It won't clone voices. Voice editing isn't supported, and that's a feature, not a bug — it keeps the tool on the right side of trust.
- Each generation costs credits. Ten seconds of AI video is genuinely expensive to compute. Sketch your idea cheaply first (a still image, a rough prompt), then spend credits on the version you believe in.
Frequently asked questions
Is Gemini Omni Flash free?
It depends where you use it. Google rolled the model out through its own consumer apps, with business access following separately. Inside MITPO, each Omni Flash generation costs credits from your plan — no separate Google subscription or setup required.
How long can the videos be?
Each generation produces a 10-second clip at 720p, with audio included. That's deliberately social-length — the size of a hook or a product moment. For anything longer, generate several clips and assemble them in MITPO's Motion mode, which stitches them into a branded 30-second (or longer) video with titles and an end card.
Does it add sound?
Yes. The model generates audio together with the video: ambient room tone, sound effects, the crackle of a candle wick, background atmosphere. Describe the sound in your prompt the same way you describe the visuals. One thing it won't do is clone voices — voice editing isn't supported, by design.
Can it edit my existing videos?
Yes. Upload footage you already own and describe the change in a sentence — new background, different season, updated colors — and the parts you didn't mention stay put, refining step by step like giving notes to an editor. One caveat: Google currently restricts editing of uploaded footage in the EEA, Switzerland, and the UK.
Will the videos work for Instagram Reels and TikTok?
Yes. Clips generate in both portrait (9:16) and landscape (16:9), so the same idea fits Reels, TikTok, YouTube Shorts, and standard YouTube without awkward cropping. And with sound already attached, most clips are post-ready the moment they finish.
Is Gemini Omni Flash worth trying for your brand?
If you post video anywhere — and in 2026, every brand does — yes. The honest pitch is simple: this is the first video model where your actual product can appear in generated scenes, and where footage you already own can be updated instead of reshot. Those two things alone change the math on video content.
A sensible first experiment: pick your best product photo, write one sentence about the scene you wish it lived in, and generate. You'll know within a minute whether this belongs in your workflow.
You can try MITPO — including the Creative Studio where Omni Flash lives — at mitpo.io/demo. No pitch, no meeting. Bring a photo and see what it becomes.
Next step
Turn this into a practical workflow with the marketing foundations guide.