How I Built Alphabet Reels Using Antigravity Gemini Flash

May 31, 2026

I gave Antigravity a detailed prompt specifying the folder structure, tools, pipeline stages, and final output. Antigravity generated the entire codebase from that one prompt. The code, scripts, scene definitions, background removal logic, image prompts, and config were all generated by Antigravity. I reviewed the output, ran it, identified issues, researched fixes, and refined the prompt or code until the pipeline worked as expected.

This is Alphabet Reels: a Python pipeline that generates 26 TikTok-ready educational videos for hamrobarnamala.com.

The Idea

I wanted short, vertical videos for young kids learning the English alphabet. Each video covers one letter, shows two words starting with that letter, and runs 45 to 60 seconds. The target platform is TikTok under the hamrobarnamala account.

Instead of writing the code myself, I wrote a detailed prompt for Antigravity. It asked for a specific folder structure, content generation for all 26 letters, image generation using Nano Banana, audio using Edge-TTS, video rendering using MoviePy, error handling with retries, and a dry-run mode.

Antigravity generated the full project structure, all source files, the config, and the requirements file in one session. I then ran it, tested each stage, and fixed what broke.

How the Pipeline Works

The project has four stages that run in sequence. Each can also run independently.

1. Content Generation

For every letter A to Z, this stage produces:

Two kid-friendly words starting with that letter
A simple sentence using the first word
A TikTok caption with hashtags
A background color
An 8-scene script

The words and scripts sit in a hardcoded dictionary rather than being generated live. The reason is practical. If the AI picked words dynamically, it might choose a word that had no matching image ready. A fixed dictionary keeps words and images in sync.

I wrote custom scripts for letters A to E by hand. They feel more playful and specific. For letters F to Z, a template fills in the letter and words automatically. Those work fine, but they lack the character of the hand-written ones.

2. Image Generation

Each letter needs three images: one for each word, plus a background. I used Nano Banana with a seed tied to each letter. That means the same letter always gets the same image. Before calling the API, the pipeline checks three sources in order:

Pre-generated images in the assets folder
Local images in the images folder
Nano Banana, if neither is found

I can override any image by dropping a replacement file into the right folder.

The hardest part was removing backgrounds. AI-generated images come with a flat background, and placing them over a video background creates edge artifacts. I wrote a custom function that samples pixels from the corners and edges, identifies the background color, makes those pixels transparent, and blurs the edges slightly. It works because Nano Banana produces flat backgrounds. It would not handle gradients well.

3. Audio Generation

Audio comes from Edge-TTS using the en-US-AriaNeural voice. Each of the 8 scenes gets its own audio file, not one long file per letter. The narrations include deliberate pauses. After asking “Can you say A?” there is a 1.3-second gap so kids have time to respond.

A metadata file records each scene’s exact duration. The video renderer reads that metadata to match visuals to audio precisely.

4. Video Rendering

MoviePy handles the rendering. The interesting logic lives in scenes.py. Each scene is a function that draws one frame at any given timestamp.

Scene 1: Intro text on a themed background
Scene 2: A large letter bounces in
Scene 3: Uppercase letter transitions to lowercase
Scene 4: Word 1 image slides in with a bounce
Scene 5: Interactive moment with Word 1; confetti appears
Scene 6: Word 2 image appears
Scene 7: Both images face off with sparkle particles
Scene 8: Review of both words, then a fade out

The renderer runs at 15 FPS. It caches the last 8 frames to avoid redrawing static backgrounds on every frame. The output is an MP4 at 1080x1920 with a hamrobarnamala.com watermark at the bottom of every frame.

What Worked Well

The modular design saved me repeatedly. When audio timing was off, I re-ran only the audio stage. When an animation felt too fast, I edited the renderer alone. Each stage validates its output before passing it forward. Failures surface early and close to their source.

Pre-generating images offline was the biggest time saver. Because the pipeline checks for existing files first, most letters skip the API entirely. A full 26-letter run finishes in minutes.

Edge-TTS was better than expected for a free tool. The voice works well for kids’ content.

What I Would Change

The template scripts for F to Z work, but they are blander than the A to E versions. I will write custom scripts for all 26 letters if the project continues.

The background removal function assumes flat backgrounds. It would struggle with gradients. A cleaner option is to ask Nano Banana for transparent backgrounds directly, or switch to a dedicated tool like rembg.

The TikTok uploader is not connected yet. The code is ready, but the integration is still pending.

The Result

Here is the actual reel from the pipeline:

@hamro.barnamala

The animations are simple, the voice is synthetic, and the background removal has rough edges. But the video is live, generated fully from a Python pipeline using free tools.

Final Thoughts

Antigravity wrote the initial codebase. I ran it, tested it, fixed issues, and refined the prompt and code until it worked. The division matters. Pasting a prompt and accepting the output is not what happened here. The pipeline needed debugging, iteration, and judgment calls that the AI could not make alone.

What Antigravity actually bought me was a solid scaffold in one session. That let me focus my time on the parts that determine output quality: animation timing, background removal, word selection, and scene scripts. Those parts required real work no matter how the surrounding code was written.