We usually talk about alt text as an SEO “trick”, but that’s actually the least impressive thing about it. At its core, alt text is the primary bridge between your visual content and the millions of people navigating the web via screen readers.
When you skip it, you are telling a significant portion of your audience that they aren’t invited to the conversation.
But we also know the logistical reality. Manually writing meaningful, descriptive text for every product shot, blog header, and social asset is the kind of repetitive labor that leads to burnout, or worse, “image-01.png” as a default.
We’re going to build a Python tool that looks at your images, understands the context, and drafts your accessibility tags for you.
How the “Eyes” Work
The process is straightforward. We aren’t guessing based on the filename (which is usually IMG_5672.png). We are sending the actual pixels to a Vision model, asking it for a concise summary, and saving that output.
In this case, we’ll use Gemini 3 Flash, but you can use any model you want, as long as you adjust the script per model’s specifities.
Setting Up the Gemini API Key
Before our script can see anything, we need to connect it to a brain through an LLM API. Google handles this through Google AI Studio. It is the fastest way to get a key without getting lost in the broader Google Cloud bureaucracy.
- Access AI Studio – Head over to aistudio.google.com and sign in with your standard Google account.
- Navigate – In the left hand sidebar, click on “API keys.”
- Create a Key – Click “Create API key.” This generates a long string of characters (usually starting with
AIza). - Secure It – Copy that key immediately. Treat this like your house keys.
Important: For illustrative purposes and personal use, we will hardcode the API key in the script, BUT do not hardcide it directly in a script that you plan to upload to GitHub. If you are just testing locally, you can put it in the script, but for production, use an environment variable.
The Code
We’ll use Python and the Google Generative AI library for this example because it’s fast and handles image reasoning exceptionally well.
This script uses the google-genai library and the Gemini 3 Flash model. We have added a few safeguards, like rate limiting and automatic MIME type detection, to make sure it doesn’t crash halfway through your folder.
First, install the dependencies: pip install google-genai
Then, use the script:
import os
import time
from google import genai
from google.genai import types
# Initialize the client with your API key
client = genai.Client(api_key="YOUR_API_KEY_HERE")
def generate_alt_text(client, image_path):
"""
Sends an image to Gemini and returns a concise alt text description.
"""
with open(image_path, 'rb') as f:
image_bytes = f.read()
# Detect MIME type based on file extension
ext = image_path.lower()
if ext.endswith('.png'):
mime_type = "image/png"
elif ext.endswith('.webp'):
mime_type = "image/webp"
else:
mime_type = "image/jpeg"
# Generate alt text using Gemini 3 Flash
# Using "low" thinking level instead of "minimal" to avoid thought signature requirements
response = client.models.generate_content(
model="gemini-3-flash-preview",
contents=[
types.Part.from_bytes(data=image_bytes, mime_type=mime_type),
"Write a concise alt text description for this image (max 125 characters). "
"Focus on the subject and context for accessibility. Avoid 'image of' or 'picture of'."
],
config=types.GenerateContentConfig(
thinking_config=types.ThinkingConfig(thinking_level="low")
)
)
return response.text.strip()
def process_folder(client, folder_path):
"""Process all images in the specified folder."""
if not os.path.exists(folder_path):
print(f"❌ Folder not found: {folder_path}")
return
print(f" Scanning folder: {folder_path}\n")
# Find all supported image files
image_files = [f for f in os.listdir(folder_path)
if f.lower().endswith(('.png', '.jpg', '.jpeg', '.webp'))]
if not image_files:
print("⚠️ No image files found in the folder.")
print(" Supported formats: .png, .jpg, .jpeg, .webp")
return
print(f"Found {len(image_files)} image(s) to process\n")
# Process each image
successful = 0
failed = 0
for i, filename in enumerate(image_files, 1):
path = os.path.join(folder_path, filename)
try:
print(f" [{i}/{len(image_files)}] Analyzing: {filename}")
alt_text = generate_alt_text(client, path)
print(f"✅ Alt Text: {alt_text}\n")
successful += 1
# Add delay between images to avoid rate limits (except after last image)
if i < len(image_files):
time.sleep(2)
except Exception as e:
print(f"❌ Error processing {filename}: {e}\n")
failed += 1
# Summary
print(f"\n{'='*50}")
print(f"✅ Successfully processed: {successful}/{len(image_files)}")
if failed > 0:
print(f"❌ Failed: {failed}/{len(image_files)}")
print(f"{'='*50}")
def main():
"""Main entry point for the script."""
# Image folder path - UPDATE THIS to your actual folder path
IMAGE_DIR = 'my_images'
print(f" Looking for images in: {os.path.abspath(IMAGE_DIR)}\n")
try:
# Process all images
process_folder(client, IMAGE_DIR)
finally:
# Clean up client resources
client.close()
print("\n✅ Client connection closed")
if __name__ == "__main__":
main()Technical Details to Keep in Mind
Specifying the Directory
In the main() function, you will see a variable named IMAGE_DIR. This is where you tell the script where your files are hiding.
- Relative Path – If your images are in a folder named
assetsinside your script’s directory, set it to'assets'. - Absolute Path – If they are on your desktop, use the full path like
'/Users/YourName/Desktop/Images'. The script usesos.path.abspath()to print the exact location it is searching, which is a great sanity check if the script says it can’t find your files.
The “Thinking” Level
We are using thinking_level="low". In the new Gemini 3 models, “Thinking” allows the model to reason through complex visual data before outputting text.
For simple alt text, “high” reasoning is overkill and adds unnecessary latency. “Low” provides enough smarts to distinguish a cat from a cushion without making you wait ten seconds per image.
MIME Type Logic
The script includes a small logic block to detect .png, .webp, or .jpeg. Vision models need to know the “MIME type” to process the binary data correctly. This prevents the script from throwing a generic error just because you mixed different file types in the same folder.
The Final Move
SEO automation is a tool for efficiency, but it isn’t a replacement for human empathy. While the AI is excellent at identifying “A person wearing a blue shirt holding a laptop,” it might miss the specific brand of that laptop or the emotional tone of your marketing campaign.
The best workflow is to run this script to generate a baseline. You can easily modify the code to save these results into a CSV file.
Once you have that list, spend ten minutes scanning it for accuracy. It is significantly faster to edit an AI’s homework than it is to write the entire assignment from scratch.

