Gate2Ai Prompt Extractor

Web Scraper, CSV Formatter, and Metadata Generator

Overview

This script is a comprehensive tool for web scraping, CSV formatting, and metadata generation. It scrapes prompts from Gate2AI, formats them into a CSV file, optionally generates variations, and creates metadata for different versions of the prompts. The tool uses a graphical user interface for easy interaction and provides progress updates throughout the process.

Technologies Used

Python : The primary programming language used.
customtkinter : For creating the graphical user interface.
Selenium : For web scraping.
google.generativeai : For generating prompt variations and metadata using Google's Gemini AI model.
csv : For reading and writing CSV files.
chardet : For detecting character encoding of files.
threading : For running processes in parallel.
queue : For thread-safe data exchange.
webdriver_manager : For managing WebDriver binaries.
time : For adding delays in the script.
os and sys : For operating system dependent functionality.
re : For regular expression operations.

Usage

Go to Gate2AI and select a Midjourney category.
Copy the URL of the category page. It should look like this: https://www.gate2ai.com/prompts-midjourney/{category}
Run the script and enter the following information in the GUI:
- URL to scrape (the one you copied)
- Output file name (e.g., "art.csv" if you're scraping the art category)
- Starting serial number for the prompts
- File name prefix for output files
- Three API keys for Google's Gemini AI (create a free account here to get the API keys)
Optionally, check the "Generate Variations" box if you want to create variations of the scraped prompts.
Click "Start Process" to begin.

The script will scrape prompts from the provided URL, format them into a CSV file, optionally generate variations, and create metadata for different versions (V1, V2, V3, V4) of the prompts.

Note: The script uses multiple API keys because one API can't handle a large number of requests at once. It automatically switches to the next API key after processing 350 prompts.

At the end of the process, you'll find:

A CSV file containing the scraped (and optionally varied) prompts
Metadata files for V1, V2, V3, and V4 versions of the prompts
Reduced metadata files with a maximum of 48 keywords per prompt

Image Naming Convention and Metadata Correlation

The script uses a specific naming convention for the extracted images and their corresponding metadata. This convention is crucial for maintaining the relationship between prompts, generated images, and metadata, especially when uploading to platforms like Freepik.

Image Naming

When you generate images using Midjourney based on the prompts from this script, name your images as follows:

[prefix]V1-[number].jpg
[prefix]V2-[number].jpg
[prefix]V3-[number].jpg
[prefix]V4-[number].jpg

Where:

[prefix] is the file name prefix you entered in the script
[number] is the serial number of the prompt

For example, if your prefix is "art" and you're working with the first prompt, your image files should be named:

artV1-1.jpg
artV2-1.jpg
artV3-1.jpg
artV4-1.jpg

Metadata Correlation

The metadata files generated by the script follow the same naming convention. This ensures that when you upload the images to Freepik or similar platforms, the metadata will automatically be connected to the corresponding images.

This system allows you to keep accurate records of which prompts correspond to which images and their associated metadata, streamlining your workflow and maintaining consistency across your generated content.

Use the generated prompt file to create images with Midjourney using your Midjourney bot, and ensure you follow this naming convention for a smooth integration with the metadata files.