# Comic Story Generator: Code Handover Document

**Date:** 2025-7-22
**Document Purpose:** This document provides a comprehensive technical handover for the Comic Story Generator project. It is intended for developers and future maintainers responsible for the deployment, maintenance, and extension of the application.

---

## 1. Project Overview

The Comic Story Generator is a web application that automatically creates multi-page, textless comic stories from a user-provided description. The application leverages generative AI to produce visually coherent narratives, focusing on character consistency, expressive emotion, and logical panel sequencing.

### 1.1. Core Functionality

The application is designed to translate a textual story concept into a purely visual comic strip. Key characteristics include:

*   **AI-Powered Narrative:** Utilizes Google's Gemini to interpret the user's concept and break it down into a structured, panel-by-panel narrative.
*   **Visual Generation:** Employs a GPT-based image model to render complete comic pages based on the AI-generated narrative structure.
*   **Intelligent Panel Detection:** Uses Gemini Vision to analyze the generated full-page image and accurately detect the boundaries of each panel, ensuring precise splitting.
*   **Customization:** Offers users control over the output, including:
    *   **Layout:** Choice of panel count (from 4 to 24).
    *   **Length:** Generation of 1 to 10 pages.
    *   **Art Style:** A selection of visual styles, including "Classic Comic," "Manga," "Cartoon," "Digital Paint," and a high-contrast "Accessible" style designed for users with special needs.

### 1.2. High-Level Workflow

The generation process follows a clear, multi-step pipeline:

1.  **User Input:** The user submits a short description of the desired story.
2.  **Story Generation:** The `StoryGenerator` component uses Gemini to create a detailed, scene-by-scene description for each comic panel.
3.  **Page Generation:** The `ComicGenerator` takes the panel descriptions and instructs the GPT-Image model to generate a single, composite image representing a full comic page with panels arranged in a grid.
4.  **Layout Analysis:** The generated page is passed to the `GeminiVision` component, which analyzes the image to identify the precise coordinates and boundaries of each panel.
5.  **Panel Splitting:** The application uses the coordinates from the vision analysis to accurately split the composite image into individual panel images.
6.  **Final Output:** The processed panels are presented to the user as a complete, multi-page visual story.

---

## 2. System Architecture

The application is built on a modular architecture composed of three primary classes, each responsible for a distinct part of the generation pipeline.

### 2.1. System Diagram

```mermaid
classDiagram
    class StoryGenerator{
        +generate_story(description: string) : list[string]
        +enhance_visuals(panel_descriptions: list) : list[string]
    }
    class ComicGenerator{
        +generate_page(panel_descriptions: list) : Image
        +split_panels(page_image: Image, grid_layout: dict) : list[Image]
    }
    class GeminiVision{
        +analyze_layout(page_image: Image) : dict
    }
    
    StoryGenerator "1" -- "1" ComicGenerator : Provides panel descriptions
    ComicGenerator "1" -- "1" GeminiVision : Uses for layout analysis
```

### 2.2. Data Flow

The end-to-end data flow illustrates the interaction between the user, the application, and the underlying AI models.

```mermaid
sequenceDiagram
    participant User
    participant App
    participant Gemini as Gemini (Text/Story)
    participant GPTImage as GPT-Image (Visuals)
    participant GeminiVision as Gemini Vision (Analysis)

    User->>+App: Submits story description
    App->>+Gemini: Requests story structure from description
    Gemini-->>-App: Returns panel-by-panel text descriptions
    App->>+GPTImage: Requests comic page generation from descriptions
    GPTImage-->>-App: Returns single full-page image
    App->>+GeminiVision: Requests layout analysis of the image
    GeminiVision-->>-App: Returns coordinates of each panel
    App->>User: Displays final, split-panel comic
```

---

## 3. Setup & Installation

### 3.1. Prerequisites

*   **Python:** Version 3.9 or higher.
*   **API Keys:**
    *   An active OpenAI API key.
    *   An active Google API key with access to the Gemini family of models.

### 3.2. Installation Steps

1.  **Clone the Repository:**
    ```bash
    git clone https://github.com/yourusername/Comic-Story-Generator.git
    cd Comic-Story-Generator
    ```

2.  **Create and Activate a Virtual Environment:**
    ```bash
    # Create the environment
    python -m venv venv
    
    # Activate the environment (macOS/Linux)
    source venv/bin/activate
    
    # Or, activate on Windows
    # venv\Scripts\activate
    ```

3.  **Install Dependencies:**
    ```bash
    pip install -r requirements.txt
    ```

4.  **Configure Environment Variables:**
    Create a `.env` file in the project root and add your API keys.
    ```bash
    echo "OPENAI_API_KEY=your_openai_key" > .env
    echo "GOOGLE_API_KEY=your_google_key" >> .env
    ```
    *Note: Ensure the `.env` file is added to your `.gitignore` file to prevent committing secrets.*

---

## 4. Environment Variables / Secrets

The application requires the following environment variables to be set in a `.env` file at the project's root.

| Variable | Description | Required | Example |
| :--- | :--- | :--- | :--- |
| `OPENAI_API_KEY` | API key for the OpenAI service, used for GPT-Image generation. | Yes | `sk-xxxxxxxxxxxxxxxxxxxxxxxx` |
| `GOOGLE_API_KEY` | API key for Google AI services, used for Gemini (story structure) and Gemini Vision (layout analysis). | Yes | `AIzaSyxxxxxxxxxxxxxxxxxxxxx` |

---

## 5. How to Run

After completing the setup and installation steps, launch the application with the following command from the project's root directory:

```bash
python app.py
```

The application will start a local web server, and the interface will be accessible at the URL provided in the console (typically `http://127.0.0.1:7860`).

---

## 6. Deployment Instructions

[TODO] This section requires documentation for deploying the application to a production environment. Steps should include:
*   Recommended hosting provider (e.g., AWS, Heroku, DigitalOcean).
*   Instructions for setting up a production-grade web server (e.g., Gunicorn).
*   Configuration of a reverse proxy (e.g., Nginx).
*   Management of production environment variables/secrets.
*   Process management (e.g., using `systemd`).

---

## 7. Core Components & Logic

The application logic is encapsulated in three main classes.

### 7.1. `StoryGenerator`

*   **Responsibility:** Handles the narrative creation phase.
*   **`generate_story()`:** Takes the raw user description as input. It constructs a prompt for the Gemini model to elicit a structured response containing a list of detailed text descriptions, one for each comic panel.
*   **`enhance_visuals()`:** Processes the panel descriptions to add specific visual cues and optimizations, particularly for the "Accessible" style, ensuring high contrast and simplified object representation.

### 7.2. `ComicGenerator`

*   **Responsibility:** Manages the visual generation and processing of the comic page.
*   **`generate_page()`:** Aggregates the panel descriptions from `StoryGenerator` into a single, complex prompt for the GPT-Image model. This prompt instructs the AI to create one composite image with all panels laid out in a grid.
*   **`split_panels()`:** Receives the generated page image and the layout data from `GeminiVision`. It uses this data to crop the page into individual panel images with high precision.

### 7.3. `GeminiVision`

*   **Responsibility:** Performs visual analysis on the generated comic page.
*   **`analyze_layout()`:** This is the core of the intelligent panel-splitting feature. It takes the full-page image as input and uses the Gemini Vision model to visually identify the boundaries of each panel. It returns a dictionary containing the coordinates and dimensions of the detected grid, which is more robust than assuming a fixed grid layout.

---

## 8. Third-party Dependencies

The complete list of Python packages is specified in `requirements.txt`. Key dependencies include:

*   **`openai`**: Python client for the OpenAI API.
*   **`google-generativeai`**: Python client for the Google AI (Gemini) API.
*   **`python-dotenv`**: For loading environment variables from the `.env` file.
*   **`Pillow`**: For image manipulation (cropping and saving).
*   **[Info Needed]**: The web framework used to build `app.py` (e.g., `gradio`, `flask`, `fastapi`).

---

## 9. Testing Instructions

[TODO] A testing framework has not been established for this project. Future work should include:
*   **Test Suite Setup:** Choose and configure a testing framework (e.g., `pytest`).
*   **Unit Tests:** Create unit tests for individual methods in `StoryGenerator`, `ComicGenerator`, and `GeminiVision`. This should involve mocking the API calls to AI services to test the data processing logic in isolation.
*   **Integration Tests:** Develop tests for the entire generation pipeline, from user input to final split panels.
*   **Continuous Integration:** Set up a CI pipeline (e.g., using GitHub Actions) to run tests automatically on pull requests.

---

## 10. Troubleshooting & Common Issues

[TODO] This section should be populated as common issues are identified. Potential areas to document include:
*   **API Key Errors:** Steps to verify that API keys are correctly configured and have the necessary permissions.
*   **Incoherent Stories:** Guidance on how to write effective initial descriptions to improve narrative quality.
*   **Poor Panel Splitting:** Troubleshooting steps for when Gemini Vision fails to detect the layout correctly (e.g., checking image complexity, trying a different art style).
*   **Long Generation Times:** Explanation of typical performance and factors that can cause delays (e.g., API provider latency, number of panels).

---

## 11. TODOs / Future Work

Based on the project's focus areas, the following are key areas for future development and contribution:

*   **Core Generation Logic:**
    *   Improve character consistency across multiple pages.
    *   Experiment with different AI models for potentially better visual or narrative results.
    *   Add support for including text (dialogue, captions) as an optional feature.
*   **UI/UX Enhancements:**
    *   Develop a more interactive interface for viewing and arranging panels.
    *   Allow users to regenerate individual panels without restarting the entire process.
    *   Add an option to export the final comic as a PDF or other formats.
*   **Accessibility Improvements:**
    *   Further refine the "Accessible" art style based on user feedback.
    *   Implement ARIA attributes and ensure full keyboard navigability for the web interface.
    *   Add an "image description" feature where a text-to-speech engine can describe the generated panels.
*   **Documentation:**
    *   Create a detailed API reference for developers looking to build on the platform.
    *   Write user-facing guides on how to get the best results from the generator.

---

## 12. Contact / Ownership Info

*   **Source Code:** [https://github.com/yourusername/Comic-Story-Generator](https://github.com/yourusername/Comic-Story-Generator)
*   **License:** This project is licensed under the **MIT License**. For full details, see the `LICENSE` file in the repository.
*   **Primary Contact:** [Info Needed: Add primary maintainer's name and contact information (e.g., GitHub handle or email).]