Image Generation Agent

This example demonstrates how to create an agent that generates images based on text prompts using Dhenara Agent DSL (DAD). The Image Generation Agent showcases how DAD can integrate with different AI modalities beyond text.

Agent Overview

The Image Generation Agent can:

Accept text prompts defining the images to generate
Select appropriate image generation models
Generate multiple images with different parameters
Save the generated images to a specified directory

This example demonstrates integrating with image generation models like DALL-E, Imagen, and Midjourney through their respective API interfaces.

Agent Structure

The Image Generation Agent consists of the following components:

src/agents/image_gen/
├── __init__.py
├── agent.py           # Main agent definition
├── handler.py         # Event handlers
└── flows/             # Flow definitions
    ├── coordinator.py     # Main coordinator flow
    └── image_generator.py # Image generation flow

Agent Definition

The main agent definition connects the coordinator flow:

from dhenara.agent.dsl import AgentDefinition

from .flows.coordinator import coordinator_flow

agent = AgentDefinition()
agent.flow(
    "imagegen_flow",
    coordinator_flow,
)

Data Models

The agent uses structured data models to define image generation tasks:

from dhenara.agent.types import BaseModel

class ImageTaskSpec(BaseModel):
    filename: str   # Base filename for the generated image
    prompt: str     # Text prompt describing the image to generate

class ImageTask(BaseModel):
    task_specifications: list[ImageTaskSpec]  # List of image specs to generate

Image Generation Flow

The core image generation flow handles single image generation:

from dhenara.agent.dsl import (
    PLACEHOLDER, AIModelNode, AIModelNodeSettings, EventType, FlowDefinition
)
from dhenara.ai.types import AIModelCallConfig, Prompt

# Define available models with their parameters
models_with_options = {
    "gpt-image-1": {
        "quality": "medium",
        "size": "1536x1024",
        "n": 2,
    },
    "imagen-3.0-generate": {
        "aspect_ratio": "16:9",
        "number_of_images": 1,
        "person_generation": "dont_allow",
    },
    "imagen-3.0-fast-generate": {
        "aspect_ratio": "16:9",
        "number_of_images": 1,
        "person_generation": "dont_allow",
    },
}

# Image directory configuration
global_image_directory = "$var{run_root}/global_images"

# Single image generation flow
single_img_flow = FlowDefinition()
single_img_flow.vars(
    {
        "task_spec": PLACEHOLDER,  # Will be populated by the multi-image flow
    }
)

# Image generation node
single_img_flow.node(
    "image_generator",
    AIModelNode(
        models=list(models_with_options.keys()),
        pre_events=[EventType.node_input_required],
        settings=AIModelNodeSettings(
            system_instructions=[
                "You are a professional image generation agent specialized in executing precise image operations.",
            ],
            prompt=Prompt.with_dad_text("$expr{task_spec.prompt}. \nCreate image\n"),
            model_call_config=AIModelCallConfig(
                test_mode=False,
                options={},  # Options will be set by the model handler
            ),
            save_generated_bytes=True,  # Save the images
            bytes_save_path=global_image_directory,
            bytes_save_filename_prefix="$expr{task_spec.filename}",
        ),
    ),
)

# Multi-image flow for batch processing
multi_image_flow = FlowDefinition()
multi_image_flow.vars(
    {
        "image_task": PLACEHOLDER,  # Will be populated by the coordinator
    }
)

# Process each image task specification
multi_image_flow.for_each(
    id="image_gen_loop",
    statement="$expr{image_task.task_specifications}",
    item_var="task_spec",
    max_iterations=20,
    body=single_img_flow,
)

Coordinator Flow

The coordinator flow manages the overall process, including task planning and execution:

from dhenara.agent.dsl import FlowDefinition
from dhenara.agent.dsl.inbuilt.flow_nodes.command import CommandNode, CommandNodeSettings
from dhenara.ai.types import ObjectTemplate

from ...autocoder.flows.planner import planner_flow
from .image_generator import multi_image_flow

# Load task information
task_background = read_background()
task_description = read_description()

# Coordinator flow definition
coordinator_flow = FlowDefinition()

# First, run the planner to generate the image task
coordinator_flow.subflow(
    "planner",
    planner_flow,
    variables={
        "task_background": task_background,
        "task_description": task_description,
    },
)

# Based on planning results, generate images
coordinator_flow.conditional(
    id="plan_executor",
    statement=ObjectTemplate(
        expression="$expr{py: $hier{planner.plan_generator}.outcome.structured is not None}",
    ),
    true_branch=multi_image_flow,
    true_branch_variables={
        "image_task": "$expr{$hier{planner.plan_generator}.outcome.structured}",
    },
    false_branch=FlowDefinition().node(
        "no_plan_generated",
        CommandNode(
            settings=CommandNodeSettings(
                commands=["echo 'Planner is unsuccessful.'"],
            )
        ),
    ),
)

Event Handler

The event handler manages input for the AI model nodes, including model selection and options:

from dhenara.agent.dsl import FlowNodeTypeEnum, NodeInputRequiredEvent
from dhenara.agent.utils.helpers.terminal import get_ai_model_node_input

from .flows.image_generator import ImageTask, models_with_options

async def image_gen_input_handler(event: NodeInputRequiredEvent):
    if event.node_type == FlowNodeTypeEnum.ai_model_call:
        node_input = None

        if event.node_id == "plan_generator":
            # For the planner, select from text generation models
            node_input = await get_ai_model_node_input(
                node_def_settings=event.node_def_settings,
                models=plan_gen_models,
                models_with_options=None,
                enable_option_update=False,
                structured_output=ImageTask,
            )

        elif event.node_id == "image_generator":
            # For image generation, select from image models with options
            node_input = await get_ai_model_node_input(
                node_def_settings=event.node_def_settings,
                models=None,
                models_with_options=models_with_options,
                enable_option_update=True,  # Allow updating model options
                structured_output=None,
            )

        else:
            print(f"WARNING: Unhandled ai_model_call input event for node {event.node_id}")

        event.input = node_input
        event.handled = True

Image Task JSON

You can define image tasks in a JSON file for direct loading:

{
  "task_specifications": [
    {
      "filename": "sunset_beach",
      "prompt": "A breathtaking sunset over a tropical beach with silhouettes of palm trees"
    },
    {
      "filename": "mountain_cabin",
      "prompt": "A cozy cabin in snow-covered mountains with smoke coming from the chimney"
    },
    {
      "filename": "futuristic_city",
      "prompt": "A futuristic city skyline with flying cars and holographic advertisements"
    }
  ]
}

Running the Agent

To run the Image Generation Agent:

from dhenara.agent.dsl.events import EventType
from dhenara.agent.run import RunContext
from dhenara.agent.runner import AgentRunner

from src.agents.image_gen.agent import agent
from src.agents.image_gen.handler import image_gen_input_handler
from src.runners.defs import observability_settings, project_root

root_component_id = "image_gen_root"
agent.root_id = root_component_id

run_context = RunContext(
    root_component_id=root_component_id,
    observability_settings=observability_settings,
    project_root=project_root,
    run_root_subpath="agent_image_gen",
)

run_context.register_event_handlers(
    handlers_map={
        EventType.node_input_required: image_gen_input_handler,
        # Additional event handlers...
    }
)

runner = AgentRunner(agent, run_context)

Output and Artifacts

After running the agent, you'll find:

Generated images saved in the global_images directory with the specified filenames
Run artifacts showing the processing steps and model outputs
Context information in the run directory for debugging or analysis

Customization Options

You can customize this agent in several ways:

Model Selection: Add or modify the available image generation models
Model Parameters: Adjust quality, size, and other model-specific parameters
Prompt Engineering: Enhance prompt templates for better image generation
Post-Processing: Add nodes for image processing after generation
Batching Logic: Modify how image generation tasks are batched and processed

Conclusion

The Image Generation Agent demonstrates how DAD can integrate with image generation models to create a flexible and powerful workflow. This example showcases key DAD features like:

Multi-modal AI Integration: Working with both text and image models
Structured Data Handling: Using Pydantic models for task specifications
Parameter Customization: Configuring model-specific parameters
Artifact Management: Saving generated binary data (images)
Batch Processing: Processing multiple image generation tasks

By understanding this example, you can create your own specialized agents that leverage visual AI capabilities within the DAD framework.

Agent Overview​

Agent Structure​

Agent Definition​

Data Models​

Image Generation Flow​

Coordinator Flow​

Event Handler​

Image Task JSON​

Running the Agent​

Output and Artifacts​

Customization Options​

Conclusion​