← All tools

Desktop Automation (Windows)

Windows Window

List, focus, minimize, maximize, restore, or close top-level windows.

windows_window

Overview

Gives the agent control of any visible desktop window the same way Alt-Tab + the title bar buttons do. The agent picks a window by a fragment of its title (the 'list' op helps disambiguate when several windows match), then asks to focus / minimize / maximize / restore / close it — and you approve each step before it runs.

How it works

Implements: list (process name + PID + visible title), focus (bring to foreground via SetForegroundWindow), minimize/maximize/restore (ShowWindow), close (WM_CLOSE). User32 is invoked via PowerShell P/Invoke, so the only runtime dep is PowerShell itself — no native libraries to install. Title matching is a case-insensitive substring; the first match is acted on, so be specific or use list to disambiguate. Pass apply=true to execute; otherwise the tool returns a dry-run plan.

Example

When a user asks:

Bring the Excel budget workbook to the front and maximize it.

the agent calls the tool:

windows_window(operation="focus", title="Budget", apply=true) then windows_window(operation="maximize", title="Budget", apply=true)

and gets back: two approval prompts, then the matching window is brought forward and maximized.

Configuration

Set these before calling the tool. Values marked required must be present or the tool call will fail.

swarmai.tools.windows.enabled required

Master switch for the Windows tool category.

swarmai.tools.windows.window.timeout optional

Per-PowerShell-invocation timeout for window ops. Default 10s.

Use it in a workflow

Wire this tool into a SwarmAI crew. Use the YAML DSL for declarative workflows, or the Java builder API when you want full programmatic control.

YAML DSL

# focus-and-tile.yaml
name: focus-and-tile-crew
process: SEQUENTIAL

agents:
  - id: operator
    role: Window Operator
    goal: Find a target window and bring it forward
    tools:
      - windows_window

tasks:
  - id: focus-task
    agent: operator
    description: Find a window matching 'Budget', bring it to the foreground, and maximize it.

Java

import ai.intelliswarm.swarmai.agent.Agent;
import ai.intelliswarm.swarmai.task.Task;
import ai.intelliswarm.swarmai.swarm.Swarm;
import ai.intelliswarm.swarmai.swarm.SwarmOutput;
import ai.intelliswarm.swarmai.process.ProcessType;
import ai.intelliswarm.swarmai.tool.windows.WindowsWindowTool;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.beans.factory.annotation.Autowired;

@Autowired ChatClient chatClient;
@Autowired WindowsWindowTool windowsWindowTool;

Agent operator = Agent.builder()
    .role("Window Operator")
    .goal("Find a target window and bring it forward")
    .chatClient(chatClient)
    .tool(windowsWindowTool)
    .build();

Task operatorTask = Task.builder()
    .description("Find a window matching 'Budget', focus it, then maximize.")
    .agent(operator)
    .build();

SwarmOutput result = Swarm.builder()
    .agent(operator)
    .task(operatorTask)
    .process(ProcessType.SEQUENTIAL)
    .build()
    .kickoff();

What it's good for

Real scenarios where agents put this tool to work.

Bring a target app forward as a setup step before screenshot or input automation
Tile or stack windows during a 'focus mode' workflow
Close a stack of confirmation dialogs an upstream agent left behind (with approval)
Combine with windows_process to launch + focus an app the next agent needs

Source

Implementation lives at swarmai-tools/src/main/java/ai/intelliswarm/swarmai/tool/windows/WindowsWindowTool.java in the swarm-ai repository.

Open windows_window on GitHub →