Desktop Automation (Windows)
Capture the screen, a monitor, or a named window to a PNG.
windows_screenshotThe visual equivalent of windows_filesystem's read ops — lets an agent take a snapshot of the user's screen, a particular monitor, or a specific window so a downstream step (vision LLM, bug-report-generator, visual diff) has something to work with. The agent never picks where the file lands; the output directory is fixed by configuration.
Implements three modes: screen (the primary monitor via java.awt.Robot), monitor (any 0-based monitor index for multi-display setups), and window (find the bounds of a window matching a title substring via PowerShell + User32, then capture that rectangle). Output goes to a single configured directory with a timestamped filename. Each capture is a MutationPlan routed through the approval gate (a screenshot is a write to disk + a privacy event, not a free read). Pass apply=true to execute; otherwise the tool returns a dry-run plan.
When a user asks:
Snapshot the Excel budget window so the next agent can OCR it.
the agent calls the tool:
windows_screenshot(operation="window", title="Budget", apply=true)and gets back: an approval prompt, then a PNG at the configured output dir + the absolute path so a vision agent can pick it up.
Set these before calling the tool. Values marked required must be present or the tool call will fail.
swarmai.tools.windows.enabled required Master switch for the Windows tool category.
swarmai.tools.windows.screenshot.output-dir optional Allowlisted directory where captured PNGs are written.
swarmai.tools.windows.auto-approve optional Skip the y/N prompt for capture writes. Intended for non-interactive runs only.
Wire this tool into a SwarmAI crew. Use the YAML DSL for declarative workflows, or the Java builder API when you want full programmatic control.
YAML DSL
# bug-report.yaml
name: bug-report-crew
process: SEQUENTIAL
agents:
- id: capturer
role: Bug Capturer
goal: Snapshot the failing app window and attach it to a bug report
tools:
- windows_screenshot
tasks:
- id: capture-task
agent: capturer
description: Find the window with 'Error' in the title and capture it to a PNG.Java
import ai.intelliswarm.swarmai.agent.Agent;
import ai.intelliswarm.swarmai.task.Task;
import ai.intelliswarm.swarmai.swarm.Swarm;
import ai.intelliswarm.swarmai.swarm.SwarmOutput;
import ai.intelliswarm.swarmai.process.ProcessType;
import ai.intelliswarm.swarmai.tool.windows.WindowsScreenshotTool;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.beans.factory.annotation.Autowired;
@Autowired ChatClient chatClient;
@Autowired WindowsScreenshotTool windowsScreenshotTool;
Agent capturer = Agent.builder()
.role("Bug Capturer")
.goal("Snapshot the failing app window and attach it to a bug report")
.chatClient(chatClient)
.tool(windowsScreenshotTool)
.build();
Task capturerTask = Task.builder()
.description("Find the first window matching 'Error' and capture its bounds.")
.agent(capturer)
.build();
SwarmOutput result = Swarm.builder()
.agent(capturer)
.task(capturerTask)
.process(ProcessType.SEQUENTIAL)
.build()
.kickoff();Real scenarios where agents put this tool to work.
Implementation lives at swarmai-tools/src/main/java/ai/intelliswarm/swarmai/tool/windows/WindowsScreenshotTool.java in the swarm-ai repository.