Architecture
DojOps is a modular DevOps agent system with 38 built-in skills, a custom skill system, 32 specialist agents, sandboxed execution, approval workflows, and hash-chained audit trails. Every LLM output is schema-validated and policy-checked before anything gets written.
High-Level Data Flow
User
|
v
CLI (@clack/prompts TUI) / REST API (Express)
|
v
Agent Router (32 specialist agents, keyword confidence scoring)
|
v
Planner Engine (LLM -> TaskGraph -> topological execution)
|
v
Skill Registry (38 built-in skills + custom skills, unified discovery)
|
v
Skill SDK Layer (BaseSkill<T>, Zod validation)
|
v
Execution Engine (Sandboxed, policy-enforced, approval-gated, audit-logged)Package Architecture
DojOps is a pnpm monorepo with Turbo build orchestration. TypeScript (ES2022, CommonJS). All packages use the @dojops/* scope.
12 Packages
@dojops/cli CLI entry point + rich TUI (@clack/prompts)
@dojops/api REST API (Express) + web dashboard + factory functions
@dojops/skill-registry Skill registry + custom skill system (discovers built-in + custom skills)
@dojops/planner TaskGraph decomposition + topological executor
@dojops/executor SafeExecutor: sandbox + policy engine + approval + audit log
@dojops/mcp MCP (Model Context Protocol) client — server lifecycle, tool discovery, dispatcher
@dojops/runtime 38 built-in DevOps skills as .dops files (DopsRuntime)
@dojops/scanner 10 security scanners + remediation engine
@dojops/session Chat session management + autonomous agent loop (AgentLoop) + memory + context injection
@dojops/context Context7 documentation augmentation for skills
@dojops/core LLM abstraction + 7 providers + 32 specialist agents + CI debugger + infra diff + DevOps checker
@dojops/sdk BaseSkill<T> abstract class with Zod validation + optional verify() + file-reader utilitiesDependency Flow
@dojops/cli
+-- @dojops/api
| +-- @dojops/skill-registry
| | +-- @dojops/runtime
| | | +-- @dojops/core
| | | +-- @dojops/sdk
| | +-- @dojops/core
| | +-- @dojops/sdk (zod)
| +-- @dojops/planner
| | +-- @dojops/core
| | +-- @dojops/sdk (zod)
| +-- @dojops/executor
| | +-- @dojops/sdk
| +-- @dojops/scanner
| +-- @dojops/session
| +-- @dojops/coreSimplified linear flow:
cli -> api -> skill-registry -> runtime -> core -> sdk
-> planner -> executor
-> scanner
-> session -> executor -> core
cli -> mcp -> core (optional, dynamic import)Layer Descriptions
1. LLM Layer (@dojops/core)
Abstraction over seven LLM providers with structured JSON output:
| Provider | JSON Mode Mechanism | SDK |
|---|---|---|
| OpenAI | response_format: { type: "json_object" } | openai |
| Anthropic | JSON prefill technique | @anthropic-ai/sdk |
| Ollama | format: "json" | ollama |
| DeepSeek | OpenAI-compatible API with custom baseURL | openai |
| Gemini | responseMimeType: "application/json" | @google/genai |
| Mistral | OpenAI-compatible API with custom baseURL | @mistralai/mistralai |
| GitHub Copilot | OpenAI-compatible API with Copilot baseURL + JWT auth | openai |
Key interface:
interface LLMProvider {
name: string;
generate(request: LLMRequest): Promise<LLMResponse>;
generateWithTools?(request: LLMToolRequest): Promise<LLMToolResponse>;
listModels?(): Promise<string[]>;
}The optional generateWithTools() method enables native tool-calling for the autonomous agent loop. OpenAI, Anthropic, and Gemini use provider-native tool-calling APIs; Ollama uses a prompt-based fallback that injects tool descriptions into the system prompt and parses structured JSON output.
All responses pass through parseAndValidate(), strips markdown fences, JSON.parse, Zod safeParse, ensuring every LLM output conforms to the expected schema. All 7 providers support temperature passthrough for deterministic reproducibility (conditionally included in API calls only when explicitly set). A DeterministicProvider wrapper forces temperature: 0 on every call for replay mode (apply --replay). A FallbackProvider wraps multiple providers and automatically falls back to the next on failure (configured via --fallback-provider flag or DOJOPS_FALLBACK_PROVIDER env var). The GitHubCopilotProvider creates a new OpenAI client per generate() call to use the freshest JWT (tokens expire every ~30 min).
2. Multi-Agent System (@dojops/core)
32 built-in specialist agents with keyword-based routing and confidence scoring, plus support for custom agents. The AgentRouter scores prompts against each agent’s keyword list and routes to the highest-confidence match. If no agent exceeds the threshold, it falls back to the general-purpose DevOpsAgent.
Custom agents are defined as structured README.md files in .dojops/agents/<name>/ (project) or ~/.dojops/agents/<name>/ (global). They can be created via LLM (dojops agents create "description") or manually (dojops agents create --manual). Custom agents participate in the same keyword-based routing as built-in agents and can override built-in agents by name. Discovery is handled by @dojops/skill-registry.
Three specialized analyzers (not routed via AgentRouter) provide structured analysis:
CIDebugger, CI log diagnosis producingCIDiagnosis(error type, root cause, fixes)InfraDiffAnalyzer, Infrastructure diff analysis producingInfraDiffAnalysis(risk, cost, security)DevOpsChecker, DevOps config quality analysis producingCheckReport(score 0-100, findings, missing files)
See Specialist Agents for the full agent list.
3. Task Planner (@dojops/planner)
LLM-powered goal decomposition into structured, dependency-aware task graphs with agent-aware delegation. The decomposer assigns specialist agents to tasks based on domain relevance, and the executor injects each agent’s system prompt as domain context during skill generation. Uses Kahn’s algorithm for topological execution ordering, $ref:<taskId> for inter-task data wiring, and completedTaskIds for resume after partial failures.
See Task Planner for details.
4. Skill SDK (@dojops/sdk)
Abstract BaseSkill<T> class with Zod input schema validation, abstract generate() for LLM generation, optional execute() for file writes, and optional verify() for external tool validation. Also provides readExistingConfig(), backupFile(), atomicWriteFileSync() (temp + rename for crash-safe writes), and restoreBackup() utilities.
See DevOps Skills for the skill pattern.
4b. DOPS Runtime (@dojops/runtime)
The DOPS runtime processes .dops skill files, a declarative format combining YAML frontmatter with markdown prompt sections. Raw content generation with Context7 integration.
Frontmatter sections (all optional except meta, files):
| Section | Purpose |
|---|---|
meta | Name, version, description, author, license, tags, repository |
context | Technology context, output guidance, best practices, Context7 library references |
files | Output file specs with path templates, format, serialization options |
scope | Write boundary, explicit list of allowed write paths (enforced at file-write time) |
risk | Self-classification: LOW / MEDIUM / HIGH with rationale string |
execution | Mutation semantics: mode (generate/update), deterministic, idempotent flags |
update | Update behavior: strategy (replace/preserve_structure), inputSource, injectAs |
detection | Existing file detection paths for auto-update mode |
verification | Structural rules + optional binary verification command |
permissions | Filesystem, child_process, and network permission declarations |
Markdown sections: ## Prompt (required), ## Keywords (required).
Key runtime features:
DopsRuntime, Runtime class for.dopsskillsparseDopsFile()/parseDopsString(), Parsers for.dopsfilescompilePrompt(), Compiles prompts with{outputGuidance},{bestPractices},{context7Docs},{projectContext}variablesstripCodeFences(), Strips markdown code fences from raw LLM output before writingDocProviderinterface, Enables Context7 documentation augmentation for skillsDopsRuntime.risk, Returns declared risk or defaults to{ level: "LOW", rationale: "No risk classification declared" }DopsRuntime.metadata, IncludesriskLevel,systemPromptHash,toolHashfor audit integration- Scope enforcement,
writeFiles()validates resolved paths againstscope.writepatterns after{var}expansion; out-of-scope writes throw - Update strategy,
preserve_structureinjects additional prompt instructions to maintain existing config organization
5. DevOps Skills (@dojops/runtime)
38 built-in skills covering CI/CD, IaC, containers, monitoring, and system services. All are .dops skills in packages/runtime/skills/, processed by DopsRuntime, generating raw file content directly via LLM with Context7 documentation augmentation. All skills support updating existing configs via auto-detection, existingContent input, and .bak backup before overwrite. All file writes use atomicWriteFileSync() for crash safety. Every execute() returns filesWritten/filesModified for rollback tracking.
See DevOps Skills for the full skill list.
5b. Skill Registry (@dojops/skill-registry)
Unified registry layer between consumers (Planner, Executor, CLI, API) and skill implementations. Combines all 38 built-in skills with custom skills discovered from disk:
.dopsskill discovery, Discovers.dopsskills from.dojops/skills/(project) and~/.dojops/skills/(global)- Skill validation, Zod schema validates
.dopsfrontmatter - Skill policy,
.dojops/policy.yamlsupportsallowedSkillsandblockedSkillslists - Audit enrichment, Custom skill executions include
toolType,toolSource,toolVersion,toolHash, andsystemPromptHashin audit entries - Skill isolation, Verification commands restricted to a whitelist of 34 allowed binaries,
child_processpermission must be"required"for execution, path traversal (..) blocked in file paths and detector paths - Unified interface,
SkillRegistry.getAll()returnsDevOpsSkill[], so Planner, Executor, and API remain unchanged
6. Execution Engine (@dojops/executor)
Orchestrates generate -> verify -> approve -> execute with policy enforcement, sandboxed file operations, and audit logging.
See Execution Engine for details.
7. Security Scanner (@dojops/scanner)
10 scanners (npm-audit, pip-audit, trivy, gitleaks, checkov, hadolint, shellcheck, trivy-sbom, trivy-license, semgrep) with LLM-powered remediation, scan comparison (--compare), and license compliance checking.
See Security Scanning for details.
8. Chat Sessions (@dojops/session)
Multi-turn conversation management with memory windowing, LLM-generated summaries, project context injection, and session persistence.
8b. Autonomous Agent Loop (@dojops/session + @dojops/executor)
The AgentLoop implements a ReAct (Reasoning + Acting) pattern, an iterative cycle where the LLM reasons about what to do, calls a tool, observes the result, and repeats until the task is complete. This replaces the one-shot generation model for complex tasks that require project awareness.
7 agent tools: read_file, write_file, edit_file, run_command, run_skill, search_files, done
The ToolExecutor dispatches tool calls to sandboxed operations enforced by ExecutionPolicy. File writes are policy-checked, commands run with timeouts, and outputs are truncated at 32KB. The loop terminates on the done tool, iteration limit (default 20), or token budget exhaustion.
Available via dojops auto <prompt> (CLI), POST /api/auto (API), and /auto <prompt> (chat).
8c. MCP Support (@dojops/mcp)
The MCP (Model Context Protocol) package enables the autonomous agent to call tools from external servers — databases, cloud APIs, GitHub, etc. MCP is a Linux Foundation standard supported by Claude Code, Codex, Gemini CLI, and Copilot CLI.
McpClientManager— Manages server lifecycle (connect all at agent start, disconnect on completion). Supportsstdio(local subprocess) andstreamable-http(remote endpoint) transports.McpToolDispatcher— Bridges MCP tools into theToolExecutordispatch chain. Parsesmcp__<server>__<tool>names and routes to the correct server.- Config:
.dojops/mcp.json(project) +~/.dojops/mcp.json(global). Project config overrides global by server name.
See MCP for configuration and usage details.
8d. Streaming Output
All 7 LLM providers support generateStream() for real-time token streaming. The LLMProvider interface includes an optional generateStream?(request, onChunk) method. Streaming is used in dojops "prompt" for agent-routed generation commands. Structured output (schema/skill requests) falls back to spinner mode.
9. REST API & Dashboard (@dojops/api)
Express-based API with dependency injection via createApp(deps). Uses @dojops/skill-registry to load all built-in + custom skills. 23 endpoints exposing all capabilities over HTTP with API v1 versioning (/api/v1/ prefix with backward-compatible /api/ alias, X-API-Version: 1 header on v1 routes). Vanilla web dashboard with 5 tabs (Overview, Security, Audit, Agents, History). Health endpoint reports customSkillCount. Per-route rate limiting and token budget tracking via TokenTracker.
See API Reference and Web Dashboard.
10. CLI (@dojops/cli)
Full-lifecycle CLI with rich TUI powered by @clack/prompts. Interactive prompts, spinners, styled panels, semantic log levels. Includes dojops init (repo scanner covering 11 CI platforms, IaC, scripts, security detection) and dojops check (LLM-powered DevOps config quality analysis).
See CLI Reference.
Design Principles
- No blind execution, Every LLM output is validated before use.
- Structured JSON outputs, Provider-native JSON modes + Zod schemas on all LLM responses.
- Schema validation everywhere, Tool inputs, LLM responses, plan structures, API requests.
- Idempotent operations, Generated configs produce the same result on re-execution. YAML keys are sorted for deterministic output.
- Clear separation of concerns, Orchestration, generation, validation, execution, and auditing are independent layers.
- Extensibility, New skills follow the
BaseSkill<T>pattern. New agents are registered in the specialist list. - Declarative safety,
.dopsskills declare their own scope boundaries, risk levels, and execution semantics, enabling automated policy enforcement without hardcoded tool-specific rules.
Data Storage
DojOps stores project state in the .dojops/ directory:
.dojops/
context.json Project context v2 (languages, 11 CI platforms, IaC, containers,
monitoring/web servers, scripts, security configs, devopsFiles[])
session.json Current session state
plans/ Saved TaskGraph plans (*.json)
execution-logs/ Per-execution results (*.json)
scan-history/ Security scan reports (*.json)
sessions/ Chat session persistence (*.json)
skills/ Project-scoped custom skills (.dops files)
agents/ Project-scoped custom agents (<name>/README.md)
memory/
dojops.db SQLite database (WAL mode): tasks_history, notes, error_patterns
mcp.json MCP server configuration (project-scoped)
policy.yaml Skill policy (allowedSkills / blockedSkills)
history/
audit.jsonl Hash-chained audit log (append-only)
lock.json Execution lock (PID-based)
~/.dojops/
config.json User configuration (provider, model, tokens)
vault.json AES-256-GCM encrypted secrets vault
backups/ Config backup snapshots
skills/ Global custom skills (shared across projects)
toolchain/ System binary sandbox (installed verification binaries)
agents/ Global custom agents (shared across projects)
mcp.json Global MCP server configuration