Software Factory 101

Executive summary

A software factory is not just “AI that helps write code”. It is a repeatable system for turning work requests into tested changes, pull requests, releases, or deployments while capturing the context and lessons that future runs need. The architecture below maps that idea as a continuous AI-native loop: human intent enters, workflow tools and project context guide the work, agents plan and execute inside controlled environments, checks and human review decide what can move forward, deployment paths hand off approved output, and feedback loops retain lessons for the next run. The “factory” part matters because work moves through structured stages, outputs are standardised, quality control exists, knowledge is retained, and throughput can improve over time. ^{GreenfieldXHawkAlex OpMCP}

Shipping software reliably is hard. Teams repeat the same slow, error-prone loops: manual reviews, broken deployments, no institutional memory, and agents that need constant hand-holding. A software factory replaces that chaos with a governed, repeatable system where human intent drives everything and automated checks catch mistakes before they reach production.

The software factory: AI-native engineering architecture

Use this as the 101 map of the system: work starts as human intent, context and workflow tools feed the factory core, agents plan and execute in controlled environments, checks and review decide what can move forward, and learning loops back into the next run.

A software factory is easiest to understand as a closed loop. A person or system asks for work, the factory loads the project context, agents plan and make the change in a controlled environment, automated checks and human review decide whether it is safe, approved output moves to a pull request, release, or deployment, and the result is recorded so the next run starts smarter. That loop is the key difference between a one-off coding prompt and a repeatable operating system for software delivery.

Rule of thumb: a coding assistant helps while you are active; a software factory keeps working even when you are not, because the workflow, context, checks, and memory are encoded into the system rather than held only in a person’s head. ^{XHawkAlex OpClaude CodeCopilot}

What it solves

Too many handoffs, too much repeated setup, weak reuse of project knowledge, and too much human time spent on mechanical steps instead of product judgement.

What it needs

Context, orchestration, execution tools, isolation, tests, approvals, and feedback capture. Remove any one of those and the system starts looking more like a chat assistant than a factory.

How to start

Build one repeatable lane first: intake a task, load context, create a plan, run an executor in a sandbox, review with tests, produce a PR summary, then record what happened.

How a software factory works

In practice, the factory works like a controlled loop. A task arrives from a human, from a schedule, or from another system. The factory loads the right context, breaks the job into steps, runs the change in an isolated environment, checks the result, asks for approval where needed, and then stores the outcome as new reusable knowledge. Modern agent products increasingly expose the building blocks for this pattern: subagents, hooks, schedules, custom agents, MCP connectors, cloud agents, version control, background tasks, and security policies. ^{XHawkAlex OpClaudeCopilotReplitLovable}

Flow of a beginner-friendly software factory

flowchart LR A[Human intent
issue, feature, bug, incident] --> B[Intake and spec] B --> C[Context layer
docs, decisions, rules, logs] C --> D[Orchestrator] D --> E[Planner agent] E --> F[Executor agent] F --> G[Sandbox / devbox] G --> H[Reviewer agent] H --> I[Tests, lint,
security checks] I --> J{Passes guardrails?} J -- No --> E J -- Yes --> K[PR or release
bundle] K --> L[Human approval
where required] L --> M[CI/CD deploy] M --> N[Telemetry +
production feedback] N --> O[Memory /
knowledge capture] O --> C

This diagram shows the core loop: intent enters, shared context drives planning, execution stays isolated, and every approved outcome feeds memory.

Context layer

The context layer is the foundation. XHawk explicitly places codebase, feature specs, decision logs, and internal conventions under the factory core; Lovable supports persistent workspace and project knowledge; Claude Code uses files such as CLAUDE.md and project memory; MCP exists to connect AI systems to external data sources, tools, and workflows.

Sources: ^{XHawkXHawk ContextLovableClaudeMCP}

Planner, executor, reviewer

The planner breaks work into steps and clarifies acceptance criteria. The executor applies changes and runs commands. The reviewer validates the result with tests, static checks, or code review. XHawk shows that split directly; Claude Code’s subagents and multi-session teams provide similar role separation; Alex Op describes the pattern as a practical path from “AI helps code” to “AI runs the workflow”.

Sources: ^{XHawkClaudeAlex Op}

Orchestration and triggers

A factory does not wait passively for one prompt. XHawk discusses scheduled and event triggers; Claude Code supports repeated prompts and routines; GitHub Copilot cloud agent can be launched from GitHub, Jira, Linear, Slack, and Teams; Replit supports background tasks and automations.

Sources: ^{XHawkClaudeCopilotReplit}

Sandbox and curated capabilities

Safe autonomy requires boundaries. XHawk uses cloud sandboxes; Claude Code offers permission modes and sandboxed Bash with file and network isolation; Docker Compose is a simple way to define an isolated, reproducible environment for a mini factory; Copilot cloud agent runs on GitHub Actions runners and GitHub recommends fresh hosted or ephemeral runners.

Sources: ^{XHawkClaudeDockerCopilot}

Guardrails, CI/CD, and human review

The factory must know what it may do automatically and what still needs a person. XHawk’s model includes automated tests and explicit human approval. GitHub Actions gives you CI/CD, and GitHub recommends branch rules, CODEOWNERS, secret handling, and workflow policies for cloud agents. OWASP adds the agent-specific reminder that outputs must be validated before they are executed.

Sources: ^{XHawkGitHub ActionsCopilotOWASP Agent}

Memory and feedback

Alex Op stresses that the closest practical model is a loop: collect context, execute, validate, learn, and repeat. Claude Code writes and recalls project memory; XHawk describes a knowledge graph and continuous snapshots; Lovable keeps shared knowledge; production analytics or operations signals can become new context for the next run.

Sources: ^{Alex OpClaudeXHawk ContextLovable}

Data model for artifacts a factory should remember

erDiagram TASK ||--|| SPEC : becomes TASK ||--o{ AGENT_RUN : spawns SPEC }o--o{ CONTEXT_DOC : references AGENT_RUN }o--o{ CONTEXT_DOC : reads AGENT_RUN ||--o{ CHANGESET : creates CHANGESET }o--|| BRANCH : committed_to BRANCH ||--|| PULL_REQUEST : opens PULL_REQUEST ||--o{ TEST_RUN : has PULL_REQUEST ||--o{ MEMORY_ITEM : records

Even a tiny factory should treat tasks, context, runs, tests, and memory as first-class data. That is how the system becomes inspectable and improvable.

Maturity path from manual delivery to continuous factory

timeline title Software Factory Maturity section Level 1 - Manual Ad-hoc scripts, no review gate section Level 2 - Assisted AI suggestions, human commits all section Level 3 - Mini Factory Manifest-gated checks, agent co-pilot section Level 4 - Continuous Event-driven, auto-routed agents

For beginners, the important lesson is that maturity comes in layers. You do not need the final stage on day one.

Factory components

Each box in the architecture diagram maps to a component below. Expand any component to see how it works. If this is your first pass, start with Human intent and then move through the list in order.

Human intent

Highlighted in the architecture: Human intent.

Human intent is the intake point. In the diagram it includes engineers, PMs, designers, APIs, agents, and incidents. In plain English, intent is the problem statement that starts the run: "Add a watchlist page", "Investigate today's error spike", "Implement the approved design", or "Turn this bug report into a fix". ^{XHawkAlex Op}

Beginner's test: if you cannot write the request clearly enough for another person to act on it, you are not ready to hand it to a factory either. Good intake is boringly explicit: scope, constraints, protected areas, and what "done" means.

Minimal task input

{
  "task_id": "task-001",
  "intent": "Add a watchlist page",
  "acceptance_criteria": [
    "Users can save favourite stocks locally",
    "Saved stocks remain after page refresh",
    "A PR summary is generated"
  ],
  "constraints": [
    "Do not change authentication",
    "Keep the change reversible"
  ],
  "context_paths": [
    "docs/product/watchlist.md",
    "src/routes",
    "tests"
  ]
}

Telemetry and logs

Highlighted in the architecture: Telemetry & logs.

Telemetry and logs are the factory's stream of operational evidence: logs, metrics, traces, build output, customer complaints, analytics, failed tests, and incident signals. In the diagram, these signals arrive 24x7. XHawk describes live signals such as logs and metrics as context for agents; Alex Op describes support feedback, analytics dashboards, and error logs feeding the work backlog. The public XHawk material does not specify an exact telemetry schema or storage architecture, so those details are intentionally left unspecified here. ^{XHawkAlex Op}

For a beginner, telemetry answers three simple questions: what is happening, what broke, and did the last change help or hurt? Without telemetry, the factory can still build, but it cannot learn intelligently from real outcomes. With telemetry, production itself becomes an input back into development.

The dotted loop in the diagram is the same idea seen over time: completed work, production signals, and incident lessons feed the next run instead of being forgotten. ^{XHawk ContextAlex Op}

Tiny telemetry record

{
  "timestamp": "2026-05-24T09:15:00Z",
  "service": "web",
  "signal": "error-rate-spike",
  "severity": "warning",
  "details": {
    "route": "/api/watchlist",
    "count_last_5m": 42
  }
}

Workflow integration

Highlighted in the architecture: Workflow integration.

Workflow integration is the adoption layer. The diagram names Slack, GitHub, Linear, and Jira; the broader pattern includes chat, issue trackers, CI failures, schedules, webhooks, incident systems, and cloud infrastructure events. GitHub Copilot's cloud-agent documentation shows the same pattern in product form: sessions can start from GitHub surfaces and integrated tools, with the issue or thread context passed into the run. ^{XHawkGitHub Copilot}

In practice, workflow integration matters because it removes "copy this from tool A into tool B" busywork. A factory should meet the team where the work already begins, then move the right context into the factory runtime.

Context layer

Highlighted in the architecture: Context layer.

The context layer is the factory's memory and grounding system. It contains the relatively stable project knowledge that AI agents need before they can work effectively. This includes the codebase structure, feature specifications, architecture decisions, internal conventions, domain terminology, protected areas of the system, and deployment rules.

The key idea is that context provides long-lived understanding about how the project operates. Without this grounding, AI agents behave like generic assistants with no real understanding of the application they are modifying.

Telemetry and logs, covered above, are different from context. Telemetry is live operational evidence about what is happening right now inside the system.

Beginner distinction: context is stable project knowledge; telemetry is live operational signals.

XHawk describes context broadly as specs, past decisions, tickets, and live signals. In practice, specs, decisions, conventions, and architecture documents belong to the actual context layer. Live signals belong to the telemetry and logs layer. After review and validation, important operational learnings from telemetry may later be promoted into long-term context. ^XHawk

Platforms such as XHawk, Lovable, and Claude Code describe variations of the same core idea: workspace knowledge, project memory, codebase awareness, and context management. The shared principle is that context transforms a generic AI model into a project-specific engineering system. ^{XHawkLovableClaude}

Beginner's version: write down what your future self should not have to rediscover. Without context, agents hallucinate more easily, workflows become inconsistent, architecture drifts, and repeated mistakes return. With context, the system behaves more like an experienced engineer who already understands the project.

Starter context file

{
  "project_name": "Watchlist demo",
  "rules": [
    "Prefer small reversible changes",
    "Use feature flags for risky UI work"
  ],
  "domain_terms": {
    "watchlist": "A saved list of favourite stocks",
    "ticker": "Market symbol such as AAPL"
  },
  "protected_areas": [
    "authentication",
    "billing",
    "production secrets"
  ],
  "done_definition": [
    "Tests pass",
    "PR summary exists"
  ]
}

Software factory core

Highlighted in the architecture: The software factory core.

The software factory core is where the system turns intent and context into controlled work. In this simplified map, the core has two jobs: orchestrate specialised agents, and run their work inside a sandboxed execution environment. The orchestrator is the control system of the factory: it decides when tasks start, which agents execute, which tools are available, which approvals are required, and when workflows stop or retry. That is what separates a repeatable factory from disconnected AI tools or a one-off chat prompt. ^{XHawkClaudeReplit}

A software factory is therefore not just AI, not just coding, not just CI/CD, and not just DevOps. It is the combination of orchestration, context, execution, governance, memory, deployment, and learning into one operating system for repeatable software change.

1. Multi-agent orchestration

Multi-agent orchestration means coordinating specialised roles instead of expecting one agent to do everything well. XHawk names the planner, executor, and reviewer trio directly. Claude Code and Replit use different product language, but the runtime logic is similar: gather context, make a plan, act, verify, and repeat. ^{XHawkClaudeReplit}

The beginner lesson is important: splitting thinking, doing, and checking often makes a system easier to steer, audit, and improve. Smaller focused roles are easier to govern, failures are easier to diagnose, and automation becomes safer because each role has a clearer job.

1a. Planner

The planner turns intent and context into ordered work. It identifies affected systems, determines the implementation approach, breaks the job into steps, names acceptance criteria, calls out constraints, and marks risky areas before anything changes.

1b. Reviewer

The reviewer validates output against the plan, tests, linting, architecture rules, requirements, and likely failure modes. In a small factory this can be a strict review checklist; in a larger one it can be a specialised agent plus automated checks and human approval.

1c. Executor

The executor performs the work inside the allowed environment: writing code, editing files, running commands, updating tests, creating artifacts, recording outputs, and reporting what happened. The executor should not silently widen scope; it should work inside the plan, constraints, and sandbox.

2. Sandboxed execution

Sandboxed execution is the safety boundary for autonomous work. Agents should operate inside isolated environments rather than directly against production systems. Typical implementations include Docker containers, cloud devboxes, ephemeral virtual machines, isolated Git branches, restricted filesystem access, and restricted network access. XHawk describes isolated repo copies and cloud sandboxes or devboxes; GitHub Copilot's cloud agent uses an ephemeral development environment powered by GitHub Actions; Claude Code's sandboxing guidance is explicit that useful isolation needs both filesystem and network boundaries. Docker Compose is the most approachable beginner tool for modelling a repeatable isolated environment because it defines services, networks, and volumes in one YAML file. ^{XHawkGitHub CopilotClaudeDocker}

Safe autonomy means giving the system enough room to work without giving it uncontrolled access to production systems, secrets, or the whole machine. Devbox environments are one practical form of that boundary. The key beginner insight is blunt but useful: autonomy without isolation is dangerous, because autonomous systems will eventually make mistakes and the sandbox limits the blast radius.

Crucial beginner warning: Node's vm module is not a security mechanism for running untrusted code. If you need isolation, use real process, VM, or container boundaries instead. ^Node

Curated capabilities

Highlighted in the architecture: Curated capabilities.

Curated capabilities are the small approved toolbox the factory can use. In the diagram that toolbox includes APIs, tests, MCP, and docs; in practice it can also include approved databases, documentation systems, deployment interfaces, and other tightly scoped services. XHawk frames this as a curated capability set where tool quality matters more than tool quantity. MCP provides a standard way for servers to expose tools, resources, and prompts; GitHub Copilot also uses MCP to extend Copilot with other systems; Claude Code's hooks and subagents show how runtime actions can be attached to lifecycle points. ^{XHawkMCPCopilotClaude}

The beginner translation is simple: do not give the agent a random pile of tools. A good factory minimises unnecessary capabilities, tightly scopes permissions, and standardises interfaces. Start with file read and write inside the sandbox, test execution, repository operations, approved documentation access, and one path for opening a pull request or writing a review summary. Fewer, safer, higher-quality tools beat unlimited access.

Guardrails and human review

Highlighted in the architecture: Guardrails + human review.

Guardrails and human review are the control layer. In the diagram this contains tests and PR generation, with review before production. Guardrails include automated tests, lint checks, security scans, branch protections, approval gates, deployment policies, permission systems, and human review requirements. XHawk describes agents generating PRs and running tests while humans define requirements and approve outputs. GitHub's Copilot guardrails guidance points to policy planning, branch rulesets, permissions, protected configuration ownership, and secure runner choices. OWASP's AI Agent Security guidance adds agent-specific controls such as validating tool use, limiting retries and tokens, and testing for approval bypass, privilege escalation, memory poisoning, and data exfiltration. ^{XHawkCopilotGitHub rulesOWASP Agent}

In plain English, guardrails answer three questions: what may happen automatically, what requires approval, and what is forbidden entirely? Good factories automate aggressively, but constrain aggressively too. Humans increasingly focus on intent, product judgement, risk management, approval of sensitive changes, architecture direction, and exception handling rather than only manual coding.

Production systems

Highlighted in the architecture: Production systems.

Production systems are where approved work lands: applications, services, databases, infrastructure, scheduled jobs, and deployments that real users depend on. The diagram labels this simply as deployment. GitHub Actions is a straightforward primary-source example of the last mile: workflows live in .github/workflows, respond to repository events, and can run CI, deployments, and automations. ^{GitHub ActionsXHawk}

A beginner does not need a fancy deployment stack to understand this box. If a change can be tested, approved, and released through a repeatable path, you have the seed of the production end of the factory.

Every run should generate reusable knowledge: successful fixes, failed approaches, architecture decisions, incident resolutions, deployment lessons, and operational learnings. XHawk describes tasks being audited and remembered, with sessions, decisions, and context becoming indexed knowledge; Claude Code, Lovable, and similar tools expose memory-like runtime pieces for continuity. Public XHawk pages describe indexed knowledge and snapshots, but do not specify the exact persistence implementation. Memory is how the factory compounds. ^{XHawk ContextClaudeLovable}

This site is a working Level 3 factory

The repository you are reading is not just documentation about a software factory - it is one. Every concept described above has a concrete implementation you can read, fork, or adapt. This site currently operates at Level 3 (Mini Factory).

Factory concept	Implementation in this repo	Where to look
Context layer	Agent instructions and context capsules	`AGENTS.md`, `docs/agent-context.md`, `docs/context/`
Automated reviewer	Validation scripts that run before commit and in CI	`scripts/check-*.mjs`
Pre-commit gate	Manifest-driven change router blocks unmapped files	`.git/hooks/pre-commit`, `scripts/run-change-checks.mjs`
CI governance	GitLab governance job runs the same checks on merge	`.gitlab-ci.yml`
Risk-tiered routing	Files are classified into change classes; policy defines risk tier per class	`.factory/change-classes.json`, `.factory/policy.json`
Generated documentation	README, AGENTS, and docs/generated are rebuilt from contract data	`scripts/generate-docs.mjs`, `docs/factory-contract.json`
Telemetry	Validation-run events are written to a managed Postgres table	`supabase/schema.sql` - `factory_events` table
Pattern analysis	Failing validation events can generate ADR draft inputs	`scripts/generate-adr-draft.mjs`, `docs/generated/adr-drafts.json`

What Level 4 would add

Event-driven triggers that automatically decompose work from telemetry signals, parallel agent orchestration across multiple concurrent tasks, and production metrics feeding routing decisions without human prompting. The architecture is already designed to accommodate this - the context layer, telemetry table, and manifest routing are the foundations.

Choose your end-to-end software factory path

A path is not one tool. It is a complete loop: task in, context loaded, work produced, checks run, human review, deployment or handoff, and lessons recorded. Pick based on where your work already lives and how much setup you can tolerate.

Choose your starting lane

flowchart TD A[Start here] --> B{Can you write basic code?} B -- No --> C[Path 1: Builder
Lovable or Replit] B -- Yes --> D{Where work lives?} D -- GitHub --> E[Path 2: GitHub-native
Issues, PRs, Actions] D -- GitLab or edge app --> F[Path 3: GitLab + Workers
MRs, CI, Wrangler] D -- Local learning --> G[Path 4: DIY local
Node + Docker] E --> H[Control-plane
reference] F --> H G --> H

Pick the lane here, then follow its exact steps in End-to-end setup steps by path. No guessing or translation required.

Path 1: Builder

Difficulty: easiestControl: lowFirst result: live prototypeMain risk: hidden mechanics

Use Lovable or Replit when you want the shortest route from idea to a live prototype and can accept that the platform hides much of the machinery.

Go to Builder steps

Path 2: GitHub-native

Difficulty: easyControl: mediumFirst result: guarded PRMain risk: repo permissions

Use GitHub when your work already lives in issues, branches, pull requests, Actions, branch rules, and CODEOWNERS.

Go to GitHub-native steps

Path 3: GitLab + Workers

Difficulty: mediumControl: medium-highFirst result: deployed WorkerMain risk: CI secrets

Use GitLab and Cloudflare Workers when you want a visible issue-to-MR-to-CI-to-edge-deploy lane.

Go to GitLab + Workers steps

Path 4: DIY local factory

Difficulty: technicalControl: highFirst result: local loopMain risk: building too much

Use the DIY path when you want to see every moving part: queue, context, planner, executor, reviewer, tests, sandbox, and memory.

Go to DIY steps

Reference architecture: Control plane. This is where mature teams may go later, not where a beginner should start. An XHawk-style control plane coordinates shared context, sandboxes, workflow integrations, policy gates, telemetry, and continuous feedback across many repos or teams. The tutorial below is one small cell of what a control plane coordinates. ^XHawkContext

Go to control-plane roadmap

Tool shopping list

Start with the smallest stack that proves the loop. You can add more capable agents later; the first win is a repeatable lane with context, isolation, checks, and review.

Layer	Recommended beginner tool	Why this tool	Cost / access reality
Version control	GitHub plus Git CLI or GitHub Desktop	Gives you issues, branches, pull requests, Actions, rulesets, CODEOWNERS, and a common review workflow.	Git is free. GitHub public repos are free; some private, team, or governance features may depend on plan limits.
GitLab path	GitLab project, Issues, Merge Requests, and GitLab CI/CD	Gives you a single place for intake, review, test pipelines, protected deploy jobs, and deployment history.	GitLab has free tiers; CI minutes, approvals, protected environments, and governance features can vary by plan.
Runtime	Node.js 24 LTS with npm	The tutorial uses Node built-ins for files, subprocesses, tests, assertions, and fetch, so there is very little framework setup.	Free and local.
Isolation	Docker Desktop and Docker Compose	Lets the executor run in a repeatable environment instead of directly on your host machine.	Free for many personal and small-business uses; check Docker’s current licence for organisation use.
CI/CD	GitHub Actions	Runs checks on pull requests and can trigger factory runs manually or on a schedule.	Free minutes are available for many repos; private repos and larger runners can consume paid minutes.
Edge deploy target	Cloudflare Workers with Wrangler	Lets the GitLab path deploy a small backend/API without managing servers. Wrangler is the CLI used for local dev, checks, types, and deploys.	Wrangler is free CLI tooling. Workers has a free tier, but usage, paid features, and account limits should be checked before production use.
GitHub agent	GitHub Copilot coding agent	Strong fit when work starts in GitHub and should end as a pull request guarded by Actions and branch rules.	Usually requires a Copilot plan and organisation settings may control access.
Custom coding runtime	Claude Code or OpenAI Agents SDK	Use when you want to design your own planner/executor/reviewer roles, tool permissions, hooks, and memory flow.	May require a paid subscription, API credits, or model-provider account.
App builder	Lovable or Replit	Good for beginners who want a working app quickly while still learning the factory concepts around context, review, and deployment.	Free tiers may be limited; serious use often needs credits, a paid plan, or a card on file.
Connectors	MCP, Jira, Linear, Slack, or GitHub integrations	Add these after the local loop works so the factory can pull real tasks and context from where the team already works.	Depends on the connected service and workspace permissions.

Day 0: install and verify

Before writing factory code, prove the command line can see the runtime, version control, and sandbox tools.

Setup check

node --version    # expect v24.x
git --version
docker --version
docker compose version

Windows note: Docker Desktop on Windows should use the WSL 2 backend. If Docker commands fail, check WSL 2 and Docker Desktop distro integration before debugging the tutorial.

Compare the tool landscape and choose the right layer

The easiest way to get confused is to treat every AI coding product as the same thing. They are not. Some are builders, some are IDE assistants, some are runtime platforms, and some are closer to an actual “factory” control plane. The table below uses each product’s official positioning and features, then adds a practical interpretation of the role it can play.

The path section compares end-to-end setups; this table compares individual tools that may appear inside those paths.

Tool	Official centre of gravity	What it is strong at	Best mental model in a factory	Use it when
Lovable ^Source	Full-stack AI web-app platform with editable code, GitHub sync, built-in knowledge, hosting and security features.	Prompt-to-app building, fast iteration, product/design collaboration, deployment of web apps, shared project knowledge, and practical security scanning.	Builder / executor layer. Inference: powerful for creating and iterating on an application, but not the whole software-factory operating model by itself.	You want a working web app quickly, especially for an MVP, prototype, internal tool, or product-validation loop.
GitHub Copilot ^Source	Contextual assistance across IDE, CLI, GitHub, project tools, chat apps, and cloud-agent workflows.	PR-based work, repo-aware agent sessions, custom agents, and integrations with tools like Jira, Linear, Slack, and Teams.	GitHub-native operator layer. Very strong if your process already centres on GitHub issues, branches, PRs, and Actions.	Your team lives in GitHub and wants background automation that still respects branch rules and review flows.
Replit ^Source	Browser-based idea-to-app platform with Agent, publishing, version control, background tasks, and connected services.	Fast start-up, browser-only development, built-in deployment, parallel agent tasks, connected services, and single-project momentum.	Integrated builder-and-host platform. Stronger than an IDE chat, but still more project-centric than an organisation-wide factory control plane.	You want the shortest path from idea to live app and prefer an all-in-one browser environment.
Claude Code ^Source	Agentic coding runtime for terminal, IDEs, web, and SDK-based integration, with hooks, subagents, memory, routines, and sandbox options.	Custom workflows, codebase-aware local or cloud execution, reusable skills, deterministic hooks, project memory, and configurable permissions.	Factory runtime / toolkit. Excellent for building your own factory because it exposes the primitives instead of hiding them.	Your team is technical and wants to design its own context, automation, and safety model.
OpenAI Agents SDK ^Source	Code-first SDK for building agentic applications with tools, handoffs, orchestration, streaming, and tracing.	Custom backend agent workflows, explicit tool contracts, model-provider integration, and product-specific control flow.	Custom factory engine. Useful when you want the software factory to be part of your own application or service.	You are comfortable writing backend code and want to own the orchestration layer rather than operate mainly inside an IDE or GitHub.
XHawk ^Source	Explicit software-factory architecture with context layer, multi-agent orchestration, cloud sandboxes, workflow integrations, human review, and production feedback.	Continuous 24x7 operation, shared context, background execution, organisation-level workflows, and delivery-system thinking.	Closest match to a software-factory product. It is presented as the control-plane model rather than merely a coding assistant.	You want the operating model itself, not just a faster way to write code in one session.

Important note: the “best mental model” column is an interpretation based on product documentation and the software-factory definitions above. It is deliberately analytical, especially for the earlier question “Is Lovable a software factory?” The short answer remains: Lovable is usually better understood as a powerful builder inside a broader factory, not the whole factory. ^{LovableXHawkAlex Op}

Practical selection advice: for a non-technical beginner building one app, Lovable or Replit may be the easiest starting point. For a GitHub-centric engineering team, Copilot plus Actions is often the shortest factory-adjacent path. For a technical team that wants to build its own system, Claude Code plus Docker plus GitHub is an excellent mini-factory foundation. For a true control-plane view of AI-native engineering, XHawk is the clearest reference architecture in this source set.

End-to-end setup steps by path

Each path below follows the same factory loop: intake, context, execution, checks, review, deployment or handoff, and memory.

Same loop, different tools: Builder platforms hide most machinery; GitHub exposes issue → PR → Actions → merge; GitLab + Workers exposes issue → MR → CI → Cloudflare deploy; DIY local exposes queue → planner → executor → reviewer; a control plane coordinates many loops.

Path navigation

Path	Use when	First concrete result	Where steps live
Path 1: Builder	You want the easiest idea-to-app route.	A live prototype.	Builder steps
Path 2: GitHub-native	Your work already lives in GitHub.	A guarded pull request.	GitHub-native steps
Path 3: GitLab + Workers	You use GitLab or want a small edge-deployed app/API.	A staging Cloudflare Worker deployment.	GitLab + Workers steps
Path 4: DIY local factory	You want to learn the mechanics directly.	A local queue-to-review loop.	DIY local steps
Reference: Control plane	You are planning maturity across many repos or teams.	A roadmap, not a day-one build.	Control-plane roadmap

Path 1: Builder end to end

This is the easiest path because Lovable or Replit hides much of the repository, CI, sandbox, and deployment machinery. You still learn the factory loop by naming the work clearly, adding context, previewing changes, reviewing output, and recording decisions. ^{LovableReplit}

Step 0: What you need

A Lovable or Replit account, one app idea, and a short note describing the user, goal, and first safe change.

Step 1: Create the workspace

Create a new app/project in the builder. Keep the first app tiny: one page, one form, one list, or one API call.

Step 2: Define intake

Write the first task as a clear request with acceptance criteria. Example: “Add a watchlist page. It must show saved items, include an empty state, and be previewable before publish.”

Step 3: Add context

Add project knowledge or notes: audience, style rules, protected areas, data rules, and what “done” means. This replaces the local context file in the DIY path.

Step 4: Add execution

Ask the builder to make one small reversible change. Avoid broad prompts such as “build the whole product” on the first run.

Step 5: Add checks

Use preview, built-in error checks, version history, and manual smoke testing. If the platform supports GitHub sync, push the change to a repo and let CI check it there.

Step 6: Add review

Review the visible change against the original acceptance criteria. If another person is involved, share the preview link instead of shipping immediately.

Step 7: Deploy or hand off

Publish through the builder only after the preview matches the task. For a team workflow, hand off through GitHub sync or exported code.

Step 8: Record memory

Save what changed, what prompt worked, what failed, and any new convention in the project knowledge area or a simple notes file.

First safe upgrade

Connect version control when available, then start requiring every builder change to have a task, preview, review note, and rollback path.

Pros: fastest start and least setup. Cons: lower control and less visibility into the underlying factory machinery.

Path 2: GitHub-native end to end

This path is for beginners whose work already lives in GitHub. The complete loop is: GitHub issue → branch → Copilot or coding agent → pull request → GitHub Actions → branch rules/CODEOWNERS → merge → memory note. ^{CopilotActionsRESTRulesPATs}

Step 0: What you need

A GitHub account, a repository, GitHub Issues enabled, GitHub Actions enabled, and either GitHub Copilot/coding agent access or a local assistant that can work on a branch.

Step 1: Create the workspace

Create a repository, add a README, and create these folders so factory material has a home: factory/context/, factory/runs/, .github/workflows/, and optionally .github/CODEOWNERS.

Step 2: Define intake

Create a GitHub issue label named factory. Create the first issue with a small task, acceptance criteria, and any files that should or should not be touched.

Starter GitHub issue body

## Task
Add a watchlist page.

## Acceptance criteria
- The page has a clear empty state.
- The change is small and reversible.
- Automated checks pass before merge.

## Context
- Avoid auth and billing files.
- Write a short PR summary explaining the change.

Step 3: Add context

Add a small context file that tells Copilot, a coding agent, or a human helper what rules to follow.

factory/context/project-context.md

# Project context

Goal: Build small, reviewable product changes.

Rules:
- Prefer small pull requests.
- Do not edit auth, secrets, billing, or deployment files without explicit approval.
- Every factory task starts from a GitHub issue labelled `factory`.
- Every pull request needs automated checks and a human review.

Definition of done:
- Acceptance criteria are met.
- GitHub Actions pass.
- The PR records what changed and what the next run should remember.

Step 4: Add execution

Use Copilot/coding agent or your local assistant to create a branch from the issue. Ask for one small change only, and include a link to the issue and the context file.

Step 5: Add checks

Add a minimal GitHub Actions workflow. Replace npm test with your actual test command if your project uses another stack.

.github/workflows/factory-checks.yml

name: factory-checks

on:
  pull_request:
  workflow_dispatch:

jobs:
  checks:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: "24"
      - run: npm ci
      - run: npm test

Step 6: Add review

Open a pull request, link it to the issue, and require the PR summary to mention acceptance criteria, tests run, and any files intentionally avoided.

.github/CODEOWNERS

# Require a human owner for factory and deployment controls.
/factory/ @your-github-user
/.github/ @your-github-user
/infra/ @your-github-user

Step 7: Deploy or hand off

For the beginner GitHub-native path, the first handoff is a reviewed pull request. If your app already has deployment, trigger it only after checks pass and the PR is approved.

Step 8: Record memory

Add a short note in the PR body or factory/runs/<issue-number>.md: what changed, what failed, and what rule the next task should remember.

First safe upgrade

After the manual issue-to-PR loop works, use the GitHub REST adapter in the DIY tutorial to import labelled issues into a queue or open PRs automatically. Do not automate merge first.

Reference: Control-plane roadmap end to end

This is not a day-one implementation. It is the maturity map for teams that eventually need shared context, sandboxes, policy gates, telemetry, and many connected workflows. ^XHawkContext

Step 0: What you need

At least one trusted path from above, plus enough team process to know what should be centralised.

Step 1: Create the workspace

Create a shared control area for policies, reusable context, run logs, approval rules, and tool permissions.

Step 2: Define intake

Connect the places work arrives: GitHub, GitLab, Jira, Linear, Slack, incidents, schedules, and manual requests.

Step 3: Add context

Build a shared context layer from architecture rules, code ownership, decision logs, incidents, telemetry, and repo conventions.

Step 4: Add execution

Route tasks to controlled agents or sandboxes based on risk, repo, and required tools.

Step 5: Add checks

Standardise test, security, policy, and compliance gates across the connected systems.

Step 6: Add review

Make approvals explicit: who reviews, which areas need owners, and which tasks may never auto-merge.

Step 7: Deploy or hand off

Coordinate CI/CD rather than bypassing it. The control plane should hand work to the existing delivery systems.

Step 8: Record memory

Feed outcomes, incidents, failed checks, useful prompts, and production signals back into shared context.

First safe upgrade

Start with one of Paths 1-4, then use this as the maturity map.

Path 3: GitLab + Cloudflare Workers end to end

This path uses GitLab as the workflow and review system, and Cloudflare Workers as the deploy target. The complete loop is: GitLab issue → merge request → GitLab CI checks → staging Worker deploy → manual production deploy → memory note. ^{GitLab CIMerge requestsVariablesWorkersWrangler}

Step 0: What you need

A GitLab account, a GitLab project, a Cloudflare account, Node.js, npm, Git, and Wrangler. First prove the CLI tools work.

GitLab + Workers setup check

node --version
git --version
npm --version
npx wrangler --version
npx wrangler whoami

Step 1: Create the workspace

Create a GitLab project, then create a tiny Cloudflare Worker locally and push the project to GitLab. Do not add factory automation until this basic project runs. ^{WorkersWrangler}

Create the project

npm create cloudflare@latest software-factory-worker
cd software-factory-worker
npm run dev

Step 2: Define intake

In GitLab, create a label named factory. Create the first issue with a small task, acceptance criteria, and a note that production deploys require human approval. ^GitLab MRs

Step 3: Add context

Add factory/context/project-context.md or factory/context/project-context.json with project purpose, protected files, deployment rules, and definition of done. This is what keeps the factory from guessing.

Step 4: Add execution

For the first run, make one tiny Worker change on a branch and open a merge request. The Worker handler below is enough to prove the execution target.

Minimal Worker handler

export default {
  async fetch(request, env, ctx) {
    return Response.json({
      ok: true,
      path: new URL(request.url).pathname,
      factoryStage: "staging"
    });
  }
};

Step 5: Add checks

Keep non-secret Worker configuration in wrangler.jsonc, then use wrangler check, tests, and merge request pipelines as the quality gate. Store secrets with Cloudflare secrets or GitLab CI/CD variables, not in source code. ^{WranglerSecretsGitLab variables}

wrangler.jsonc

{
  "$schema": "./node_modules/wrangler/config-schema.json",
  "name": "software-factory-worker",
  "main": "src/index.ts",
  "compatibility_date": "2026-05-23",
  "compatibility_flags": ["nodejs_compat"],
  "observability": {
    "enabled": true
  },
  "env": {
    "staging": {
      "name": "software-factory-worker-staging"
    },
    "production": {
      "name": "software-factory-worker"
    }
  }
}

Step 6: Add review

Open a GitLab merge request from your branch. The MR should link the issue, list acceptance criteria, show the pipeline result, and name the staging Worker URL once deployed. ^GitLab MRs

Step 7: Deploy or hand off

Deploy once by hand to prove Cloudflare is configured, then let GitLab CI repeat the deploy. Confirm staging before any manual production deploy. ^{Workers CI/CDGitLab CIProtected environments}

Manual Cloudflare check

npx wrangler check
npx wrangler types
npx wrangler deploy --dry-run
npx wrangler deploy --env staging

In GitLab, add masked CI/CD variables for CLOUDFLARE_API_TOKEN and CLOUDFLARE_ACCOUNT_ID. Protect production variables if production deploys only run from protected branches or tags. ^{GitLab variablesProtected environmentsCloudflare secrets}

.gitlab-ci.yml

image: node:24

stages:
  - test
  - deploy

cache:
  key: "$CI_COMMIT_REF_SLUG"
  paths:
    - .npm/

before_script:
  - npm ci --cache .npm --prefer-offline

test:
  stage: test
  script:
    - npm test
    - npx wrangler check
  rules:
    - if: '$CI_PIPELINE_SOURCE == "merge_request_event"'
    - if: '$CI_COMMIT_BRANCH == "main"'

deploy_staging:
  stage: deploy
  script:
    - npx wrangler deploy --env staging
  environment:
    name: staging
  rules:
    - if: '$CI_COMMIT_BRANCH == "main"'

deploy_production:
  stage: deploy
  script:
    - npx wrangler deploy --env production
  environment:
    name: production
  when: manual
  rules:
    - if: '$CI_COMMIT_BRANCH == "main"'

Step 8: Record memory

Add a short note to the merge request or factory/runs/<issue-number>.md: issue link, MR link, staging URL, production deploy decision, what failed, and what the next factory run should remember.

First safe upgrade

After the manual issue-to-MR-to-staging loop works, add a tiny issue importer or MR-summary generator. Keep production deploy manual until checks and review feel boring.

Beginner safety rule: first make local dev work, then staging deploy, then manual production deploy. Do not automate production until the test, review, and approval path is boring.

Path 4: DIY local factory end to end

You are here if you want to understand the factory mechanics directly before trusting a hosted product. The mini factory below is intentionally small and vendor-neutral. It uses Node.js built-ins for files, subprocesses, tests, and assertions; Docker Compose for isolation; and optional GitHub adapters only after the local loop works. The first version is deliberately conservative: instead of letting an agent rewrite your whole application immediately, it proves the workflow by planning work, producing a proposal artefact, running checks, and preparing a review summary. Once you trust the loop, you can swap in an actual coding model. ^{NodeDockerClaudeOpenAI Agents}

What this starter teaches: intake, context, planning, isolated execution, review, CI, and memory. Those are the bones of a real software factory.

Step 0: What you need

Node.js, Git, Docker Desktop/Compose, a terminal, and a small practice project. Claude Code or another coding agent comes later, after the deterministic loop is test-gated.

Step 1: Create the workspace

Create the folder structure below: factory/context, factory/tasks, factory/agents, factory/runs, tests, and optional integration folders.

Step 2: Define intake

Start with factory/tasks/queue.json. A local queue is beginner-friendly because it avoids API tokens while teaching the same intake pattern.

Step 3: Add context

Add factory/context/project-context.json with project goal, protected files, coding rules, and definition of done.

Step 4: Add execution

Add planner, executor, and reviewer modules. The executor writes a proposal first; direct code edits are a later upgrade.

Step 5: Add checks

Add Node’s built-in test runner and make every factory run execute tests before a task is marked ready.

Step 6: Add review

Generate a review summary or PR body from the run. A human should be able to inspect what happened without reading logs line by line.

Step 7: Deploy or hand off

For the local DIY starter, the handoff is a proposal and review summary. Add GitHub, GitLab, or deployment automation only after this local handoff is reliable.

Step 8: Record memory

Write run artefacts into factory/runs/: proposal, review result, test result, and what the next run should remember.

First safe upgrade

Replace the deterministic planner or proposal-only executor with Claude Code, OpenAI Agents SDK, or another coding agent only after the local loop is observable, sandboxed, and test-gated.

Detailed build: first local automation lane

The recommended beginner lane is deliberately narrow: a local queued task becomes a plan, the factory produces a proposal and review summary, tests run, and a human decides what to do next. That is enough to be a factory because the work is repeatable, inspectable, isolated, and blocked by checks.

Stage	Beginner version	Output
Issue	The starter `queue.json` while learning locally; GitHub or GitLab issues are optional upgrades.	Task title, source link, and acceptance criteria.
Run	The orchestrator loads context, plans the work, writes a proposal, and runs tests.	Proposal file, test result, and updated task status.
Review	The reviewer runs tests and the orchestrator writes a review summary for a human.	Review body, required checks, and a clear continue/stop decision.
Memory	The run records what happened so the next task has better context.	Run artefacts, outcome, and follow-up notes.

Create the folder structure

You are building: the factory floor plan

Project tree

.
├─ package.json
├─ docker-compose.yml
├─ factory/
│  ├─ context/
│  │  └─ project-context.json
│  ├─ tasks/
│  │  └─ queue.json
│  ├─ runs/
│  ├─ lib/
│  │  └─ run-command.mjs
│  ├─ agents/
│  │  ├─ planner.mjs
│  │  ├─ executor.mjs
│  │  └─ reviewer.mjs
│  └─ orchestrator.mjs
├─ scripts/
│  └─ open-pr.mjs
├─ src/
│  └─ README.md
├─ tests/
│  └─ factory.test.mjs
└─ .github/
   └─ workflows/
      └─ mini-factory.yml

This structure mirrors the earlier blueprint from our conversation: queue and context on the left, planner/executor/reviewer in the middle, and delivery artefacts on the right. Keep those concerns physically separate in the repository; beginners stay safer when the control files are easy to inspect.

Add the package manifest

You are building: the command surface

package.json

{
  "name": "mini-software-factory",
  "private": true,
  "type": "module",
  "scripts": {
    "factory": "node factory/orchestrator.mjs",
    "test": "node --test"
  }
}

Write the context file

You are building: context

The context file is where beginners usually underinvest. Start simple: state the project purpose, protected areas, coding conventions, and what “done” means. Later you can split this into architecture decisions, API conventions, incident notes, and deployment rules.

factory/context/project-context.json

{
  "projectName": "Watchlist demo",
  "goal": "A small web app that lets users save and view favourite stocks.",
  "protectedFiles": [
    "src/auth/",
    "infra/production/"
  ],
  "codingRules": [
    "Prefer small, reversible changes.",
    "Do not edit protected files without human approval.",
    "Record every proposed change in factory/runs/<task-id>/proposal.md."
  ],
  "definitionOfDone": [
    "Task has a clear plan.",
    "Automated checks run.",
    "A PR summary is ready for review."
  ]
}

Add a queue

You are building: intake

A queue makes the system explicit. One tiny JSON file is enough to prove the concept. In a fuller factory the queue may come from GitHub issues, Jira, Linear, or a database, but the operating idea is the same.

factory/tasks/queue.json

[
  {
    "id": "task-001",
    "title": "Add a watchlist page",
    "status": "pending",
    "files": ["src/README.md"],
    "acceptanceCriteria": [
      "The change is described clearly.",
      "The review step runs tests.",
      "A PR body is generated."
    ]
  }
]

Create the runner utility

You are building: tool execution

This wrapper uses Node’s child_process module to run shell commands in a controlled way. It is the minimal bridge between the orchestrator and your actual tools. ^Node

factory/lib/run-command.mjs

import { spawn } from "node:child_process";

export function runCommand(command, args = [], options = {}) {
  return new Promise((resolve, reject) => {
    const child = spawn(command, args, {
      cwd: options.cwd ?? process.cwd(),
      env: { ...process.env, ...(options.env ?? {}) },
      shell: false
    });

    let stdout = "";
    let stderr = "";

    child.stdout.on("data", chunk => { stdout += chunk.toString(); });
    child.stderr.on("data", chunk => { stderr += chunk.toString(); });
    child.on("error", reject);
    child.on("close", exitCode => {
      resolve({ command, args, exitCode, stdout, stderr });
    });
  });
}

Create the planner

You are building: planning

This first planner is deterministic on purpose. It gives you the shape of a real planner without tying the tutorial to one model provider.

factory/agents/planner.mjs

function slugify(value) {
  return value
    .toLowerCase()
    .replace(/[^a-z0-9]+/g, "-")
    .replace(/^-|-$/g, "")
    .slice(0, 40);
}

export async function planTask(task, context) {
  return {
    taskId: task.id,
    title: task.title,
    branch: `factory/${task.id}-${slugify(task.title)}`,
    filesToInspect: task.files ?? [],
    acceptanceCriteria: task.acceptanceCriteria ?? [],
    protectedFiles: context.protectedFiles ?? [],
    steps: [
      "Load project context and task details",
      "Inspect the files listed for the task",
      "Draft a reversible change proposal",
      "Run automated checks",
      "Prepare a pull request summary"
    ]
  };
}

Create the executor

You are building: execution

The executor writes a proposal artefact and runs a harmless smoke command. That makes the workflow runnable before you trust it with code edits. Once you are ready, you replace the proposal-writing part with your preferred coding model or scripted transform.

factory/agents/executor.mjs

import { mkdir, writeFile } from "node:fs/promises";
import { join } from "node:path";
import { runCommand } from "../lib/run-command.mjs";

export async function executePlan(plan) {
  const runDir = join("factory", "runs", plan.taskId);
  await mkdir(runDir, { recursive: true });

  const proposalFile = join(runDir, "proposal.md");
  const proposalMarkdown = [
    `# ${plan.title}`,
    "",
    `**Suggested branch:** \`${plan.branch}\``,
    "",
    "## Files to inspect",
    ...(plan.filesToInspect.length ? plan.filesToInspect.map(file => `- ${file}`) : ["- No files specified"]),
    "",
    "## Acceptance criteria",
    ...(plan.acceptanceCriteria.length ? plan.acceptanceCriteria.map(item => `- ${item}`) : ["- None provided"]),
    "",
    "## Planned steps",
    ...plan.steps.map(step => `- ${step}`)
  ].join("\n");

  await writeFile(proposalFile, proposalMarkdown, "utf8");

  const smoke = await runCommand(process.execPath, ["--version"]);

  return {
    runDir,
    proposalFile,
    smoke
  };
}

Create the reviewer

You are building: review

The reviewer should be boring and strict. It does not need to be clever; it needs to be dependable. Here it runs your test suite and returns a simple pass/fail result.

factory/agents/reviewer.mjs

import { runCommand } from "../lib/run-command.mjs";

export async function reviewRun() {
  const tests = await runCommand(process.execPath, ["--test"]);

  return {
    passed: tests.exitCode === 0,
    summary: tests.exitCode === 0 ? "All automated checks passed." : "Automated checks failed.",
    tests
  };
}

Create the orchestrator

You are building: orchestration

The orchestrator is the control room. It reads the queue, loads context, asks the planner for a plan, asks the executor to do the work, asks the reviewer to verify it, then records the result and updates the queue.

factory/orchestrator.mjs

import { readFile, writeFile } from "node:fs/promises";
import { join } from "node:path";
import { planTask } from "./agents/planner.mjs";
import { executePlan } from "./agents/executor.mjs";
import { reviewRun } from "./agents/reviewer.mjs";

async function readJson(path) {
  return JSON.parse(await readFile(path, "utf8"));
}

async function writeJson(path, value) {
  await writeFile(path, JSON.stringify(value, null, 2) + "\n", "utf8");
}

async function main() {
  const contextPath = join("factory", "context", "project-context.json");
  const queuePath = join("factory", "tasks", "queue.json");

  const context = await readJson(contextPath);
  const queue = await readJson(queuePath);

  const nextTask = queue.find(task => task.status === "pending");

  if (!nextTask) {
    console.log("No pending tasks.");
    return;
  }

  const plan = await planTask(nextTask, context);
  const execution = await executePlan(plan);
  const review = await reviewRun();

  const prBodyPath = join(execution.runDir, "pr-body.md");
  const prBody = [
    `## ${plan.title}`,
    "",
    `**Branch:** \`${plan.branch}\``,
    "",
    "### Acceptance criteria",
    ...plan.acceptanceCriteria.map(item => `- ${item}`),
    "",
    "### Review result",
    `- ${review.summary}`,
    "",
    "### Proposal file",
    `- ${execution.proposalFile}`
  ].join("\n");

  await writeFile(prBodyPath, prBody, "utf8");

  nextTask.status = review.passed ? "ready-for-pr" : "failed-review";
  nextTask.lastRun = {
    proposalFile: execution.proposalFile,
    prBodyPath,
    smokeExitCode: execution.smoke.exitCode,
    testExitCode: review.tests.exitCode
  };

  await writeJson(queuePath, queue);

  console.log(JSON.stringify({
    task: nextTask.id,
    status: nextTask.status,
    proposalFile: execution.proposalFile,
    prBodyPath
  }, null, 2));
}

main().catch(error => {
  console.error(error);
  process.exit(1);
});

Add a simple test

You are building: a quality gate

Node’s built-in test runner and assertion module are enough for the starter. This is important for beginners: you do not need a giant test stack to start enforcing quality gates. ^Node

tests/factory.test.mjs

import test from "node:test";
import assert from "node:assert/strict";
import { planTask } from "../factory/agents/planner.mjs";

test("planner creates a branch name and keeps acceptance criteria", async () => {
  const task = {
    id: "task-001",
    title: "Add a watchlist page",
    acceptanceCriteria: ["PR summary exists"]
  };

  const context = { protectedFiles: ["src/auth/"] };
  const plan = await planTask(task, context);

  assert.match(plan.branch, /^factory\/task-001-/);
  assert.deepEqual(plan.acceptanceCriteria, ["PR summary exists"]);
  assert.ok(plan.steps.length > 0);
});

Run the mini factory locally

You are building: the first closed loop

Commands

npm test
npm run factory

Add a Docker sandbox

You are building: isolation

Docker Compose is a practical beginner move because it creates a repeatable execution environment from one YAML file and works in development, testing, CI, and production-like flows. ^Docker

docker-compose.yml

services:
  factory:
    image: node:24-alpine
    working_dir: /workspace
    volumes:
      - ./:/workspace
    command: ["sh", "-lc", "node --test && node factory/orchestrator.mjs"]

Run inside the sandbox

docker compose run --rm factory

Optional upgrade: connect the DIY factory to GitHub

The local factory does not require GitHub. Add the GitHub pieces below only after the local queue, planner, executor, reviewer, tests, and sandbox are working. ^{GitHub ActionsGitHub REST}

Connect the factory to GitHub Actions

You are building: CI and scheduled review

GitHub Actions is a CI/CD platform that can run tests on every pull request, deploy on merge, or start work on a manual or scheduled trigger. That makes it a natural home for a beginner factory’s review and orchestration steps. ^{GitHub Actions}

.github/workflows/mini-factory.yml

name: mini-factory

on:
  pull_request:
  workflow_dispatch:
  schedule:
    - cron: "0 2 * * 1-5"

jobs:
  checks:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: "24"
      - run: node --test

  factory:
    if: github.event_name != 'pull_request'
    needs: checks
    permissions:
      contents: read
      pull-requests: write
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: "24"
      - run: node factory/orchestrator.mjs

Add optional pull-request automation

You are building: PR handoff

GitHub’s REST API supports creating and merging pull requests. Once your factory can safely commit or push a branch, this script turns the generated PR body into a real pull request. ^GitHub REST

scripts/open-pr.mjs

import { readFile } from "node:fs/promises";

const [owner, repo] = process.env.GITHUB_REPOSITORY.split("/");
const title = process.env.PR_TITLE;
const head = process.env.PR_HEAD;
const base = process.env.PR_BASE ?? "main";
const body = await readFile(process.env.PR_BODY_PATH, "utf8");

const response = await fetch(`https://api.github.com/repos/${owner}/${repo}/pulls`, {
  method: "POST",
  headers: {
    "Accept": "application/vnd.github+json",
    "Authorization": `Bearer ${process.env.GITHUB_TOKEN}`,
    "X-GitHub-Api-Version": "2022-11-28"
  },
  body: JSON.stringify({ title, head, base, body })
});

if (!response.ok) {
  throw new Error(await response.text());
}

const pr = await response.json();
console.log(`Created PR #${pr.number}: ${pr.html_url}`);

Upgrade the queue from GitHub issues

You are building: real intake

The starter uses factory/tasks/queue.json so you can learn the loop without API permissions. The first real automation lane is GitHub Issue → queue → factory run → PR summary. Once the local loop works, this tiny adapter replaces the hand-written queue with issues labelled factory. ^{GitHub RESTGitHub PATs}

scripts/import-issues.mjs

import { writeFile } from "node:fs/promises";

const [owner, repo] = process.env.GITHUB_REPOSITORY.split("/");
const url = new URL(`https://api.github.com/repos/${owner}/${repo}/issues`);
url.searchParams.set("state", "open");
url.searchParams.set("labels", "factory");

const response = await fetch(url, {
  headers: {
    "Accept": "application/vnd.github+json",
    "Authorization": `Bearer ${process.env.GITHUB_TOKEN}`,
    "X-GitHub-Api-Version": "2022-11-28"
  }
});

if (!response.ok) {
  throw new Error(await response.text());
}

const issues = await response.json();
const queue = issues
  .filter(issue => !issue.pull_request)
  .map(issue => ({
    id: `issue-${issue.number}`,
    title: issue.title,
    status: "pending",
    files: [],
    acceptanceCriteria: [
      "The issue has been reviewed.",
      "A plan or pull request summary is generated.",
      "Automated checks run before merge."
    ],
    source: issue.html_url
  }));

await writeFile("factory/tasks/queue.json", JSON.stringify(queue, null, 2) + "\n", "utf8");
console.log(`Imported ${queue.length} factory issues.`);

Understand what to replace next

You are building: memory and capability upgrades

At this point, you already have a real workflow: task intake, context loading, planning, output generation, automated review, CI integration, and a PR artefact. The next upgrades are straightforward. Replace the deterministic planner with a real model call. Replace the proposal-only executor with safe code-editing inside the sandbox. Pull tasks from issues instead of a JSON file. Add memory capture after each run. Then add more specialised reviewers for tests, security, and release notes.

Limitation of the starter: the executor intentionally writes a proposal file rather than editing your application code directly. That is a safety-first choice for beginners, not a claim that a full factory stops there.

Security, ethics, and guardrails

The most common beginner mistake is not “using too little AI”; it is giving the agent more authority than the controls deserve. OWASP’s LLM Top 10 specifically calls out prompt injection, insecure output handling, training-data poisoning, model denial of service, and supply-chain vulnerabilities. The OWASP AI Agent Security guidance then gets very practical: validate outputs before execution, prefer structured outputs, set scope and rate limits, and filter sensitive outputs. NIST’s AI Risk Management Framework adds the broader governance frame: manage risk to people, organisations, and society, not just to the codebase. ^{OWASP LLMOWASP AgentNIST}

Guardrail	Why it exists	Smallest useful beginner version
Least-privilege permissions	Agents should not automatically access every tool, credential, or directory.	Require approval for writes outside the working area and deny sensitive folders by default.
Sandboxed execution	If the agent runs a bad command, the blast radius should be small.	Run executor steps in Docker Compose or another isolated runtime before granting host access.
Secret separation	Prompts and repos are not good places for credentials.	Store secrets in GitHub Actions secrets, agent secret stores, or platform secrets, not in source files or chat text.
Protected branches and CODEOWNERS	Autonomy should not bypass organisational review.	Keep production branches protected, require PRs, and require owner review for agent configuration files.
Automated checks	Bad outputs should fail quickly and cheaply.	Run at least tests and one static check before a task can become “ready for PR”.
Human approval for sensitive work	Not every action is safe to automate end to end.	Require a person for production deploys, auth changes, schema changes, and permission changes.
Audit trail and memory	You cannot improve or investigate what you did not record.	Write proposal files, PR bodies, test results, and final task status into versioned artefacts.
Privacy and training controls	AI developer tools differ in how data may be used.	Review each platform’s training and privacy settings before using proprietary code or personal data.

Concrete examples from the official docs include Claude Code permission modes and sandboxing, GitHub’s cloud-agent guardrails, CODEOWNERS, rulesets, secret handling, Lovable’s security scans and training-data controls, and Docker isolation primitives. ^{ClaudeCopilotGitHub rulesLovableDocker}

Automation ladder: what to automate when

Treat sensitive automation as a ladder, not a wall. Each rung earns trust for the next one.

Rung	Automate this	Wait before automating this
1	Triage, summaries, duplicate detection, and acceptance-criteria drafts.	Any action that writes to production systems.
2	PR descriptions, release-note drafts, and test-result summaries.	Auto-merging without required checks and review.
3	Reversible UI copy, docs, and small content changes behind review.	Auth, billing, permissions, and irreversible data changes.
4	Low-risk code changes with tests, branch rules, CODEOWNERS, and rollback notes.	Broad rewrites and dependency upgrades without reliable regression coverage.
5	Dependency updates with lockfile review, security scanning, and rollback.	Database migrations, production deploys, and secret changes until approvals and recovery are mature.

Ethical bottom line: use agents to compress mechanical work, not to offload accountability. Humans should still decide product intent, user impact, risk appetite, and whether a change deserves release.

Checklist, FAQs, and next reads

Implementation checklist

Tick these off as you build. Progress is stored locally in your browser.

Done	Item	Why it matters
	Create one context file with goals, rules, protected areas, and definition of done.	This is the smallest possible context layer.
	Represent work as tasks in a queue rather than free-form chat only.	A factory needs explicit intake.
	Add a planner that emits acceptance criteria, steps, and a branch name.	Plans make execution reviewable.
	Run work in a sandboxed executor.	Autonomy without isolation is a bad bargain.
	Add a reviewer that can fail the task when checks fail.	Quality gates are the factory’s brake pedal.
	Run tests in CI on PRs and on factory runs.	This closes the local-versus-remote gap.
	Generate a PR summary or release bundle automatically.	Humans need clear review artefacts.
	Protect important branches and config files.	Guardrails should survive a bad prompt.
	Store outcomes, failures, conventions, and follow-up notes.	Without memory, the factory does not improve.
	Review tool privacy and training settings before using sensitive data.	Data policy is part of engineering design now.

Checklist derived from the factory model, agent-runtime controls, GitHub CI patterns, and OWASP/NIST safety guidance. ^{XHawkClaudeCopilotGitHub ActionsGitHub rulesOWASP AgentNIST}

Visual regression checklist

Run these checks after any layout or content edit. A static page should not ship obvious overlaps.

Check	Acceptance rule
Button placement	Standalone CTA buttons inside callouts or cards live in an `.action-row`, never inline after long prose.
Widths	Check `390px`, `768px`, `1366px`, and `1920px`. No text, button, table, code block, diagram, or card escapes its box.
Zoom	Check browser zoom at `80%`, `100%`, `150%`, and `200%`. No button overlaps text, citations, borders, code, tables, or diagrams.
Horizontal overflow	The page itself has no horizontal scrollbar; wide diagrams, code, and tables scroll inside their own wrappers.
Mermaid labels	Diagram labels are short or explicitly wrapped with `<br/>`. No Mermaid node text is clipped in-page or in large view.
Diagram viewer	Open large view for every Mermaid diagram and confirm all node text is visible at browser zoom `80%`, `100%`, `150%`, and `200%`.
Static sanity checks	Run the inline-script parse check, the duplicate-ID/missing-anchor check, the risky-inline-button check, and the Mermaid long-label check before calling the page done.

Troubleshooting the first run

Docker says it cannot connect to the daemon

Docker Desktop is probably not running, or it has not finished starting. Open Docker Desktop, wait until it reports that the engine is running, then try docker version and docker compose version again.

Docker works in the app but fails in my Windows terminal

Check that Docker Desktop is using the WSL 2 backend and that your WSL distro is enabled under Docker Desktop integration settings. If this is your first attempt, run the tutorial from a normal project folder and avoid unusual synced-drive paths until Docker is proven. ^{Docker Desktop}

node --test reports no tests found

Confirm the file is named tests/factory.test.mjs and that you are running the command from the project root. Node’s test runner discovers common test filenames, but it cannot find a file that was placed in the wrong folder or given a different extension. ^Node

GitHub issue or pull-request automation returns 401 or 403

The token is missing or does not have enough permission. In GitHub Actions, start with the built-in GITHUB_TOKEN and explicit workflow permissions. For local scripts, use a fine-grained PAT with only the repository, issues, and pull-request access required for the script. ^{GitHub PATsGitHub Actions}

Docker Compose volume paths behave oddly on Windows

Run from the project directory, keep the path simple, and avoid moving the tutorial between cloud-synced folders, network drives, and WSL paths during the first pass. Once the command works once, you can move it into your preferred workspace and debug path-specific behaviour separately.

Wrangler says I am not authenticated

Run npx wrangler whoami. If it fails locally, authenticate with Wrangler before debugging GitLab CI. In CI, use a Cloudflare API token stored as a masked GitLab CI/CD variable, not an interactive login. ^{WranglerGitLab variables}

GitLab CI cannot see my Cloudflare variables

Check that CLOUDFLARE_API_TOKEN and CLOUDFLARE_ACCOUNT_ID exist in Settings > CI/CD > Variables. If a variable is protected, it is available only to protected branches or tags, so an ordinary merge-request branch may not receive it. ^{GitLab variablesProtected environments}

npm ci fails in GitLab CI

Commit the lockfile produced by your package manager. npm ci is designed for repeatable CI installs and expects the repository to contain a matching lockfile.

wrangler deploy works locally but fails in GitLab CI

Run npx wrangler check locally, confirm wrangler.jsonc is committed, and confirm the GitLab job has the Cloudflare account ID and API token. If production deploys are manual or protected, also check the job’s environment and branch rules. ^{WranglerGitLab CIProtected environments}

My Worker deployed but I cannot find the URL

Start in the Cloudflare dashboard under Workers & Pages, then open the Worker deployment details. For the first attempt, use the default workers.dev route before adding a custom domain or route. ^Workers

FAQs

Is a software factory just another name for a coding assistant?

No. A coding assistant mainly improves the act of writing code. A software factory encodes the broader delivery workflow: intake, context, decomposition, execution, testing, approvals, deployment, and learning. That is why sources like XHawk and Alex Op emphasise systems, loops, and workflow integration rather than only autocomplete or chat. ^{XHawkAlex OpCopilot}

Do I need multiple agents on day one?

No. Start with one orchestrated lane. A single assistant plus a planner-like step and a strict review step can already behave like a mini factory. Add more specialised roles only when the extra separation makes the workflow safer or clearer. ^{Alex OpClaude}

Is Lovable a software factory?

Usually not in the full architectural sense used here. Lovable is officially positioned as a full-stack AI development platform for building, iterating on, deploying web apps, syncing code to GitHub, and operating inside enterprise governance. That makes it excellent as a builder or execution layer. But the broader “software factory” idea includes explicit orchestration, workflow triggers, continuous loops, and organisation-level control of how work moves through the system. So the best practical answer is: Lovable can be part of a software factory, but is not usually the whole factory by itself. ^{LovableXHawkAlex Op}

What if my stack is React, Express, Docker, AWS, or something else?

The factory pattern survives stack changes. What matters is not React versus Express; it is whether your context is captured, execution is isolated, tests are automated, approvals are explicit, and feedback is recorded. The Node/Docker/GitHub examples here are simply a small, runnable teaching stack.

Why is memory so important?

Because factories fail when each run starts from zero. Memory can be simple files, a knowledge directory, a context service, or a richer knowledge graph. The point is to stop relearning the same architecture, the same bug history, and the same conventions every session. ^{Alex OpClaudeXHawk Context}

What should I automate first?

Automate the most repetitive, lowest-risk lane that already has a clear finish line. Examples: triaging small maintenance tickets, drafting release notes, preparing PR descriptions, or proposing reversible UI changes behind tests. That is how you earn trust before moving the factory closer to production changes. ^{Alex OpXHawkCopilot}

Next reads and primary resources

Read the sources below in roughly this order: first the classic paper for the underlying idea, then Alex Op and XHawk for the modern AI-native interpretation, then official docs for the tooling layers you might combine.

Source	Why it is worth your time
Greenfield and Short Industrializing Software Development	The original conceptual backbone: software factories as structured, repeatable production systems rather than lone-coder craft.
Alex Op The Software Factory	A clear modern explanation of why team bottlenecks move from “writing code” to “designing and operating the workflow”.
XHawk factory model Software factory page	One of the clearest public diagrams of the AI-native factory pattern: context, orchestration, sandboxing, integrations, review, and production loop.
XHawk context model XHawk product overview	Useful for understanding why a context layer is more than a prompt: it can become a system of context or knowledge graph.
Model Context Protocol Official MCP docs · Anthropic announcement	The connector layer. Helpful for understanding how agents plug into tools, data, and workflows without bespoke one-off integrations.
Claude Code How it works · Subagents · Hooks · Memory · Schedules · Sandboxing · Agent SDK	A rich set of building blocks for designing your own factory runtime.
GitHub Copilot Product page · Integrations · Guardrails · Jira · Linear	The best source set here for GitHub-native, PR-driven agentic delivery patterns.
Lovable Welcome · GitHub sync · Knowledge · Security · Privacy & training · Agent mode	The clearest official picture of Lovable as a full-stack AI builder and why it is best seen as one layer inside a wider factory architecture.
Replit Agent · Plan mode · Task system · Build and publish · Version control	Helpful for understanding the “idea to live app” end of the spectrum, with background tasks and built-in hosting.
Docker Compose Official docs	A straightforward way to create an isolated environment for a mini factory.
GitHub Actions Docs · Quickstart	Your easiest default CI/CD layer if you already use GitHub.
GitHub REST API REST docs · Pull requests endpoints	The simplest primary-source reference for PR automation.
GitHub CODEOWNERS and rules CODEOWNERS · Rulesets	Official guidance for making human review and required checks enforceable.
GitHub personal access tokens PAT docs	The safest starting point when a local script needs GitHub API access beyond the built-in Actions token.
GitLab CI/CD Pipelines · CI/CD YAML syntax	The workflow layer for the GitLab path: tests, deploy jobs, environments, and repeatable review gates.
GitLab merge requests Merge requests · Merge request pipelines	The GitLab equivalent of the review handoff: proposed changes, discussion, approval, and CI results in one place.
GitLab CI/CD variables Variables docs	The official place to learn how to store non-source configuration and sensitive CI values such as Cloudflare deployment tokens.
GitLab protected environments Protected environments	Useful when production deployments should require the right role, branch, or manual approval.
Cloudflare Workers Get started guide	The beginner deployment target for the GitLab + Workers path: a small serverless app/API without server management.
Cloudflare Workers CI/CD CI/CD overview · Git integration	Explains Workers Builds and external CI/CD choices, including GitLab-connected deployments.
Wrangler Wrangler docs · Configuration	The Cloudflare CLI used to run, validate, type, and deploy Workers from local development and GitLab CI.
Cloudflare Workers secrets Secrets docs · Create API token	The official reference for Worker secrets and Cloudflare API tokens. Use it before putting any credential into CI.
Node.js core docs fs · child_process · test runner · assert · vm · release schedule	The built-ins used in the tutorial, with no framework dependency required.
Docker Desktop Install Docker Desktop · WSL integration	The official install and Windows integration reference for the Day 0 setup check.
OpenAI Agents Agents guide	A code-first option for building custom agent workflows with tools, orchestration, and tracing.
OWASP LLM Top 10 Project page	The most succinct public list of common LLM application failure modes.
OWASP AI Agent Security Cheat sheet	Practical controls for output validation, schema checking, scoping, and guardrails.
NIST AI RMF Framework overview	The governance lens: software factories are not only technical systems but risk-management systems.

If you only take one next step after reading this page, make it this: build one narrow, reviewable automation lane. Do not try to industrialise your whole engineering org in one jump. Start with one loop you can trust, then widen the conveyor.