What it solves
Too many handoffs, too much repeated setup, weak reuse of project knowledge, and too much human time spent on mechanical steps instead of product judgement.
22 May 2026
Software Factory 101 is a practical, beginner-friendly guide to designing a safe AI-native software factory. It explains the core model, the moving parts, the beginner path choices, and the end-to-end setup steps for Builder tools, GitHub-native workflows, GitLab + Cloudflare Workers, and a DIY local factory. The emphasis is not on one vendor or stack, but on the operating model: intent goes in, context guides the work, agents execute inside guardrails, CI/CD checks the output, humans review the risk, and memory improves the next run. GreenfieldXHawkAlex Op
A software factory is not just “AI that helps write code”. It is a repeatable system for turning work requests into tested changes, pull requests, releases, or deployments while capturing the context and lessons that future runs need. The architecture below maps that idea as a continuous AI-native loop: human intent enters, workflow tools and project context guide the work, agents plan and execute inside controlled environments, checks and human review decide what can move forward, deployment paths hand off approved output, and feedback loops retain lessons for the next run. The “factory” part matters because work moves through structured stages, outputs are standardised, quality control exists, knowledge is retained, and throughput can improve over time. GreenfieldXHawkAlex OpMCP
A software factory is easiest to understand as a closed loop. A person or system asks for work, the factory loads the project context, agents plan and make the change in a controlled environment, automated checks and human review decide whether it is safe, approved output moves to a pull request, release, or deployment, and the result is recorded so the next run starts smarter. That loop is the key difference between a one-off coding prompt and a repeatable operating system for software delivery.
Too many handoffs, too much repeated setup, weak reuse of project knowledge, and too much human time spent on mechanical steps instead of product judgement.
Context, orchestration, execution tools, isolation, tests, approvals, and feedback capture. Remove any one of those and the system starts looking more like a chat assistant than a factory.
Build one repeatable lane first: intake a task, load context, create a plan, run an executor in a sandbox, review with tests, produce a PR summary, then record what happened.
Human intent is the intake point. In the diagram it includes engineers, PMs, designers, APIs, agents, and incidents. In plain English, intent is the problem statement that starts the run: "Add a watchlist page", "Investigate today's error spike", "Implement the approved design", or "Turn this bug report into a fix". XHawkAlex Op
Beginner's test: if you cannot write the request clearly enough for another person to act on it, you are not ready to hand it to a factory either. Good intake is boringly explicit: scope, constraints, protected areas, and what "done" means.
{
"task_id": "task-001",
"intent": "Add a watchlist page",
"acceptance_criteria": [
"Users can save favourite stocks locally",
"Saved stocks remain after page refresh",
"A PR summary is generated"
],
"constraints": [
"Do not change authentication",
"Keep the change reversible"
],
"context_paths": [
"docs/product/watchlist.md",
"src/routes",
"tests"
]
}
Telemetry and logs are the factory's stream of operational evidence: logs, metrics, traces, build output, customer complaints, analytics, failed tests, and incident signals. In the diagram, these signals arrive 24x7. XHawk describes live signals such as logs and metrics as context for agents; Alex Op describes support feedback, analytics dashboards, and error logs feeding the work backlog. The public XHawk material does not specify an exact telemetry schema or storage architecture, so those details are intentionally left unspecified here. XHawkAlex Op
For a beginner, telemetry answers three simple questions: what is happening, what broke, and did the last change help or hurt? Without telemetry, the factory can still build, but it cannot learn intelligently from real outcomes. With telemetry, production itself becomes an input back into development.
The dotted loop in the diagram is the same idea seen over time: completed work, production signals, and incident lessons feed the next run instead of being forgotten. XHawk ContextAlex Op
{
"timestamp": "2026-05-24T09:15:00Z",
"service": "web",
"signal": "error-rate-spike",
"severity": "warning",
"details": {
"route": "/api/watchlist",
"count_last_5m": 42
}
}
Workflow integration is the adoption layer. The diagram names Slack, GitHub, Linear, and Jira; the broader pattern includes chat, issue trackers, CI failures, schedules, webhooks, incident systems, and cloud infrastructure events. GitHub Copilot's cloud-agent documentation shows the same pattern in product form: sessions can start from GitHub surfaces and integrated tools, with the issue or thread context passed into the run. XHawkGitHub Copilot
In practice, workflow integration matters because it removes "copy this from tool A into tool B" busywork. A factory should meet the team where the work already begins, then move the right context into the factory runtime.
The context layer is the factory's memory and grounding system. It contains the relatively stable project knowledge that AI agents need before they can work effectively. This includes the codebase structure, feature specifications, architecture decisions, internal conventions, domain terminology, protected areas of the system, and deployment rules.
The key idea is that context provides long-lived understanding about how the project operates. Without this grounding, AI agents behave like generic assistants with no real understanding of the application they are modifying.
Telemetry and logs, covered above, are different from context. Telemetry is live operational evidence about what is happening right now inside the system.
XHawk describes context broadly as specs, past decisions, tickets, and live signals. In practice, specs, decisions, conventions, and architecture documents belong to the actual context layer. Live signals belong to the telemetry and logs layer. After review and validation, important operational learnings from telemetry may later be promoted into long-term context. XHawk
Platforms such as XHawk, Lovable, and Claude Code describe variations of the same core idea: workspace knowledge, project memory, codebase awareness, and context management. The shared principle is that context transforms a generic AI model into a project-specific engineering system. XHawkLovableClaude
Beginner's version: write down what your future self should not have to rediscover. Without context, agents hallucinate more easily, workflows become inconsistent, architecture drifts, and repeated mistakes return. With context, the system behaves more like an experienced engineer who already understands the project.
{
"project_name": "Watchlist demo",
"rules": [
"Prefer small reversible changes",
"Use feature flags for risky UI work"
],
"domain_terms": {
"watchlist": "A saved list of favourite stocks",
"ticker": "Market symbol such as AAPL"
},
"protected_areas": [
"authentication",
"billing",
"production secrets"
],
"done_definition": [
"Tests pass",
"PR summary exists"
]
}
The software factory core is where the system turns intent and context into controlled work. In this simplified map, the core has two jobs: orchestrate specialised agents, and run their work inside a sandboxed execution environment. The orchestrator is the control system of the factory: it decides when tasks start, which agents execute, which tools are available, which approvals are required, and when workflows stop or retry. That is what separates a repeatable factory from disconnected AI tools or a one-off chat prompt. XHawkClaudeReplit
A software factory is therefore not just AI, not just coding, not just CI/CD, and not just DevOps. It is the combination of orchestration, context, execution, governance, memory, deployment, and learning into one operating system for repeatable software change.
Multi-agent orchestration means coordinating specialised roles instead of expecting one agent to do everything well. XHawk names the planner, executor, and reviewer trio directly. Claude Code and Replit use different product language, but the runtime logic is similar: gather context, make a plan, act, verify, and repeat. XHawkClaudeReplit
The beginner lesson is important: splitting thinking, doing, and checking often makes a system easier to steer, audit, and improve. Smaller focused roles are easier to govern, failures are easier to diagnose, and automation becomes safer because each role has a clearer job.
The planner turns intent and context into ordered work. It identifies affected systems, determines the implementation approach, breaks the job into steps, names acceptance criteria, calls out constraints, and marks risky areas before anything changes.
The reviewer validates output against the plan, tests, linting, architecture rules, requirements, and likely failure modes. In a small factory this can be a strict review checklist; in a larger one it can be a specialised agent plus automated checks and human approval.
The executor performs the work inside the allowed environment: writing code, editing files, running commands, updating tests, creating artifacts, recording outputs, and reporting what happened. The executor should not silently widen scope; it should work inside the plan, constraints, and sandbox.
Sandboxed execution is the safety boundary for autonomous work. Agents should operate inside isolated environments rather than directly against production systems. Typical implementations include Docker containers, cloud devboxes, ephemeral virtual machines, isolated Git branches, restricted filesystem access, and restricted network access. XHawk describes isolated repo copies and cloud sandboxes or devboxes; GitHub Copilot's cloud agent uses an ephemeral development environment powered by GitHub Actions; Claude Code's sandboxing guidance is explicit that useful isolation needs both filesystem and network boundaries. Docker Compose is the most approachable beginner tool for modelling a repeatable isolated environment because it defines services, networks, and volumes in one YAML file. XHawkGitHub CopilotClaudeDocker
Safe autonomy means giving the system enough room to work without giving it uncontrolled access to production systems, secrets, or the whole machine. Devbox environments are one practical form of that boundary. The key beginner insight is blunt but useful: autonomy without isolation is dangerous, because autonomous systems will eventually make mistakes and the sandbox limits the blast radius.
Crucial beginner warning: Node's vm module is not a security mechanism for running untrusted code. If you need isolation, use real process, VM, or container boundaries instead.
Node
Curated capabilities are the small approved toolbox the factory can use. In the diagram that toolbox includes APIs, tests, MCP, and docs; in practice it can also include approved databases, documentation systems, deployment interfaces, and other tightly scoped services. XHawk frames this as a curated capability set where tool quality matters more than tool quantity. MCP provides a standard way for servers to expose tools, resources, and prompts; GitHub Copilot also uses MCP to extend Copilot with other systems; Claude Code's hooks and subagents show how runtime actions can be attached to lifecycle points. XHawkMCPCopilotClaude
The beginner translation is simple: do not give the agent a random pile of tools. A good factory minimises unnecessary capabilities, tightly scopes permissions, and standardises interfaces. Start with file read and write inside the sandbox, test execution, repository operations, approved documentation access, and one path for opening a pull request or writing a review summary. Fewer, safer, higher-quality tools beat unlimited access.
Guardrails and human review are the control layer. In the diagram this contains tests and PR generation, with review before production. Guardrails include automated tests, lint checks, security scans, branch protections, approval gates, deployment policies, permission systems, and human review requirements. XHawk describes agents generating PRs and running tests while humans define requirements and approve outputs. GitHub's Copilot guardrails guidance points to policy planning, branch rulesets, permissions, protected configuration ownership, and secure runner choices. OWASP's AI Agent Security guidance adds agent-specific controls such as validating tool use, limiting retries and tokens, and testing for approval bypass, privilege escalation, memory poisoning, and data exfiltration. XHawkCopilotGitHub rulesOWASP Agent
In plain English, guardrails answer three questions: what may happen automatically, what requires approval, and what is forbidden entirely? Good factories automate aggressively, but constrain aggressively too. Humans increasingly focus on intent, product judgement, risk management, approval of sensitive changes, architecture direction, and exception handling rather than only manual coding.
Production systems are where approved work lands: applications, services, databases, infrastructure, scheduled jobs, and deployments that real users depend on. The diagram labels this simply as deployment. GitHub Actions is a straightforward primary-source example of the last mile: workflows live in .github/workflows, respond to repository events, and can run CI, deployments, and automations.
GitHub ActionsXHawk
A beginner does not need a fancy deployment stack to understand this box. If a change can be tested, approved, and released through a repeatable path, you have the seed of the production end of the factory.
Every run should generate reusable knowledge: successful fixes, failed approaches, architecture decisions, incident resolutions, deployment lessons, and operational learnings. XHawk describes tasks being audited and remembered, with sessions, decisions, and context becoming indexed knowledge; Claude Code, Lovable, and similar tools expose memory-like runtime pieces for continuity. Public XHawk pages describe indexed knowledge and snapshots, but do not specify the exact persistence implementation. Memory is how the factory compounds. XHawk ContextClaudeLovable
In practice, the factory works like a controlled loop. A task arrives from a human, from a schedule, or from another system. The factory loads the right context, breaks the job into steps, runs the change in an isolated environment, checks the result, asks for approval where needed, and then stores the outcome as new reusable knowledge. Modern agent products increasingly expose the building blocks for this pattern: subagents, hooks, schedules, custom agents, MCP connectors, cloud agents, version control, background tasks, and security policies. XHawkAlex OpClaudeCopilotReplitLovable
The context layer is the foundation. XHawk explicitly places codebase, feature specs, decision logs, and internal conventions under the factory core; Lovable supports persistent workspace and project knowledge; Claude Code uses files such as CLAUDE.md and project memory; MCP exists to connect AI systems to external data sources, tools, and workflows.
Sources: XHawkXHawk ContextLovableClaudeMCP
The planner breaks work into steps and clarifies acceptance criteria. The executor applies changes and runs commands. The reviewer validates the result with tests, static checks, or code review. XHawk shows that split directly; Claude Code’s subagents and multi-session teams provide similar role separation; Alex Op describes the pattern as a practical path from “AI helps code” to “AI runs the workflow”.
A factory does not wait passively for one prompt. XHawk discusses scheduled and event triggers; Claude Code supports repeated prompts and routines; GitHub Copilot cloud agent can be launched from GitHub, Jira, Linear, Slack, and Teams; Replit supports background tasks and automations.
Safe autonomy requires boundaries. XHawk uses cloud sandboxes; Claude Code offers permission modes and sandboxed Bash with file and network isolation; Docker Compose is a simple way to define an isolated, reproducible environment for a mini factory; Copilot cloud agent runs on GitHub Actions runners and GitHub recommends fresh hosted or ephemeral runners.
The factory must know what it may do automatically and what still needs a person. XHawk’s model includes automated tests and explicit human approval. GitHub Actions gives you CI/CD, and GitHub recommends branch rules, CODEOWNERS, secret handling, and workflow policies for cloud agents. OWASP adds the agent-specific reminder that outputs must be validated before they are executed.
Alex Op stresses that the closest practical model is a loop: collect context, execute, validate, learn, and repeat. Claude Code writes and recalls project memory; XHawk describes a knowledge graph and continuous snapshots; Lovable keeps shared knowledge; production analytics or operations signals can become new context for the next run.
Sources: Alex OpClaudeXHawk ContextLovable
A path is not one tool. It is a complete loop: task in, context loaded, work produced, checks run, human review, deployment or handoff, and lessons recorded. Pick based on where your work already lives and how much setup you can tolerate.
Difficulty: easiestControl: lowFirst result: live prototypeMain risk: hidden mechanics
Use Lovable or Replit when you want the shortest route from idea to a live prototype and can accept that the platform hides much of the machinery.
Difficulty: easyControl: mediumFirst result: guarded PRMain risk: repo permissions
Use GitHub when your work already lives in issues, branches, pull requests, Actions, branch rules, and CODEOWNERS.
Difficulty: mediumControl: medium-highFirst result: deployed WorkerMain risk: CI secrets
Use GitLab and Cloudflare Workers when you want a visible issue-to-MR-to-CI-to-edge-deploy lane.
Difficulty: technicalControl: highFirst result: local loopMain risk: building too much
Use the DIY path when you want to see every moving part: queue, context, planner, executor, reviewer, tests, sandbox, and memory.
Start with the smallest stack that proves the loop. You can add more capable agents later; the first win is a repeatable lane with context, isolation, checks, and review.
| Layer | Recommended beginner tool | Why this tool | Cost / access reality |
|---|---|---|---|
| Version control | GitHub plus Git CLI or GitHub Desktop | Gives you issues, branches, pull requests, Actions, rulesets, CODEOWNERS, and a common review workflow. | Git is free. GitHub public repos are free; some private, team, or governance features may depend on plan limits. |
| GitLab path | GitLab project, Issues, Merge Requests, and GitLab CI/CD | Gives you a single place for intake, review, test pipelines, protected deploy jobs, and deployment history. | GitLab has free tiers; CI minutes, approvals, protected environments, and governance features can vary by plan. |
| Runtime | Node.js 24 LTS with npm | The tutorial uses Node built-ins for files, subprocesses, tests, assertions, and fetch, so there is very little framework setup. | Free and local. |
| Isolation | Docker Desktop and Docker Compose | Lets the executor run in a repeatable environment instead of directly on your host machine. | Free for many personal and small-business uses; check Docker’s current licence for organisation use. |
| CI/CD | GitHub Actions | Runs checks on pull requests and can trigger factory runs manually or on a schedule. | Free minutes are available for many repos; private repos and larger runners can consume paid minutes. |
| Edge deploy target | Cloudflare Workers with Wrangler | Lets the GitLab path deploy a small backend/API without managing servers. Wrangler is the CLI used for local dev, checks, types, and deploys. | Wrangler is free CLI tooling. Workers has a free tier, but usage, paid features, and account limits should be checked before production use. |
| GitHub agent | GitHub Copilot coding agent | Strong fit when work starts in GitHub and should end as a pull request guarded by Actions and branch rules. | Usually requires a Copilot plan and organisation settings may control access. |
| Custom coding runtime | Claude Code or OpenAI Agents SDK | Use when you want to design your own planner/executor/reviewer roles, tool permissions, hooks, and memory flow. | May require a paid subscription, API credits, or model-provider account. |
| App builder | Lovable or Replit | Good for beginners who want a working app quickly while still learning the factory concepts around context, review, and deployment. | Free tiers may be limited; serious use often needs credits, a paid plan, or a card on file. |
| Connectors | MCP, Jira, Linear, Slack, or GitHub integrations | Add these after the local loop works so the factory can pull real tasks and context from where the team already works. | Depends on the connected service and workspace permissions. |
Before writing factory code, prove the command line can see the runtime, version control, and sandbox tools.
node --version # expect v24.x
git --version
docker --version
docker compose version
Windows note: Docker Desktop on Windows should use the WSL 2 backend. If Docker commands fail, check WSL 2 and Docker Desktop distro integration before debugging the tutorial.
Each path below follows the same factory loop: intake, context, execution, checks, review, deployment or handoff, and memory.
| Path | Use when | First concrete result | Where steps live |
|---|---|---|---|
| Path 1: Builder | You want the easiest idea-to-app route. | A live prototype. | Builder steps |
| Path 2: GitHub-native | Your work already lives in GitHub. | A guarded pull request. | GitHub-native steps |
| Path 3: GitLab + Workers | You use GitLab or want a small edge-deployed app/API. | A staging Cloudflare Worker deployment. | GitLab + Workers steps |
| Path 4: DIY local factory | You want to learn the mechanics directly. | A local queue-to-review loop. | DIY local steps |
| Reference: Control plane | You are planning maturity across many repos or teams. | A roadmap, not a day-one build. | Control-plane roadmap |
This is the easiest path because Lovable or Replit hides much of the repository, CI, sandbox, and deployment machinery. You still learn the factory loop by naming the work clearly, adding context, previewing changes, reviewing output, and recording decisions. LovableReplit
A Lovable or Replit account, one app idea, and a short note describing the user, goal, and first safe change.
Create a new app/project in the builder. Keep the first app tiny: one page, one form, one list, or one API call.
Write the first task as a clear request with acceptance criteria. Example: “Add a watchlist page. It must show saved items, include an empty state, and be previewable before publish.”
Add project knowledge or notes: audience, style rules, protected areas, data rules, and what “done” means. This replaces the local context file in the DIY path.
Ask the builder to make one small reversible change. Avoid broad prompts such as “build the whole product” on the first run.
Use preview, built-in error checks, version history, and manual smoke testing. If the platform supports GitHub sync, push the change to a repo and let CI check it there.
Review the visible change against the original acceptance criteria. If another person is involved, share the preview link instead of shipping immediately.
Publish through the builder only after the preview matches the task. For a team workflow, hand off through GitHub sync or exported code.
Save what changed, what prompt worked, what failed, and any new convention in the project knowledge area or a simple notes file.
Connect version control when available, then start requiring every builder change to have a task, preview, review note, and rollback path.
This path is for beginners whose work already lives in GitHub. The complete loop is: GitHub issue → branch → Copilot or coding agent → pull request → GitHub Actions → branch rules/CODEOWNERS → merge → memory note. CopilotActionsRESTRulesPATs
A GitHub account, a repository, GitHub Issues enabled, GitHub Actions enabled, and either GitHub Copilot/coding agent access or a local assistant that can work on a branch.
Create a repository, add a README, and create these folders so factory material has a home: factory/context/, factory/runs/, .github/workflows/, and optionally .github/CODEOWNERS.
Create a GitHub issue label named factory. Create the first issue with a small task, acceptance criteria, and any files that should or should not be touched.
## Task
Add a watchlist page.
## Acceptance criteria
- The page has a clear empty state.
- The change is small and reversible.
- Automated checks pass before merge.
## Context
- Avoid auth and billing files.
- Write a short PR summary explaining the change.
Add a small context file that tells Copilot, a coding agent, or a human helper what rules to follow.
# Project context
Goal: Build small, reviewable product changes.
Rules:
- Prefer small pull requests.
- Do not edit auth, secrets, billing, or deployment files without explicit approval.
- Every factory task starts from a GitHub issue labelled `factory`.
- Every pull request needs automated checks and a human review.
Definition of done:
- Acceptance criteria are met.
- GitHub Actions pass.
- The PR records what changed and what the next run should remember.
Use Copilot/coding agent or your local assistant to create a branch from the issue. Ask for one small change only, and include a link to the issue and the context file.
Add a minimal GitHub Actions workflow. Replace npm test with your actual test command if your project uses another stack.
name: factory-checks
on:
pull_request:
workflow_dispatch:
jobs:
checks:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: "24"
- run: npm ci
- run: npm test
Open a pull request, link it to the issue, and require the PR summary to mention acceptance criteria, tests run, and any files intentionally avoided.
# Require a human owner for factory and deployment controls.
/factory/ @your-github-user
/.github/ @your-github-user
/infra/ @your-github-user
For the beginner GitHub-native path, the first handoff is a reviewed pull request. If your app already has deployment, trigger it only after checks pass and the PR is approved.
Add a short note in the PR body or factory/runs/<issue-number>.md: what changed, what failed, and what rule the next task should remember.
After the manual issue-to-PR loop works, use the GitHub REST adapter in the DIY tutorial to import labelled issues into a queue or open PRs automatically. Do not automate merge first.
This is not a day-one implementation. It is the maturity map for teams that eventually need shared context, sandboxes, policy gates, telemetry, and many connected workflows. XHawkContext
At least one trusted path from above, plus enough team process to know what should be centralised.
Create a shared control area for policies, reusable context, run logs, approval rules, and tool permissions.
Connect the places work arrives: GitHub, GitLab, Jira, Linear, Slack, incidents, schedules, and manual requests.
Build a shared context layer from architecture rules, code ownership, decision logs, incidents, telemetry, and repo conventions.
Route tasks to controlled agents or sandboxes based on risk, repo, and required tools.
Standardise test, security, policy, and compliance gates across the connected systems.
Make approvals explicit: who reviews, which areas need owners, and which tasks may never auto-merge.
Coordinate CI/CD rather than bypassing it. The control plane should hand work to the existing delivery systems.
Feed outcomes, incidents, failed checks, useful prompts, and production signals back into shared context.
Start with one of Paths 1-4, then use this as the maturity map.
This path uses GitLab as the workflow and review system, and Cloudflare Workers as the deploy target. The complete loop is: GitLab issue → merge request → GitLab CI checks → staging Worker deploy → manual production deploy → memory note. GitLab CIMerge requestsVariablesWorkersWrangler
A GitLab account, a GitLab project, a Cloudflare account, Node.js, npm, Git, and Wrangler. First prove the CLI tools work.
node --version
git --version
npm --version
npx wrangler --version
npx wrangler whoami
Create a GitLab project, then create a tiny Cloudflare Worker locally and push the project to GitLab. Do not add factory automation until this basic project runs. WorkersWrangler
npm create cloudflare@latest software-factory-worker
cd software-factory-worker
npm run dev
In GitLab, create a label named factory. Create the first issue with a small task, acceptance criteria, and a note that production deploys require human approval.
GitLab MRs
Add factory/context/project-context.md or factory/context/project-context.json with project purpose, protected files, deployment rules, and definition of done. This is what keeps the factory from guessing.
For the first run, make one tiny Worker change on a branch and open a merge request. The Worker handler below is enough to prove the execution target.
export default {
async fetch(request, env, ctx) {
return Response.json({
ok: true,
path: new URL(request.url).pathname,
factoryStage: "staging"
});
}
};
Keep non-secret Worker configuration in wrangler.jsonc, then use wrangler check, tests, and merge request pipelines as the quality gate. Store secrets with Cloudflare secrets or GitLab CI/CD variables, not in source code.
WranglerSecretsGitLab variables
{
"$schema": "./node_modules/wrangler/config-schema.json",
"name": "software-factory-worker",
"main": "src/index.ts",
"compatibility_date": "2026-05-23",
"compatibility_flags": ["nodejs_compat"],
"observability": {
"enabled": true
},
"env": {
"staging": {
"name": "software-factory-worker-staging"
},
"production": {
"name": "software-factory-worker"
}
}
}
Open a GitLab merge request from your branch. The MR should link the issue, list acceptance criteria, show the pipeline result, and name the staging Worker URL once deployed. GitLab MRs
Deploy once by hand to prove Cloudflare is configured, then let GitLab CI repeat the deploy. Confirm staging before any manual production deploy. Workers CI/CDGitLab CIProtected environments
npx wrangler check
npx wrangler types
npx wrangler deploy --dry-run
npx wrangler deploy --env staging
In GitLab, add masked CI/CD variables for CLOUDFLARE_API_TOKEN and CLOUDFLARE_ACCOUNT_ID. Protect production variables if production deploys only run from protected branches or tags.
GitLab variablesProtected environmentsCloudflare secrets
image: node:24
stages:
- test
- deploy
cache:
key: "$CI_COMMIT_REF_SLUG"
paths:
- .npm/
before_script:
- npm ci --cache .npm --prefer-offline
test:
stage: test
script:
- npm test
- npx wrangler check
rules:
- if: '$CI_PIPELINE_SOURCE == "merge_request_event"'
- if: '$CI_COMMIT_BRANCH == "main"'
deploy_staging:
stage: deploy
script:
- npx wrangler deploy --env staging
environment:
name: staging
rules:
- if: '$CI_COMMIT_BRANCH == "main"'
deploy_production:
stage: deploy
script:
- npx wrangler deploy --env production
environment:
name: production
when: manual
rules:
- if: '$CI_COMMIT_BRANCH == "main"'
Add a short note to the merge request or factory/runs/<issue-number>.md: issue link, MR link, staging URL, production deploy decision, what failed, and what the next factory run should remember.
After the manual issue-to-MR-to-staging loop works, add a tiny issue importer or MR-summary generator. Keep production deploy manual until checks and review feel boring.
You are here if you want to understand the factory mechanics directly before trusting a hosted product. The mini factory below is intentionally small and vendor-neutral. It uses Node.js built-ins for files, subprocesses, tests, and assertions; Docker Compose for isolation; and optional GitHub adapters only after the local loop works. The first version is deliberately conservative: instead of letting an agent rewrite your whole application immediately, it proves the workflow by planning work, producing a proposal artefact, running checks, and preparing a review summary. Once you trust the loop, you can swap in an actual coding model. NodeDockerClaudeOpenAI Agents
Node.js, Git, Docker Desktop/Compose, a terminal, and a small practice project. Claude Code or another coding agent comes later, after the deterministic loop is test-gated.
Create the folder structure below: factory/context, factory/tasks, factory/agents, factory/runs, tests, and optional integration folders.
Start with factory/tasks/queue.json. A local queue is beginner-friendly because it avoids API tokens while teaching the same intake pattern.
Add factory/context/project-context.json with project goal, protected files, coding rules, and definition of done.
Add planner, executor, and reviewer modules. The executor writes a proposal first; direct code edits are a later upgrade.
Add Node’s built-in test runner and make every factory run execute tests before a task is marked ready.
Generate a review summary or PR body from the run. A human should be able to inspect what happened without reading logs line by line.
For the local DIY starter, the handoff is a proposal and review summary. Add GitHub, GitLab, or deployment automation only after this local handoff is reliable.
Write run artefacts into factory/runs/: proposal, review result, test result, and what the next run should remember.
Replace the deterministic planner or proposal-only executor with Claude Code, OpenAI Agents SDK, or another coding agent only after the local loop is observable, sandboxed, and test-gated.
The recommended beginner lane is deliberately narrow: a local queued task becomes a plan, the factory produces a proposal and review summary, tests run, and a human decides what to do next. That is enough to be a factory because the work is repeatable, inspectable, isolated, and blocked by checks.
| Stage | Beginner version | Output |
|---|---|---|
| Issue | The starter queue.json while learning locally; GitHub or GitLab issues are optional upgrades. |
Task title, source link, and acceptance criteria. |
| Run | The orchestrator loads context, plans the work, writes a proposal, and runs tests. | Proposal file, test result, and updated task status. |
| Review | The reviewer runs tests and the orchestrator writes a review summary for a human. | Review body, required checks, and a clear continue/stop decision. |
| Memory | The run records what happened so the next task has better context. | Run artefacts, outcome, and follow-up notes. |
You are building: the factory floor plan
.
├─ package.json
├─ docker-compose.yml
├─ factory/
│ ├─ context/
│ │ └─ project-context.json
│ ├─ tasks/
│ │ └─ queue.json
│ ├─ runs/
│ ├─ lib/
│ │ └─ run-command.mjs
│ ├─ agents/
│ │ ├─ planner.mjs
│ │ ├─ executor.mjs
│ │ └─ reviewer.mjs
│ └─ orchestrator.mjs
├─ scripts/
│ └─ open-pr.mjs
├─ src/
│ └─ README.md
├─ tests/
│ └─ factory.test.mjs
└─ .github/
└─ workflows/
└─ mini-factory.yml
This structure mirrors the earlier blueprint from our conversation: queue and context on the left, planner/executor/reviewer in the middle, and delivery artefacts on the right. Keep those concerns physically separate in the repository; beginners stay safer when the control files are easy to inspect.
You are building: the command surface
{
"name": "mini-software-factory",
"private": true,
"type": "module",
"scripts": {
"factory": "node factory/orchestrator.mjs",
"test": "node --test"
}
}
You are building: context
The context file is where beginners usually underinvest. Start simple: state the project purpose, protected areas, coding conventions, and what “done” means. Later you can split this into architecture decisions, API conventions, incident notes, and deployment rules.
{
"projectName": "Watchlist demo",
"goal": "A small web app that lets users save and view favourite stocks.",
"protectedFiles": [
"src/auth/",
"infra/production/"
],
"codingRules": [
"Prefer small, reversible changes.",
"Do not edit protected files without human approval.",
"Record every proposed change in factory/runs/<task-id>/proposal.md."
],
"definitionOfDone": [
"Task has a clear plan.",
"Automated checks run.",
"A PR summary is ready for review."
]
}
You are building: intake
A queue makes the system explicit. One tiny JSON file is enough to prove the concept. In a fuller factory the queue may come from GitHub issues, Jira, Linear, or a database, but the operating idea is the same.
[
{
"id": "task-001",
"title": "Add a watchlist page",
"status": "pending",
"files": ["src/README.md"],
"acceptanceCriteria": [
"The change is described clearly.",
"The review step runs tests.",
"A PR body is generated."
]
}
]
You are building: tool execution
This wrapper uses Node’s child_process module to run shell commands in a controlled way. It is the minimal bridge between the orchestrator and your actual tools.
Node
import { spawn } from "node:child_process";
export function runCommand(command, args = [], options = {}) {
return new Promise((resolve, reject) => {
const child = spawn(command, args, {
cwd: options.cwd ?? process.cwd(),
env: { ...process.env, ...(options.env ?? {}) },
shell: false
});
let stdout = "";
let stderr = "";
child.stdout.on("data", chunk => { stdout += chunk.toString(); });
child.stderr.on("data", chunk => { stderr += chunk.toString(); });
child.on("error", reject);
child.on("close", exitCode => {
resolve({ command, args, exitCode, stdout, stderr });
});
});
}
You are building: planning
This first planner is deterministic on purpose. It gives you the shape of a real planner without tying the tutorial to one model provider.
function slugify(value) {
return value
.toLowerCase()
.replace(/[^a-z0-9]+/g, "-")
.replace(/^-|-$/g, "")
.slice(0, 40);
}
export async function planTask(task, context) {
return {
taskId: task.id,
title: task.title,
branch: `factory/${task.id}-${slugify(task.title)}`,
filesToInspect: task.files ?? [],
acceptanceCriteria: task.acceptanceCriteria ?? [],
protectedFiles: context.protectedFiles ?? [],
steps: [
"Load project context and task details",
"Inspect the files listed for the task",
"Draft a reversible change proposal",
"Run automated checks",
"Prepare a pull request summary"
]
};
}
You are building: execution
The executor writes a proposal artefact and runs a harmless smoke command. That makes the workflow runnable before you trust it with code edits. Once you are ready, you replace the proposal-writing part with your preferred coding model or scripted transform.
import { mkdir, writeFile } from "node:fs/promises";
import { join } from "node:path";
import { runCommand } from "../lib/run-command.mjs";
export async function executePlan(plan) {
const runDir = join("factory", "runs", plan.taskId);
await mkdir(runDir, { recursive: true });
const proposalFile = join(runDir, "proposal.md");
const proposalMarkdown = [
`# ${plan.title}`,
"",
`**Suggested branch:** \`${plan.branch}\``,
"",
"## Files to inspect",
...(plan.filesToInspect.length ? plan.filesToInspect.map(file => `- ${file}`) : ["- No files specified"]),
"",
"## Acceptance criteria",
...(plan.acceptanceCriteria.length ? plan.acceptanceCriteria.map(item => `- ${item}`) : ["- None provided"]),
"",
"## Planned steps",
...plan.steps.map(step => `- ${step}`)
].join("\n");
await writeFile(proposalFile, proposalMarkdown, "utf8");
const smoke = await runCommand(process.execPath, ["--version"]);
return {
runDir,
proposalFile,
smoke
};
}
You are building: review
The reviewer should be boring and strict. It does not need to be clever; it needs to be dependable. Here it runs your test suite and returns a simple pass/fail result.
import { runCommand } from "../lib/run-command.mjs";
export async function reviewRun() {
const tests = await runCommand(process.execPath, ["--test"]);
return {
passed: tests.exitCode === 0,
summary: tests.exitCode === 0 ? "All automated checks passed." : "Automated checks failed.",
tests
};
}
You are building: orchestration
The orchestrator is the control room. It reads the queue, loads context, asks the planner for a plan, asks the executor to do the work, asks the reviewer to verify it, then records the result and updates the queue.
import { readFile, writeFile } from "node:fs/promises";
import { join } from "node:path";
import { planTask } from "./agents/planner.mjs";
import { executePlan } from "./agents/executor.mjs";
import { reviewRun } from "./agents/reviewer.mjs";
async function readJson(path) {
return JSON.parse(await readFile(path, "utf8"));
}
async function writeJson(path, value) {
await writeFile(path, JSON.stringify(value, null, 2) + "\n", "utf8");
}
async function main() {
const contextPath = join("factory", "context", "project-context.json");
const queuePath = join("factory", "tasks", "queue.json");
const context = await readJson(contextPath);
const queue = await readJson(queuePath);
const nextTask = queue.find(task => task.status === "pending");
if (!nextTask) {
console.log("No pending tasks.");
return;
}
const plan = await planTask(nextTask, context);
const execution = await executePlan(plan);
const review = await reviewRun();
const prBodyPath = join(execution.runDir, "pr-body.md");
const prBody = [
`## ${plan.title}`,
"",
`**Branch:** \`${plan.branch}\``,
"",
"### Acceptance criteria",
...plan.acceptanceCriteria.map(item => `- ${item}`),
"",
"### Review result",
`- ${review.summary}`,
"",
"### Proposal file",
`- ${execution.proposalFile}`
].join("\n");
await writeFile(prBodyPath, prBody, "utf8");
nextTask.status = review.passed ? "ready-for-pr" : "failed-review";
nextTask.lastRun = {
proposalFile: execution.proposalFile,
prBodyPath,
smokeExitCode: execution.smoke.exitCode,
testExitCode: review.tests.exitCode
};
await writeJson(queuePath, queue);
console.log(JSON.stringify({
task: nextTask.id,
status: nextTask.status,
proposalFile: execution.proposalFile,
prBodyPath
}, null, 2));
}
main().catch(error => {
console.error(error);
process.exit(1);
});
You are building: a quality gate
Node’s built-in test runner and assertion module are enough for the starter. This is important for beginners: you do not need a giant test stack to start enforcing quality gates. Node
import test from "node:test";
import assert from "node:assert/strict";
import { planTask } from "../factory/agents/planner.mjs";
test("planner creates a branch name and keeps acceptance criteria", async () => {
const task = {
id: "task-001",
title: "Add a watchlist page",
acceptanceCriteria: ["PR summary exists"]
};
const context = { protectedFiles: ["src/auth/"] };
const plan = await planTask(task, context);
assert.match(plan.branch, /^factory\/task-001-/);
assert.deepEqual(plan.acceptanceCriteria, ["PR summary exists"]);
assert.ok(plan.steps.length > 0);
});
You are building: the first closed loop
npm test
npm run factory
You are building: isolation
Docker Compose is a practical beginner move because it creates a repeatable execution environment from one YAML file and works in development, testing, CI, and production-like flows. Docker
services:
factory:
image: node:24-alpine
working_dir: /workspace
volumes:
- ./:/workspace
command: ["sh", "-lc", "node --test && node factory/orchestrator.mjs"]
docker compose run --rm factory
The local factory does not require GitHub. Add the GitHub pieces below only after the local queue, planner, executor, reviewer, tests, and sandbox are working. GitHub ActionsGitHub REST
You are building: CI and scheduled review
GitHub Actions is a CI/CD platform that can run tests on every pull request, deploy on merge, or start work on a manual or scheduled trigger. That makes it a natural home for a beginner factory’s review and orchestration steps. GitHub Actions
name: mini-factory
on:
pull_request:
workflow_dispatch:
schedule:
- cron: "0 2 * * 1-5"
jobs:
checks:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: "24"
- run: node --test
factory:
if: github.event_name != 'pull_request'
needs: checks
permissions:
contents: read
pull-requests: write
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: "24"
- run: node factory/orchestrator.mjs
You are building: PR handoff
GitHub’s REST API supports creating and merging pull requests. Once your factory can safely commit or push a branch, this script turns the generated PR body into a real pull request. GitHub REST
import { readFile } from "node:fs/promises";
const [owner, repo] = process.env.GITHUB_REPOSITORY.split("/");
const title = process.env.PR_TITLE;
const head = process.env.PR_HEAD;
const base = process.env.PR_BASE ?? "main";
const body = await readFile(process.env.PR_BODY_PATH, "utf8");
const response = await fetch(`https://api.github.com/repos/${owner}/${repo}/pulls`, {
method: "POST",
headers: {
"Accept": "application/vnd.github+json",
"Authorization": `Bearer ${process.env.GITHUB_TOKEN}`,
"X-GitHub-Api-Version": "2022-11-28"
},
body: JSON.stringify({ title, head, base, body })
});
if (!response.ok) {
throw new Error(await response.text());
}
const pr = await response.json();
console.log(`Created PR #${pr.number}: ${pr.html_url}`);
You are building: real intake
The starter uses factory/tasks/queue.json so you can learn the loop without API permissions. The first real automation lane is GitHub Issue → queue → factory run → PR summary. Once the local loop works, this tiny adapter replaces the hand-written queue with issues labelled factory.
GitHub RESTGitHub PATs
import { writeFile } from "node:fs/promises";
const [owner, repo] = process.env.GITHUB_REPOSITORY.split("/");
const url = new URL(`https://api.github.com/repos/${owner}/${repo}/issues`);
url.searchParams.set("state", "open");
url.searchParams.set("labels", "factory");
const response = await fetch(url, {
headers: {
"Accept": "application/vnd.github+json",
"Authorization": `Bearer ${process.env.GITHUB_TOKEN}`,
"X-GitHub-Api-Version": "2022-11-28"
}
});
if (!response.ok) {
throw new Error(await response.text());
}
const issues = await response.json();
const queue = issues
.filter(issue => !issue.pull_request)
.map(issue => ({
id: `issue-${issue.number}`,
title: issue.title,
status: "pending",
files: [],
acceptanceCriteria: [
"The issue has been reviewed.",
"A plan or pull request summary is generated.",
"Automated checks run before merge."
],
source: issue.html_url
}));
await writeFile("factory/tasks/queue.json", JSON.stringify(queue, null, 2) + "\n", "utf8");
console.log(`Imported ${queue.length} factory issues.`);
You are building: memory and capability upgrades
At this point, you already have a real workflow: task intake, context loading, planning, output generation, automated review, CI integration, and a PR artefact. The next upgrades are straightforward. Replace the deterministic planner with a real model call. Replace the proposal-only executor with safe code-editing inside the sandbox. Pull tasks from issues instead of a JSON file. Add memory capture after each run. Then add more specialised reviewers for tests, security, and release notes.
Limitation of the starter: the executor intentionally writes a proposal file rather than editing your application code directly. That is a safety-first choice for beginners, not a claim that a full factory stops there.
The easiest way to get confused is to treat every AI coding product as the same thing. They are not. Some are builders, some are IDE assistants, some are runtime platforms, and some are closer to an actual “factory” control plane. The table below uses each product’s official positioning and features, then adds a practical interpretation of the role it can play.
The path section compares end-to-end setups; this table compares individual tools that may appear inside those paths.
| Tool | Official centre of gravity | What it is strong at | Best mental model in a factory | Use it when |
|---|---|---|---|---|
| Lovable Source | Full-stack AI web-app platform with editable code, GitHub sync, built-in knowledge, hosting and security features. | Prompt-to-app building, fast iteration, product/design collaboration, deployment of web apps, shared project knowledge, and practical security scanning. | Builder / executor layer. Inference: powerful for creating and iterating on an application, but not the whole software-factory operating model by itself. | You want a working web app quickly, especially for an MVP, prototype, internal tool, or product-validation loop. |
| GitHub Copilot Source | Contextual assistance across IDE, CLI, GitHub, project tools, chat apps, and cloud-agent workflows. | PR-based work, repo-aware agent sessions, custom agents, and integrations with tools like Jira, Linear, Slack, and Teams. | GitHub-native operator layer. Very strong if your process already centres on GitHub issues, branches, PRs, and Actions. | Your team lives in GitHub and wants background automation that still respects branch rules and review flows. |
| Replit Source | Browser-based idea-to-app platform with Agent, publishing, version control, background tasks, and connected services. | Fast start-up, browser-only development, built-in deployment, parallel agent tasks, connected services, and single-project momentum. | Integrated builder-and-host platform. Stronger than an IDE chat, but still more project-centric than an organisation-wide factory control plane. | You want the shortest path from idea to live app and prefer an all-in-one browser environment. |
| Claude Code Source | Agentic coding runtime for terminal, IDEs, web, and SDK-based integration, with hooks, subagents, memory, routines, and sandbox options. | Custom workflows, codebase-aware local or cloud execution, reusable skills, deterministic hooks, project memory, and configurable permissions. | Factory runtime / toolkit. Excellent for building your own factory because it exposes the primitives instead of hiding them. | Your team is technical and wants to design its own context, automation, and safety model. |
| OpenAI Agents SDK Source | Code-first SDK for building agentic applications with tools, handoffs, orchestration, streaming, and tracing. | Custom backend agent workflows, explicit tool contracts, model-provider integration, and product-specific control flow. | Custom factory engine. Useful when you want the software factory to be part of your own application or service. | You are comfortable writing backend code and want to own the orchestration layer rather than operate mainly inside an IDE or GitHub. |
| XHawk Source | Explicit software-factory architecture with context layer, multi-agent orchestration, cloud sandboxes, workflow integrations, human review, and production feedback. | Continuous 24x7 operation, shared context, background execution, organisation-level workflows, and delivery-system thinking. | Closest match to a software-factory product. It is presented as the control-plane model rather than merely a coding assistant. | You want the operating model itself, not just a faster way to write code in one session. |
Important note: the “best mental model” column is an interpretation based on product documentation and the software-factory definitions above. It is deliberately analytical, especially for the earlier question “Is Lovable a software factory?” The short answer remains: Lovable is usually better understood as a powerful builder inside a broader factory, not the whole factory. LovableXHawkAlex Op
The most common beginner mistake is not “using too little AI”; it is giving the agent more authority than the controls deserve. OWASP’s LLM Top 10 specifically calls out prompt injection, insecure output handling, training-data poisoning, model denial of service, and supply-chain vulnerabilities. The OWASP AI Agent Security guidance then gets very practical: validate outputs before execution, prefer structured outputs, set scope and rate limits, and filter sensitive outputs. NIST’s AI Risk Management Framework adds the broader governance frame: manage risk to people, organisations, and society, not just to the codebase. OWASP LLMOWASP AgentNIST
| Guardrail | Why it exists | Smallest useful beginner version |
|---|---|---|
| Least-privilege permissions | Agents should not automatically access every tool, credential, or directory. | Require approval for writes outside the working area and deny sensitive folders by default. |
| Sandboxed execution | If the agent runs a bad command, the blast radius should be small. | Run executor steps in Docker Compose or another isolated runtime before granting host access. |
| Secret separation | Prompts and repos are not good places for credentials. | Store secrets in GitHub Actions secrets, agent secret stores, or platform secrets, not in source files or chat text. |
| Protected branches and CODEOWNERS | Autonomy should not bypass organisational review. | Keep production branches protected, require PRs, and require owner review for agent configuration files. |
| Automated checks | Bad outputs should fail quickly and cheaply. | Run at least tests and one static check before a task can become “ready for PR”. |
| Human approval for sensitive work | Not every action is safe to automate end to end. | Require a person for production deploys, auth changes, schema changes, and permission changes. |
| Audit trail and memory | You cannot improve or investigate what you did not record. | Write proposal files, PR bodies, test results, and final task status into versioned artefacts. |
| Privacy and training controls | AI developer tools differ in how data may be used. | Review each platform’s training and privacy settings before using proprietary code or personal data. |
Concrete examples from the official docs include Claude Code permission modes and sandboxing, GitHub’s cloud-agent guardrails, CODEOWNERS, rulesets, secret handling, Lovable’s security scans and training-data controls, and Docker isolation primitives. ClaudeCopilotGitHub rulesLovableDocker
Treat sensitive automation as a ladder, not a wall. Each rung earns trust for the next one.
| Rung | Automate this | Wait before automating this |
|---|---|---|
| 1 | Triage, summaries, duplicate detection, and acceptance-criteria drafts. | Any action that writes to production systems. |
| 2 | PR descriptions, release-note drafts, and test-result summaries. | Auto-merging without required checks and review. |
| 3 | Reversible UI copy, docs, and small content changes behind review. | Auth, billing, permissions, and irreversible data changes. |
| 4 | Low-risk code changes with tests, branch rules, CODEOWNERS, and rollback notes. | Broad rewrites and dependency upgrades without reliable regression coverage. |
| 5 | Dependency updates with lockfile review, security scanning, and rollback. | Database migrations, production deploys, and secret changes until approvals and recovery are mature. |
Tick these off as you build. Progress is stored locally in your browser.
| Done | Item | Why it matters |
|---|---|---|
| Create one context file with goals, rules, protected areas, and definition of done. | This is the smallest possible context layer. | |
| Represent work as tasks in a queue rather than free-form chat only. | A factory needs explicit intake. | |
| Add a planner that emits acceptance criteria, steps, and a branch name. | Plans make execution reviewable. | |
| Run work in a sandboxed executor. | Autonomy without isolation is a bad bargain. | |
| Add a reviewer that can fail the task when checks fail. | Quality gates are the factory’s brake pedal. | |
| Run tests in CI on PRs and on factory runs. | This closes the local-versus-remote gap. | |
| Generate a PR summary or release bundle automatically. | Humans need clear review artefacts. | |
| Protect important branches and config files. | Guardrails should survive a bad prompt. | |
| Store outcomes, failures, conventions, and follow-up notes. | Without memory, the factory does not improve. | |
| Review tool privacy and training settings before using sensitive data. | Data policy is part of engineering design now. |
Checklist derived from the factory model, agent-runtime controls, GitHub CI patterns, and OWASP/NIST safety guidance. XHawkClaudeCopilotGitHub ActionsGitHub rulesOWASP AgentNIST
Run these checks after any layout or content edit. A static page should not ship obvious overlaps.
| Check | Acceptance rule |
|---|---|
| Button placement | Standalone CTA buttons inside callouts or cards live in an .action-row, never inline after long prose. |
| Widths | Check 390px, 768px, 1366px, and 1920px. No text, button, table, code block, diagram, or card escapes its box. |
| Zoom | Check browser zoom at 80%, 100%, 150%, and 200%. No button overlaps text, citations, borders, code, tables, or diagrams. |
| Horizontal overflow | The page itself has no horizontal scrollbar; wide diagrams, code, and tables scroll inside their own wrappers. |
| Mermaid labels | Diagram labels are short or explicitly wrapped with <br/>. No Mermaid node text is clipped in-page or in large view. |
| Diagram viewer | Open large view for every Mermaid diagram and confirm all node text is visible at browser zoom 80%, 100%, 150%, and 200%. |
| Static sanity checks | Run the inline-script parse check, the duplicate-ID/missing-anchor check, the risky-inline-button check, and the Mermaid long-label check before calling the page done. |
Docker Desktop is probably not running, or it has not finished starting. Open Docker Desktop, wait until it reports that the engine is running, then try docker version and docker compose version again.
Check that Docker Desktop is using the WSL 2 backend and that your WSL distro is enabled under Docker Desktop integration settings. If this is your first attempt, run the tutorial from a normal project folder and avoid unusual synced-drive paths until Docker is proven. Docker Desktop
node --test reports no tests found
Confirm the file is named tests/factory.test.mjs and that you are running the command from the project root. Node’s test runner discovers common test filenames, but it cannot find a file that was placed in the wrong folder or given a different extension.
Node
The token is missing or does not have enough permission. In GitHub Actions, start with the built-in GITHUB_TOKEN and explicit workflow permissions. For local scripts, use a fine-grained PAT with only the repository, issues, and pull-request access required for the script.
GitHub PATsGitHub Actions
Run from the project directory, keep the path simple, and avoid moving the tutorial between cloud-synced folders, network drives, and WSL paths during the first pass. Once the command works once, you can move it into your preferred workspace and debug path-specific behaviour separately.
Run npx wrangler whoami. If it fails locally, authenticate with Wrangler before debugging GitLab CI. In CI, use a Cloudflare API token stored as a masked GitLab CI/CD variable, not an interactive login.
WranglerGitLab variables
Check that CLOUDFLARE_API_TOKEN and CLOUDFLARE_ACCOUNT_ID exist in Settings > CI/CD > Variables. If a variable is protected, it is available only to protected branches or tags, so an ordinary merge-request branch may not receive it.
GitLab variablesProtected environments
npm ci fails in GitLab CI
Commit the lockfile produced by your package manager. npm ci is designed for repeatable CI installs and expects the repository to contain a matching lockfile.
wrangler deploy works locally but fails in GitLab CI
Run npx wrangler check locally, confirm wrangler.jsonc is committed, and confirm the GitLab job has the Cloudflare account ID and API token. If production deploys are manual or protected, also check the job’s environment and branch rules.
WranglerGitLab CIProtected environments
Start in the Cloudflare dashboard under Workers & Pages, then open the Worker deployment details. For the first attempt, use the default workers.dev route before adding a custom domain or route. Workers
No. A coding assistant mainly improves the act of writing code. A software factory encodes the broader delivery workflow: intake, context, decomposition, execution, testing, approvals, deployment, and learning. That is why sources like XHawk and Alex Op emphasise systems, loops, and workflow integration rather than only autocomplete or chat. XHawkAlex OpCopilot
No. Start with one orchestrated lane. A single assistant plus a planner-like step and a strict review step can already behave like a mini factory. Add more specialised roles only when the extra separation makes the workflow safer or clearer. Alex OpClaude
Usually not in the full architectural sense used here. Lovable is officially positioned as a full-stack AI development platform for building, iterating on, deploying web apps, syncing code to GitHub, and operating inside enterprise governance. That makes it excellent as a builder or execution layer. But the broader “software factory” idea includes explicit orchestration, workflow triggers, continuous loops, and organisation-level control of how work moves through the system. So the best practical answer is: Lovable can be part of a software factory, but is not usually the whole factory by itself. LovableXHawkAlex Op
The factory pattern survives stack changes. What matters is not React versus Express; it is whether your context is captured, execution is isolated, tests are automated, approvals are explicit, and feedback is recorded. The Node/Docker/GitHub examples here are simply a small, runnable teaching stack.
Because factories fail when each run starts from zero. Memory can be simple files, a knowledge directory, a context service, or a richer knowledge graph. The point is to stop relearning the same architecture, the same bug history, and the same conventions every session. Alex OpClaudeXHawk Context
Automate the most repetitive, lowest-risk lane that already has a clear finish line. Examples: triaging small maintenance tickets, drafting release notes, preparing PR descriptions, or proposing reversible UI changes behind tests. That is how you earn trust before moving the factory closer to production changes. Alex OpXHawkCopilot
Read the sources below in roughly this order: first the classic paper for the underlying idea, then Alex Op and XHawk for the modern AI-native interpretation, then official docs for the tooling layers you might combine.
| Source | Why it is worth your time |
|---|---|
| Greenfield and Short Industrializing Software Development |
The original conceptual backbone: software factories as structured, repeatable production systems rather than lone-coder craft. |
| Alex Op The Software Factory |
A clear modern explanation of why team bottlenecks move from “writing code” to “designing and operating the workflow”. |
| XHawk factory model Software factory page |
One of the clearest public diagrams of the AI-native factory pattern: context, orchestration, sandboxing, integrations, review, and production loop. |
| XHawk context model XHawk product overview |
Useful for understanding why a context layer is more than a prompt: it can become a system of context or knowledge graph. |
| Model Context Protocol Official MCP docs · Anthropic announcement |
The connector layer. Helpful for understanding how agents plug into tools, data, and workflows without bespoke one-off integrations. |
| Claude Code How it works · Subagents · Hooks · Memory · Schedules · Sandboxing · Agent SDK |
A rich set of building blocks for designing your own factory runtime. |
| GitHub Copilot Product page · Integrations · Guardrails · Jira · Linear |
The best source set here for GitHub-native, PR-driven agentic delivery patterns. |
| Lovable Welcome · GitHub sync · Knowledge · Security · Privacy & training · Agent mode |
The clearest official picture of Lovable as a full-stack AI builder and why it is best seen as one layer inside a wider factory architecture. |
| Replit Agent · Plan mode · Task system · Build and publish · Version control |
Helpful for understanding the “idea to live app” end of the spectrum, with background tasks and built-in hosting. |
| Docker Compose Official docs |
A straightforward way to create an isolated environment for a mini factory. |
| GitHub Actions Docs · Quickstart |
Your easiest default CI/CD layer if you already use GitHub. |
| GitHub REST API REST docs · Pull requests endpoints |
The simplest primary-source reference for PR automation. |
| GitHub CODEOWNERS and rules CODEOWNERS · Rulesets |
Official guidance for making human review and required checks enforceable. |
| GitHub personal access tokens PAT docs |
The safest starting point when a local script needs GitHub API access beyond the built-in Actions token. |
| GitLab CI/CD Pipelines · CI/CD YAML syntax |
The workflow layer for the GitLab path: tests, deploy jobs, environments, and repeatable review gates. |
| GitLab merge requests Merge requests · Merge request pipelines |
The GitLab equivalent of the review handoff: proposed changes, discussion, approval, and CI results in one place. |
| GitLab CI/CD variables Variables docs |
The official place to learn how to store non-source configuration and sensitive CI values such as Cloudflare deployment tokens. |
| GitLab protected environments Protected environments |
Useful when production deployments should require the right role, branch, or manual approval. |
| Cloudflare Workers Get started guide |
The beginner deployment target for the GitLab + Workers path: a small serverless app/API without server management. |
| Cloudflare Workers CI/CD CI/CD overview · Git integration |
Explains Workers Builds and external CI/CD choices, including GitLab-connected deployments. |
| Wrangler Wrangler docs · Configuration |
The Cloudflare CLI used to run, validate, type, and deploy Workers from local development and GitLab CI. |
| Cloudflare Workers secrets Secrets docs · Create API token |
The official reference for Worker secrets and Cloudflare API tokens. Use it before putting any credential into CI. |
| Node.js core docs fs · child_process · test runner · assert · vm · release schedule |
The built-ins used in the tutorial, with no framework dependency required. |
| Docker Desktop Install Docker Desktop · WSL integration |
The official install and Windows integration reference for the Day 0 setup check. |
| OpenAI Agents Agents guide |
A code-first option for building custom agent workflows with tools, orchestration, and tracing. |
| OWASP LLM Top 10 Project page |
The most succinct public list of common LLM application failure modes. |
| OWASP AI Agent Security Cheat sheet |
Practical controls for output validation, schema checking, scoping, and guardrails. |
| NIST AI RMF Framework overview |
The governance lens: software factories are not only technical systems but risk-management systems. |
If you only take one next step after reading this page, make it this: build one narrow, reviewable automation lane. Do not try to industrialise your whole engineering org in one jump. Start with one loop you can trust, then widen the conveyor.