Skip to content

docxology/template

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

488 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ”¬ Docxology Template

Production-grade scaffold for reproducible computational research Pipelines Β· Manuscripts Β· Cryptographic Provenance Β· AI-Agent Collaboration

CI Python 3.10+ uv Ruff License: Apache 2.0 Version DOI

πŸ“„ Published: A template/ approach to Reproducible Generative Research: Architecture and Ergonomics from Configuration through Publication β€” DOI: 10.5281/zenodo.19139090

⚑ Quick Start Β· πŸ“ Architecture Β· πŸ”„ Pipeline Β· πŸ€– AI Collaboration Β· πŸ”’ Provenance Β· πŸ“š Docs


About .github/

This folder is the GitHub integration surface: Actions workflows, Dependabot, and issue/PR templates. It is not importable application code.

Document Use
This file Human overview while browsing the repo on GitHub
AGENTS.md Technical index: job names, triggers, coverage thresholds
workflows/README.md CI job graph and commands to mirror CI locally

Related (repo root): .cursor/skill_manifest.json lists agent SKILL.md descriptors. After adding or editing infrastructure/**/SKILL.md, run uv run python -m infrastructure.skills write and commit the updated manifest (see infrastructure/skills/).


What Is This?

The Docxology Template solves the structural root of research irreproducibility: fragmentation between code, tests, manuscripts, and provenance. Instead of patching tools together, it enforces integrity at the architectural level.

You get How
Reproducible builds 8-stage pipeline from env setup β†’ PDF β†’ hashed artifact
Real test enforcement Zero-Mock policy Β· β‰₯90% project coverage Β· β‰₯60% infra coverage
Cryptographic provenance SHA-256/512 hashing + steganographic watermarking on every PDF
Horizontal scaling N independent projects share one infrastructure layer β€” no coupling
AI-agent-ready codebase AGENTS.md + README.md per directory; SKILL.md + .cursor/skill_manifest.json for routing
Interactive orchestration run.sh TUI menu for humans Β· --pipeline flag for CI

⚑ Quick Start

# 1. Create your research repo from this template
gh repo create my-research --template docxology/template --private
cd my-research

# 2. Install dependencies
uv sync

# 3. Run the interactive pipeline menu
./run.sh

# 4. Or run non-interactively against the exemplar project
./run.sh --pipeline --project code_project

# Outputs β†’ output/code_project/

Don't have uv? β†’ curl -Ls https://astral.sh/uv/install.sh | sh

See the full walkthrough in docs/RUN_GUIDE.md and docs/guides/getting-started.md.


πŸ“ Architecture

The repository is organized into two strictly separated layers β€” shared infrastructure that never changes per-project, and self-contained project workspaces that know nothing about each other.

graph TD
    Root["/ (Repository Root)"] --> Infra["infrastructure/ (Layer 1 β€” Shared)"]
    Root --> Scripts["scripts/ (Pipeline Stage Scripts)"]
    Root --> Projects["projects/ (Layer 2 β€” Project Workspaces)"]
    Root --> Docs["docs/ (Documentation Hub β€” 90+ files)"]
    Root --> Output["output/ (Final Deliverables)"]

    subgraph "Layer 1 Β· 13 infrastructure subpackages Β· ~150 Python modules"
        Infra --> Core["core/ β€” logging, config, exceptions"]
        Infra --> Rendering["rendering/ β€” Pandoc + XeLaTeX"]
        Infra --> Stego["steganography/ β€” SHA-256 + watermarking"]
        Infra --> Valid["validation/ β€” PDF + Markdown integrity"]
        Infra --> LLM["llm/ β€” Ollama review + translation"]
        Infra --> More["+ publishing, reporting, scientific, project, documentation…"]
    end

    subgraph "Layer 2 Β· Project Workspaces (add as many as needed)"
        Projects --> CP["code_project/ ← Active exemplar project"]
        Projects --> Dots["your_project/ ← Drop in; auto-discovered"]
    end
Loading

Directory Reference

Path Persistence Purpose
infrastructure/ Permanent 13 subpackages (+ hub SKILL.md); see infrastructure/SKILL.md
projects/ Permanent Active projects β€” discovered and executed by pipeline
projects_in_progress/ Transient Staging area: scaffold here before promoting
projects_archive/ Permanent Completed/retired work β€” preserved, not executed
scripts/ Permanent 8 generic pipeline stage scripts (Stages 00–07)
output/ Disposable Final PDFs, dashboards, reports
docs/ Permanent 90+ documentation files across 13 subdirectories

Key invariant: All domain logic lives in projects/{name}/src/. Scripts are thin orchestrators β€” they import and call, never implement. See docs/architecture/thin-orchestrator-summary.md.


πŸ“ Active Exemplar Project: code_project

projects/code_project/ is the canonical example of a complete, working project in this template. Use it as the reference when building your own.

It demonstrates:

Feature Implementation
Gradient descent optimization src/code_project/optimizer.py
Scientific benchmarking uses infrastructure.scientific
39 tests, 100% coverage tests/ β€” Zero-Mock, real operations only
6 publication-quality figures generated in scripts/, registered via FigureManager
Full pipeline output PDF rendered, validated, steganographically signed
Complete documentation AGENTS.md + README.md throughout
projects/code_project/
β”œβ”€β”€ src/code_project/      # All domain logic (optimizer, analysis)
β”œβ”€β”€ tests/                 # 39 real tests β€” no mocks
β”œβ”€β”€ scripts/               # Thin orchestrators calling src/
β”œβ”€β”€ manuscript/            # Markdown chapters + config.yaml
β”œβ”€β”€ output/                # Pipeline artifacts (generated)
└── AGENTS.md              # AI-agent context for this project

To add your own project, follow docs/guides/new-project-setup.md.

Project Lifecycle

stateDiagram-v2
    [*] --> InProgress: Create scaffold
    InProgress --> Active: Add src/ + tests/ + manuscript/config.yaml
    Active --> Archive: Complete / retire
    Archive --> Active: Reactivate

    InProgress : projects_in_progress/ β€” not executed
    Active : projects/ β€” auto-discovered, pipeline-executed
    Archive : projects_archive/ β€” preserved, not executed
Loading

πŸ”„ Pipeline

run.sh executes an 8-stage pipeline (Scripts 00–07). secure_run.sh appends steganographic post-processing.

flowchart LR
    Start([run.sh]) --> S0[00 Setup]
    S0 --> S1[01 Tests]
    S1 --> S2[02 Analysis]
    S2 --> S3[03 Render PDF]
    S3 --> S4[04 Validate]
    S4 --> S5[05 Copy Outputs]
    S5 --> S6[06 LLM Review]
    S6 --> S7[07 Exec Report]
    S7 --> End([Deliverables])

    subgraph "Core β€” always run"
        S1
        S2
        S3
        S4
    end
Loading
Stage Script Failure Mode
00 Setup 00_setup_environment.py Hard fail
01 Tests 01_run_tests.py Configurable tolerance
02 Analysis 02_run_analysis.py Hard fail
03 Render PDF 03_render_pdf.py Hard fail
04 Validate 04_validate_output.py Warning + report
05 Copy 05_copy_outputs.py Soft fail
06 LLM Review 06_llm_review.py Skippable (requires Ollama)
07 Exec Report 07_generate_executive_report.py Soft fail

Full stage details: docs/core/workflow.md Β· docs/core/how-to-use.md.


πŸ€– AI Collaboration

Every directory at every level contains two documentation files:

  • README.md β€” Human-readable overview and quick-start
  • AGENTS.md β€” Machine-readable spec for AI coding assistants: API tables, dependency graphs, architectural constraints, naming conventions

Under infrastructure/, each subpackage also has SKILL.md (YAML frontmatter). The aggregated list for editors is .cursor/skill_manifest.json (regenerate with uv run python -m infrastructure.skills write). Cursor project rules live under .cursor/rules/.

CLAUDE.md (root)          ← Global constraints: Zero-Mock, Thin Orchestrator, naming
  └── AGENTS.md (per dir) ← Local API surfaces, file inventories, integration patterns
        └── README.md     ← Human navigation and quick-start

See docs/rules/ for standards and infrastructure/SKILL.md for the infrastructure skill hub.


πŸ”’ Security & Provenance

Every rendered PDF is automatically processed by the steganographic pipeline via secure_run.sh:

Layer Mechanism Survives
PDF Metadata XMP + Info dictionary (author, DOI, ORCID, build timestamp) All viewers
Hash manifest SHA-256 + SHA-512 in *.hashes.json External verification
Alpha overlay Low-opacity text per page (build time + commit hash) Standard PDF operations, printing
QR code Repository URL injected on final page Redistribution

Full specification: docs/security/steganography.md Β· docs/security/hashing_and_manifests.md Β· docs/security/secure_execution.md.


πŸ§ͺ Testing Standards

Standard Requirement
Zero-Mock policy No MagicMock, mocker.patch, or unittest.mock anywhere
Real operations Tests use real filesystem, subprocess, and HTTP calls
Infrastructure coverage β‰₯ 60% (currently achieving 83%+)
Project coverage β‰₯ 90% (currently achieving 100% in code_project)
Optional service skipping @pytest.mark.requires_ollama for graceful degradation
# Mirror CI locally
uv run pytest tests/infra_tests/ --cov=infrastructure --cov-fail-under=60 -m "not requires_ollama"
uv run pytest projects/code_project/tests/ --cov-fail-under=90 -m "not requires_ollama"
python scripts/verify_no_mocks.py

See docs/development/testing/ and docs/guides/testing-and-reproducibility.md.


πŸ“š Documentation Hub

The docs/ directory contains 90+ files across 13 subdirectories. Every subdirectory has its own README.md and AGENTS.md. Start at docs/README.md or docs/documentation-index.md.

πŸ“‚ Core (docs/core/)

Essential start-here docs β€” read these first.

File Purpose
how-to-use.md Step-by-step usage guide for the full system
workflow.md Pipeline workflow: stages, flags, modes
architecture.md Two-Layer Architecture overview

πŸ—οΈ Architecture (docs/architecture/)

Design decisions, patterns, and migration guides.

File Purpose
two-layer-architecture.md Deep dive into the Layer 1 / Layer 2 separation
thin-orchestrator-summary.md The Thin Orchestrator pattern β€” why and how
testing-strategy.md Testing architecture and Zero-Mock rationale
decision-tree.md Where does new code go? Decision guide
migration-from-flat.md Migrating a flat repo to the Two-Layer model

πŸ“– Guides (docs/guides/)

Progressive tutorials from first project to advanced automation.

File Purpose
getting-started.md First-time setup and first pipeline run
new-project-setup.md Full checklist for adding a new project
figures-and-analysis.md Generating, registering, and embedding figures
testing-and-reproducibility.md Writing real tests, coverage, markers
extending-and-automation.md Customizing the pipeline, adding CI stages

πŸ“ Rules (docs/rules/)

Authoritative standards enforced across the codebase.

File Purpose
testing_standards.md Zero-Mock policy, coverage thresholds, markers
code_style.md Ruff config, formatting, naming conventions
documentation_standards.md AGENTS.md / README.md duality requirements
manuscript_style.md Chapter structure, figure captions, citations
api_design.md Module API conventions, dataclass patterns
error_handling.md Exception hierarchy, pipeline flow control
security.md Dependency pinning, secrets management
llm_standards.md Ollama integration, prompt templates, markers
python_logging.md Structured logging via get_logger
type_hints_standards.md mypy-compatible type annotation requirements
infrastructure_modules.md Module API contracts and extension patterns
git_workflow.md Branch strategy, commit conventions, PRs
folder_structure.md Directory layout invariants

πŸ”§ Operational (docs/operational/)

Build, config, logging, and troubleshooting.

Directory / File Purpose
build/ Build system internals and stage details
config/ config.yaml reference, environment variables
logging/ Logging configuration, log levels, rotation
troubleshooting/ Common errors, rendering issues, coverage gaps
error-handling-guide.md Pipeline error handling patterns
reporting-guide.md Executive reports, coverage JSON, dashboards

πŸ“¦ Modules (docs/modules/)

Infrastructure subpackage documentation.

File Purpose
modules-guide.md Overview of infrastructure subpackages
pdf-validation.md infrastructure.validation β€” PDF integrity checking
scientific-simulation-guide.md infrastructure.scientific β€” stability, benchmarking
guides/ Per-module usage guides

πŸ“ Usage (docs/usage/)

Manuscript authoring, style, and content guides.

File Purpose
markdown-template-guide.md Chapter structure, frontmatter, Pandoc quirks
style-guide.md Voice, tense, academic writing conventions
manuscript-numbering-system.md Section/figure/table numbering
visualization-guide.md Figure accessibility standards (16pt floor, colorblind palettes)
image-management.md Figure registration, paths, captions
examples.md Worked example manuscript snippets
examples-showcase.md Gallery of generated figures from exemplar projects

πŸ“ Best Practices (docs/best-practices/)

Project hygiene, version control, and multi-project management.

File Purpose
best-practices.md Consolidated best practices across all concerns
multi-project-management.md Managing N projects, discovery rules, isolation
version-control.md Git workflow, tagging, output tracking
migration-guide.md Upgrading the template across major versions
backup-recovery.md Output preservation, disaster recovery

πŸ› οΈ Development (docs/development/)

Contributing, testing internals, roadmap.

File Purpose
contributing.md How to contribute β€” branch, test, PR
testing/ Test writing guide, coverage analysis, patterns
coverage-gaps.md Known low-coverage modules and improvement plans
roadmap.md Feature roadmap and planned improvements
security.md Security disclosure policy
code-of-conduct.md Community standards

πŸ€– Prompts (docs/prompts/)

Reusable AI agent prompt templates for common tasks.

File Purpose
infrastructure_module.md Creating a new infrastructure subpackage
feature_addition.md Adding a feature to an existing module
test_creation.md Writing Zero-Mock tests
manuscript_creation.md Authoring a new manuscript chapter
refactoring.md Safe refactoring with test preservation
validation_quality.md Adding validation and quality gates
documentation_creation.md Writing AGENTS.md / README.md
code_development.md General code development patterns
comprehensive_assessment.md Full pipeline + codebase audit

πŸ“– Reference (docs/reference/)

API reference, glossary, cheatsheets, and FAQ.

File Purpose
api-reference.md Public API reference for infrastructure modules
api-project-modules.md Project-level module patterns and conventions
glossary.md Definitions for all template-specific terms
faq.md Frequently asked questions
quick-start-cheatsheet.md One-page command reference
common-workflows.md Recipes for common research tasks
copypasta.md Copy-paste code snippets for common patterns

πŸ”’ Security (docs/security/)

Steganography, hashing, and secure execution.

File Purpose
steganography.md Watermarking layers, alpha overlay, QR injection
hashing_and_manifests.md SHA-256/512 hash manifests and tamper detection
secure_execution.md secure_run.sh, steganography config, output files

πŸ” Audit (docs/audit/)

Documentation review reports and filepath audits.

File Purpose
documentation-review-report.md Comprehensive documentation audit results
filepath-audit-report.md File path accuracy and broken link report

πŸš€ Top-Level Docs

File Purpose
docs/README.md Documentation hub index and navigation
docs/documentation-index.md Full inventory of all 90+ documentation files
docs/RUN_GUIDE.md Complete run guide: modes, flags, troubleshooting
docs/CLOUD_DEPLOY.md Cloud deployment guide (AWS, GCP, Azure, Docker)
docs/PAI.md Personal AI Infrastructure integration guide

πŸ”§ CI/CD

Workflows

Workflow Trigger Purpose
ci.yml push Β· PR Β· weekly Β· manual Full 7-job quality gate
stale.yml Daily 01:00 UTC Close inactive issues/PRs
release.yml v*.*.* tag Β· manual GitHub Release with changelog

CI Job Flow

graph TD
    L[Job 1: Lint & Type Check] --> VNM[Job 2: Verify No Mocks]
    VNM --> TI[Job 3: Infra Tests]
    VNM --> TP[Job 4: Project Tests]
    L --> VM[Job 5: Validate Manuscript]
    L --> SS[Job 6: Security Scan]
    TI --> PC[Job 7: Performance Check]
    TP --> PC

    style L fill:#f9f,stroke:#333,stroke-width:2px
    style PC fill:#bbf,stroke:#333,stroke-width:2px
Loading

Quality Gates

Gate Tool Threshold
Code style Ruff zero violations
Formatting Ruff zero diffs
Type safety mypy no errors
No mocks verify_no_mocks.py zero mock usage
Infra coverage pytest-cov β‰₯ 60%
Project coverage pytest-cov β‰₯ 90%
Security Bandit MEDIUM+ zero findings
Performance import timer ≀ 5 s

Simulate CI Locally

# Lint + format check
uv run ruff check infrastructure/ projects/*/src/
uv run ruff format --check infrastructure/ projects/*/src/

# Tests (skip Ollama-requiring tests)
uv run pytest tests/infra_tests/ --cov=infrastructure --cov-fail-under=60 -m "not requires_ollama"
uv run pytest projects/code_project/tests/ --cov-fail-under=90 -m "not requires_ollama"

# Security
uv run pip-audit
uv run bandit -r -ll infrastructure/ scripts/ projects/ \
  --exclude projects_archive,projects_in_progress

Branch Protection

Set in Settings β†’ Branches β†’ main:

Required status checks:
  Lint & Type Check
  Infra Tests (ubuntu-latest, Python 3.10/3.11/3.12)
  Project Tests (ubuntu-latest, Python 3.10/3.11/3.12)
  Validate Manuscripts Β· Security Scan Β· Performance Check

Require PR review before merging: 1 approver
Secret Required for
CODECOV_TOKEN Coverage upload to Codecov (optional)

πŸ“‹ Issue & PR Templates

Issues β†’ New Issue

Template Labels Best for
πŸ› Bug Report bug Β· needs-triage Reproducible errors with log output and pipeline stage
✨ Feature Request enhancement · needs-triage New capabilities with priority and alternatives
πŸ“ Documentation documentation Β· needs-triage Incorrect, missing, or outdated docs with file paths

πŸ’¬ Questions? Use GitHub Discussions β€” blank issues are disabled.

PR Checklist β†’ PULL_REQUEST_TEMPLATE.md

  • βœ… Linked issue Β· type-of-change label Β· pipeline stage(s) affected
  • βœ… Test evidence β€” local run confirmation with pass rates
  • βœ… Zero-Mock confirmation β€” no MagicMock / mocker.patch
  • βœ… Thin Orchestrator compliance β€” no logic in scripts

πŸ“¦ Dependency Management

dependabot.yml β€” weekly automated PRs:

Ecosystem Group Max PRs
GitHub Actions all minor/patch batched 5
Python (uv) dev-tools (pytest, mypy, ruff…) 5
Python (uv) scientific-core (numpy, scipy…) 5

πŸ” Troubleshooting

# CI status
gh run list --workflow=CI --limit=5
gh run view <run-id> --log-failed
gh run rerun <run-id> --failed

# Fix lint locally
uvx ruff check infrastructure/ projects/*/src/ --fix
uvx ruff format infrastructure/ projects/*/src/

Common issues: docs/operational/troubleshooting/ Β· docs/reference/faq.md.


πŸ“– AGENTS.md Β· πŸš€ Run Guide Β· πŸ“ Architecture Β· πŸ“‹ Rules Β· πŸ› Issues Β· πŸ’¬ Discussions

Reproducibility as architecture, not afterthought.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors