Skip to content

chore: Automate data model generation from upcoming CycloneDX 2.0 modularized specification #955

@jkowalleck

Description

@jkowalleck

Description

Currently, the data models in this library are largely written and maintained manually. While this approach has worked so far, it is time-consuming and requires significant effort for both implementation and review. This effort could be better invested in feature development and bug fixing.

With the upcoming CycloneDX 2.0 specification, a modularized and machine-readable format will be introduced. This presents an opportunity to rethink how data models are created and maintained in this project.

Reference (work in progress):


Problem

  • Data models are mostly handwritten
  • High maintenance overhead
  • Repetitive work for contributors
  • Slows down development velocity due to review effort

Proposal

Leverage the machine-readable specification planned for CycloneDX 2.0 to introduce static code generation for data models.

This would involve:

  • Parsing the official CycloneDX specification (once available in machine-readable form)
  • Generating Python data models automatically
  • Integrating generation into the build or release process
  • Minimizing manual intervention for future spec updates

There have already been proof-of-concept implementations demonstrating that automated generation of data models from the specification is feasible. These approaches should be revisited, consolidated, and applied as part of this effort.

Pipeline:

CycloneDX JSON Schema
        ↓
Preprocessing (if needed)
        ↓
Code Generation (datamodel-code-generator)
        ↓
Post-processing (formatting, adjustments)
        ↓
Generated Python Models

Possible Tools / Libraries

The following tools could be evaluated as part of this effort:


Community Input

Community discussions have already suggested evaluating tools such as:

and

These should be considered as primary candidates during evaluation.

see discussions:


Expected Benefits

  • Significant reduction in maintenance effort
  • Improved consistency across models
  • Faster adoption of new specification versions
  • More time available for feature development and bug fixing

Considerations / Open Questions

  • What format will the machine-readable spec be published in (e.g., JSON Schema, OpenAPI, etc.)?
    • JSON Schema it is
  • Should generated code be committed or generated at build time?
    • decision: generated before build time, and commited to the repo
  • How to handle custom logic or extensions on top of generated models?
  • Backward compatibility with CycloneDX 1.x
    • easy path: breaking change in the library, and only support 2.0 from then on

Additional Context

This proposal aligns with the direction of CycloneDX 2.0, which aims to make the specification more modular and tooling-friendly. Taking advantage of this early could significantly improve long-term maintainability of this library.


Note: This issue is intended as a meta-ticket to collect related subtasks and track overall implementation efforts.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions