-
-
Notifications
You must be signed in to change notification settings - Fork 62
chore: Automate data model generation from upcoming CycloneDX 2.0 modularized specification #955
Description
Description
Currently, the data models in this library are largely written and maintained manually. While this approach has worked so far, it is time-consuming and requires significant effort for both implementation and review. This effort could be better invested in feature development and bug fixing.
With the upcoming CycloneDX 2.0 specification, a modularized and machine-readable format will be introduced. This presents an opportunity to rethink how data models are created and maintained in this project.
Reference (work in progress):
- PR https://github.com/CycloneDX/cyclonedx-python-lib/issues
- modularized schema https://github.com/CycloneDX/specification/tree/2.0-dev/schema/2.0/model
Problem
- Data models are mostly handwritten
- High maintenance overhead
- Repetitive work for contributors
- Slows down development velocity due to review effort
Proposal
Leverage the machine-readable specification planned for CycloneDX 2.0 to introduce static code generation for data models.
This would involve:
- Parsing the official CycloneDX specification (once available in machine-readable form)
- Generating Python data models automatically
- Integrating generation into the build or release process
- Minimizing manual intervention for future spec updates
There have already been proof-of-concept implementations demonstrating that automated generation of data models from the specification is feasible. These approaches should be revisited, consolidated, and applied as part of this effort.
Pipeline:
CycloneDX JSON Schema
↓
Preprocessing (if needed)
↓
Code Generation (datamodel-code-generator)
↓
Post-processing (formatting, adjustments)
↓
Generated Python Models
Possible Tools / Libraries
The following tools could be evaluated as part of this effort:
-
datamodel-code-generator — MIT
https://pypi.org/project/datamodel-code-generator/ -
pydantic — MIT
https://pypi.org/project/pydantic/ -
dataclasses-json — MIT
https://pypi.org/project/dataclasses-json/ -
dacite — MIT
https://pypi.org/project/dacite/ -
marshmallow — MIT
https://pypi.org/project/marshmallow/ -
marshmallow-jsonschema — MIT
https://pypi.org/project/marshmallow-jsonschema/ -
jsonschema (validation, not models) — MIT
https://pypi.org/project/jsonschema/ -
quicktype — Apache 2.0
https://pypi.org/project/quicktype/ -
genson (schema generator, reverse direction) — MIT
https://pypi.org/project/genson/
Community Input
Community discussions have already suggested evaluating tools such as:
- datamodel-code-generator
- json-schema-to-pydantic (https://pypi.org/project/json-schema-to-pydantic/)
- jambo (https://pypi.org/project/jambo/)
and
- de/serialization with cattrs (https://pypi.org/project/cattrs/)
- example: feat!: de/serialize with cattrs #934
These should be considered as primary candidates during evaluation.
see discussions:
Expected Benefits
- Significant reduction in maintenance effort
- Improved consistency across models
- Faster adoption of new specification versions
- More time available for feature development and bug fixing
Considerations / Open Questions
- What format will the machine-readable spec be published in (e.g., JSON Schema, OpenAPI, etc.)?
- JSON Schema it is
- Should generated code be committed or generated at build time?
- decision: generated before build time, and commited to the repo
- How to handle custom logic or extensions on top of generated models?
- Backward compatibility with CycloneDX 1.x
- easy path: breaking change in the library, and only support 2.0 from then on
Additional Context
This proposal aligns with the direction of CycloneDX 2.0, which aims to make the specification more modular and tooling-friendly. Taking advantage of this early could significantly improve long-term maintainability of this library.
Note: This issue is intended as a meta-ticket to collect related subtasks and track overall implementation efforts.