Skip to content
@canopy-datahub

canopy-datahub

Canopy

Canopy is an open-source platform for FAIR-aligned scientific data hubs, supporting data sharing, harmonization, discovery, and reuse across research studies. Canopy is derived from the NIH RADx Data Hub (https://radxdatahub.nih.gov/), a cloud-based platform originally developed for the NIH Rapid Acceleration of Diagnostics (RADx) program. RADx Data Hub is available on GitHub. Rather than presenting a one-size-fits-all data hub, Canopy enables customization of RADx Data Hub technology for the needs of specific scientific domains.

Live demo: A demonstration instance of Canopy is publicly available at canopy.stanford.edu. All studies, datasets, and files on that site are synthetic and intended for demonstration purposes only.


Getting Started

Deploying Canopy to AWS
Start here → Deployment Guide

Exploring the codebase?
Start here → Repositories — links to every service, tool, and guide

Want to contribute?
Start here → Contributing Guide


Architecture

Canopy runs on AWS as a microservices platform:

  • 7 Spring Boot microservices on ECS Fargate, behind an Application Load Balancer
  • Next.js / React frontend with server-side rendering
  • PostgreSQL (RDS) for relational data persistence
  • OpenSearch for full-text and faceted search
  • AWS Lambda for asynchronous email processing and search reindexing
  • S3 for dataset file storage
  • Keycloak for authentication and authorization
  • CloudFormation (IaC) for repeatable, auditable AWS deployments

Repository Map

Backend Services (Spring Boot)

Repository Description
datahub-service-entity Direct retrieval of database entities
datahub-service-search Search across studies and variables
datahub-service-user User info, profiles, and support requests
datahub-service-submission Data and study ingestion workflows
datahub-service-report Metrics dashboard and reporting
datahub-service-download Controlled dataset file downloads
datahub-service-email Lambda-based email notifications via AWS SES
datahub-lib-keycloak-auth Shared Keycloak authentication library
datahub-project Maven parent POM for all Java services

Frontend

Repository Description
datahub-ui-main Next.js / React web application

Infrastructure & Deployment

Repository Description
datahub-cloud-replication AWS CloudFormation templates
datahub-development PostgreSQL schema scripts, seed data, OpenSearch Lambda, Keycloak Docker Compose
datahub-docs Deployment guide, limitations, and operator documentation
datahub-deployment-scripts Automation scripts supporting deployment and operations

Developer Tooling

Repository Description
datahub-cli CLI for local development and server management
datahub-utility-scripts Automation helpers and publication utilities

Popular repositories Loading

  1. datahub-development datahub-development Public

    Development related collection

    PLpgSQL

  2. datahub-service-entity datahub-service-entity Public

    DataHub Entity Service

  3. datahub-service-submission datahub-service-submission Public

    Data Hub Submission Service

  4. datahub-service-user datahub-service-user Public

    DataHub User Service

  5. datahub-service-report datahub-service-report Public

    DataHub Report Service

  6. datahub-service-download datahub-service-download Public

    DataHub Download Service

    Java

Repositories

Showing 10 of 15 repositories

Top languages

Loading…

Most used topics

Loading…