Database utilities and configuration for the Bishop State ML Pipeline.
Database configuration and credentials for Postgres (Supabase).
Configuration (env vars):
DB_HOST- Database host (default:127.0.0.1)DB_PORT- Port (default:54332for local Supabase direct connection)DB_NAME- Database name (default:postgres)DB_USER/DB_PASSWORD- Credentials
Tables:
student_predictions- Student-level predictionscourse_predictions- Course-level predictionsml_model_performance- Model performance metrics
Database utility functions for saving data and managing connections.
Functions:
get_connection()- Create psycopg2 connectionget_sqlalchemy_engine()- Create SQLAlchemy enginesave_dataframe_to_db()- Save DataFrame to database in batches (10k records per batch)create_model_performance_table()- Create performance tracking tablesave_model_performance()- Save model metricstest_connection()- Test database connectivity
Script to test database connection and verify credentials.
Usage:
python -m operations.test_db_connectionfrom operations.db_utils import save_dataframe_to_db, test_connection
from operations.db_config import DB_CONFIG, TABLES
# Test connection
if test_connection():
print("Connected!")
# Save data
save_dataframe_to_db(df, TABLES['student_predictions'])# From project root
python -m operations.test_db_connectionData is saved in batches of 10,000 records with progress tracking:
Saving 500,000 records to table 'course_predictions'...
Processing in 50 batches of 10,000 records each
Batch 1/50: Saved 10,000/500,000 records (2.0%)
Batch 2/50: Saved 20,000/500,000 records (4.0%)
...
- Database credentials are stored as environment variables (never hardcoded)
- Copy
codebenders-dashboard/env.exampleto.envand fill in your values - Use environment variables for all deployments
- All database operations include error handling and fallback to CSV
- Connection pooling is enabled via SQLAlchemy
- Batch size can be adjusted in
save_dataframe_to_db()function