Automated Data Update System
This document describes the automated data update system for the AfCFTA project.
Overview
The system automatically updates data from external sources on a daily schedule and can also be triggered manually. It fetches fresh data from:
- World Bank API (economic indicators)
- Country profiles and statistics
- JSON data files (ports, airports, production data)
Components
1. Data Update Script (backend/update_data_automated.py)
A Python script that:
- Fetches latest data from World Bank API
- Updates country economic profiles
- Refreshes JSON data files with timestamps
- Generates detailed update reports
Usage:
# Run manually
python backend/update_data_automated.py
Features:
- Graceful error handling
- Detailed logging
- Update reports in JSON format
- Rate limiting to respect API limits
2. GitHub Actions Workflow (.github/workflows/auto_update_data.yml)
An automated workflow that:
- Runs daily at 2:00 AM UTC
- Can be triggered manually from the Actions tab
- Commits and pushes data changes automatically
- Uploads update reports as artifacts
Schedule: Daily at 2:00 AM UTC (configurable via cron expression)
Manual Trigger:
- Go to the repository’s Actions tab
- Select “Auto Update Data” workflow
- Click “Run workflow”
- Choose the update type (all, worldbank, production, trade)
- Click “Run workflow”
Files Generated
The update process generates/updates the following files:
worldbank_data_latest.json - Latest World Bank data for all African countries
data_update_report.json - Detailed log of the update process (gitignored)
ports_africains.json - Updated with timestamps
airports_africains.json - Updated with timestamps
production_africaine.json - Updated with timestamps
Data Sources
World Bank API
Fetches the following indicators for all 54 African countries:
- GDP (NY.GDP.MKTP.CD): GDP in current US$
- GDP per capita (NY.GDP.PCAP.CD): GDP per capita in current US$
- Population (SP.POP.TOTL): Total population
- GDP growth (NY.GDP.MKTP.KD.ZG): Annual GDP growth rate
Data is fetched for years 2020-2024 (most recent 5 years).
Workflow Details
Automatic Execution
The workflow runs automatically every day at 2:00 AM UTC:
schedule:
- cron: '0 2 * * *'
Manual Execution
You can manually trigger the workflow with different update types:
- all: Update all data sources (default)
- worldbank: Update only World Bank data
- production: Update only production data
- trade: Update only trade data
Workflow Steps
- Checkout: Checks out the repository
- Setup Python: Installs Python 3.11 with pip caching
- Install Dependencies: Installs required packages (requests, openpyxl, pandas)
- Run Update Script: Executes the data update script
- Check Changes: Detects if any data was modified
- Commit & Push: Commits changes, pulls latest remote changes (with rebase), and pushes
- Generate Summary: Creates a summary in the workflow output
- Upload Report: Uploads the update report as an artifact (retained for 30 days)
Note on Step 6: The workflow uses git pull --rebase before pushing to prevent non-fast-forward errors when the remote branch has been updated by another workflow or manual commit. If conflicts occur on data files, the workflow automatically resolves them by preferring the new data (our changes).
Error Handling
The system is designed to be resilient:
- API Errors: Network errors are logged but don’t fail the workflow
- Rate Limiting: Built-in delays between API calls
- Graceful Degradation: If external APIs fail, local updates still proceed
- Detailed Logging: All actions are logged for debugging
Monitoring
View Update Reports
Update reports are available in two ways:
- Workflow Summary: Each workflow run generates a summary visible in the Actions tab
- Artifacts: Detailed JSON reports are uploaded and retained for 30 days
Check Update Status
# View the latest update report
cat data_update_report.json
The report includes:
- Timestamp
- Status (completed/failed)
- Number of updates performed
- Warnings and errors
- Detailed log of all operations
Configuration
Change Update Frequency
Edit the cron schedule in .github/workflows/auto_update_data.yml:
schedule:
- cron: '0 2 * * *' # Daily at 2:00 AM UTC
Examples:
0 */6 * * * - Every 6 hours
0 0 * * 1 - Weekly on Mondays
0 0 1 * * - Monthly on the 1st
Add New Data Sources
To add new data sources, edit backend/update_data_automated.py:
- Add a new method to the
DataUpdater class
- Call the method in the
main() function
- Update the documentation
Example:
def update_trade_data(self):
"""Update trade statistics"""
self.log("Updating trade data...")
# Your implementation here
Integration with Existing Workflows
This workflow complements the existing lyra_plus_ops.yml workflow:
- lyra_plus_ops.yml: Updates AfCFTA-specific datasets (tariffs, rules of origin) weekly
- auto_update_data.yml: Updates general economic data (World Bank, country profiles) daily
Both workflows work independently and can run concurrently.
Troubleshooting
Workflow Not Running
- Check that GitHub Actions is enabled in repository settings
- Verify the cron schedule syntax
- Ensure the workflow file is in
.github/workflows/
API Errors
- World Bank API might be temporarily unavailable
- Check API status at https://data.worldbank.org/
- Review the update report for specific error messages
No Data Changes
This is normal if:
- World Bank hasn’t released new data
- The data is the same as the previous update
- The workflow will still run but not commit anything
Permission Errors
Ensure the workflow has write permissions:
permissions:
contents: write
Git Push Rejection (Non-Fast-Forward)
Problem: The workflow fails with error:
! [rejected] main -> main (non-fast-forward)
error: failed to push some refs
Cause: The remote branch has changed since the workflow started (e.g., another workflow or manual push occurred).
Solution: The workflow now includes automatic conflict resolution:
- Pull and Rebase: Before pushing, the workflow pulls the latest changes and rebases local commits on top
- Fallback to Merge: If rebase fails (e.g., conflicts), it falls back to a merge strategy
- Retry Logic: If push still fails, it retries up to 3 times with a 2-second delay between attempts
This ensures that the automated workflow can handle concurrent changes without manual intervention.
Implementation details:
# Pull and rebase before pushing
git pull --rebase origin main || {
# If rebase fails, abort and try merge
git rebase --abort 2>/dev/null || true
git pull --no-rebase origin main
}
# Retry push up to 3 times
# (with pull between retries to handle new remote changes)
This same fix has been applied to both:
.github/workflows/auto_update_data.yml (daily data updates)
.github/workflows/lyra_plus_ops.yml (weekly Lyra+ dataset updates)
Push Rejected (Non-Fast-Forward)
If you see errors like “rejected (non-fast-forward)” or “failed to push some refs”:
This has been fixed! The workflows now automatically:
- Pull the latest changes from the remote branch before pushing
- Rebase local commits on top of remote changes
- Handle merge conflicts automatically for data files
- Retry the push after synchronizing
The fix ensures that multiple workflows or manual commits don’t cause push failures.
Best Practices
- Monitor the first few runs to ensure everything works as expected
- Review update reports periodically to catch any issues
- Don’t modify data files manually - let the automation handle it
- Keep dependencies updated in the workflow file
- Test changes locally before pushing workflow modifications
Future Enhancements
Potential improvements:
Support
For issues or questions:
- Open an issue on GitHub
- Check the Actions tab for workflow logs
- Review
data_update_report.json for detailed error information