afcfta-final-001

Automated Data Update System

This document describes the automated data update system for the AfCFTA project.

Overview

The system automatically updates data from external sources on a daily schedule and can also be triggered manually. It fetches fresh data from:

Components

1. Data Update Script (backend/update_data_automated.py)

A Python script that:

Usage:

# Run manually
python backend/update_data_automated.py

Features:

2. GitHub Actions Workflow (.github/workflows/auto_update_data.yml)

An automated workflow that:

Schedule: Daily at 2:00 AM UTC (configurable via cron expression)

Manual Trigger:

  1. Go to the repository’s Actions tab
  2. Select “Auto Update Data” workflow
  3. Click “Run workflow”
  4. Choose the update type (all, worldbank, production, trade)
  5. Click “Run workflow”

Files Generated

The update process generates/updates the following files:

Data Sources

World Bank API

Fetches the following indicators for all 54 African countries:

Data is fetched for years 2020-2024 (most recent 5 years).

Workflow Details

Automatic Execution

The workflow runs automatically every day at 2:00 AM UTC:

schedule:
  - cron: '0 2 * * *'

Manual Execution

You can manually trigger the workflow with different update types:

Workflow Steps

  1. Checkout: Checks out the repository
  2. Setup Python: Installs Python 3.11 with pip caching
  3. Install Dependencies: Installs required packages (requests, openpyxl, pandas)
  4. Run Update Script: Executes the data update script
  5. Check Changes: Detects if any data was modified
  6. Commit & Push: Commits changes, pulls latest remote changes (with rebase), and pushes
  7. Generate Summary: Creates a summary in the workflow output
  8. Upload Report: Uploads the update report as an artifact (retained for 30 days)

Note on Step 6: The workflow uses git pull --rebase before pushing to prevent non-fast-forward errors when the remote branch has been updated by another workflow or manual commit. If conflicts occur on data files, the workflow automatically resolves them by preferring the new data (our changes).

Error Handling

The system is designed to be resilient:

Monitoring

View Update Reports

Update reports are available in two ways:

  1. Workflow Summary: Each workflow run generates a summary visible in the Actions tab
  2. Artifacts: Detailed JSON reports are uploaded and retained for 30 days

Check Update Status

# View the latest update report
cat data_update_report.json

The report includes:

Configuration

Change Update Frequency

Edit the cron schedule in .github/workflows/auto_update_data.yml:

schedule:
  - cron: '0 2 * * *'  # Daily at 2:00 AM UTC

Examples:

Add New Data Sources

To add new data sources, edit backend/update_data_automated.py:

  1. Add a new method to the DataUpdater class
  2. Call the method in the main() function
  3. Update the documentation

Example:

def update_trade_data(self):
    """Update trade statistics"""
    self.log("Updating trade data...")
    # Your implementation here

Integration with Existing Workflows

This workflow complements the existing lyra_plus_ops.yml workflow:

Both workflows work independently and can run concurrently.

Troubleshooting

Workflow Not Running

API Errors

No Data Changes

This is normal if:

Permission Errors

Ensure the workflow has write permissions:

permissions:
  contents: write

Git Push Rejection (Non-Fast-Forward)

Problem: The workflow fails with error:

! [rejected]        main -> main (non-fast-forward)
error: failed to push some refs

Cause: The remote branch has changed since the workflow started (e.g., another workflow or manual push occurred).

Solution: The workflow now includes automatic conflict resolution:

  1. Pull and Rebase: Before pushing, the workflow pulls the latest changes and rebases local commits on top
  2. Fallback to Merge: If rebase fails (e.g., conflicts), it falls back to a merge strategy
  3. Retry Logic: If push still fails, it retries up to 3 times with a 2-second delay between attempts

This ensures that the automated workflow can handle concurrent changes without manual intervention.

Implementation details:

# Pull and rebase before pushing
git pull --rebase origin main || {
  # If rebase fails, abort and try merge
  git rebase --abort 2>/dev/null || true
  git pull --no-rebase origin main
}

# Retry push up to 3 times
# (with pull between retries to handle new remote changes)

This same fix has been applied to both:

If you see errors like “rejected (non-fast-forward)” or “failed to push some refs”:

This has been fixed! The workflows now automatically:

  1. Pull the latest changes from the remote branch before pushing
  2. Rebase local commits on top of remote changes
  3. Handle merge conflicts automatically for data files
  4. Retry the push after synchronizing

The fix ensures that multiple workflows or manual commits don’t cause push failures.

Best Practices

  1. Monitor the first few runs to ensure everything works as expected
  2. Review update reports periodically to catch any issues
  3. Don’t modify data files manually - let the automation handle it
  4. Keep dependencies updated in the workflow file
  5. Test changes locally before pushing workflow modifications

Future Enhancements

Potential improvements:

Support

For issues or questions: