branber.io
Back to projects

MMA Almanac AI

An ML fight-prediction engine with three XGBoost models, probability calibration, and an OpenAI-powered fight-article generator.

Last pushed Nov 2025PythonShellMakefileDockerfile
View on GitHub

About this project

What it is

The prediction engine for the MMA Almanac platform. It reads fighter and fight-statistics data from PostgreSQL, engineers a rich feature set, then trains three separate XGBoost gradient-boosted models: one that predicts the fight winner, one that predicts the method of victory (KO/TKO, submission, decision), and one that predicts the finishing round. Each model goes through probability calibration and is validated against an explicit four-phase data-leakage test suite. A separate module uses OpenAI's API to generate readable fight-preview articles keyed to upcoming cards. The whole engine is served as a FastAPI prediction API, deployed on AWS ECS, with scheduled retrain and hyperparameter-tune tasks managed by EventBridge and Lambda.

Engineering highlights

  • Three specialist XGBoost models: Win / Method / Round predictors — each with its own feature-engineering and preprocessing pipeline
  • Probability calibration on all three models to produce reliable confidence scores
  • Four-phase data-leakage test suite: feature leakage, temporal leakage, position bias, and prediction bias
  • OpenAI-powered ArticleGenerator that drafts fight-preview articles for upcoming cards
  • FastAPI prediction API deployed on AWS ECS Fargate with ECR container registry
  • Scheduled retrain and hyperparameter-tune tasks via EventBridge and Lambda triggers
  • Prisma schema shared with the scrapers for schema consistency across services

Stack

PythonXGBoostscikit-learnFastAPIOpenAI APIPrismaPostgreSQLDockerAWS ECS FargateAWS EventBridgeAWS Lambda

Part of the MMA Almanac system

This repo is one service in the four-part MMA Almanac platform. The system diagram below shows how the scrapers, ML engine, web UI, and AWS infrastructure fit together.