NBA Analytics App

A data-driven NBA analytics platform built on a custom backend pipeline that ingests real-time sports data, transforms it into structured datasets, and applies projection models to evaluate player prop opportunities.

Technologies Used:

Project Links:

Live link not available

GitHub repository not available

Overview:

The sports betting space is heavily driven by data, but most tools either rely on surface-level stats or require manual analysis. I set out to build a system that could ingest real-time National Basketball Association data and turn it into actionable insights for evaluating player prop bets.

Challenges:

Inconsistent data formats across different websites.
Mismatched schemas and missing fields.
Unreliable or delayed updates.
Increasing time spent cleaning data instead of building features.

Implementation:

To address these issues, I redesigned the system around a more robust, API-driven architecture.

Data Source and Ingestion: Transitioned from web scraping to a paid API to ensure structured, reliable data. Built automated ingestion pipelines using AWS Lambda triggered by EventBridge schedules. Managed infrastructure and scheduling through Terraform using CLI-based workflows.

Data Architecture: Implemented a dual-layer data model where the Raw Layer stores unmodified API responses for auditing and reprocessing, while the Analytics Layer transforms data into clean, query-optimized tables. This separation allowed safe transformations while preserving data integrity and traceability.

Modeling and Analysis: Developed baseline projection models using historical player performance, then calculated expected value (EV) by comparing model outputs against market odds. Structured the system to support future enhancements such as more advanced models and feature engineering.

Research & Backtesting System

As the project matured, I expanded it from a real-time stats and props dashboard into a research platform for testing NBA betting hypotheses against historical data.

I built a backtesting layer that replays strategy ideas against historical player game logs and point-in-time feature datasets. The system supports pure TypeScript strategy functions, S3-hosted Parquet feature files, DuckDB-powered research runners, JSONL result artifacts, manifests, and dashboard-ready summary reports.

This lets me evaluate questions like whether recent scoring form, minutes trends, or role changes would have produced useful signals across a full season before trusting those ideas in a live props workflow.

For a deeper technical breakdown, I wrote about the full progression here: Building a Backtesting System for My NBA Analytics App

Key Decisions:

API over scraping: prioritized reliability and consistency over cost savings.
Raw to analytics pipeline: enabled safer transformations and easier debugging.
Server-side ingestion with Lambda: decoupled data processing from the frontend for scalability.
Automated scheduling with EventBridge: ensured continuous, hands-off data updates.

Tradeoffs:

Using a paid API increased operational cost but significantly improved data quality.
Serverless ingestion with Lambda simplified scaling but introduced complexity in debugging and observability.
Separating data layers improved safety and flexibility at the cost of added architectural complexity.

Learnings:

Data quality is foundational: unreliable inputs make all downstream analysis questionable.
Schema design should come first: investing early in data modeling prevents costly refactors.
Separation of concerns is critical: isolating raw and processed data improves both safety and flexibility.
Confidence in your pipeline matters more than model complexity.
Gained hands-on experience with real-world tooling, including AWS Lambda, EventBridge, and Terraform, in a production-style workflow.