PAM Derivatives Legacy Message Platform (Azure Synapse)
Metadata-driven message generation platform in Azure Synapse covering ~800 CUSIPs across options, futures, and swaps, extensible to additional asset classes without platform changes, and validated as a cross-cloud portability POC.
Highlights
- Built a configuration + Excel-template driven message engine supporting nested and repeating financial structures
- Generates SMF and transaction messages dynamically from Spark SQL tables with full audit and diagnostics output
- Integrated with Azure Synapse pipelines for scheduled, production-grade execution
- Handles missing data, partial availability, and schema variability safely
Impact
- Generates SMF and transaction messages for ~800 CUSIPs spanning options, futures, and swaps from a single unified platform
- Designed for asset-class extensibility: FI, Cash, and legacy assets onboard without major platform changes
- Validated as a proof-of-concept for cross-cloud deployment, demonstrating portability across different cloud providers
- Replaced brittle, hardcoded derivatives logic with a reusable metadata-driven architecture
Context
Derivatives processing required complex, hierarchical legacy messages built from many Spark tables, with frequent structure changes and partial data availability. Hardcoding this logic was brittle, slow to change, and risky.
What I Built
A metadata-driven legacy message generation platform that:
- Uses Excel as the message structure control plane
- Uses config sets to control runs, scopes, and environments
- Scans Spark tables dynamically and builds messages at runtime
- Supports:
- Direct fields
- Single-field submessages
- Nested structures
- Repeating message groups
- Produces:
- Final business-ready Excel outputs
- Full summary, diagnostics, and audit reports
The system runs inside Azure Synapse and is orchestrated by pipelines on a daily schedule.
Reliability & Scale
- Missing tables and optional fields are handled gracefully
- Partial messages still generate with full diagnostics
- Spark caching and batching are used for performance
- The system scales with cluster size and data volume
Outcomes
- Standardized derivatives legacy message generation across the platform
- Greatly reduced change risk when message formats evolve
- Improved operational transparency and audit readiness
- Established a reusable, template-driven generation pattern for future feeds
Why This Matters
This project demonstrates true data-platform engineering:
Metadata-driven systems, dynamic schema handling, Spark-native execution, and production-grade orchestration. Not just pipelines.