Central Theme
The article introduces Netflix’s Unified Data Architecture (UDA), a system designed to solve the challenges of data management in a complex, growing organization. The core problem is that business concepts like ‘movie’ or ‘actor’ were modeled inconsistently across numerous isolated systems, leading to duplicated effort, data quality issues, and a lack of connectivity. UDA’s goal is to enable teams to “model once, represent everywhere,” creating a single, conceptual source of truth that can be projected into any system.
Key Points & Arguments
- UDA as a Knowledge Graph: UDA is built as a knowledge graph using semantic technologies like RDF and SHACL. It connects abstract business domain models to the concrete data containers (e.g., databases, APIs) where the data actually lives.
- “Upper” Metamodel: At the heart of UDA is “Upper,” a custom language (or metamodel) for formally describing business domains. Domain experts model their concepts in Upper, and this model becomes the central, queryable definition within the knowledge graph.
- Mappings & Projections: UDA uses “mappings” to link the conceptual models to specific data assets (like a GraphQL type or a database table). It can then perform “projections” to automatically generate consistent schemas (e.g., GraphQL, Avro, SQL) and data pipelines for different systems, ensuring they all align with the central domain model.
- Data Integration & Automation: This architecture moves beyond a simple data catalog. By understanding the semantics and location of data, UDA can automate data movement and enable intent-based systems.
Significant Conclusions & Takeaways
UDA represents a foundational shift in how Netflix’s Content Engineering handles data. By creating a unified, semantic layer over disparate systems, it achieves several key benefits:
- Consistency: Ensures that business concepts have a single, authoritative definition across the organization.
- Discoverability: Allows users to find data using familiar business terms rather than technical table or field names.
- Interoperability: Enables systems to understand and work with each other’s data through a shared semantic framework.
- Automation: Reduces manual effort by automatically generating schemas and data movement pipelines.
Two early adopter systems demonstrate UDA’s value: PDM (Primary Data Management) for managing controlled vocabularies and Sphere, a self-service operational reporting tool that allows business users to generate complex reports without writing SQL.
Mentoring Question
In your organization, how many different systems or teams have their own definition for a core business concept like “Customer” or “Product”? What challenges has this created, and how could a centralized, conceptual model like UDA help resolve them?
Source: https://netflixtechblog.com/uda-unified-data-architecture-6a6aee261d8d
Leave a Reply