This is part 1 of a three-part blog series. Part 1 highlights what a data mesh is and how it solves the current data management challenges, part 2 highlights how to implement data mesh, and part 3 shares critical success factors for a sustainable data mesh solution.
With the ever-growing data deluge, organizations need a flexible data analytics framework to enable tangible business outcomes – be it better customer experiences or even new business models. The need for a flexible analytics framework is driven by four distinct business drivers:
- Business Climate: Globalized, fluid with rapid changes
- Demographic Changes: Multi-generational, virtual, self-service, driven by new motivations
- Organizational Dynamics: Transformational, adaptive/responsive to change
- Technology: Big Data, Data Science, AI/ML
Although all firms admit that data is their most important asset, many fail to realize the potential of data due to the proliferation of data silos in their legacy environment, therefore, losing opportunities for speed, scalability, and agility. Unlocking this data presents an immense opportunity for customer-centric innovation with disruption-as-a-rule to maximize operational efficiency and delight customers with enhanced user experience. Per McKinsey, organizations with top customer experience see an increase of three percent in growth rate and fifteen percent in revenue.
Most organizations are data rich but insights poor. Per Gartner, only twenty percent of analytic insights deliver business value. With lots of data, many one-off systems are built for well-thought-out business use cases but fail to scale across the enterprise. In addition, companies fail to have a “sustainable” model to understand how the data was sourced, analyzed, and is used to drive business outcomes. Only half of chief data officers (49.5%) have primary responsibility for data within their firms, and only a third characterized the CDO role as “successful and established.” Data quality and its fitness for purpose are questionable. Firms lose an average of $12.9MM per year due to poor data quality. Per McKinsey, data quality problems account for up to twenty-six percent of operational costs. Typically, an analyst spends seventy-five percent of the time simply preparing the data. Some of the primary problems include lack of holistic data strategy (both offensive- and defensive- focused), myopic focus, lack of relevant talent, heavy systems-focused, data not treated as an enterprise product, and lack of common data language across the enterprise.
Organizations use diverse tools – data warehouses, data lakes, and lakehouses – and each of them have their focus and objectives. Refer to our blog The Data Lakehouse – Simple, Flexible, and Cost Efficient .
The drawback of the current data management tools is the central, enterprise-level data pipeline. With flourishing needs from producers and consumers, the enterprise-level data pipeline lacks timely data-driven decisions and hence negatively impacts an organization’s agility. The growth of cloud computing and an increasingly heterogenous environment (e.g., operational data store, data warehouse, data lake, and/or lakehouse), cause organizations to need mechanisms to simplify access and management of data. Data is not “owned” by anyone; it is just being processed in the pipeline. The activity-oriented team structure forces data engineers, who are in between producers and consumers, to struggle not only to keep data pipelines aligned to proliferation of data sources but also to get the “right” data quickly to meet increasing consumer needs. This approach does not scale well – either impacts business agility and/or quality.
Data Mesh is essentially an organizational approach, not a tool or a product that you can get from a vendor, for business domain experts to deliver data as a product to data science and BI teams. Having said that, technology is a critical enabler for data mesh. Today, cloud service providers offer a range of self-serve data services that could be leveraged as the foundational data platform for a data mesh. It is a new approach in sourcing, managing, and accessing data; principally a different way to handle analytical data to innovate at scale. It is a “data platform version of microservices.” It relies on data warehouse, data lake, and/or data lakehouse platforms to help standardize data delivery. Data mesh introduces new ideas around ‘data product thinking’ and how it can help drive a more cross-functional approach to business domain modeling and create high-value data products. Data mesh can be applied in an all-in-one cloud service provider (CSP) or multicloud/hybrid environment.
Per Zhamak Dehghani, founder of the term “Data Mesh”, data mesh is “a decentralized sociotechnical approach in managing and accessing analytical data at scale” and has four guiding principles listed below:
- Domain-oriented ownership (“owned”) – this calls for data “owners” to understand the data and its management requirements, permissible uses, and limitations.
- Data as a product (“productized”) – enables an organization’s agility through its relevance, currency, and applicability. Data products can take a variety of forms (e.g., analytics, data sets, models, algorithms, APIs).
- Self-serve data platform (“discovered”) – supported by a team of data product and platform owners that collaborate to define some common “rules” so that data product can be discovered, addressed, self-described, secured, trusted, and interoperable.
- Federated computational governance (“governed”) – With the federated governance model, individual business units/domains can leverage a common set of tools, technologies, and processes. The federated governance empowers data product owners to make management and use decisions for their data, enforces those decisions by sharing data, and provides clear visibility of where data is being shared across the enterprise.
Data discovery is an important capability of data mesh, and graph database is recommended for metadata management of complex ad-hoc insights of data.
Data mesh is best suited for organizations that need to scale their data usage and progress their digital transformation initiative. With data mesh, data is decentralized but the metadata (data access, discovery, transformation, integration, security, governance, lineage, and orchestration) is centralized. Decentralized data architecture is implemented in the hybrid/multicloud environment where application workloads reside across multiple clouds. Refer to our blog on Centralized vs. Federated Digital Transformations: Which Approach Will You Choose?
The drawbacks include the skills gap (business experts lack data engineering skills) and potential misalignment when their data might conflict with other teams.
Data Fabric, just like data mesh, aims to solve data access, management, and exposure in a heterogenous data environment. Two key differences include (a) data mesh supports federated governance with data distributed across business domains whereas data fabric supports centralized governance and (b) data mesh expects human experts/SMEs whereas data fabric supports no-code/low-code integration with AI/ML.
Each of the data management tools and approaches – e.g., data warehouse, data lakes, Lakehouse, data mesh/fabric – add immense value. Selection of a specific tool and approach depends on your organization’s specific business objectives/goals, your organizational culture, and your current and future-state technology strategy.
Per Infinitive’s extensive research and experience, we know that data warehouse investment dominates across enterprises and continues to grow. In addition, we see the potential for new investments in data lakes and data lakehouses to thrive due to growing need for real-time analytics using big data. Several organizations have already adopted or are implementing data mesh (e.g., Capital One, JPMC, Disney).
If you are a data-driven organization, with new AI/ML products being built on the cloud or your valuable data is held on on-prem systems that need to be holistically integrated to cloud, we recommend you evaluate the best approach moving forward. The next blog will cover the data mesh framework and the high-level steps to implement data mesh.
Infinitive has implemented several of the data mesh technology enablers using AWS, Snowflake, and Databricks.
At Infinitive, we have the know-how to help your organization “get the value out of your data.” For more information on how to implement any of these data management tools and/or approaches in your business, contact us today.