A Recap of Databricks Data & AI World Tour, 2024 – New York City

Introduction

On Sept 17, Infinitive sent representatives to the Databricks Data & AI World Tour in New York City. This one-day conference is one of several conferences held in different locations around the globe where Databricks highlights their accomplishments and vision for the future. For those that attended the Databricks Data & AI Summit held in San Francisco on June 9 – 12, the New York event was something of a repeat although it highlighted the latest Databricks innovations, their plans, and vision for the future.

Key points. The following were the key points from the keynote speeches of the Databricks New York Data & AI World Tour:

  • Databricks is continuing to succeed in the marketplace. The company now has 12,000 customers and holds the top spot in Forrester’s Wave Report for Data Lakehouses. While Snowflake and Google were included in Forrester’s leader zone, Databricks received the highest score in both axis of the report – strength of current offering and strength of strategy.
  • There is still considerable untapped opportunity in modern data management. Databricks cited an MIT study that claimed 99% of enterprise data remains “hidden”.
  • Security and privacy remain issues. Databricks described its commitment to access, governance, and intelligence as ways they are addressing concerns about security and privacy.
  • The three pillars of Databricks’ architecture are the Data Lakehouse, Unity Catalog, and Delta UniForm data format.
    • The data lakehouse is a unified data architecture that combines the scalability, flexibility, and low cost of a data lake with the data management and performance features of a data warehouse.
    • Unity Catalog is an open-source unified data governance solution that provides fine-grained access control, centralized metadata management, and data lineage for data across multiple clouds.
    • Delta UniForm refers to a unified metadata model designed to enable consistent, simplified data governance and management across multiple cloud platforms. It supports seamless interoperability between different systems and cloud providers, allowing for more efficient data collaboration, sharing, and access control while maintaining security and compliance standards.
  • GenAI is an extremely important technology which will democratize access to data, making everybody more productive.
  • AI can now handle language very well and engineering solutions that use it is what Databricks excels at through their platform. We saw exciting demos of how the platform allows building solutions that “talk” to your data to produce insights, generates code and metadata about your data.

Getting AI to production

There was considerable discussion about the challenges of moving AI from prototype to full blown application status.

  • 85 – 90% of AI efforts have not made it to production.
  • Hallucinations in GenAI are an ongoing problem and guardrails are required. Databricks’ own large language model, DBRX, has a guardrail that prevents guessing, thereby lowering hallucinations.
  • Security and privacy under pressure is hard in AI. People cut and paste proprietary data into chatbots like ChatGPT exposing that data to loss.
  • Governance across the entire data estate is hard. Hence, the focus on the Unity Catalog and Delta UniForm.
  • Democratized data and AI should make “talking to your data” as easy as driving a car. The process should be operationally intuitive.
  • Much of the Databricks philosophy on AI applications can be found in the research paper, “The shift from models to compound AI systems”, co-written by Matei Zaharia – co-founder and CTO of Databricks.

Summary

Databricks is no longer one of the top data platforms on the market. It is now THE top data platform on the market. The Databricks focus on democratizing data and AI as well as their focus on enterprise-wide governance and industry leading performance puts Databricks in the lead.