The Databricks Data + AI Summit held in San Francisco this week attracted over 16,000 attendees in person and virtually over 60,000 from 140 countries. Here are some of the key announcements and takeaways:
Announcements from CEO Ali Ghodsi
- There have been over 1 billion downloads per year for Delta Lake and Apache Spark projects, and 200 million downloads per year for MLflow. Databricks employees have contributed over 12 million lines of open-source code.
- Databricks announced their acquisition of Tabular, which includes the creators of Apache Iceberg. Their intent is to influence the unification of the Delta Lake and Iceberg file formats. Delta Lake Uniform is now generally available and Delta Lake and Iceberg will be 100% interoperable blog
- Databricks unveiled their goals for all of their offerings to have a serverless option by July 2024. This has been a multi-year project involving hundreds of engineers and a re-architecting of all their services.
- Unity catalog is now public and open source! blog
AI Innovations
- Databricks showcased new AI features powered by their acquisition of Mosaic ML (renamed Mosaic AI), including advanced model training, no-code fine-tuning, and SQL integration for invoking AI models.
- Support for embedding models like GPT and vector search was announced, allowing users to leverage these capabilities directly from SQL.
- They announced AI/BI, intelligence for Business Intelligence (blog)
- A new “Genie” feature allows users to chat with their data using a Google-like search interface backed by a compound AI agent.
- Agents that remember and learn your business
- They introduced a new “Compound AI” agent framework with an SDK for building real-time AI agents and applications, along with an agent evaluation framework for debugging and improving agents.
- All with built in security and governance through Unity Catalog
- AI delivers class leading performance without tuning. Benchmarks show that the Databricks SQL experience is one of the most efficient and cost-effective warehouses
Notable Speakers
- In a compelling keynote at the Databricks Data + AI Summit 2024, Jensen Huang, the visionary founder and CEO of NVIDIA, shed light on the transformative potential of accelerated computing in the realms of data processing and analytics. Sharing the stage with Ali Ghodsi, the co-founder and CEO of Databricks, Huang painted a vivid picture of business data as an untapped “gold mine” brimming with opportunities for companies willing to harness its immense value. Huang said, “Data is the new currency, but it’s a disservice to compare it to oil, for it is the very lifeblood of modern enterprises.” NVIDIA partnership blog
- Stanford AI researcher, Fei-Fei Li, gave a compelling talk, highlighting advancements in AI agents and robotics. She compared it to the development of advanced vision in living organisms 540M years ago that lead to the Cambrian explosion.
- Databricks co-founders Matei Zaharia, Reynold Xin, and Patrick Wendell gave insights into the company’s AI strategy and product roadmap.
- Matei Zaharia, co-founder and CTO of Databricks, spoke about the company’s AI strategy and product roadmap related to “Compound AI”, which is a concept developed by the Berkeley Artificial Intelligence Research group. He introduced Databricks’ new agent framework and SDK for building real-time AI agents and applications, along with an agent evaluation framework for debugging and improving these agents. He also open-sourced Unity Catalog and made it public live on stage.
- Reynold Xin, co-founder and Chief Architect, highlighted Databricks’ AI-powered performance improvements called “Prediction IO 2.0” that significantly boost query performance over competitors like Snowflake. He demonstrated how these enhancements, enabled by Databricks’ shift to a serverless infrastructure, lead to much faster query performance compared to competitors, especially with a high number of concurrent queries.
- Patrick Wendell, co-founder and VP of Engineering, explained Databricks’ new AI features powered by their acquisition of Mosaic. This includes advanced model training capabilities, with over 200,000 AI models trained by Databricks customers in the past year. He also showcased no-code fine-tuning of foundation models, SQL integration for invoking AI models, and the Mosaic AI Gateway for governance and permissions around AI models.
The Data & AI Summit demonstrated Databricks’ strong focus on AI integration, open-source contributions, performance optimization, continuously improving their products, and enhancing the user experience through generative AI capabilities.