Every organisation on this planet has amassed data on decades or centuries of operations that is their most precious asset – the only asset that grows in value while everything else it comprises declines in value. The nitty-gritty of which columns in which table house which records, invented by the database industry some 50 years ago, is the crux of a multi-hundred billion dollar industry. But this nitty-gritty, this metadata, applies to far more than tables and columns in a database. Balancing an organisation’s stored data is the investment by a company, even an individual, over time. The objects that get bought, sold, hired, fired, praised, and denounced. That data today has a name in some circles: the data graph, or data graph for short. Recently, Databricks, the innovators of open source technologies in data and artificial intelligence (AI), opened the floodgates by announcing that they’d open-sourced Unity Catalog, one of the world’s most ambitious products to ever stack the deck in favour of openness in data interoperability for data workloads. This move is primarily a shot across the bow at competitors like Snowflake.
Photo courtesy DatabricksUnity Catalog is the big answer and, historically, it continues to be Databricks’ big answer to making all data intelligible, usable and governed across an organisation. It manages, audits and shares data centrally, out of the box. But until this point, Unity Catalog has been a proprietary product, hindering its potential to be adopted and integrated. By making Unity Catalog open source under an Apache 2.0 licence, and by releasing an OpenAPI specification for it to be implemented elsewhere, so too will how enterprises think about data usage, storage and accessiblity.
The open-sourced Unity Catalog illustrates Databricks’ focus on giving users the flexibility and interoperability that they need most of all. They can ‘plumb’ the catalog in to a range of tools and technologies of their choice, including data lake and data lakehouse query engines, such as Databricks, Apache Spark, Amazon Athena and Snowflake, but also query engines based on Delta Lake and Apache Iceberg, such as Synapse analytics, Dalet and MicroStrategy. Enterprises don’t need to be locked in to a particular vendor to manage their data and AI asset register.
When Unity Catalog is open-sourced, it will not only address the need for more flexible data governance solutions, but it will also establish a new standard for cross-engine data workload interoperability: UniForm was created by Databricks to abstract away the different formats of tables (tabular, columnar, relational or otherwise) empowering users to query data from any engine that supports UniForm without having to deal with the data management headache of continually managing copies and formats across heterogeneous technology stacks.
And just weeks after Snowflake announced the release of Polaris Catalog, the Snowflake-defined set of catalog standards for open catalog implementations, Unity Catalog OSS was released. As we get to know Unity Catalog over time, we’ll see if the Databricks commitment to the data lakehouse philosophy is truly being reflected. Databricks is committed to data management technology independence and this is the key difference between Snowflake’s Polaris and Databricks’ Unity Catalog. Unity Catalog supports multiple data-format standards, and supports the management of known and unknown data assets types by enabling management of semi-structured, unstructured and structured data.
The real innovation that Unity Catalog brings is that it puts data and AI assets under open-source management. This opens up a full ecosystem of compatible engines and tools that can be shared by enterprises, allowing the community to collectively advance the future of data interoperability.
Still, as companies prepare for more turbulence in today’s increasingly digital economy, open, modular and interoperable data infrastructures are now even more needed. Databricks opening up Unity Catalog shows their leadership in addressing and providing for this need.
Open source is a philosophy and a movement that encourages the free release and distribution of software, together with its source code, to enable collaboration, improve quality, and for diverse users to build better solutions. Open source empowers companies such as Databricks to build a vibrant community where anyone can contribute, improve, and distribute code, enhancing the utility of a software proposition for a wider variety of industries and use-cases. In turn, this model of development catalyses a self-improving process grounded in communal efforts that can progress more quickly at a larger scale than would otherwise be possible.
© 2024 UC Technology Inc . All Rights Reserved.