On-Prem

Storage

Databricks promises cheap cloud data warehousing at an eighth of the cost of rivals

Inertia of embedded BI and analytics a limiting factor, however


Databricks, the company born out of the Apache Spark boom, has let loose a raft of updates at its San Francisco conference, including an elastic compute option for analytics.

Databricks SQL Serverless, available in preview on AWS, has been designed to improve query performance and concurrency of BI and analytics workloads on messy data lake repositories.

The move is part of the company's plan to bring data lakes and data warehouses together on one system: the proverbial “lakehouse”, the coinage du jour achieving currency among vendors and commentators alike.

Databricks is also announcing an update to Photon, its query engine for lakehouse systems, making it available in Databricks Workspaces — the environment where users view their Databricks assets. It is releasing Open source connectors for Go, Node.js, and Python to simplify to access a lakehouse from operational applications. Meanwhile, query federation in Databricks SQL is set to let users query data in PostgreSQL, MySQL, AWS Redshift from Databricks.

Announced last year, Databricks SQL Serverless is designed to provide instant compute to users for their BI and SQL workloads. The company promised “minimal management” and capacity optimizations to lower overall cost by an average of 40 per cent.

Joel Minnick, Databricks marketing VP, told The Register that SQL Serverless will allows users coming into Databricks to “go from start up to query in three seconds”.

It also makes economic sense in the cloud, he said. “You truly only pay for what you use on a data warehouse and workloads. That is a real game-changer in terms of the cost of getting these workloads done.”

Minnick said the nearest cloud data warehouse rival would be eight times more expensive than Databricks for these types of workloads.

The serverless option would also help address the issue of user concurrency on analytics queries, where data lakes have attracted criticism.

Minnick said Databricks had already made progress on this issue, and the vast majority of enterprise data warehousing concurrency needs would be met by Databricks SQL, he said.

Hyoun Park, CEO and chief analyst at Amalgam Insights says Databricks' Serverless SQL makes it easier to support large amounts of distributed data in a cost-effective manner. “It is a response to other vendors providing serverless SQL offerings such as Azure SQL or CockroachDB, but this should also allow Databricks customers to more easily support multi-region and hybrid multi-cloud environments. From a practical perspective, this move makes it easier for potential Databricks customers to use a lakehouse without the significant challenges of manual resource management that can potentially occur as data gets bigger and faster from many different sources to multiple different destinations.”

Park says Serverless SQL could also help users solve the concurrency challenge, but more work would be required to address it in the future.

“Realistically, this concurrency issue will probably require a bit of compromise: Databricks customers should ideally partition and structure data across multiple instances to avoid concurrency issues and Databricks will need to continue working on methods to accelerate queries, duplicate resources, cache results, and provide other resource and analytic workarounds as Databricks is used on smaller amounts of data.”

However he cautions that while Databrick's claims on price-performance compared with rivals could stand up, it would depend on a lot of different variables. "It's hard to tell if they're making an apples-to-apples comparison without license and hardware and labor specs. I'd advise users to do their own total cost of ownership analysis to support this claim."

Since it introduced the lakehouse concept in 2020, Databricks has seen some competition spring up.

In April, Google announced a preview on Google Cloud of BigLake, a data lake storage service that it claims can remove data limits by combining data lakes and data warehouses.

In Feb this year, tiny Californian startup Onehouse won $8m in seed funding with hopes to grow a business worthy of taking on the giants of data engineering. The other aim is to make data lake projects faster, cheaper and easier than before. And not to be outdone, Snowflake too has announced support for unstructured data in its data warehouse platform.

Park said the challenge for Databricks is quickly scaling in areas where there was already massive competition. “For instance, shifting to build apps on Databricks when embedded BI and analytics have been around for decades is a significant process shift,” he said.

“Although the new generation of analytics is often reduced to a "Databricks vs. Snowflake" matchup, this simplistic view ignores the practical use cases for each vendor differ based on their historical approaches to data, open-source, analytic processing, and semi-structured data.

“Although enterprise data demands are pushing Databricks and Snowflake product maps closer together, Databricks is a platform fundamentally better suited to the future of real-time analytic data across multiple varieties of data structures and formats while Snowflake is well structured for fast adoption based on the current era of rapid use of structured data for data marts and warehouses,” he said.

Park said the market for software vendors addressing analytic data problems was in a “Cambrian explosion” phase. “As these shifts occur, it would not be surprising to see new vendors arise to take on both Databricks and Snowflake by solving problems associated with hybrid cloud, multi-modal data, networking, storage, compute, low-code application development, or administration,” he says. ®

Send us news
2 Comments

Microsoft, Databricks double act tries to sew up the data platform market

But the one-stop shop vision fails to take it far beyond the competition

Databricks' lakehouse becomes foundation under fresh layer of AI dreams

Mega startup serves slice of GenAI with data engineering main as it tries to upstage Microsoft’s Fabric showpiece

Microsoft touts mirroring over moving in data warehouse gambit

Fabric update cuts against the grain, and may have more to do with Databricks partnerships

Databricks cements Arcion Labs deal, will absorb its data access tools

A $100M match made in analytics heaven, parties claim

Databricks shakes VC money tree and $500M falls out

Who needs an IPO when you have Series I?

Tabular's Iceberg vision goes from Netflix and chill to database thrill

Promise of neutral data layer between vendors' vested interests attracts $26M

Databricks puts cards on the table format as Snowflake looks for more players

Enterprises want a single data platform for data lakes and warehouse, but tech's not there yet, say analysts

Databricks snaps up MosaicML to build private, custom machine models

Acquisition means for both parties get a shot at leading the roll-your-own AI market

Teradata chases hyperscaler, SI partnerships in cloud push

We're not just your grandmother's data warehouse, CEO tells El Reg

Cassandra 4.1 promises dev guardrails and pluggable storage

Apache project focused on stability following previous major upgrade