Off-Prem

SaaS

MySQL Heatwave dives into object storage data lakes

Oracle joins the analytics anywhere bandwagon, promises future access to AWS S3


Oracle has launched MySQL HeatWave Lakehouse, an extension to its proprietary analytics platform which now supports object storage outside the database.

The analytics system, which was built on top of the open source MySQL database, can query data in the object store in a variety of file formats as well as combine it with data in MySQL. Meanwhile, files in the object store are queried directly by HeatWave without copying the data into the MySQL database, Oracle told us.

The data lake technology supports file formats including CSV, Parquet, and export files from other databases. At the same time, MySQL Autopilot promises to improve performance and scalability without requiring database tuning expertise.

On a 500TB TPC-H benchmark, Oracle claims queries took nine times longer on AWS's data warehouse and 17 times longer on Snowflake and Databricks compared with the new Heatwave datalake. Google's BigQuery would be 36 times slower, Oracle reckons, though it did not publish comparisons with Teradata, the data warehouse vendor founded in 1979.

The system is only available on Oracle Cloud Infrastructure (OCI), but Nipun Agarwal, senior vice veep of MySQL HeatWave, told The Register that Oracle planned to extend the system to query data held in object storage in other clouds including AWS, Azure and GCP.

"One of the important things to note over here is that data in the object store remains in the object store," he said. "We do not copy data from the object store into the MySQL database. Secondly, the processing of this data, whether it's loading or queried, is done by Heatwave not by the MySQL engine. That's what gives it extreme scalability because the Heatwave cluster can scale up to 500 nodes."

Using analytics engines to query data outside their home database is not new. The approach was used by Snowflake, Cloudera and Google's BigQuery with their support for the Apache Iceberg table format. Similarly, Databricks, Microsoft and SAP have endorsed Delta Lake table format, an open source format under the Linux Foundation, created by Databricks.

Commentators and vendors have suggested most vendors will come to support most formats, including Hudi.

Agarwal said Oracle intends HeatWave to support these formats in the future, starting with Iceberg and Delta Lake.

The Autopilot feature offers schema inference, which help users determine data type in object storage before data is analyzed by the query engine.

"We can come up with this mapping, even for files which don't have metadata," Agarwal said. "Autopilot can make these predictions in less than one minute. We invented this technique called adaptive data sampling, which very intelligently scans and samples the file without compromising on the accuracy."

Autopilot also predicts the in-memory representation for a specific data source, the optimal size of the cluster that is needed to compute the data and how long it's going to take to load the data, he said.

Holger Mueller, vice president and principal analyst at Constellation Research, said Oracle had introduced new features to HeatWave in the last three years at a rapid pace. "The HeatWave team has out-innovated all other cloud databases," he claimed.

The move into object storage was "huge," he added, because it "allows users to bring all the data of the enterprise together – into one single query. It is something enterprises have long waited for."

Meanwhile, the ability to query data in AWS, Azure and GCP object storage would appeal to users who want to work across all their enterprise data using Heatwave, he said.

Like any suite model, Oracle Heatwave had the downside of competing with specialist players in any one of its features. "But, at this point, Oracle is more than good enough," Mueller said. ®

Send us news
2 Comments

AWS exec: 'Our understanding of open source has started to change'

Apache Foundation president David Nalley on Amazon Linux 2023, Free software, and more

AWS accuses Microsoft of clipping customers' cloud freedoms

World's biggest off-prem service slinger submits comments to UK cloud inquiry, mostly has Redmond HQ's rival in its sights

Watchdog claims retaliation from military after questioning cushy federal IT contracts

IT-AAC had a hand in scrutinizing JEDI, now faces probe for challenging $300M+ single-source deals

The AI everything show continues at AWS: Generate SQL from text, vector search, and more

Invisible watermarks on AI-generated images? Sure. But major tools in the stack matter most

AWS unveils core-packed Graviton4 and beefier Trainium accelerators for AI

Also hedging its bets with a healthy dose of Nvidia chips too

AWS plays with Fire TV Cube, turns it into a thin client for cloudy desktops

$195 a pop, delivered, pre-provisioned ready to stream desktops or apps

Rackspace runs short of Cloud Files storage in LON region

Rackspace? More like Lackspace as customers face upload and delete problems

Now AWS gets a ChatGPT-style Copilot: Amazon Q to be your cloud chat assistant

Anthropic CEO also rocks up on stage for reasons

You're so worried about AWS reliability, the cloud giant now lets you simulate major outages

Fake it 'til you break it, for a whole availability zone or WAN FAIL

AWS rakes in half a billion pounds from UK Home Office

Someone has to top up the Bezos rocket fund, like British taxpayers

AWS previews AppFabric for productivity – pitched as AI-powered glue between apps

Park user data in Amazon's servers for ML-generated insights and actions – yea or nay for you?

Google submits complaints about Microsoft licensing to UK competition regulator

Now Microsoft has regulator breathing down its neck in three regions