CIO.com

big data as a service (BDaaS)

By Cameron Hashemi-Pour

What is big data as a service (BDaaS)?

Big data as a service (BDaaS) is the delivery of data platforms and tools by a cloud provider to help organizations process, manage and analyze large data sets so they can generate insights to improve business operations and gain a competitive advantage.

Companies generate immense amounts of unstructured, semistructured and structured data on a regular basis. Big data as a service lets them use third-party providers' data management systems and IT skills to free up organizational resources that would otherwise be devoted to on-premises systems. BDaaS can be dedicated systems and software running in the cloud or a contract for a managed service that a cloud vendor hosts and operates.

BDaaS is a form of cloud computing, similar to software as a service, platform as a service and infrastructure as a service. In addition to using the data processing frameworks and associated tools at the core of these cloud services, BDaaS relies on cloud storage to maintain data sets and provide the user organization with access to them.

Benefits of big data as a service

In the past, large enterprises often installed big data systems in on-premises data centers. These systems combined various open source technologies to fit an organization's particular big data applications and use case needs. More recently, deployments have shifted to the cloud because of its potential advantages. The following are some of the benefits of big data as a service:

Challenges of big data as a service

Despite myriad benefits for enterprises, BDaaS isn't foolproof and if these services aren't managed correctly, they can create headaches. Some of the potential drawbacks to be aware of include the following:

Key elements of BDaaS offerings

The big three cloud platform vendors each offer big data technology bundles and services: Amazon EMR from Amazon Web Services (AWS), Google Cloud Dataproc and Microsoft's Azure HDInsight. A sampling of other big-data as-a-service vendors includes Cloudera, Databricks, HPE, IBM, Oracle and Qubole.

The competing BDaaS platforms provide different combinations of open source big data software. Common core technologies include the Hadoop distributed processing framework, Spark processing engine, Hive data warehouse software and Python, R and Scala programming languages. The following are examples of tools that are often included as standard or optional components:

Data can be stored in the Hadoop Distributed File System (HDFS), which is one of Hadoop's core components, or in cloud-based object storage services like Amazon Simple Storage Service, Google Cloud Storage and Microsoft Azure Blob Storage. BDaaS platforms can also connect to data warehouse and data lake environments, such as Azure Data Lake Storage, Delta Lake, Iceberg and Snowflake.

BDaaS market trends

While the BDaaS market is primarily focused on public cloud deployments, users can install the AWS, Google and Microsoft platforms in their own data centers and other on-premises facilities. Added support is available to run the big data services on each vendor's hybrid cloud platform -- AWS Outposts, Google Anthos and Azure Stack, respectively. Using those technologies, organizations can set up private clouds or mix public cloud and in-house systems in their big data environments.

All three vendors have tied their BDaaS platforms to Kubernetes services. These enable organizations to use the popular container management framework to create containerized big data applications, which can help simplify deployments, streamline infrastructure management and optimize the use of system resources.

AWS, Google and other BDaaS vendors are now emphasizing Spark and other technologies over Hadoop, which was initially at the center of their offerings and the big data ecosystem. That reflects a broader decline in Hadoop's standing vs. Spark as a batch processing engine, although Hadoop's YARN cluster resource management software and HDFS continue to be widely used.

Big data storage is an important part of big data management, but large volumes of data must be culled from sources first. Learn about how big data collection works.

13 Mar 2024

All Rights Reserved, Copyright 2007 - 2024, TechTarget | Read our Privacy Statement