Snowflake Vs Data Bricks
Snowflake Vs Data Bricks
Snowflake Vs Data Bricks
Databricks
breakdown
Which data platform fits best with the needs
of your organization?
vs.
Two of the most dynamic and fastest growing companies in the big data
world — Snowflake and Databricks, were built around innovative concepts.
2 Storage
4 Data analytics
5 Additional features
Platform values
Wavicle insights
Snowflake’s values of speed, scalability, and sharing are built throughout. For a rapidly expanding
organization that needs to handle a significant number of concurrent workloads and share data across
multiple partners efficiently and securely, Snowflake is a strong choice.
For Databricks, the foundation of data science is evident in the platform's value pillars. Databricks is suited for
a wide-variety of machine learning cases. Organizations focused on scalable data engineering, collaborative
data science, and transforming large volumes of unstructured data should be intrigued by Databricks.
Architecture
Architecture features
Snowflake: Databricks:
▪ Snowflake is available on AWS, Azure, or ▪ Databricks is a native component of Azure
GCS and is also available on AWS
▪ Data stored in Snowflake storage ▪ Delta Lake sits on top of your existing data
▪ Data can be accessed from S3, Azure Blob lake, delivering reliability, security, and
Storage or GCS performance
Wavicle insights
Both of the platforms can be spun up on AWS, Azure, and GCP platforms. Snowflake does not require any pre-
planning or maintenance to start, eliminating the need for a database administrator in many cases. It automatically
runs across three availability zones, allowing for replication to an alternate cloud.
Fully elastic autoscaling, a hallmark feature of Snowflake, means increasing or decreasing the size of an instance can
be completed easily.
For creating a Databricks cluster, there’s three different cluster modes: Standard, High Concurrency, and Single
Node. For the user, deciding which cluster mode to use can be a challenge but is the key to managing cost and
performance.
Databricks also features autoscaling by leveraging reporting statistics to scale up, or, remove workers in the cluster.
To use and maintain Databricks, users need to have some level of knowledge surrounding cloud infrastructure
components and how they work together.
Snowflake’s architecture means a rapid rollout to start, with levels of automation. This makes it a great choice for an
organization that may not have the initial bandwidth or expertise in the platform.
The customizable options of clustering for Databricks are a very attractive feature but requires strong competency in
the platform and users must choose between cost and performance during configuration.
Wavicle insights
Both platforms are leading the collision of traditional data warehouses and data lakes. The overlapping
capabilities and names can become blurred. Organizations that don’t have the time or resources for setup,
maintenance, and support of servers should consider Snowflake.
If management of a data lake and data warehouse is an issue for an organization, Databricks can help solve
the problem, along with its advanced analytics and AI/ML capabilities.
Access
Snowflake: Databricks:
Democratized data access and simplified and Databricks provides access control down to the
controllable data governance are a hallmark storage layer by leveraging AWS security controls
feature of Snowflake. The flexibility and security within the platform. At the same time, Databricks
policies are designed to boost innovation. provides access control to compute resources, API
provisioning and permission management, audit
Snowflake’s governance is designed into the
logging with Amazon Cloud Trail, and Amazon
platform featuring access control to accounts and
CloudWatch.
users, column-level security, row access policies,
audit logging for access history and object tagging
for sensitive data for compliance, discovery,
protection, and resource usage.
Wavicle insights
Snowflake’s emphasis on democratized access and security are a big plus for the platform. However, that
strength comes with a variable—difficult to manage operational governance and CPU cost. With easier
control of compute resources, Databricks provides more transparent cost and relies on AWS for its security
functions.
If an organization needs day one access to sensitive data across various units at scale, Snowflake is a great
choice. If more efficiently managed spend and familiar AWS features are appealing, Databricks can be quickly
operationalized.
Pipeline
Databricks Autoloader
Before After
Notification Message
Service Queue
Stream
Batch
Delayed
Schedule
External
Trigger
Airflow File
Sensor
▪ Pipe data from cloud storage into Delta Lake
as it arrives
▪ Gets too complicated for multiple jobs ▪ “Set and forget” model eliminates complex
setup
Pipeline integrations
Snowflake: Databricks:
Snowflake features Snowpipe integrations from the Databricks automates streaming data ingestion
following cloud storage services. Snowpipe loads and transformation with StreamSets. The
data into Snowflake as soon as that data is partnership provides a fast and easy to use drag
available in the staging layer. and drop interface. It allows users to design, test,
▪ Amazon Web Services: Amazon S3 and monitor batch and streaming ETL pipelines
without the need for coding or specialized skills.
▪ Google Cloud Storage
▪ Microsoft Azure Blob Storage
▪ Microsoft Data Lake Storage Gen2 Databricks + StreamSets:
▪ Microsoft General-purpose v2
Control Hub
Amazon Google Multi-tenant, CI/CD, Provenance
Snowflake
Account Host Web Cloud Microsoft
StreamSets
Services Platform Azure
Transformer
Amazon S3 ✔ — — Visual ETL and push down transformations
on Delta Lake
Microsoft Azure
Blob Storage ✔ — ✔
Microsoft Data
Lake Storage Gen2 ✔ — ✔
Databricks
Microsoft Azure
General-purpose v2 ✔ — ✔ Unified Data Analytics Reliable Data Lakes
Engine at Scale
Amazon S3 ✔ ✔ ✔
Google Cloud
Storage ✔ ✔ ✔
Microsoft Azure
Blob Storage ✔ ✔ ✔
Microsoft Azure
General-purpose v2 ✔ ✔ ✔
Microsoft Data
Lake Storage Gen2 ✔ ✔ ✔
Wavicle insights
Both platforms are designed for fast, easy, and multiple sourced ingestion. ELT/ETL tools like Matillion,
Talend, and SnapLogic can be used on both platforms to easily ingest and migrate data.
The multitude of ingestion capabilities for both platforms means excellent flexibility for the major cloud
providers. Customers of Amazon, Microsoft, or Google should be comfortable with either platform.
Performance
Snowflake: Databricks:
In head-to-head comparisons conducted by According to the Transaction Processing
independent companies and with minimal Performance Council, Databricks SQL is now the
configurations or tuning, Snowflake out performed record holder for data warehouse performance.
other cloud data warehouses on query time and For data scientists, the performance clusters of
related costs. This means Snowflake is almost a Databricks allow large-scale data batch
serverless solution. processing and real-time stream data processing.
The ability of its ML, deep learning, and graph
analysis are exactly what you would expect from
the founders of Apache Spark and MLflow.
Wavicle insights
As both platforms continue to improve at a rapid pace, performance will be a continued debate. While the
tests may show contradicting results, they are impacted by use case, configuration of systems, code, and
structure of underlying data.
Both platforms are top-of-class performers. For pure speed involving query time, Snowflake’s near serverless
solution continues as a standard of pure performance. The performance of large batch processing and ML for
Databricks makes it the pinnacle of data science related performance.
Data sharing
Snowflake:
Snowflake: Databricks:
Another
Anotherone
oneofofthethe
pillars for for
pillars the the
creation of
creation Databricks in its current form does not allow
Snowflake was data sharing. Data can be for cloning of data, only copying. With the
of Snowflake was data sharing. Data can
cloned fast and within one or more data introduction of Delta Sharing, Databricks
be cloned fast
warehouses. Thisand within
allows one or data
for sharing more users can share secured and real-time large
data warehouses.
without This allows for sharing
copying or moving. datasets for sharing data cross products. This
data without copying or moving. allows for sharing any data set in Delta Lake
Secured data sharing across Snowflake
or Apache Parquet formats.
objects:
Secured
▪ Tables data sharing across Snowflake
objects:
▪ External tables
▪ Secure views
• Tables
▪ Secure materialized views
• External tables
▪ Secure user-defined functions
• Secure views
• Secure materialized views
• Secure user-defined functions
Wavicle insights
The exciting addition of the first version of Delta Sharing is a major upgrade in this category for Databricks,
Databricks:
but it’s still limited in its scope. The future plans call for sharing objects, such as streams, SQL views, or
Databricks
arbitrary files.in its current form does not
allow for cloning of data, only copying.
Snowflake’s cloning and wide-spread data sharing capabilities make it a great choice for organizations that
With to
need theshare
introduction ofwide
data with a Delta Sharing,
variety of partners, vendors, or customers.
Databricks users can share secured and
real-time large datasets for sharing data
cross products. This allows for sharing
Data format
any data set in Delta Lake or Apache
Parquet formats.
Snowflake: Databricks:
Snowflake handles structured and unstructured Databricks default data format is Parquet and all
data natively. Semi-structured JSON, Avro, ORC, data stored in the Delta Lake is stored in Parquet
Parquet, or XML can be loaded into a single field. format. Databricks can read semi-structured data
The Query API allows parsing unstructured data at like JSON. By using the combination of Databricks
speed and scale. & Labelbox, you can effectively handle unstructured
data. With Sparser, Databricks users can rapidly
parse unstructured data formats in Apache Spark.
Wavicle insights
Both platforms are able to handle a wide range of data formats. This really comes down to preference and
experience.
BI and visualization
Snowflake: Databricks:
Snowflake is compatible with several BI and Databricks comes with built-in BI functionality but
visualization tools such as Tableau, PowerBI, and it is not the strongest feature. It’s compatible with
ThoughtSpot. tools like Tableau and ThoughtSpot for analyzing
data lakes at scale.
Wavicle insights
Both platforms integrate well with leading BI and visualization tools. There isn’t a distinct advantage for either,
unless you need to handle significant numbers of concurrent users, then Snowflake would be a better choice.
AI/ML
Snowflake: Databricks:
Snowflake is designed to support machine learning Built on top of MLflow, Managed MLflow,
and in conjunction with tight integrations to Spark, Databricks’ open-source platform, manages the
R, Qubole, and Python. Snowflake performance complete ML lifecycle, including experimentation,
means scaling up or down but it also takes on data reproducibility, deployment, and a central model
curation responsibilities and reduced data-related registry with enterprise reliability, security, and
burdens from ML tools. scale.
Data Model
Logging & Experiment Registry & Metric
Versioning Tracking Servicing Tracking
ML Logging
Feature Store ML Runtime ML Deployment & Monitoring
Production Metrics
Wavicle insights
Databricks was designed from its creation to be the most powerful, efficient, and collaborative environment
for machine learning and that remains the truth. Even with the introduction of a model like Snowpark for
additional developer languages, Databricks is still the premier platform for AI/ML. Organizations with a strong
need for ML within their caseloads should look to Databricks or a combination of the two.
ML integration
Snowflake: Databricks:
Snowflake can access code directly from Jupyter, For the more hands-on-the-keys crowd,
Notebooks, or JAR files from within the platform. Databricks has built-in ML functionality for Jupyter
and Notebooks.
Wavicle insights
The built-in ML functionalities of Databricks makes it the most efficient and collaborative environment for
developers with heavy use of ML.
UI
With Snowflake’s platform meant for a variety of end users, the UI is easier to navigate. As for
Databricks, it is designed for ultimate function over form.
Scalability
Snowflake: Databricks:
Storage, compute, and services are independently Users can enable clusters for auto-scaling based
elastic. Users can spin up separate virtual on workload with serverless pools to deal with
warehouses instantly to support ETL, ELT, and BI concurrency.
workloads with no resource contention.
Wavicle insights
For scalability, each platform has very distinct characteristics. As mentioned, the independent elasticity of
Snowflake creates a top-of-class model for scalability and for organizations where it’s a top priority,
Snowflake is a strong choice.
Snowflake: Databricks:
▪ Time travel to query data from different points ▪ Supports Python, Scala, R and, SQL OOB
in time ▪ Optimized for machine and deep learning
▪ Clone and restore data from tables, schemas, ▪ Manage a machine learning pipeline
or entire databases for a point in time
▪ Restore tables from a point in time or before
updates were made
▪ Geo-spatial data for calculating distance is
built into Snowflake
Snowflake: Databricks:
▪ Usage based on a combination of time and ▪ Minimal users model – lower cost
compute ▪ Enterprise level users – higher cost
▪ Auto-scaling and increasing VM sizing during ▪ Auto-scaling configurations
SQL processing can streamline costs
Interoperability
Despite the differences, Snowflake and Databricks have a high-level of interoperability. Snowflake
can read data from Databricks for analysis and visualization. Databricks fills the role of a
connector that can read and process data within the platform and push results to Snowflake. In
an ideal world, organizations across the board could utilize both platforms for their advantages.
Wavicle insights
Organizations across various industries utilize both platforms for their distinct advantages. This “best of both
worlds” stack sets up data engineers and data scientists alike in fast, scalable, and collaborative
environments. Wavicle has experience enacting this powerful stack simultaneously for clients.
The choice
Well, the truth is that it will take much more than a guide to determine which platform, Snowflake or
Databricks, is the best fit for your organization. Many organizations leverage both platforms for their unique
capabilities in a powerful stack.
Each platform is a pathway for storing, ingesting, transforming, and analyzing data. Regardless of which way
you are leaning, Wavicle can help you make the best possible choice based on your business strategy and goals.
With our deep technical expertise and our proprietary accelerators, we migrate data quickly and integrate
Snowflake, Databricks, or both into your technology stack.
Are you looking to add Snowflake, Databricks, or both into your organization? Our expert cloud consultants
bring proven experience with each to ensure you get the most out of the platforms.
Learn how
Wavicle can help you choose and implement the data
It’s time to grow with us
analytics architecture that will meet your business goals.