Databricks Interview Question & Answers
Databricks Interview Question & Answers
10)What are the types of workloads we can use in Standard type Cluster?
11)Can I use both Python 2 and Python 3 notebooks on the same cluster?
The set of core components that run on the clusters managed by Databricks.
Consists of the underlying Ubuntu OS, pre-installed languages and libraries
(Java, Scala, Python, and R), Apache Spark, and various proprietary Databricks
modules (e.g. DBIO, Databricks Serverless, etc.).
Azure Databricks offers several types of runtimes and several versions of those
runtime types in the Databricks Runtime Version drop-down when you create
or edit a cluster.
Databricks Light
Databricks Light provides a runtime option for jobs that don’t need
the advanced performance, reliability, or autoscaling benefits
provided by Databricks Runtime.
Delta Lake
Autopilot features such as autoscaling
Highly concurrent, all-purpose clusters
Notebooks, dashboards, and collaboration features
Connectors to various data sources and BI tools
1. %run notebook_name
You cannot use functions and variables. Only return value using
arguments parameter.
In this example we have two notebooks. 1 for running exit and passing input
value and another notebook for running 1st notebook using
dbutlis.notebook.run() method. And storing into variable.
1st notebook
2nd notebook.
External Table.
The table uses the custom directory specified with LOCATION. Queries on the table access existing
data previously stored in the directory. When an EXTERNAL table is dropped, its data is not deleted
from the file system. This flag is implied if LOCATION is specified.
Databricks Interview Questions & Answers
Create a managed table using the definition/metadata of an existing table or view. The created table
always uses its own directory in the default warehouse location.
We can Mount Azure Blob storage containers to DBFS and we can access through DBFS
mount point.
You can mount a Blob storage container or a folder inside a container to Databricks File
System (DBFS) using dbutils.fs.mount. The mount is a pointer to a Blob storage container, so
the data is never synced locally.
Access files in your container as if they were local files, for example:
Once an account access key or a SAS is set up in your notebook, you can use standard Spark
and Databricks APIs to read from the storage account
a) Standard clusters
b) High concurrency clusters
10) What are the types of workloads we can use in Standard type Cluster?
There are three types of workloads we can use in Standard type cluster. Those are
a. DATA ANALYTICS
b. DATA ENGINEERING
c. DATA ENGINEERING LIGHT
11) Can I use both Python 2 and Python 3 notebooks on the same cluster?
No. In Single Cluster you can use only one Python2 or Python 3. You cannot use both
Python2 and python3 on same databricks cluster.
12) What is pool? Why we use pool? How to create pool in Databricks?
Pool is used to reduce cluster start time while auto scaling, you can attach a cluster to a
predefined pool of idle instances. When attached to a pool, a cluster allocates its driver and
worker nodes from the pool. If the pool does not have sufficient idle resources to
accommodate the cluster’s request, the pool expands by allocating new instances from the
Databricks Interview Questions & Answers
instance provider. When an attached cluster is terminated, the instances it used are
returned to the pool and can be reused by a different cluster.
One method is creating variable and assigning values and calling that notebook into
another notebook using %run
Like creating text, dropdown, combobox variables and getting values using GET method.