JustPaste.it

Azure Databricks For Beginners

Azure Databricks is an easy, fast, and collaborative Apache spark-based analytics platform. It accelerates innovation by bringing data science data engineering and business together. Making the process of data analytics more productive more secure more scalable and optimized for Azure.

7f1c68a4ce0105ef01de60be5c540342.png

What Is Azure Databricks?

  • Databricks + Apache Spark + enterprise cloud = Azure Databricks
  • It is a fully-managed version of the open-source Apache Spark analytics and it features optimized connectors to storage platforms for the quickest possible data access.
  • It offers a notebook-oriented Apache Spark as-a-service workspace environment which makes it easy to explore data interactively and manage clusters.
  • It is secure cloud-based machine learning and big data platform.
  • It is supporting multiple languages such as Scala, Python, R, Java, and SQL.
a9b0f7c47fc5caaaaf91a48a1da8fef4.png

What is Apache Spark?

  • Spark is an integrated processing engine that can analyze big data using SQL, graph processing, machine learning, or real-time stream analysis.
  • Spark ML offers high class and finely tuned machine learning algorithms for handling big data.
193ac763436adf3956333353e8b3053a.png

Microsoft Azure Databricks Architecture & Diagram

  • When we launch a cluster via Databricks, a “Databricks appliance” is deployed as an Azure resource in our subscription.
  • Then we specify the types of VMs to use and how many, but Databricks handle all other elements.
  • A managed resource group is deployed into the subscription that we populate with a VNet, a storage account, and a security group.
  • Once these services are ready, we will control the Databricks cluster over the Databricks UI.
f69cd59d512808af9410f40394d20153.png