Data Engineering

What is Apache Spark ?

It’s an open-source parallel processing framework for large-scale data processing and analytics.
Where can I use Spark ?

Spark is available in multiple platform implementations, including Azure HDInsight, Azure Databricks & Fabric.
What is the difference between a temporary view and a table in the Spark Catalog ?

A temporary view is automatically deleted at the end of the session while a table is persistent in the catalog and can be queried using Spark SQL.
How does Spark process large volumes of data quickly ?

Spark uses a “divide and conquer” approach by distributing the work across multiple systems.
What languages can be used with Spark ?

Java, Scala, Spark R, Spark SQL & PySpark.
What’s the preferred format for tables in Fabric ?

The preferred format is delta which is the format for a relationsl data technology on Spark named Delta Lake.

Agnostic Architecture & Concepts