/
Data Engineering

Data Engineering

  1. What is Apache Spark ?

    It’s an open-source parallel processing framework for large-scale data processing and analytics.

  2. Where can I use Spark ?

    Spark is available in multiple platform implementations, including Azure HDInsight, Azure Databricks & Fabric.

  3. What is the difference between a temporary view and a table in the Spark Catalog ?

    A temporary view is automatically deleted at the end of the session while a table is persistent in the catalog and can be queried using Spark SQL.

  4. How does Spark process large volumes of data quickly ?

    Spark uses a “divide and conquer” approach by distributing the work across multiple systems.

  5. What languages can be used with Spark ?

    Java, Scala, Spark R, Spark SQL & PySpark.

  6. What’s the preferred format for tables in Fabric ?

    The preferred format is delta which is the format for a relationsl data technology on Spark named Delta Lake.