About 59,500 results
Open links in new tab
  1. Apache Spark™ - Unified Engine for large-scale data analytics

    Apache Spark is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters.

  2. Documentation | Apache Spark

    Apache Spark™ Documentation Setup instructions, programming guides, and other documentation are available for each stable version of Spark below: Spark

  3. PySpark Overview — PySpark 4.0.1 documentation - Apache Spark

    Spark Connect is a client-server architecture within Apache Spark that enables remote connectivity to Spark clusters from any application. PySpark provides the client for the Spark …

  4. Spark SQL & DataFrames | Apache Spark

    Spark SQL includes a cost-based optimizer, columnar storage and code generation to make queries fast. At the same time, it scales to thousands of nodes and multi hour queries using …

  5. Overview - Spark 4.0.1 Documentation

    If you’d like to build Spark from source, visit Building Spark. Spark runs on both Windows and UNIX-like systems (e.g. Linux, Mac OS), and it should run on any platform that runs a …

  6. Downloads - Apache Spark

    Spark docker images are available from Dockerhub under the accounts of both The Apache Software Foundation and Official Images. Note that, these images contain non-ASF software …

  7. Quick Start - Spark 4.0.1 Documentation

    To follow along with this guide, first, download a packaged release of Spark from the Spark website. Since we won’t be using HDFS, you can download a package for any version of …

  8. SparkR (R on Spark) - Spark 4.0.1 Documentation

    To use Arrow when executing these, users need to set the Spark configuration ‘spark.sql.execution.arrow.sparkr.enabled’ to ‘true’ first. This is disabled by default.

  9. Data Types - Spark 4.0.1 Documentation

    All data types of Spark SQL are located in the package of pyspark.sql.types. You can access them by doing from pyspark.sql.types import *

  10. Structured Streaming Programming Guide - Spark 4.0.1 …

    Types of time windows Spark supports three types of time windows: tumbling (fixed), sliding and session. Tumbling windows are a series of fixed-sized, non-overlapping and contiguous time …