Hazelcast Jet Features

Performance

Engineered for Performance

To achieve very high throughputs with consistent low latency, Hazelcast Jet uses a combination of distributed DAG computation model, in-memory, data locality, partition mapping affinity, SP/SC Queues and Green Threads.

See Hazelcast Jet Performance for explanation and benchmarks comparing Jet to other popular frameworks.

Low Latency End-to-End

The performance of Jet Core can be boosted by using embedded Hazelcast IMDG. Hazelcast IMDG provides elastic in-memory storage and is a great tool for publishing the results of the computation, or as a cache, for data sets to be used during the computation. Very low end-to-end latencies can be achieved this way.

Stream Processing

Streaming Core

Hazelcast Jet is built on top of a low latency streaming core. This refers to processing the incoming records as soon as possible as opposed to accumulating the records into micro-batches before processing.

Windowing

As data streams are unbounded and there is the possibility of infinite sequences of records, a tool is required to group individual records to finite frames in order to run the computation. Hazelcast Jet windows provides a tool to define the grouping.

Types of windows supported by Jet:

  • Fixed/tumbling – The stream is divided into same-length, non-overlapping chunks. Each event belongs to exactly one window.
  • Sliding – Windows have fixed length, but are separated by a sliding interval, so they can overlap.
  • Session – Windows have various sizes and are defined based on session identifiers contained in the records. Sessions are closed after a period of inactivity (timeout).

Event Time Processing

Hazelcast Jet allows you to classify records in a data stream based on the time stamp embedded in each record — the event time. Event time processing is a natural requirement as users are mostly interested in handling the data based on the time that the event originated (the event time). Event time processing is a first-class citizen in Jet.

For handling late events, there is a set of policies to determine whether the event is “still on time” or “late”, which results in the discarding of the latter.

Handling Back Pressure

In the streaming system, it is necessary to control the flow of messages. The consumer cannot be flooded by more messages than it can process in a fixed amount of time. On the other hand, the processors should not be left idle wasting resources.

Hazelcast Jet comes with a mechanism to handle back pressure. Every consumer keeps signaling to all the upstream producers how much free capacity it has. This information is naturally propagated upstream to keep the system balanced.

Batch Processing and ETL

Batches Disguised as Streams

Although Jet is based on a streaming core, it is often used on top of bounded, finite data sets (often referred to as batch tasks). Jet treats such data sets as if it were a stream that suddenly ends. Batches are processed, just as streams, in the same one-record-per-time streaming core.

Connectors for ETL

Hazelcast Jet includes connectors for Hadoop Distributed File System (HDFS), local data files (e.g., CSV or logs), and Hazelcast IMDG. All of these should be combined in one pipeline to take advantage of the benefits of each, e.g., reading large data sets from HDFS and using distributed in-memory caches of Hazelcast IMDG to enrich processed records.

Processing Hadoop Data

Hadoop Distributed File System (HDFS) is a common tool used for building data warehouses and data lakes. Hazelcast Jet can use HDFS as either data source or destination. If Jet and HDFS clusters are colocated, Jet benefits from the data locality and processes the data from the same node without sending them over the network.

Taking Advantage of Hazelcast In-Memory Data Grid

Hazelcast Jet is the distributed data processing tool that takes advantage of being integrated with the Hazelcast IMDG (In-Memory Data Grid) — an elastic in-memory storage.

Use IMDG for

  • Data ingestion prior to processing.
  • Connecting multiple Jet jobs using IMDG as an intermediate buffer to keep jobs loosely coupled.
  • Enriching processed events, cache remote data, e.g., fact tables from a database on Jet nodes making use of data locality.
  • Distributing Jet processed data with Hazelcast IMDG.
  • Running advanced data processing tasks on top of Hazelcast data structures.

Jet Embeds IMDG

The complete Hazelcast IMDG is embedded in Jet. So, all the services of IMDG are available to your Jet jobs without any additional deployment effort. As Hazelcast IMDG is the embedded, support structure for Jet, IMDG is fully controlled by Jet (start, shutdown, scaling etc.)

To isolate the processing from the storage, you can still make use of Jet reading from or writing to remote Hazelcast IMDG clusters.

Streaming from IMDG

In Jet a connector is included which allows the user to process streams of changes (Event Journal) of an IMap and ICache, enabling developers to stream process IMap/ICache data or to use Hazelcast IMDG as storage for data ingestion.

The Hazelcast Way

Hazelcast Jet was built by the same community as the Hazelcast IMDG. Jet and Hazelcast IMDG share what’s best about the community experience.

Simplicity

You will get the value from Hazelcast Jet in less than 15 minutes.

Add one dependency to your Maven project and start building — Jet is a small JAR with no dependencies. Adding a member (also called node) to a Jet cluster is as easy as running one script. The Pipeline API keeps all the complexity under the hood.

Try it!

Lightweight and Embeddable

Jet is not a server, it’s a library. It’s natural to embed it in your application to build a data processing microservice. Due to its lightweight, each job can be easily launched on its own cluster to maximize service isolation. This is in contrast to heavyweight data processing servers where the cluster, once started, hosts multiple tasks and tenants.

Discovery and Cloud Deployment

Hazelcast Jet cluster members (also called nodes) automatically join together to form a cluster. Discovery finds other Jet instances based on filters and provides their corresponding IP addresses. Cluster setup is simple.

The cloud discovery plugins can be used for easy setup and operations within any cloud environment.

Support

Get professional support from the same people who built the software.

Open Source with Apache 2 License

Hazelcast Jet is open source and available under the Apache 2 license.

Fault Tolerance

Distributed Snapshots

Hazelcast Jet is able to tolerate faults such as network failure, split or node failure.

Jet supports distributed state snapshots and automatic restarts. Snapshots are periodically created and backed up. When there is a failure (network failure or split, node failure), Jet uses the latest state snapshot and automatically restarts all jobs that contain the failed member as a job participant from this snapshot.

Exactly-Once or At-Least-Once

For snapshot creation, you can choose between exactly-once and at-least-once semantics. This is a trade-off between correctness and performance. It is configured per job.

Fault Tolerance Out-of-the-Box

No additional infrastructure such as distributed file system or external snapshot storage is necessary in order to make Jet fault tolerant.

Jet uses IMaps of embedded Hazelcast IMDG to store the snapshots. Remember, IMap is replicated accross the cluster, so the state remains available in the case of failure.

Elasticity

Hazelcast Jet supports the scenario where a new member joins the cluster while a job is running. Currently, the ongoing job will not be replanned to start using the member, although this is on the roadmap for future versions. The new member can also leave the cluster while the job is running and this won’t affect its progress.

APIs

Pipeline API

Pipeline API is the primary API of Hazelcast Jet. The general shape of any data processing pipeline is drawFromSource -> transform -> drainToSink. Pipeline API follows this pattern.

The Jet library contains a set of Transforms covering standard data operations (map, filter, group, join). Also there are Source and Sink adapters availabe. The library can be extended by custom Transforms, Sources or Sinks.

Visit the code sample repository to learn how to use Pipeline API.

* Introduced in Jet 0.5, Pipeline API is still under construction and we plan to add more transforms in 0.6. The major missing feature is windowing of infinite streams (sliding, tumbling, session windows), but we also plan to add more batch transforms (sort and distinct for example). This may be a reason to use a low-level Core API and build the DAG.

Java 8 Stream API

java.util.stream API is a well known and popular API in the Java community. It supports functional style operations on streams of elements.

Hazelcast Jet shifts java.util.stream to a distributed world – the processing is distributed across the Jet cluster and parallelized. If j.u.s is used on top of Hazelcast distributed data structures, the data locality is utilized.

Jet adds support for java.util.stream API to Hazelcast IMDG collections.

Source and Sink Adapters

Hazelcast Jet contains a library of adapters enabling Jet jobs to read from and write to various sources and sinks.

Hazelcast IMDG

Hazelcast Jet comes with readers and writers for Hazelcast distributed data structures: IMap, ICache and IList. For IMap and ICache, Jet can be used to read and process the stream of changes (event journal).

Jet can use structures on the embedded Hazelcast IMDG cluster and make use of data locality — each processor will access the data locally.

File

Batch and streaming file readers and writers can be used for either reading static files or watching a directory for changes.

Socket

Socket readers and writers can read and write to simple text based sockets.

Kafka

Kafka Streamer is used to consume items from one or more Apache Kafka topics. Kafka Writer is used to write output data into Apache Kafka.

HDFS

Hazelcast Jet can use HDFS as either data source or destination. If Jet and HDFS clusters are colocated, then Jet benefits from the data locality and processes the data from the same node without sending them over the network.

Log Writer

Log Writer is a sink which logs all items at the INFO level.

Ready for Cloud Deployment

Run Hazelcast Jet in the Cloud

Cloud developers can easily drop Hazelcast Jet into their applications. Jet can work in many cloud environments and can be easily extended via Cloud Discovery Plugins to more.

Docker for Hazelcast Jet

Hazelcast Jet includes container deployment options for Docker.

Hazelcast Jet for Pivotal Cloud Foundry

Cloud Foundry users are now able to provision and scale Hazelcast Jet clusters — adding and removing nodes seamlessly as required. After downloading the Hazelcast Jet for PCF tile from the Pivotal Network website, the PaaS Operator simply uploads the tile into Pivotal Ops Manager, enters any required configuration into the GUI, and clicks ‘Install’. After installation, the PaaS Operator need have no further interaction with the tile.

Try Jet in 5 Minutes Jet Use Cases

Hazelcast Jet

Main Menu