Hazelcast Jet Features

Engineered for Performance

The Performance of Jet

Distributed DAG Execution

Hazelcast Jet uses directed acyclic graphs (DAGs) to model data processing tasks – Jet Jobs. The Jet Job is composed of vertices — units of parallel processing such as data source readers, joiners, sorters, aggregators, filters, mappers or output writers. Those are connected by edges representing the data flow.

Hazelcast Jet provides an ultra low-latency and high-throughput distributed DAG execution.

distributed DAG

Stream Processing

Low Latency End-to-end

Hazelcast Jet is built on top of the one-record-per-time streaming core (also known as continuous operators). That refers to processing incoming records just after they are ingested as opposed to accumulating the records to micro-batches before submitting.

The fast processing can be boosted by using embedded Hazelcast IMDG. IMDG provides elastic in-memory storage and it is a great tool for publishing the results of the computation or as a cache for datasets to be used during the computation. Extremely low end-to-end latencies can be achieved this way.

Windowing

As data streams are unbounded and possibly infinite sequences of records, a tool is required to group individual records to a finite frames in order to run the computation. Jet windows provides a tool to define the grouping.

Types of windows supported by Jet:

  • Fixed/tumbling – the stream is divided into same-length, non-overlapping chunks. Each event belongs to exactly one window.
  • Sliding – windows have fixed length, but are separated by a sliding interval.
  • Session – windows have various sizes and are defined basing on session identifiers contained in the records. Session is closed after a period of inactivity (timeout).

Event-time Processing

Jet allows you to classify the records in a data stream based on the timestamp embedded in each record — the Event time. Event-time processing is a natural requirement as users are mostly interested in handling the data based on the time the event originated (the event time). Event-time processing is a first-class citizen in Jet.

For handling late events, there is a set of policies to determine whether the event is “still on time” or “late”, discarding the latter.

Handling Back Pressure

In the streaming system, it is necessary to control the flow of messages. The consumer cannot be flooded by more messages than it cannot process in a time. On the other hand, the processors cannot stay idle and waste resources.

Hazelcast Jet comes with a mechanism to handle back pressure. Every consumer keeps signalling to all the upstream producers on how much free capacity it has. This information is naturally propagated upstream to keep the system balanced.

Connected to Message Broker

Hazelcast Jet benefits from message brokers for ingesting data streams and is able to work as a data processor connected to message broker in the data pipeline.

Jet comes with a Kafka connector for reading from and writing to the Kafka topics.

Batch processing and ETL

Batches Disguised as Streams

Although Jet is based on a streaming core, it is supposed to be used on top of bounded, finite datasets (often referred as batch tasks). Jet considers such a dataset as a stream that suddenly ends. Therefore, batches are streams in disguise for Jet.

Connectors for ETL

Jet includes connectors for Hadoop Distributed File System (HDFS), for local data files (e.g. CSV or logs) and for Hazelcast IMDG. Those should be combined in one pipeline to take advantage of the benefits of each: e.g. reading large datasets from HDFS and using distributed in-memory caches of Hazelcast IMDG to enrich processed records.

Processing Hadoop data

Hadoop Distributed File System (HDFS) is the common tool used for building data warehouses and data lakes. Jet can use HDFS as either data source or destination. If Jet and HDFS clusters are co-located, then Jet benefits from the data locality and processes the data from the same node without sending them over the wire.

Taking Advantage of Hazelcast IMDG

Hazelcast Jet is the distributed data processing tool that takes full benefit of being integrated with Hazelcast In-Memory Data Grid .

Hazelcast Jet Embeds Hazelcast IMDG

The complete Hazelcast IMDG is embedded in Jet. So all the services of IMDG are available to your Jet jobs. As the IMDG is the embedded, supportive structure for Jet, the IMDG is fully controlled by Jet (start, shutdown, scaling etc.)

Embedded in-memory data grid suits well for:

  • Sharing the processing state among Jet Jobs.
  • Caching intermediate processing results.
  • Enriching processed events. Cache remote data (e.g. fact tables from database) on Jet nodes.
  • Running advanced data processing tasks on top of Hazelcast data structures.
  • Development purposes. Since starting the Jet cluster is so simple and fast.

Distributing Jet Processed Data with Hazelcast IMDG

The Jet Job can take benefit of Hazelcast IMDG connector allowing reading and writing records from/to remote Hazelcast IMDG instance.

Use remote Hazelcast IMDG cluster for:

  • Sharing state or intermediate results among more Jet clusters
  • Isolate the processing cluster (Jet) from operational data storage cluster (IMDG)
  • If you need more control over your Hazelcast IMDG cluster. Because the embedded IMDG  is managed by Jet.
  • Publish intermediate results (e.g. to show real-time processing stats on dashboard)

The Hazelcast Way

Jet was built by the same community as the Hazelcast In-Memory Data Grid. Jet and IMDG share what the Hazelcast community experienced as the best.

Simplicity

You will get the value from Jet in less than 15 minutes.

Add one dependency to your Maven project and start building — Jet is a small jar with no dependencies. Use the familiar java.util.stream API to keep all the complexity under the hood. You can still scale Jet to perform well on complex, distributed deployments later, as you get introduced to Jet.

Try it!

Lightweight and Embeddable

Jet is lightweight. It starts fast, it scales and handles failures itself and communicates with the data processing pipeline using the asynchronous messages.

Also, Jet is not a server, it’s a library. It’s natural to embed it to your application to build the data processing microservice. Due to it’s light weight, each Job can be easily launched on its own cluster to maximise service isolation. This is in contrast with heavy-weight data processing servers where the cluster, once started, hosts multiple tasks and tenants.

Discovery and Cloud Deployment

Jet cluster members (also called nodes) automatically join together to form a cluster. Discovery finds other Jet instances based on filters and provides their corresponding IP addresses. No complicated cluster setup is necessary.

The cloud discovery plugins should be used to allow the easy setup and operations within many cloud environments.

Support

Get a professional support from the same people who built the software.

Open Source with Apache 2 License

Hazelcast Jet is open source and available with Apache 2 license.

Fault-Tolerant

Fault-tolerant

Jet is able to detect changes in the cluster topology (network failures and splits, node failures, exceptions) and reacts with graceful termination. The Job could be re-initiated or cancelled.

The finer mechanisms of Jet fault-tolerance including snapshots and processing guarantees for stream processing are subject of intensive development.

APIs

Java 8 Stream API

java.util.stream API is well-known and popular API in the Java community. It supports functional-style operations on streams of elements.

Jet shifts java.util.stream to a distributed world – the processing is distributed across the Jet cluster and parallelized. If j.u.s is used on top of Hazelcast’s distributed data structures, the data locality is utilized.

Jet adds support for java.util.stream API to Hazelcast IMDG collections.

The Core API

With Jet Core API, you get all the power of distributed DAGs.

Implementing the processors (DAG nodes) yourself is more verbose compared to high-level API, however it gives you really powerful tool.

Sources and Sinks

Hazelcast IMDG

Jet comes with readers and writers for Hazelcast distributed data structures: IMap, ICache and IList.

Jet can use sturctures on the embedded Hazelcast IMDG cluster and make use of data locality — each processor will access the data locally.

File

Batch and streaming file reader and writers can be used for either reading static files or watching a directory for changes.

Socket

Socket readers and writers can read and write to simple text based sockets.

Kafka

Kafka Streamer is used to consume items from one or more Apache Kafka topics. Kafka Writer is used as Apache Kafka sink.

HDFS

Jet can use HDFS as either data source or destination. If Jet and HDFS clusters are co-located, then Jet benefits from the data locality and processes the data from the same node without sending them over the wire.

Log Writer

Log Writer is a sink which logs all items at the INFO level.

Ready for Cloud Deployments

Run Jet in the Cloud

Cloud developers can easily drop Hazelcast Jet into their applications. Hazelcast Jet can work in many cloud environments and can easily be extended via Cloud Discovery Plugins to more.

Docker for Jet

Hazelcast Jet includes container deployment options for Docker.

Jet for Pivotal Cloud Foundry

Cloud Foundry users are now able to provision and scale new Hazelcast Jet clusters, and add and remove nodes seamlessly as required. After downloading the Hazelcast tile from PivNet, the PaaS Operator simply uploads the tile into Pivotal Ops Manager, enters any required configuration into the GUI, and clicks ‘Install’. After installation, the PaaS Operator need have no further interaction with the Tile.

jet.hazelcast.org

Main Menu