Hazelcast Jet Features
Engineered for Performance
The Performance of Jet
Distributed DAG Execution
Hazelcast Jet uses directed acyclic graphs (DAGs) to model data processing tasks – Jet Jobs. The Jet Job is composed of processors — units of parallel processing such as data source readers, joiners, sorters, aggregators, filters, mappers or output writers. Those nodes are connected by edges representing the data flow.
Hazelcast Jet provides a low-latency and high-throughput distributed DAG execution.
Built for Stream Processing
Low Latency End-to-end
Hazelcast Jet is built on top of the one-record-per-time streaming core (also known as continuous operators). That refers to processing incoming records just after they are ingested as opposed to accumulating the records to micro-batches before submitting.
The fast processing can be boosted by using embedded Hazelcast IMDG. IMDG provides elastic in-memory storage and it is a great tool for publishing the results of the computation or as a cache for datasets to be used during the computation. Extremely low end-to-end latencies can be achieved this way.
Batches Disguised as Streams
Although Jet is based on a streaming core, it is supposed to be used on top of bounded, finite datasets (often referred as batch tasks). Jet considers such a dataset as a stream that suddenly ends. Therefore, batches are streams in disguise for Jet.
Jet includes connectors for Hadoop Distributed File System (HDFS), for local data files (e.g. CSV or logs) and for Hazelcast IMDG. Those should be combined in one pipeline to take the advantage of a nature of each one: e.g. reading large dataset from HDFS and using distributed in-memory caches of Hazelcast IMDG to enrich processed records.
Handling Back Pressure
In the streaming system, it is necessary to control the flow of the messages. The consumer cannot be flooded by more messages than it cannot process in a time. On the other side, the processors cannot stay idle and waste resources.
Hazelcast Jet comes with mechanism to handle back pressure. Every consumer keeps signalling to all the upstream producers how much free capacity it has. This information is naturally propagated upstream to keep the system balanced.
Connected to Message Broker
Hazelcast Jet takes a benefit of the message brokers for ingesting data streams and it is able to work as a data processor connected to message broker in the data pipeline.
Jet comes with a Kafka connector for reading from and writing to the Kafka topics.
Taking Advantage of Hazelcast IMDG
Hazelcast Jet is the distributed data processing tool that takes full benefit of being integrated with Hazelcast In-Memory Data Grid .
Hazelcast Jet Embeds Hazelcast IMDG
The complete Hazelcast IMDG is embedded in Jet. So all the services of IMDG are available to your Jet jobs. As the IMDG is the embedded, supportive structure for Jet, the IMDG is fully controlled by Jet (start, shutdown, scaling etc.)
Embedded in-memory data grid suits well for:
- Sharing the processing state among Jet Jobs.
- Caching intermediate processing results.
- Enriching processed events. Cache remote data (e.g. fact tables from database) on Jet nodes.
- Running advanced data processing tasks on top of Hazelcast data structures.
- Development purposes. Since starting the Jet cluster is so simple and fast.
Distributing Jet Processed Data with Hazelcast IMDG
The Jet Job can take benefit of Hazelcast IMDG connector allowing reading and writing records from/to remote Hazelcast IMDG instance.
Use remote Hazelcast IMDG cluster for:
- Sharing state or intermediate results among more Jet clusters
- Isolate the processing cluster (Jet) from operational data storage cluster (IMDG)
- If you need more control over your Hazelcast IMDG cluster. Because the embedded IMDG is managed by Jet.
- Publish intermediate results (e.g. to show real-time processing stats on dashboard)
The Hazelcast Way
Jet was built by the same community as the Hazelcast In-Memory Data Grid. Jet and IMDG share what the Hazelcast community experienced as the best.
You will get the value from Jet in less than 15 minutes.
Add one dependency to your Maven project and start building — Jet is a small jar with no dependencies. Use the familiar java.util.stream API to keep all the complexity under the hood. You can still scale Jet to perform well on complex, distributed deployments later, as you get introduced to Jet.
Lightweight and Embeddable
Jet is lightweight. It starts fast, it scales and handles failures itself and communicates with the data processing pipeline using the asynchronous messages.
Also, Jet is not a server, it’s a library. It’s natural to embed it to your application to build the data processing microservice. Due to it’s light weight, each Job can be easily launched on its own cluster to maximise service isolation. This is in contrast with heavy-weight data processing servers where the cluster, once started, hosts multiple tasks and tenants.
Discovery and Cloud Deployment
Jet cluster members (also called nodes) automatically join together to form a cluster. Discovery finds other Jet instances based on filters and provides their corresponding IP addresses. No complicated cluster setup is necessary.
The cloud discovery plugins should be used to allow the easy setup and operations within many cloud environments.
Get a professional support from the same people who built the software.
Open Source with Apache 2 License
Hazelcast Jet is open source and available with Apache 2 license.
Jet is able to detect changes in the cluster topology (network failures and splits, node failures, exceptions) and reacts with graceful termination. The Job could be re-initiated or cancelled.
The finer mechanisms of Jet fault-tolerance including snapshots and processing guarantees for stream processing are subject of intensive development.
Java 8 Stream API
java.util.stream API is well-known and popular API in the Java community. It supports functional-style operations on streams of elements.
Jet shifts java.util.stream to a distributed world – the processing is distributed across the Jet cluster and parallelized. If j.u.s is used on top of Hazelcast’s distributed data structures, the data locality is utilized.
Jet adds support for java.util.stream API to Hazelcast IMDG collections.
The Core API
With Jet Core API, you get all the power of distributed DAGs.
Implementing the processors (DAG nodes) yourself is more verbose compared to high-level API, however it gives you really powerful tool.
Run on YARN or Mesos
Jet is ready to run on top of the popular Hadoop clusters. YARN or Mesos can be used to provide the resource management for Jet.
Hadoop File System is the common tool used for building data warehouses and data lakes. Jet can use HDFS as either data source and destination. If Jet and HDFS clusters are co-located, then Jet benefits from the data locality and processes the data from the same node without sending them over the wire.
Ready for Cloud Deployments
Run Jet in the Cloud
Cloud developers can easily drop Hazelcast Jet into their applications. Hazelcast Jet can work in many cloud environments and can easily be extended via Cloud Discovery Plugins to more.
Docker for Jet
Hazelcast Jet includes container deployment options for Docker.