Don't miss the upcoming webinar: Building Real-Time Data Pipelines with a 3rd Generation Stream Processing Engine - sign up now!

Hazelcast Jet Features

Performance

Engineered for Performance

To achieve high throughputs with consistently low latency, Hazelcast Jet uses a combination of distributed Directed Acyclic Graphs (DAG) computation model, in-memory, data locality, partition mapping affinity, SP/SC Queues and Green Threads.

See Hazelcast Jet Performance for further detail and benchmarks against other popular frameworks.

Low Latency End-to-End

The performance of Hazelcast Jet Core can be boosted by using embedded Hazelcast In-Memory Data Grid (IMDG). Hazelcast IMDG provides elastic in-memory storage and is a great tool for publishing the results of the computation, or as a cache, for data sets to be used during the computation. Very low end-to-end latencies at extreme scale can be achieved this way.

Stream Processing

Streaming Core

Hazelcast Jet is built on a low latency streaming core. Rather than accumulating the records into micro-batches and then processing, Hazelcast Jet processes the incoming records as soon as possible to accelerate performance.

Windowing

To process unbounded, possibility infinite sequences of records, a tool is required to group individual records to finite frames in order to run the computation. Hazelcast Jet windows provide a tool to define the grouping.

Types of windows supported by Jet:

  • Fixed/tumbling – The stream is divided into same-length, non-overlapping chunks. Each event belongs to exactly one window.
  • Sliding – Windows have fixed length, but are separated by a sliding interval, so they can overlap.
  • Session – Windows have various sizes and are defined based on session identifiers contained in the records. Sessions are closed after a period of inactivity (timeout).

Event Time Processing

Hazelcast Jet allows you to classify records in a data stream based on the time stamp embedded in each record — the event time. Event time processing is a natural requirement as users are mostly interested in handling the data based on the time that the event originated (the event time). Event time processing is a first-class citizen in Jet.

For handling late events, there is a set of policies to determine whether the event is “still on time” or “late,” which results in the discarding of the latter.

Handling Back Pressure

In a streaming system, it is necessary to control the flow of messages. The consumer cannot be flooded by more messages than it can process in a fixed amount of time. On the other hand, the processors should not sit idle wasting resources.

Hazelcast Jet comes with a mechanism to handle back pressure. Every consumer keeps signaling to all the upstream producers how much free capacity it has. This information is naturally propagated upstream to keep the system balanced.

Batch Processing and ETL

Batches Disguised as Streams

Although Jet is based on a streaming core, it is often used on top of bounded, finite data sets often referred to as batch tasks. Jet treats such data sets as if it were a stream that suddenly ends. Batches are processed, just as streams, in the same one-record-per-time streaming core.

Connectors for ETL

Hazelcast Jet includes a library of connectors. One can also build a custom connector using the builder. The Jet Demos Repository contains connectors for Twitter, REST JSON service and Web Camera built using the custom connector API.

Multiple connectors should be combined in one pipeline to take advantage of the benefits of each, such as reading large data sets from Hadoop Distributed File System (HDFS) and using distributed in-memory caches of Hazelcast IMDG to enrich processed records.

Processing Hadoop Data

HDFS is a common tool used for building data warehouses and data lakes. Hazelcast Jet can use HDFS as a data source or destination. If Jet and HDFS clusters are colocated, Jet benefits from the data locality and processes the data from the same node without sending them over the network.

Taking Advantage of Hazelcast In-Memory Data Grid

Hazelcast Jet is the distributed data processing tool that takes advantage of being integrated with the Hazelcast IMDG (In-Memory Data Grid) — an elastic in-memory storage.

Use IMDG for

  • Data ingestion prior to processing.
  • Messaging: connect multiple Jet jobs using IMDG as an intermediate buffer to keep jobs loosely coupled.
  • Enrichment: cache remote data, e.g., fact tables from a database on Jet nodes making use of data locality.
  • Distributing processed data: using the elastic in-memory NoSQL storage of Hazelcast with low-latency access, querying and event listeners.
  • Running advanced data processing tasks on top of Hazelcast data structures.
  • Coordinating Jet Jobs using the distributed coordination toolkit.

See the features of Hazelcast IMDG »

Jet Embeds IMDG

Hazelcast IMDG is embedded in Jet. So, all the services of IMDG are available to your Jet jobs without any additional deployment effort. As Hazelcast IMDG is the embedded, support structure for Jet, IMDG is fully controlled by Jet (start, shutdown, scaling etc.)

To isolate the processing from the storage, you can still make use of Jet reading from or writing to remote Hazelcast IMDG clusters.

Streaming from IMDG

In Jet, a connector is included which allows you to process streams of changes (Event Journal) of an IMap and ICache. This enables developers to stream process IMap/ICache data or to use Hazelcast IMDG as storage for data ingestion.

The Hazelcast Way

Hazelcast Jet was built by the same community as Hazelcast IMDG and both share what’s best about the community experience.

Simplicity

You will get the value from Hazelcast Jet in less than 15 minutes.

Add one dependency to your Maven project and start building — Jet is a small JAR with no dependencies. Adding a member (also called node) to a Jet cluster is as easy as running one script. The Pipeline API keeps all the complexity under the hood.

HELM charts for Hazelcast Jet make the deployment smooth for Kubernetes users.

Try it!

Lightweight and Embeddable

Jet is not a server, it’s a library. It’s natural to embed it in your application to build a data processing microservice. Due to its lightweight, each job can be easily launched on its own cluster to maximize service isolation. This is in contrast to heavyweight data processing servers where the cluster, once started, hosts multiple tasks and tenants.

Discovery

Hazelcast Jet cluster members (also called nodes) automatically join together to form a cluster. Discovery finds other Jet instances based on filters and provides their corresponding IP addresses. Cluster setup is simple and quick.

The cloud discovery plugins can be used for easy setup and operations within any cloud environment.

Support

Get professional support from the same people who built the software.

Open Source with Apache 2 License

Hazelcast Jet is open source and available under the Apache 2 license.

Elasticity

Elasticity

Hazelcast Jet is elastic — it is able to dynamically rescale to adapt to the workload changes.

When the cluster extends or shrinks, running jobs can be automatically replanned to make use of all available resources.

Fault Tolerance

Hazelcast Jet is able to tolerate faults such as network failure, split or node failure.

When there is a fault, Jet uses the latest state snapshot and automatically restarts all jobs that contain the failed member as a job participant from this snapshot.

Exactly-Once or At-Least Once

Hazelcast Jet supports distributed state snapshots. Snapshots are periodically created to back up the running state. Periodic snapshots are used as a consistent point of recovery for failures. Snapshots are also taken and used for an up-scaling.

For snapshot creation, exactly-once or at-least-once semantics can be used. This is a trade-off between correctness and performance. It is configured per job.

Resilient Snapshot Storage

Hazelcast Jet uses the distributed in-memory storage to store the snapshots. This storage is an integral component of the Jet cluster and no further infrastructure is necessary. Data is stored in multiple replicas distributed across the cluster to increase the resiliency.

APIs

Pipeline API

Pipeline API is the primary API of Hazelcast Jet. The general shape of any data processing pipeline is drawFromSource -> transform -> drainToSink. Pipeline API follows this pattern.

The Jet library contains a set of Transforms covering standard data operations (map, filter, group, join). Also there are Source and Sink adapters availabe. The library can be extended by custom Transforms, Sources or Sinks.

Visit the code sample repository to learn how to use Pipeline API.

Core API

Core API is Jet’s low-level API that directly exposes the computation engine’s raw features, including DAGs, partitioning schemes, vertex parallelism, distributed vs. local edges, etc…

The purpose of Core API is to serve as the infrastructure on top of which to build high-level DSLs and APIs that describe computation jobs.

Use Core API for low-level control over the data flow, for fine-tuning performance or for building DSLs.

Source and Sink Connectors

Hazelcast Jet contains a library of connectors enabling Hazelcast Jet jobs to read from and write to various sources and sinks.

Hazelcast IMDG

Hazelcast Jet comes with readers and writers for Hazelcast distributed data structures: IMap, ICache and IList. For IMap and ICache, Jet can be used to read and process the stream of changes (event journal).

Jet can use structures on the embedded Hazelcast IMDG cluster and make use of data locality — each processor will access the data locally.

File

Batch and streaming file readers and writers can be used for either reading static files or watching a directory for changes.

Socket

Socket readers and writers can read and write to simple text based sockets.

Kafka

Kafka Streamer is used to consume items from one or more Apache Kafka topics. Kafka Writer is used to write output data into Apache Kafka.

HDFS

Hazelcast Jet can use HDFS as either data source or sink. If Jet and HDFS clusters are colocated, then Jet benefits from the data locality and processes the data from the same node without sending them over the network.

Log Writer

Log Writer is a sink which logs all items at the INFO level.

JDBC

Reads or writes the data from/to relational database or another source that supports the standard JDBC API. Supports parallel reading for partitioned sources.

JMS

Streams messages from/to a JMS queue or a JMS topic using a JMS Client on a classpath.

Custom

Jet provides a convenient programming interface allowing you to write your own connectors for both batch and streaming.

The Jet Demos Repository contains connectors for Twitter, REST JSON service and Web Camera built using the custom connector API.

Cloud Native

Cloud developers can easily drop Hazelcast Jet into their applications. Jet works in every major cloud environment and can be easily extended via Cloud Discovery Plugins to more.

Hazelcast Jet includes the container deployment options for Docker.

Hazelcast Jet Docker images are Kubernetes-ready. See the Hazelcast Jet Kubernetes guide. There is a Helm chart available that bootstraps Hazelcast Jet deployments on a Kubernetes cluster using the Helm package manager.

 

Hazelcast Jet Enterprise version

Hazelcast Jet Enterprise is a proprietary extension of Hazelcast Jet open source.

Get a 30-day free trial »

Management Center

Management Center enables you to monitor and manage your Hazelcast Jet cluster in real-time and gain far more insight into what is occurring “under the hood”. In addition to monitoring the overall health of your cluster, you can also analyze the data flow of the distributed pipelines. Management Center provides visual tools to inspect running jobs and detect potential bottlenecks.

Jet Management Center is free for single node deployments, but requires a license key when used with more than one node.

Lossless Restart

Lossless Restart features enable persisting the state snapshots regularly. Jobs, Job State, Job Configuration is configured to be persistent with Hazelcast’s Hot Restart capability. This allows full cluster shutdowns. Computations are restarted from where they left off after the cluster is online.

Job Upgrades

Allows long-running jobs to be upgraded without data loss or interruption. Job Upgrade feature makes use of Jet state snapshots to address a variety of requirements, including modifications in business logic, bug fixes, and configuration changes.

Security Suite

Industry-leading security with end-to-end TLS encryption, mutual authentication with X509 certificates and roles-based authorization over data structures and actions, making security a seamlessly integrated component of your Hazelcast Jet application.

Enterprise PaaS Deployment

Certified native images from Hazelcast enable you to run Jet Enterprise in the leading enterprise cloud-container environments: Pivotal Cloud Foundry and Red Hat OpenShift.

Try Jet in 5 Minutes Jet Use Cases

Hazelcast Jet

Main Menu