General Questions

Where is Hazelcast Jet Documentation?

See documentation.

How are Hazelcast Jet and Hazelcast IMDG related?

Jet is built on top of the Hazelcast platform, so there is a tight integration between Jet and IMDG.

A JetInstance embeds a Hazelcast Instance. The full storage functionality of Hazelcast IMDG is available inside Jet. Hazelcast operations are used for different actions that can be performed on a job. Jet can also be used with the Hazelcast client, which uses the Hazelcast Open Binary Protocol to communicate different actions to the server instance.

Is Jet an Apache project?

Jet isn’t an Apache project, however it comes with an Apache 2 license.

Where can I get more help?

Support for Hazelcast Jet is provided at GitHub, Mail Group and StackOverflow.

For information on support subscriptions, please visit Hazelcast.com.

Can I interchange Jet with any version of Hazelcast IMDG?

Hazelcast IMDG is an integral part of each Jet release and cannot be swapped out as Jet relies on the services specific to each individual IMDG release.

Nevertheless, a Jet job can use any remote Hazelcast IMDG as a source or sink.

See Integration with Hazelcast IMDG.

Can I use Hazelcast IMDG for data ingestion?

Yes, data can be ingested into Jet using the distributed data structures of Hazelcast IMDG.

Data producers can use the Hazelcast client (available for numerous programing languages) to push data into Hazelcast IMDG.

Using IMDG to ingest data ensures a smoother deployment.

Batch processing

For batch (bounded data) processing, Jet comes with IMap, ICache and IList batch connectors that iterate over the entries in order to process them.

Data Ingestion Batch diagram

Hazelcast client writes the data to a Hazelcast IMap. When the entire data set is written, Jet reads the data from the IMap and processes the batch.

Stream processing

Jet contains a change stream (event journal) reader for IMap and ICache, so all updates of the IMap/ICache will be streamed directly to Jet.

Data Ingestion Event Stream diagram

Values are updated in an Hazelcast IMDG IMap, each update produces a change event. Jet processes the stream of change events of an IMap.

Data streams are ordered sequences of records, similar to append-only logs. Right now, Hazelcast IMDG does not provide an append-only log structure. Therefore, to ingest data of this type, consider using Apache Kafka. Note, Jet comes with an Apache Kafka connector. However, we do intend to provide a distributed append-only in-memory log in future versions of Hazelcast IMDG.

Data Ingestion Streaming Diagram

Records are appended to a log in Apache Kafka. Hazelcast Jet streams from Apache Kafka.

Hazelcast IMDG Computing and Jet Questions

What are the differences between Hazelcast Jet and Hazelcast IMDG Fast-Aggregations?

Hazelcast IMDG has native support for aggregation operations on the contents of its distributed data structures – Fast-Aggregations.

When Hazelcast IMDG is a better fit:

Fast-Aggregations are a good fit for simple operations (count, distinct, sum, avg, min, max, etc.).

Hazelcast IMDG Fast-Aggregations may not be sufficient with operations that group data by key and produce results of size O(keyCount). The architecture of Hazelcast aggregations is not well suited to this use case, although it will still work even for moderately sized results (up to 100 MB, as a ballpark figure).

When Hazelcast Jet is a better fit:

Beyond the numbers quoted above, and whenever something more than a single aggregation step is needed, Jet becomes the preferred choice.

For more information see the docs: Jet Compared with New Aggregations.

What are the differences between Hazelcast Jet and Hazelcast IMDG EntryProcessor?

An Entry Processor is a function that executes your code on a map entry in an atomic way. Instead of calling get and set, it is used to mutate the map entry by executing logic directly on the JVM where the data resides, therefore in one step reducing the network hops and providing atomicity. It is intended to be used for fast mutating operations.

Consider using Hazelcast IMDG with…

An Entry Processor that performs bulk mutations of an IMap, where the processing function is fast and involves a single map entry per call.

Consider using Hazelcast Jet with…

Processing that involves multiple entries (aggregations, joins, etc.), or involves multiple computing steps to be made parallel, or when the data source and sink are not a single IMap instance.

API Questions

Which API should I use?

Pipeline API is the primary API of Hazelcast Jet. Apart from that, there is a java.util.stream API, and Core API allowing you to define the DAG.

Pipeline API (high-level API) java.util.stream Core API (DAG API)
Use for

  • General purpose high-level API for processing both bounded and unbounded data.

  • Simple transform and reduce operations on top of IMap and IList.
  • Fast adoption, as j.u.stream is a well known Java 8 API.

  • Building custom sources and sinks
  • Integration with other libraries or frameworks
  • Low-level control over data flow
  • Fine-tuning performance
  • Building DSLs

Declarative (what) x Imperative (how) Declarative Declarative Imperative
Works with all sources and sinks *
Transforms (map, flat map, filter)
Aggregations **
Joins and forks
Processing bounded data (batch)
Processing unbounded data (streaming) ***

* Any source can be used with j.u.stream, but only IMap and IList sinks are supported.

** j.u.stream only supports grouping on one input, co-grouping is not supported. Furthermore aggregation is a terminal operation in and additional transforms can’t be applied to aggregation results.

*** Windowing support available in 0.6

Does Jet support Apache Beam?

No, the performance of Jet is based on optimizations that wouldn’t be available via Beam (for example, Beam assumes a fully general, opaque window assignment policy). Therefore, we chose to build our own Pipelines API instead of Beam as a primary high-level API.

However, depending on user demand, we may implement Beam in the future.

Does Jet support SQL?

No, SQL involves many additional layers beyond the obvious (select from input stream, insert to output stream), and since we are focused on a Java programming audience, we have no plans at this point to add SQL support.

Features and Roadmap Questions

What is the unique value of Jet?

  • Performance – For both stream (unbounded data) and batch (bounded data) processing, Jet is able to process data with an impressive throughput capacity, with very low latency even under increasing load. See the benchmarks.
  • Integration with IMDG – IMDG provides scalable in-memory data storage to be used during processing (as source, sink, operational storage for cached/temporary data). Hazelcast IMDG and Jet are engineered to work together for high performance.
  • Simplicity – Jet is simple to deploy as it shares the lower stack with Hazelcast IMDG. Jet is application embeddable for OEMs and microservices, making it easier for manufacturers to build and maintain next generation systems.

What is the current focus of Jet?

High-level API (Pipelines), elasticity (dynamic scaling) and high performance Hazelcast integrations.

Are Jet nodes highly available?

Yes. The state of the cluster is regularly saved to a snapshot. Snapshots are stored across the cluster in multiple replicas to be highly available. When a node failure is detected by Jet, processing is restarted on the remaining nodes by restoring the state from the last snapshot.

Is Jet at-least-once or exactly-once?

Jet provides both at-least-once and exactly-once guarantees — based on the user configuration. Each Jet job can have its own configuration.

Jet can be also switched to no guarantee (best effort) mode in order to achieve best performance.

Is it possible to rescale a Jet cluster?

A cluster can be dynamically rescaled. The change in the cluster topology affects new jobs and the embedded in-memory storage. Running jobs aren’t affected, however these running jobs won’t make use of the expanded cluster. In Jet 0.6, we plan to allow running jobs to take advantage of an increased cluster for computation (elasticity).

When is Jet 1.0 expected?

We plan to introduce API stability from the release of Jet 1.0 onwards. We expect this to happen in early 2018.

Releasing the 0.x versions gives the Jet team more flexibility in changing both the API and the internals.

All the features released to date are of production quality. It’s recommended to use Jet in production if the features available fit your use case.

Hazelcast Jet

Main Menu