FAQ

General Questions

Where is Hazelcast Jet Documentation?

See documentation.

How are Hazelcast Jet and Hazelcast IMDG related?

Hazelcast Jet is built on top of the Hazelcast IMDG platform, so there is a tight integration between the two.

A JetInstance embeds a Hazelcast IMDG instance. The full storage functionality of Hazelcast IMDG is available inside Hazelcast Jet. Hazelcast IMDG operations are used for different actions that can be performed on a job. Hazelcast Jet can also be used with the Hazelcast client, which uses the Hazelcast Open Binary Protocol to communicate different actions to the server instance.

Is Hazelcast Jet an Apache project?

Hazelcast Jet isn’t an Apache project, however it comes with an Apache 2 license. Hazelcast has also come with an Apache 2 license since 2008, so Hazelcast the company is very committed to open source and the Apache 2 license.

Where can I get more help?

Support for Hazelcast Jet is provided at GitHub, Mail Group and StackOverflow.

For information on support subscriptions, please visit Hazelcast.com.

Can I interchange Hazelcast Jet with any version of Hazelcast IMDG?

Hazelcast IMDG is an integral part of each Jet release and cannot be swapped out as Jet relies on the services specific to each individual IMDG release.

Nevertheless, a Hazelcast Jet job can use any remote Hazelcast IMDG as a source or sink.

See the Documentation.

Can I use Hazelcast IMDG for data ingestion?

Yes, data can be ingested into Hazelcast Jet using the distributed data structures of Hazelcast IMDG.

Data producers can use the Hazelcast IMDG client (available for numerous programing languages) to push data into Hazelcast IMDG.

Using IMDG to ingest data ensures a smoother deployment.

Batch processing

For batch (bounded data) processing, Hazelcast Jet comes with IMap, ICache and IList batch connectors that iterate over the entries in order to process them.

Data Ingestion Batch diagram

Hazelcast IMDG client writes the data to a Hazelcast IMDG IMap. When the entire data set is written, Hazelcast Jet reads the data from the IMap and processes the batch.

Stream processing

Hazelcast Jet contains a change stream (event journal) reader for IMap and ICache, so all updates of the IMap/ICache will be streamed directly to Hazelcast Jet.

Data Ingestion Event Stream diagram

Values are updated in an Hazelcast IMDG IMap, each update produces a change event. Hazelcast Jet processes the stream of change events of an IMap.

Data streams are ordered sequences of records, similar to append-only logs. Right now, Hazelcast IMDG does not provide an append-only log structure. Therefore, to ingest data of this type, consider using Apache Kafka. Note, Hazelcast Jet comes with an Apache Kafka connector. However, we do intend to provide a distributed append-only in-memory log in future versions of Hazelcast IMDG.

Data Ingestion Streaming Diagram

Records are appended to a log in Apache Kafka. Hazelcast Jet streams from Apache Kafka.

Hazelcast IMDG Computing and Jet Questions

What are the differences between Hazelcast Jet and Hazelcast IMDG Fast-Aggregations?

Hazelcast IMDG has native support for aggregation operations on the contents of its distributed data structures – Fast-Aggregations.

When Hazelcast IMDG is a better fit:

Fast-Aggregations are a good fit for simple operations (count, distinct, sum, avg, min, max, etc.).

Hazelcast IMDG Fast-Aggregations may not be sufficient with operations that group data by key and produce results of size O(keyCount). The architecture of Hazelcast aggregations is not well suited to this use case, although it will still work even for moderately sized results (up to 100 MB, as a ballpark figure).

When Hazelcast Jet is a better fit:

Beyond the numbers quoted above, and whenever something more than a single aggregation step is needed, Jet becomes the preferred choice.

For more information see the docs: Jet Compared with New Aggregations.

What are the differences between Hazelcast Jet and Hazelcast IMDG EntryProcessor?

An Entry Processor is a function that executes your code on a map entry in an atomic way. Instead of calling get and set, it is used to mutate the map entry by executing logic directly on the JVM where the data resides, therefore in one step reducing the network hops and providing atomicity. It is intended to be used for fast mutating operations.

Consider using Hazelcast IMDG with…

An Entry Processor that performs bulk mutations of an IMap, where the processing function is fast and involves a single map entry per call.

Consider using Hazelcast Jet with…

Processing that involves multiple entries (aggregations, joins, etc.), or involves multiple computing steps to be made parallel, or when the data source and sink are not a single IMap instance.

Hazelcast Jet contains an Entry Processor Sink to allow you to update IMDG data as a result of your Hazelcast Jet computation.

API Questions

Which API should I use?

Pipeline API is the primary API of Hazelcast Jet. Apart from that, there is a java.util.stream API, and Core API allowing you to define the DAG.

Pipeline API (high-level API) java.util.stream Core API (DAG API)
Use for

  • General purpose high-level API for processing both bounded and unbounded data.

  • Simple transform and reduce operations on top of IMap and IList.
  • Fast adoption, as j.u.stream is a well known Java 8 API.

  • Low-level control over data flow
  • Fine-tuning performance
  • Building DSLs

Declarative (what) x Imperative (how) Declarative Declarative Imperative
Works with all sources
Works with all sinks *
Transforms (map, flat map, filter)
Aggregations **
Joins and forks
Processing bounded data (batch)
Processing unbounded data (streaming)

* Any source can be used with j.u.stream, but only IMap and IList sinks are supported.

** j.u.stream only supports grouping on one input, co-grouping is not supported. Furthermore aggregation is a terminal operation in and additional transforms can’t be applied to aggregation results.

Does Hazelcast Jet support Apache Beam?

No, the performance of Hazelcast Jet is based on optimizations that wouldn’t be available via Beam (for example, Beam assumes a fully general, opaque window assignment policy). Therefore, we chose to build our own Pipelines API instead of Beam as a primary high-level API.

However, depending on user demand, we may implement Beam in the future.

Does Hazelcast Jet support SQL?

No, SQL involves many additional layers beyond the obvious (select from input stream, insert to output stream), and since we are focused on a Java programming audience, we have no plans at this point to add SQL support.

Features and Roadmap Questions

What is the unique value of Hazelcast Jet?

  • Performance – For both stream (unbounded data) and batch (bounded data) processing, Jet is able to process data with an impressive throughput capacity, with very low latency even under increasing load. See the benchmarks.
  • Integration with IMDG – IMDG provides scalable in-memory data storage to be used during processing (as source, sink, operational storage for cached/temporary data). Hazelcast IMDG and Jet are engineered to work together for high performance.
  • Simplicity – Jet is simple to deploy as it shares the lower stack with Hazelcast IMDG. Jet is application embeddable for OEMs and microservices, making it easier for manufacturers to build and maintain next generation systems.

What is the current development focus of Hazelcast Jet?

0.7 will focus on tooling for monitoring and diagnostics, further elasticity (dynamic scaling) changes, rolling job upgrades and high performance Hazelcast integrations.

Are Hazelcast Jet nodes highly available?

Yes. The state of the cluster is regularly saved to a snapshot. Snapshots are stored across the cluster in multiple replicas to be highly available. When a node failure is detected by Hazelcast Jet, processing is restarted on the remaining nodes by restoring the state from the last snapshot.

Is Hazelcast Jet at-least-once or exactly-once?

Hazelcast Jet provides both at-least-once and exactly-once guarantees — based on the user configuration. Each Hazelcast Jet job can have its own configuration.

Hazelcast Jet can be also switched to no guarantee (best effort) mode in order to achieve best performance.

Is it possible to rescale a Hazelcast Jet cluster?

A cluster can be dynamically rescaled. The change in the cluster topology affects both running Hazelcast Jet jobs and the embedded in-memory storage. The Hazelcast Jet job elasticity is based on the state snapshots – you can restart your jobs from the last snapshot.

With future versions of Hazelcast Jet, we plan to add multiple strategies for controlling elasticity.

When is Hazelcast Jet 1.0 expected?

We plan to introduce API stability from the release of Jet 1.0 onwards. We expect this to happen in 2018.

Releasing the 0.x versions gives the Hazelcast Jet team more flexibility in changing both the API and the internals.

All the features released to date are of production quality.

Hazelcast Jet

Main Menu