Where is Hazelcast Jet Documentation?
How are Hazelcast Jet and Hazelcast IMDG related?
Hazelcast Jet is built on top of the Hazelcast IMDG platform, so there is a tight integration between the two.
JetInstance embeds a Hazelcast IMDG instance. The full storage functionality of Hazelcast IMDG is available inside Hazelcast Jet. Hazelcast IMDG operations are used for different actions that can be performed on a job. Hazelcast Jet can also be used with the Hazelcast client, which uses the Hazelcast Open Binary Protocol to communicate different actions to the server instance.
Is Hazelcast Jet an Apache project?
Hazelcast Jet isn’t an Apache project, however it comes with an Apache 2 license. Hazelcast has also come with an Apache 2 license since 2008, so Hazelcast the company is very committed to open source and the Apache 2 license.
Where can I get more help?
For information on support subscriptions, please visit Hazelcast.com.
Can I interchange Hazelcast Jet with any version of Hazelcast IMDG?
Hazelcast IMDG is an integral part of each Jet release and cannot be swapped out as Jet relies on the services specific to each individual IMDG release.
Nevertheless, a Hazelcast Jet job can use any remote Hazelcast IMDG as a source or sink.
See the Documentation.
Can I use Hazelcast IMDG for data ingestion?
Yes, data can be ingested into Hazelcast Jet using the distributed data structures of Hazelcast IMDG.
Data producers can use the Hazelcast IMDG client (available for numerous programming languages) to push data into Hazelcast IMDG.
Using IMDG to ingest data ensures a smoother deployment.
For batch (bounded data) processing, Hazelcast Jet comes with
IList batch connectors that iterate over the entries in order to process them.
Hazelcast IMDG client writes the data to a Hazelcast IMDG
IMap. When the entire data set is written, Hazelcast Jet reads the data from the
IMap and processes the batch.
Hazelcast Jet contains a change stream (event journal) reader for
ICache, so all updates of the
ICache will be streamed directly to Hazelcast Jet.
Values are updated in an Hazelcast IMDG
IMap, each update produces a change event. Hazelcast Jet processes the stream of change events of an
Data streams are ordered sequences of records, similar to append-only logs. Right now, Hazelcast IMDG does not provide an append-only log structure. Therefore, to ingest data of this type, consider using Apache Kafka. Note, Hazelcast Jet comes with an Apache Kafka connector. However, we do intend to provide a distributed append-only in-memory log in future versions of Hazelcast IMDG.
Records are appended to a log in Apache Kafka. Hazelcast Jet streams from Apache Kafka.
Hazelcast IMDG Computing and Jet Questions
What are the differences between Hazelcast Jet and Hazelcast IMDG Fast-Aggregations?
Hazelcast IMDG has native support for aggregation operations on the contents of its distributed data structures – Fast-Aggregations.
When Hazelcast IMDG is a better fit:
Fast-Aggregations are a good fit for simple operations (count, distinct, sum, avg, min, max, etc.).
Hazelcast IMDG Fast-Aggregations may not be sufficient with operations that group data by key and produce results of size O(keyCount). The architecture of Hazelcast aggregations is not well suited to this use case, although it will still work even for moderately sized results (up to 100 MB, as a ballpark figure).
When Hazelcast Jet is a better fit:
Beyond the numbers quoted above, and whenever something more than a single aggregation step is needed, Jet becomes the preferred choice.
For more information see the docs: Jet Compared with New Aggregations.
What are the differences between Hazelcast Jet and Hazelcast IMDG
An Entry Processor is a function that executes your code on a map entry in an atomic way. Instead of calling get and set, it is used to mutate the map entry by executing logic directly on the JVM where the data resides, therefore in one step reducing the network hops and providing atomicity. It is intended to be used for fast mutating operations.
Consider using Hazelcast IMDG with…
An Entry Processor that performs bulk mutations of an
IMap, where the processing function is fast and involves a single map entry per call.
Consider using Hazelcast Jet with…
Processing that involves multiple entries (aggregations, joins, etc.), or involves multiple computing steps to be made parallel, or when the data source and sink are not a single
Hazelcast Jet contains an Entry Processor Sink to allow you to update IMDG data as a result of your Hazelcast Jet computation.
Which API should I use?
Pipeline API is the primary API of Hazelcast Jet. Apart from that, there is a Core API allowing you to define the DAG.
|Pipeline API (high-level API)||Core API (DAG API)|
General purpose high-level API for processing both bounded and unbounded data.
Low-level API to define the data flow using a DAG.
|User deals with||
|Works with all sources and sinks|
|Transforms (map, flat map, filter)|
|Joins and forks|
|Processing bounded data (batch)|
|Processing unbounded data (streaming)|
Why was java.util.stream removed from Hazelcast Jet?
There was a distributed
java.util.stream API implementation in Jet which was removed in Jet 0.7.
java.util.stream is not designed as a distributed API. It is mainly designed as a convenience for Java developers to work with local collections and several methods reflect this approach. Many of the methods have non-obvious consequences in a distributed system.
Pipeline API is more powerful while also offering correct abstractions for working with a distributed system.
Does Hazelcast Jet support Apache Beam?
No, the performance of Hazelcast Jet is based on optimizations that wouldn’t be available via Beam (for example, Beam assumes a fully general, opaque window assignment policy). Therefore, we chose to build our own Pipelines API instead of Beam as a primary high-level API.
However, depending on user demand, we may implement Beam in the future.
Does Hazelcast Jet support SQL?
No, SQL involves many additional layers beyond the obvious (select from input stream, insert to output stream), and since we are focused on a Java programming audience, we have no plans at this point to add SQL support.
Features and Roadmap Questions
What is the unique value of Hazelcast Jet?
- Performance – For both stream (unbounded data) and batch (bounded data) processing, Jet is able to process data with an impressive throughput capacity, with very low latency even under increasing load. See the benchmarks.
- Integration with IMDG – IMDG provides scalable in-memory data storage to be used during processing (as source, sink, operational storage for cached/temporary data). Hazelcast IMDG and Jet are engineered to work together for high performance.
- Simplicity – Jet is simple to deploy as it shares the lower stack with Hazelcast IMDG. Jet is application embeddable for OEMs and microservices, making it easier for manufacturers to build and maintain next generation systems.
What is the current development focus of Hazelcast Jet?
We’re focused on releasing Jet 1.0 with backward API compatibility, smooth development and operational experience using rolling job upgrades, lossless recovery and advanced tooling for monitoring and diagnostics.
Moreover, we want to make Jet cloud native – e.g. as convenient to use in the cloud as possible.
Are Hazelcast Jet nodes highly available?
Yes. The state of the cluster is regularly saved to a snapshot. Snapshots are stored across the cluster in multiple replicas to be highly available. When a node failure is detected by Hazelcast Jet, processing is restarted on the remaining nodes by restoring the state from the last snapshot.
Is Hazelcast Jet at-least-once or exactly-once?
Hazelcast Jet provides both at-least-once and exactly-once guarantees — based on the user configuration. Each Hazelcast Jet job can have its own configuration.
Hazelcast Jet can be also switched to no guarantee (best effort) mode in order to achieve best performance.
Is it possible to rescale a Hazelcast Jet cluster?
A cluster can be dynamically rescaled. The change in the cluster topology affects both running Hazelcast Jet jobs and the embedded in-memory storage. The Hazelcast Jet job elasticity is based on the state snapshots – you can restart your jobs from the last snapshot.
Job behavior during up-scales and down-scales can be configured using multiple elasticity strategies.
Why is Jet 3.x instead of 1.x?
First non-zero version of Jet is 3.0 instead of 1.0.
Jet is built on top of Hazelcast 3.x. The reuse is rather significant: Jet builds on Hazelcast modules for the clustering, networking, data partitioning and cloud integration. Distributed maps of Hazelcast are used to store state snapshots, providing elasticity and simple scaling. Building on a stable, battle-hardened modules allows Jet to move forward fast without relying on 3rd party products.
Therefore the major versions of Jet and Hazelcast IMDG were aligned to declare that they build on the same platform.