Recently I had to ponder how to do processing for a security system with stringent performance requirements. Like a lot folks these days, my first thought was to use Kafka. Kafka’s features are well known and don’t need repeating here.
There was, however, a red flag. As I observed the various systems and applications using Kafka, most of these centered around backend processing where there were not real-time or pseudo real-time performance requirements. This lack of near real-time applications made me somewhat uneasy. I didn’t want to commit to Kafka only to find that it could scale and be extremely responsive. In particular, security requests needed to be answered in milliseconds.
So what do do?
Now I came up with an approach…
Caveat: I don’t know if the approach described here is novel or not. It could be novel, or it could be rediscovery…In any case, here is the method, starting with its inspiration.
Now a lot of systems these days use the following pattern. Nodes in a cluster propagation information horizontally, with node X replicating data to node Y using some fast communication channel. Then both node X and node Y persist the data. Sometimes, the data from a node is replicated to a plurality of nodes, and then each node can persist its data to its own persistent store.
This design pattern is responsible for a lot of the performance in some NoSQL systems.
It occurred to me that this design pattern could be expanded upon. That is, suppose a node X has a Kafka log. As messages are appended to the long on node X, the appends are also pushed to related node(s) that have replicated the same Kafka log. Only after this horizontal replication completes is the log persisted.
[Of course, it is possible to not persist the logs at all, but in many preferred cases, the logs would be persisted.]
Once the Kafka log has been replicated, processors can then process the messages. Other data structures, such as Hazelcast or bloom filters etc. may then be used to determine if a message has been processed or not. (This is important for messages that are not idempotent).
Anyway, that’s the core of the idea. Take a well known design pattern and evolve it in a (potentially) novel way so that the Kafka logs can be processed more quickly, and with great resilience, due to the horizontal-then-vertical replication strategy.
Again, I do not know if Kafka does anything like this currently. I am fairly new to Kafka, but as far as I can tell, it does *not* do this.