Do people use kafka for situations where data loss is not tolerable(eg: accept c...

travisp · on Nov 1, 2017

Yes, but the default configuration is not suitable to that situation. You will need to make some adjustments if you cannot tolerate data loss.

kafkaisthatyou · on Nov 1, 2017

What are the alternatives people are using instead of kafka in these situations. Low volume but high reliability.

manigandham · on Nov 1, 2017

Low-volume = relational database. They remain the gold standard in usability and tooling. Backups and high-availability and consistency are well solved.

Also low-volume on modern servers can easily get into 10k+ transactions per second so you have plenty of performance potential.

Don't bother with Kafka and other systems unless you really have the scale of 100k+ events per second or otherwise need to start splitting up your app into several layers. That recent NYTimes article about Kafka for storing their entire dataset of 100gb is exactly the wrong way to use it.

oskari · on Nov 1, 2017

For low volume my recommendation would be to just use a traditional relational database configured for high availability.

If you want to use Kafka and need disaster recovery capabilities we typically recommend using Kafka Connect or other similar tools to replicate the data to another cluster or persistent storage system such as S3.

theptip · on Nov 1, 2017

+1. For use-cases which impose strict data durability requirements (either for business or regulatory reasons), I think it's unwise to use anything fancy like Kafka unless you've maxed out performance of your SQL database.

For example, for credit card receipts, simply by the nature of the type of transaction, you're unlikely to be processing enough of these to put pressure on a SQL database. One $1/transaction per second means you're grossing north of $30m, which is easy to handle in even an unoptimized schema. Citus reckon you can get thousands to tens of thousands of writes per second[1] on Postgres, which would be grossing tens or hundreds of billions of dollars; this tech stack is suitable even when "low volume" becomes quite significant.

Of course, Kafka is designed for situations where you need to process millions of writes per second, which is into "GDP of the whole world" territory if those writes are credit card receipts, so I'd contend you're unlikely to ever need something like Kafka scale for your credit card payment handling components.

[1]: https://www.citusdata.com/blog/2017/09/29/what-performance-c...

zbentley · on Nov 1, 2017

Two suggestions:

1. RabbitMQ saves on some of the operational complexity, especially at low volume. I've used it for similarly durability-critical applications with a combination of:

- Publisher Confirms.

- Durable queue and message metadata.

- Multiple nodes with HA (synchronous replication) in pause-minority mode (sacrifice availability when there's a partition).

- Fanout-duplicating messages into two queues, hooking up main consumers to one, and a periodic backup that drains the other to a backup location (a separate, non-replicated RabbitMQ via federation, or a text/db file or whatnot). This deals with RMQ's achilles heel, which is its synchronous replication system that can fail in interesting ways. Restoring a backup taken like this isn't automatic, but I've found that even adding a second decoupled RMQ instance into the mix is sufficient to significantly mitigate data loss due to disasters or RMQ failures/bugs.

All of those things will slow RMQ down a bit from its "crazy high throughput" use case, but your volume requirements are low, and the slowness will not be significant (at worst network time + 10s of ms rather than 1s of ms).

The configuration of each of those can be done mostly via idempotent messaging operations from your clients, with a bit of interaction (programmatic or manual) via the very user-friendly management HTTP API/site.

For even more delivery/execution confirmation, you can use more advanced messaging flows with AMQP transactions to get better assurance of whether or not a particular operation was completed or needs to be restored from backup.

2. Use a relational database. The reliability/transaction guarantees of something like Postgres are incredibly nice to have, simple to use, and incredibly widely supported even compared to things as popular as Kafka/JMS/NSQ/RMQ. At low volume, using a properly-tuned, replicated, dedicated database as a message bus or queue (via polling, triggers, or other Postgres features) tends to be almost as easy to set up as a real queue that prioritizes user-friendliness (like RabbitMQ), much easier to use and reason about in client code, and much more reliable.

Edit: syntax, missed words.

lmsp · on Nov 1, 2017

you might want to checkout https://pulsar.apache.org/ a durable low latency pub/sub system. it also has a kafka api client.

lima · on Nov 1, 2017

Kafka works fine for no-data-loss scenarios, you just need to configure it properly.

iampims · on Nov 1, 2017

Write to a replicated database?

chicagobuss · on Nov 1, 2017

Absolutely, we used store-and-forward semantics when I used Kafka at a previous company to guarantee zero message loss (for data getting into kafka). Now that kafka streams provides exactly once semantics, you're in an even better spot to achieve this.

zbentley · on Nov 1, 2017

You probably already know this, but obligatory reminder for others: Kafka's exactly-once delivery is not the same as exactly-once execution of an arbitrary workload. E1 mode is incredibly convenient, but it doesn't solve the tricky parts of, for example, hitting an external ACH to process a credit card payment. What if the external request times out? Fails silently? Fails loudly? Exactly-once delivery can help develop solutions in this area, but it doesn't eliminate this problem domain.

chicagobuss · on Nov 1, 2017

Of course, the key to any successful implementation of end-to-end exactly-once processing lies in the fact that you USE the kstreams APIs - including all semantics around "consumption is completed up to offset X"