6 Comments
User's avatar
Bruno Taboada's avatar

Ok I see. Now have a different duplidate case to discuss.

Suppose we have a service which reads from a kafka topic and then forwards the event to another topic.

Now, just after reading the event from the topic and after forwarding it to the next topic this service goes down and don't commit/acknowledge the forwarding. In this case, it will read the same message once more and do the operation again creating a duplicate. How should we handle this scenario? Add a db in between to keep track of the events being processed to be able to track the duplicates?

Expand full comment
Saurabh Dashora's avatar

Really important questions. And there are multiple dimensions to the problem.

A separate DB is one useful approach to keep track of processed events. It also makes a lot of sense if the event is a key domain entity such as a bank transaction. You'd definitely want to keep track of all transactions and their states anyway.

Of course, a separate DB means more complexity and failure points.

If we are using something as powerful as Kafka, we could also get away without managing our own list of processed events by using the automatic offset management provided by Kafka. What this essentially means is that Kafka takes care of tracking all the messages that have been processed by the service.

Having said that, I think that this is a very important topic with multiple solutions. I'm also adding it to my writing list of topics 🙂

Expand full comment
Bruno Taboada's avatar

ok great. just wanted to illustrate a scenario, it could also be a queue (AWS SQS, IBM MQ and so on). Let's see if we can discuss this subject further in a next writing. good writing anyway. thank you!

Expand full comment
Saurabh Dashora's avatar

Glad you liked the post.

And yes, this is an interesting subject that should be explored further. Stay tuned!

Expand full comment
Bruno Taboada's avatar

Did not get idea with UUID, can you give some example please. Thanks

Expand full comment
Saurabh Dashora's avatar

Sure, Bruno.

UUIDs are basically identifiers that we can use to uniquely identify information in a distributed computing environment. For example, 550e8400-e29b-41d4-a716-446655440000 is a UUID.

Since UUIDs have a very high degree of randomness, they are guaranteed to be unique.

Due to this, we can generate a UUID for every new event and identify whether an event is a duplicate by checking if we have already processed that UUID.

Expand full comment