What are topics and partition in Kafka?
Topics are nothing but a particular stream of data. It is just like the table's in the database except without all the constraints. We can have as many topics as we want. Like in the database each topic is identified by its name.
Similarly, topics are splits into partitions and each partition is ordered. Each message in the partition gets an incrementing id which is called offset.
Example::
Let us suppose we have a topic T which has a partition let P0 be one of them
step 1: Partition0 {initially it is empty}
step 2: Partition0 0 {here when we write a message to this then that message will have offset 0}
step 3: Partition0 0 1 {when we write another message to it then that message will have offset 1}
step 4: Partition0 0 1 2 and so on {just we described earlier in an incremental order}
and so on...
Note:: Offsets increase from 0 to n as we write data
Each partition will have their own offsets
Partition0: 0 1 2 3 4 5 6 7 {here the partition goes from 0 to 7 }
Partition1: 0 1 2 3 4 5 {here the partiotion goes from 0 to 5}
Partition2: 0 1 2 3 4 5 6 7 8 {here the partition goes from 0 to 8}
so here the combination of these partitions[Partition0, Partition1, partition2] is called a TOPIC.
Things to keep in mind
- Offsets in one partition don't mean anything to other partition, Eg: offset 2 in Partition1 will be the same with offset 2 in other partitions.
- Orders are guaranteed only within the partitions.
- Data in the partition are limited to the specific time. {default 2 weeks}
- Once the data is written in the partition it cannot be unchanged i.e. it is immutable.
- We push data to the topic, not the partitions.
- Unless we provide the key data is randomly assigned to the partitions.
- We can have as many partitions on the topic we want.
Happy Coding...