TIL about snowflake IDs
The weird numbers on Twitter, Discord, and other platforms, and how they work.
I recently read a blog post on making fault resilient, consistent and efficient distributed system for broadcasting messages, written by a senior of mine from college. Inspired by that, I started attempting the Fly.io Distributed Systems Challenge. I got stuck on the same problem he had written about, so that was a lifesaver, and gave me a couple of ideas to try out.
But this post is not about that. It's about the weird numbers we see on Twitter, Discord, and other platforms. You know, those long numbers like this: https://x.com/CATIAManikin/status/ 1860383419210817897. I never really thought about them, and assumed it was just some index. I would only understand them when I had to implement the second part of the challenge, which required generating unique IDs that can be generated in parallel and are unique across all nodes.
Now that we know our goal, let's think about how to achieve it. The first thing that comes to mind is a centralized database to generate IDs, but that doesn't scale. We need IDs that can be generated in parallel, without coordination.
Twitter introduced a format called Snowflake IDs: a 64-bit integer that is unique across all nodes. The format:
bit 0: sign bit (always 0)
bits 1 - 41: timestamp in milliseconds since epoch
bits 42 - 51: node ID (unique for each node)
bits 52 - 63: sequence number (increments for each ID generated in the same millisecond)For each millisecond, we can generate a maximum of IDs. This is calculated as () sequence numbers () nodes. The number of bits allocated to the sequence number or node ID can be adjusted, or the reference epoch modified, to fit specific constraints.
Given a Snowflake ID, we can extract the timestamp, node ID, and sequence number. Working through the example above:
-
convert to 64-bit binary
1860383419210817897in binary:
0001100111010001011010001101110011110000000101100110000101101001 -
split into components
- sign (1 bit):
0 - timestamp (41 bits):
00110011101000101101000110111001111000000 - node ID (10 bits):
0101100110 - sequence (12 bits):
000101101001
- sign (1 bit):
-
decode the timestamp
The timestamp (00110011101000101101000110111001111000000) is milliseconds since the Twitter epoch (Nov 04, 2010 01:42:54.657 UTC). Converting gives443549971392. Adding the epoch gives1732384946049 ms, which is2024-11-23 18:02:26.049 UTC. -
decode the node ID
0101100110in decimal:358. -
decode the sequence
000101101001in decimal:361.