Working with a Redis database we discovered a very strange behavior after a refactor on one of the services.
Before the refactor we have been storing on Redis sets with only integer values, but we decided to add some information as a suffix of a couple of chars, something like: 12312412 -> 12312412:b after this refactor, the memory usage of the instances of the cluster, was increased from ~2GB to ~10GB, this was something unexpected for us, according to the Redis documentation, the sets only store "binary-safe strings" values: "Redis data types"
And after check how Redis handles the strings (Hacking Strings), the relative memory increase should to be directly proportional to the increase of characters of the values. According to the increase of the values lengths after the refactor we expected an increase of the 20% but it was of the 500% :( .
I decided to do a little proof of code:
Well, as you can see, we test two kind of values, the integer sets will contain numbers from 10000 to 10030 with a zero at the end, then the result will be integers: 100000, 100010, 100020, ..., 100300 , and the string sets will contain the same number but with the zero before the number, this will force to Redis to store this values as strings: 010000, 010001, 010002, ..., 010030 .
The expected behavior, if Redis stores the values as strings, is that after create all the sets, the memory usage will be the same for both test cases... but:
This unexpected behavior is caused by the next lines of the Redis source code: t_set.c
Redis checks if the value can be represented as "long long", and in that case stores the value as its integer representation, a clever behavior, but I didn't find it documented.