2021.7.9.1193 - Fixed slow message loading in the cloud, other improvements
Hello everyone!
I've got a new build for you! This is mainly cloud changes, that should greatly improve the responsivness and make the new services more robust! The switch to SignalR ended up a bit bumpier than I hoped for (thanks everyone for understanding despite the inconveniences ^^;), but it should be really worth it in the long run and scale much better as we get more users.
Notably, I found root cause of message history taking really long time to load. In some cases the database queries were very suboptimal, taking several seconds of database time for a single fetch. For example loading message history I used to test consumed around 14000 RUs (Request Units, pretty much database throughput units). After adding extra field to simplify the query and adding composite index (you probably felt that yesterday, as all existing messages had to be re-indexed!), it now takes just 8 RUs!
As a result, the message history should now load super fast across the board and stop hammering the cloud, causing other things to be slow! But please let us know if you still run into some issues.
There are a bunch of improvements as well. The cloud also ran into another issue, where due to Redis server restarting, the rate limiting library wouldn't reinitialize and everything would break! This should be fixed now. The system now also has fallback in case the Redis fail temporarily and will restart itself automatically if it keeps failing.
Anyway, hopefully things should run pretty smoohtly right now. I'll be monitoring them and see if we run into some more issues. There some more things that we need to optimize as well (e.g. world searching/browsing, initial loading of contacts and bunch of others), but this should be a pretty important step forward!
Oh also this build is compatible with the last since most of the changes were in the cloud (but there should be some neat stuff in the build too)!
[h2]Tweaks:[/h2]
- Added optimized index values to messages in the database and optimized message history queries to significantly reduce (orders of magnitude) required database throughput when fetching message history
-- Message history should now load significantly faster and cause less load on the cloud backend (to give example, one fetch would previously consume ~14000 RUs, the same exact fetch now consumes 8 RUs)
-- This should also fix cases of message history not loading at all in somem cases due to the request timing out
-- Additionaly this should help with overall cloud responsiveness, thanks to message history loading not hammering the database like crazy anymore
-- All existing messages have been processed and re-indexed (this was one of the major factors for the slowness yesterday). If you notice any messages not loading or missing, please let us know!
- Improved health check for the cloud API server, to ensure the server gets rebooted in case of CosmosDB or Redis issues
-- This will auto-recover the cloud in case of unexpected connectivity issues that require reinitializing connections
- Tweaked database query batch size when fetching records, to prevent certain operations (e.g. generating record usage report) from resulting in large load on the database, causing cloud slowness
- Record usage report/JSON is now pipelined and rate limited to avoid it from potentially disrupting cloud services and to improve robustness
-- You can only run the command once every 10 minutes and max 8 times a day. This is subject to change, but it's generally not recommended to run this too often
-- The requests are processed in sequence in case of multiple requests to avoid overloading database with too many expensive tasks at once. You will be notified when the generation finishes
- Cloud messaging system now locally tracks sent status of a message
-- This ensures that message is properly colored as sending or failed to sent when the message history is refreshed
- Added in-process rate limiting fallback during Redis connectivity issues
-- This should improve robustness of the cloud services in case of interminnent connectivity issues to the Redis server
- VBLFC badge now shows up in the Session list as well @Delta and @Kulza
- Changed deadzone algorithm for axis movement to be based on magnitude, rather than each individual axis, which results in better diagonal behavior (based on report by @H3BO3)
- If you send a message to user before the history loads, the newly sent message shouldn't disappear anymore
- Merged Russian locale addition by @Shadow Panther [RU/EN, UTC+3]
- Merged Japanese locale additions and tweaks by @Aesc
- Merged Korean locale additions by Holy_Water
- Merged Chinese locale additions and tweaks by Holy_Water
- Merged Czech locale addition by @rampa_3 (UTC +1, DST UTC +2)
- Merged German locale additions and tweaks by @InnocentThief
[h2]Security:[/h2]
- Added extra obfuscation to host user's ID's (based on report by @runtime)
[h2]Bugfixes:[/h2]
- Fixed bug in FireflySoft.RateLimit library that would cause all rate limit checks to fail when the Redis server is unexpectedly rebooted, due to Lua Redis script being cleared
-- This was the primary cause of the cloud issues yesterday
-- Pull request for our fix is available here: https://github.com/bosima/FireflySoft.RateLimit/pull/7
- Fixed full message fetch rate limit not working properly, allowing the full message fetches to be executed too often
- Fixed wrong debug message on SignalR (reported by @Bitman (Neos.js Developer))
- Fixed incorrect URL when fetching records at path with access key (reported by @Bitman (Neos.js Developer))
- Smooth Turn exclusive mode will now also prevent movement when already turning, instead of just preventing turning while moving (reported by @H3BO3 and @Kaptain Krunch)

I've got a new build for you! This is mainly cloud changes, that should greatly improve the responsivness and make the new services more robust! The switch to SignalR ended up a bit bumpier than I hoped for (thanks everyone for understanding despite the inconveniences ^^;), but it should be really worth it in the long run and scale much better as we get more users.
Notably, I found root cause of message history taking really long time to load. In some cases the database queries were very suboptimal, taking several seconds of database time for a single fetch. For example loading message history I used to test consumed around 14000 RUs (Request Units, pretty much database throughput units). After adding extra field to simplify the query and adding composite index (you probably felt that yesterday, as all existing messages had to be re-indexed!), it now takes just 8 RUs!
As a result, the message history should now load super fast across the board and stop hammering the cloud, causing other things to be slow! But please let us know if you still run into some issues.
There are a bunch of improvements as well. The cloud also ran into another issue, where due to Redis server restarting, the rate limiting library wouldn't reinitialize and everything would break! This should be fixed now. The system now also has fallback in case the Redis fail temporarily and will restart itself automatically if it keeps failing.
Anyway, hopefully things should run pretty smoohtly right now. I'll be monitoring them and see if we run into some more issues. There some more things that we need to optimize as well (e.g. world searching/browsing, initial loading of contacts and bunch of others), but this should be a pretty important step forward!
Oh also this build is compatible with the last since most of the changes were in the cloud (but there should be some neat stuff in the build too)!
[h2]Tweaks:[/h2]
- Added optimized index values to messages in the database and optimized message history queries to significantly reduce (orders of magnitude) required database throughput when fetching message history
-- Message history should now load significantly faster and cause less load on the cloud backend (to give example, one fetch would previously consume ~14000 RUs, the same exact fetch now consumes 8 RUs)
-- This should also fix cases of message history not loading at all in somem cases due to the request timing out
-- Additionaly this should help with overall cloud responsiveness, thanks to message history loading not hammering the database like crazy anymore
-- All existing messages have been processed and re-indexed (this was one of the major factors for the slowness yesterday). If you notice any messages not loading or missing, please let us know!
- Improved health check for the cloud API server, to ensure the server gets rebooted in case of CosmosDB or Redis issues
-- This will auto-recover the cloud in case of unexpected connectivity issues that require reinitializing connections
- Tweaked database query batch size when fetching records, to prevent certain operations (e.g. generating record usage report) from resulting in large load on the database, causing cloud slowness
- Record usage report/JSON is now pipelined and rate limited to avoid it from potentially disrupting cloud services and to improve robustness
-- You can only run the command once every 10 minutes and max 8 times a day. This is subject to change, but it's generally not recommended to run this too often
-- The requests are processed in sequence in case of multiple requests to avoid overloading database with too many expensive tasks at once. You will be notified when the generation finishes
- Cloud messaging system now locally tracks sent status of a message
-- This ensures that message is properly colored as sending or failed to sent when the message history is refreshed
- Added in-process rate limiting fallback during Redis connectivity issues
-- This should improve robustness of the cloud services in case of interminnent connectivity issues to the Redis server
- VBLFC badge now shows up in the Session list as well @Delta and @Kulza
- Changed deadzone algorithm for axis movement to be based on magnitude, rather than each individual axis, which results in better diagonal behavior (based on report by @H3BO3)
- If you send a message to user before the history loads, the newly sent message shouldn't disappear anymore
- Merged Russian locale addition by @Shadow Panther [RU/EN, UTC+3]
- Merged Japanese locale additions and tweaks by @Aesc
- Merged Korean locale additions by Holy_Water
- Merged Chinese locale additions and tweaks by Holy_Water
- Merged Czech locale addition by @rampa_3 (UTC +1, DST UTC +2)
- Merged German locale additions and tweaks by @InnocentThief
[h2]Security:[/h2]
- Added extra obfuscation to host user's ID's (based on report by @runtime)
[h2]Bugfixes:[/h2]
- Fixed bug in FireflySoft.RateLimit library that would cause all rate limit checks to fail when the Redis server is unexpectedly rebooted, due to Lua Redis script being cleared
-- This was the primary cause of the cloud issues yesterday
-- Pull request for our fix is available here: https://github.com/bosima/FireflySoft.RateLimit/pull/7
- Fixed full message fetch rate limit not working properly, allowing the full message fetches to be executed too often
- Fixed wrong debug message on SignalR (reported by @Bitman (Neos.js Developer))
- Fixed incorrect URL when fetching records at path with access key (reported by @Bitman (Neos.js Developer))
- Smooth Turn exclusive mode will now also prevent movement when already turning, instead of just preventing turning while moving (reported by @H3BO3 and @Kaptain Krunch)


