Hey there,
I'm coming here because we just had an incident that almost cost us our production environment.
We're working on an Xperience Website with this architecture:
One CMS instance
One Frontend instance that can scale to up to 4
All that in Azure.
We're currently doing a lot of test on our scaling so a lot of server creation/destruction.
A few minutes ago, we had the quite unpleasant surprise to see our SQLServer Instance go up to 100% with nothing peculiar running. Let's check the problem.
Proc_CMS_WebFarmTask_DeleteOrphanedTasks => min 15s, moy 15s, max 15s
That's not good. 15s felt a lot like a SQL timeout and indeed it was.
Let's dive.
Body of DeleteOrphanedTasks with a NOT IN( get all the tasks id in WebFarmServerTasks)
And at that moment, I got it.
Let's check our servers...
18 servers with ServerEnabled seems a lot as we only had 1 instance at that moment (+1 cms + 1 staging slot I guess). Let's check what's going on...
request returning last ping by ServerId
So, some servers haven't answered for more than 12h but are still considered (serverEnabled) ?
Once I deleted all tasks/server that were not there anymore, the db load is back to normal.
Ever had this problem ? What is the normal behavior for our usecase ? Do Xperience support autoscalling or do I have to manage the server list on our side ? I feel a process should do the cleaning hourly at least.
Furthermore, Delete Top is a really bad pattern. We already had a problem with k13 with the same kind of request. You were doing almost the same thing to delete logs and if you had a big log intake, you could easily break the system. Your request is listing all the lines and taking the x first. If the listing takes more time than your timeout, the whole cleaning breaks.
You have an identity field. Get the min/max of your request and delete on Id > X. It will be lighting fast in comparison.
[edit] seems we can't put SQL. Not quite practical :D
Environment
Xperience by Kentico version: [31.5.0]
.NET version: [10]
Execution environment: [Private cloud (Azure)]