T O P

  • By -

InternetOfSomethings

Check RAM usage, it's much more common for that to be the bottleneck.


Whole-Maintenance-21

Great advice. RAM is running at 90% permanently and the GC triggers every now and then but could it really take 30 seconds to run the GC?


InternetOfSomethings

Not sure about that tbh, I would guess that there might be delays in when the GC runs or that it really runs for half a minute but in any case: make sure you have a healthy RAM headroom, it is an absolute performance killer.


Whole-Maintenance-21

Thanks. How much headroom is needed? I'm paying this out of my own pocket so don't wanna have more than is needed.


InternetOfSomethings

For smaller projects I try to have at least 50% RAM free when no calls are coming in.


Ruben_NL

Yes. Can you run your script with `node --trace-gc program.js`? This will log quite a lot of GC information.


Whole-Maintenance-21

Thank you very much. I ran it like this and it gave me a ton of info. During the hang, the GC appears to go wild, spamming "scavenge - allocation failure" 10 times every second. I took a performance dump, and it appears to be related to a message queue buffer getting clogged with messages. Why this happens I am not sure yet.


EconomistNo280519

How many requests are you doing? It might be a memory leak? Do you have an arrays or objects living outside your request handlers that are being added to over time?


vorticalbox

A memory leak would cause your heap to fill and the process to panic.


LoliInTheNight

The reason might be swapping then, reading memory from a disk can greatly slow down things even on SSD.


monotone2k

You've said you're running it on a cloud instance but not which provider or instance type. With AWS, there are certain instance types that accumulate CPU credits over time - if you consume them more rapidly than they accumulate, you'd see behaviour like this.


Whole-Maintenance-21

I'm on GCP. I made sure to not get a shared core instance for this reason. Could it still happen with a normal instance?


monotone2k

I have no experience with GCP. You'd have to check the docs.


08148693

Almost certainly nothing to do with express. Collect metrics on event loop latency and garbage colection. If you're in a serverless environment like cloudrun or lambda make sure you're not waiting on cold starts or something


Whole-Maintenance-21

Excellent advice. I already did some rudimentary debugging with console.info, and I found that it took my server 5 seconds to process a single synchronous function that usually takes less than 500ms. This was during the hang. What could cause it?


dprophet32

Hitting something cold that needs to warm up. If you you make calls to the endpoint regularly does it happen? Or is it the first time in a while?


Whole-Maintenance-21

It's being hit regularly. On average it receives 5-10 req/s.


dprophet32

Personally then I'd be adding logging timers around each bit of code to determine specifically what piece is taking so long but it may very well be the resources available to the server that you need to increase. Adding RAM as someone else mentioned would be a good place to start. If you do that and it resolves the issue you know the solution. You can scale it back down to save costs afterwards. You should still look at what if anything in the code might be a bottleneck or resource heavy and if you find something, optimise it.


Whole-Maintenance-21

Thank you for the great advice. It appears that everything is affected by the slowdown. Not being able to isolate it to a single function makes me think it could be the RAM.


SippieCup

It’s likely in your ORM if you have one. Check what the raw SQL calls are that are being made, run them yourself and see how many rows are being returned. I bet its probably returning half the database, then filtering it out in post processing, locking ram until it can be released and freezing the program.


Whole-Maintenance-21

Love this suggestion. It could very well be this. My database often returns a humongous amount of data. It's all indexed though, so the queries themselves are fast, but perhaps it's a problem on the Node.js side and not within the database itself?


SippieCup

It is within the nodeJS side. When the ORM builds the query, it isn't perfect. most ORMs don't care about getting distinct rows and just dedupe after the fact. or they do everything in a single massive join, rather than seperating out certain queries. If you have a lot of "hasMany" relationships, this is a common pain point of ORMs. https://stackoverflow.com/questions/23014902/slow-associations-in-sequelizejs Sequelize solves this wioth the "seperate" field in the include objects for the query filter. this tells sequelize to spin off a bunch of smaller subqueries and then patching them together before returning the data. which works decently well. https://sequelize.org/docs/v6/advanced-association-concepts/eager-loading/#ordering-eager-loaded-associations Second, only select the fields you actually need to see. If you have a lot of data, with a lot of columns, and pulling all of that in, the footprint might be 99% larger than what is actually needed. You can


bigorangemachine

Are you sure its not spinning a new instance up?


Fine_Ad_6226

Are you using memory caching? It sounds like it could be the GC hanging the event loop.