--inspectflag (see Inspector Docs)
At a more detailed level, GC collection is triggered by memory activity, rather than time and objects are classified by the GC into young and old. "Young" objects are traversed (scavenged) more frequently, while "old" objects will stay in memory for longer. So there are actually two GC types, a frequent scavenge of new space (short lived objects) and a less regular traversal of old space (objects that survived enough new space scavenges).
Several heuristics may trigger detection of a GC issue, but they all center around high memory usage.
One possible cause of a detected GC issue is a memory leak, where objects are being accidentally allocated. However there are other (more common) cases where the is no leak but the memory strategy needs to be adapted.
One such common case is when large objects (such as may be generated for big JSON payloads), are created during periods of high activity (e.g. under request load). This can cause the objects to be moved into old space – if they survive two (by default) GC scavenges – where they will live for longer due to the less frequent scavenges. Objects can then build up in "old space" and cause intermittent process stalling during Garbage Collection.
Depending on the use case this may be solved in different ways. For instance if the goal is to write out serialized objects, then the output could be written to the response as strings (or buffers) directly instead of creating the intermediate objects (or a combined strategy where part of the object is written out from available state). It may just be a case that a functional approach (which is usually recommended) is leading to the repeated creation of very similar objects, in which case the logical flow between functions in a hot path could be adapted to reuse objects instead of create new objects.
Another possibility is that a very high amount of short lived objects are created, filling up the "young" space and triggering frequent GC sweeps – if this case isn't an unintended memory leak, then then an object pooling strategy may be necessary.
To solve Garbage Collection issues we have to analyse the state of our process in order to track down the root cause behind the high memory consumption.
node --inspect <FILENAME>
inspectlink for that target – this will connect Chrome Devtools to the Node processes remote debug interface
Advanced: Other Devtools memory profiling functionality, Record allocation profile and Record allocation timeline may also be very helpful
Advanced: An alternative approach is to use a generate a core dump and use a core dump analysis tool to list all JS objects in a core dump file (this approach isn't viable on macOS)
clinic flameto discover CPU intensive function calls – run
clinic flame -hto get started.
At a (very) basic level the following pseudo-code demonstrates the Event Loop:
while (event) handle(event)
The Event Loop paradigm leads to an ergonomic development experience for high concurrency programming (relative to the multi-threaded paradigm).
However, since the Event Loop operates on a single thread this is essentially a shared execution environment for every potentially concurrent action. This means that if the execution time of any line of code exceeds an acceptable threshold it interferes with processing of future events (for instance, an incoming HTTP request); new events cannot be processed because the same thread that would be processing the event is currently blocked by a long-running synchronous operation.
Asynchronous operations are those which queue an event for later handling, they tend to be identified by an API that requires a callback, or uses promises (or async/await).
Whereas synchronous operations simply return a value. Long running synchronous operations are either
functions that perform blocking I/O (such as
fs.readFileSync) or potentially resource intensive
algorithms (such as
To solve the Event Loop issue, we need to find out where the synchronous bottleneck is. This may (commonly) be identified as a single long-running synchronous function, or the bottleneck may be distributed which would take rather more detective work.
clinic flameto generate a flamegraph
clinic flamehelp to get started
Node.js provides a platform for non-blocking I/O. Unlike languages that typically block for IO (e.g. Java, PHP), Node.js passes I/O operations to an accompanying C++ library (libuv) which delegates these operations to the Operating System.
The profiled process has been observed is unusually idle under load, typically this means it's waiting for external I/O because there's nothing else to do until the I/O completes.
To solve I/O issues we have to track down the asynchronous call(s) which are taking an abnormally long time to complete.
I/O root cause analysis is mostly a reasoning exercises, or requires advanced knowledge and expertise with specialist Node.js logging flags and (very new) asynchronous tracking API's.
At nearForm we care a lot about this problem and we are developing a new tool to make I/O debugging easier… stay tuned!
console.timebefore an asynchronous call and
console.timeEndat the top of the callback (or promise
thenhandler, or after an
awaitor whatever the asynchronous abstraction)
Advanced: For more advanced, lower overhead, timing functionality, check out the experimental
Advanced: An alternative to the approach outlined above is to make use of the experimental
async_hooks API, in combination with a timer and stack trace generation.