Profiling, Refactoring & Design Patterns
Evaluating and enhancing the code
Profiling
One way to evaluate the code is to profile it. Luckily, Node.js already provides us with a profiler, behind the --prof
flag. So let’s just run our app with the profiler.
node --prof build/server.js
You can see that a isolate-0x....-v8.log
file is created in your working directory. Now let’s stress test our app with ApacheBench.
ab -k -c 1000 -n 10000 -T "application/json" -p post.data "[http://localhost:3000/slack/events](http://localhost:3000/slack/events)"
After waiting for about a minute, the stress test is complete. We can terminate our process and start analyzing the logs. We can do that by executing something like this:
node --prof-process isolate-0x102801e00-v8.log > processed.txt
Here’s some interesting sections I got from the processed.txt
file.
[Summary]:
ticks total nonlib name
12231 49.0% 50.0% JavaScript
9162 36.7% 37.4% C++
832 3.3% 3.4% GC
497 2.0% Shared libraries
3081 12.3% Unaccounted</span><span id="4dbe" class="dz jb gy dc it b jc jm jn jo jp jq je s jf">[JavaScript]:
ticks total nonlib name
1920 7.7% 7.8% Builtin: StringEqual
941 3.8% 3.8% Builtin: KeyedLoadIC_Megamorphic
860 3.4% 3.5% Builtin: KeyedStoreIC_Megamorphic_Strict
655 2.6% 2.7% LazyCompile: *matchKnownFields _http_incoming.js:150:26
480 1.9% 2.0% LoadIC: A load IC from the snapshot
373 1.5% 1.5% StoreIC: A store IC from the snapshot
362 1.4% 1.5% LazyCompile: *parseBody /Users/rkkautsar/Kuliah/PPL/PPL2018-C1/scrum-bot/node_modules/koa-bodyparser/index.js:89:27</span> <span id="c211" class="dz jb gy dc it b jc jm jn jo jp jq je s jf">[C++ entry points]:
ticks cpp total name
2242 35.3% 9.0% T v8::internal::Runtime_StringSplit(int, v8::internal::Object**, v8::internal::Isolate*)
1064 16.8% 4.3% T v8::internal::Runtime_GetProperty(int, v8::internal::Object**, v8::internal::Isolate*)
895 14.1% 3.6% T v8::internal::Builtin_HandleApiCall(int, v8::internal::Object**, v8::internal::Isolate*)
748 11.8% 3.0% T v8::internal::Runtime_KeyedGetProperty(int, v8::internal::Object**, v8::internal::Isolate*)</span>
We can see that half of the ticks happens in Javascript Runtime, and the most consuming function is StringEqual. Let’s see if pruning some of our many comparisons of string can optimize our app.
Here’s the before and after refactoring of an unoptimized code:
const messageEvent: MessageEventType = ctx.request.body.event;
let origin = MessageOrigin.unknown;
if (
await isMemberOf(
ctx.request.body.event.channel,
channelTypes.find(type => type.endpoint === 'channels')!
)
) {
origin = MessageOrigin.channels;
} else if (
await isMemberOf(
ctx.request.body.event.channel,
channelTypes.find(type => type.endpoint === 'im')!
)
) {
origin = MessageOrigin.ims;
} else if (
await isMemberOf(
ctx.request.body.event.channel,
channelTypes.find(type => type.endpoint === 'mpim')!
)
) {
origin = MessageOrigin.mpim;
} else if (
await isMemberOf(
ctx.request.body.event.channel,
channelTypes.find(type => type.endpoint === 'groups')!
)
) {
origin = MessageOrigin.groups;
}
messageEvent.origin = origin;
const messageEvent: MessageEventType = ctx.request.body.event;
const isMemberOfs = await Promise.all(channelTypes.map(type => isMemberOf(messageEvent.channel, type)));
const channelType = channelTypes.find((type, index) => isMemberOfs[index]);
if (channelType !== undefined) {
messageEvent.origin = channelType.origin;
} else {
messageEvent.origin = MessageOrigin.unknown;
}
We can see that before, it was calling the String::find function on every channelTypes, making it achieve something like an O(n²) complexity. We can then refactor it to a simple two, easy to understand, linear loops.
Let’s do the profiling and stress test again (but now I turn up the number of connection to 100.000, finished in about 5 minutes):
[Summary]:
ticks total nonlib name
117033 45.6% 46.6% JavaScript
97543 38.0% 38.9% C++
10069 3.9% 4.0% GC
5547 2.2% Shared libraries
36407 14.2% Unaccounted
[JavaScript]:
ticks total nonlib name
10274 4.0% 4.1% Builtin: StringEqual
8733 3.4% 3.5% Builtin: KeyedLoadIC_Megamorphic
6871 2.7% 2.7% LoadIC: A load IC from the snapshot
5107 2.0% 2.0% Builtin: KeyedStoreIC_Megamorphic_Strict
4340 1.7% 1.7% LazyCompile: *parseBody /Users/rkkautsar/Kuliah/PPL/PPL2018-C1/scrum-bot/node_modules/koa-bodyparser/index.js:89:27
4039 1.6% 1.6% StoreIC: A store IC from the snapshot {1}
[C++ entry points]:
ticks cpp total name
12083 25.1% 4.7% T v8::internal::Runtime_StringSplit(int, v8::internal::Object**, v8::internal::Isolate*)
8118 16.9% 3.2% T v8::internal::Runtime_KeyedGetProperty(int, v8::internal::Object**, v8::internal::Isolate*)
7039 14.6% 2.7% T v8::internal::Runtime_GetProperty(int, v8::internal::Object**, v8::internal::Isolate*)
5305 11.0% 2.1% T v8::internal::Runtime_SetProperty(int, v8::internal::Object**, v8::internal::Isolate*)
We can see that the ticks are reduced by about 50% in the StringEqual function, and the total ticks (in Javascript) are reduced by about 5% altogether. That’s some nice optimization! Now we know that string methods is not that cheap and we will have to be more considerate of it (top consuming method in C++ is also string-related).
Refactoring
Martin Fowler has written a great books called “Refactoring: Improving the Design of Existing Code” that is said to be THE book of refactoring. As I don’t own the book, I stumbled upon a repository in GitHub that summarize its content and I found it very helpful I wish I had the book.
Well, as I look at Scrum.ai code, I know that this needs a refactoring. And with the guide in hand, I jump into refactoring the code.
One of the refactoring I do is when I need to implement and NLU handler for messages. That is, previously our bot only handles slash commands, and I want to extend it to handle natural language too.
I also see that the handler function is too bloated and need a refactor. It smells of a long method, and need to be more low coupling and more cohesive. So I do a Big Refactoring, specifically Extracting Hierarchy.
From a single messageHandler
method. I refactored it into:
handlers/
NLUHandler extends MessageHandler
SlashCommandHandler extends MessageHandler
slack/handler/
intentHandler(parsedIntent)
That is, the original handler is now an interface, extended by NLUHandler and SlashCommandHandler that each do their own things and return a parsed intent object, e.g.:
{ intent: 'task_add', task_title: 'My Task Title' }
And the original function-calling handler is now an intentHandler that receives the said parsed intent object and executes the related functions. Now both of the handler is more cohesive and can be easily extended to handle another type of messages.
Design Patterns
Some of the design patterns used in Scrum.ai includings:
- Chain of responsibility. The handlers I refactored above implements chain of responsibility pattern in the way that it tries every handlers it know until one of it actually handles the message.
- Template Method. One of the early feature I’ve coded is the database seeder. It implements template method in the way that it has many seeder that each implements their own
seed()
function that will be run by the main seeder. - Decorators, Singleton, Builder, and many more. As we use libraries like TypeORM that encourage good patterns, we use their structural and creational patterns to the best that we can.
Appendix
Now that we’re at the last sprint, we need to sped up things and finish the requirements. In the last two weeks I’ve integrated Kata NLU to the bot as one of our user story. I also learned a lot about NLP, Dialog Management, and Context in the context of chatbots. Now we need to glue things together to finish the requirements.
By the way, here’s the result of ApacheBench for 1.000.000 connections and 1000 concurrent connections, a bit unstable but still all right.
ab -k -c 1000 -n 1000000 -T "application/json" -p post.data "http://localhost:3000/slack/events"
This is ApacheBench, Version 2.3 <$Revision: 1807734 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient)
Completed 100000 requests
Completed 200000 requests
Completed 300000 requests
Completed 400000 requests
Completed 500000 requests
Completed 600000 requests
Completed 700000 requests
Completed 800000 requests
Completed 900000 requests
Completed 1000000 requests
Finished 1000000 requests
Server Software:
Server Hostname: localhost
Server Port: 3000
Document Path: /slack/events
Document Length: 2 bytes
Concurrency Level: 1000
Time taken for tests: 304.648 seconds
Complete requests: 1000000
Failed requests: 0
Keep-Alive requests: 1000000
Total transferred: 142000000 bytes
Total body sent: 275000000
HTML transferred: 2000000 bytes
Requests per second: 3282.47 [#/sec] (mean)
Time per request: 304.648 [ms] (mean)
Time per request: 0.305 [ms] (mean, across all concurrent requests)
Transfer rate: 455.19 [Kbytes/sec] received
881.52 kb/s sent
1336.71 kb/s total
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 1.9 0 93
Processing: 17 304 126.6 261 1741
Waiting: 17 304 126.6 261 1741
Total: 22 305 126.6 261 1741
Percentage of the requests served within a certain time (ms)
50% 261
66% 276
75% 293
80% 322
90% 431
95% 547
98% 738
99% 855
100% 1741 (longest request)