Profiling, Refactoring & Design Patterns

Featured on Hashnode

Profiling, Refactoring & Design Patterns

Evaluating and enhancing the code

Profiling

One way to evaluate the code is to profile it. Luckily, Node.js already provides us with a profiler, behind the --prof flag. So let’s just run our app with the profiler.

node --prof build/server.js

You can see that a isolate-0x....-v8.log file is created in your working directory. Now let’s stress test our app with ApacheBench.

ab -k -c 1000 -n 10000 -T "application/json" -p post.data "[http://localhost:3000/slack/events](http://localhost:3000/slack/events)"

After waiting for about a minute, the stress test is complete. We can terminate our process and start analyzing the logs. We can do that by executing something like this:

node --prof-process isolate-0x102801e00-v8.log  > processed.txt

Here’s some interesting sections I got from the processed.txtfile.

[Summary]:
   ticks  total  nonlib   name
  12231   49.0%   50.0%  JavaScript
   9162   36.7%   37.4%  C++
    832    3.3%    3.4%  GC
    497    2.0%          Shared libraries
   3081   12.3%          Unaccounted</span><span id="4dbe" class="dz jb gy dc it b jc jm jn jo jp jq je s jf">[JavaScript]:
   ticks  total  nonlib   name
   1920    7.7%    7.8%  Builtin: StringEqual
    941    3.8%    3.8%  Builtin: KeyedLoadIC_Megamorphic
    860    3.4%    3.5%  Builtin: KeyedStoreIC_Megamorphic_Strict
    655    2.6%    2.7%  LazyCompile: *matchKnownFields _http_incoming.js:150:26
    480    1.9%    2.0%  LoadIC: A load IC from the snapshot
    373    1.5%    1.5%  StoreIC: A store IC from the snapshot
    362    1.4%    1.5%  LazyCompile: *parseBody /Users/rkkautsar/Kuliah/PPL/PPL2018-C1/scrum-bot/node_modules/koa-bodyparser/index.js:89:27</span> <span id="c211" class="dz jb gy dc it b jc jm jn jo jp jq je s jf">[C++ entry points]:
   ticks    cpp   total   name
   2242   35.3%    9.0%  T v8::internal::Runtime_StringSplit(int, v8::internal::Object**, v8::internal::Isolate*)
   1064   16.8%    4.3%  T v8::internal::Runtime_GetProperty(int, v8::internal::Object**, v8::internal::Isolate*)
    895   14.1%    3.6%  T v8::internal::Builtin_HandleApiCall(int, v8::internal::Object**, v8::internal::Isolate*)
    748   11.8%    3.0%  T v8::internal::Runtime_KeyedGetProperty(int, v8::internal::Object**, v8::internal::Isolate*)</span>

We can see that half of the ticks happens in Javascript Runtime, and the most consuming function is StringEqual. Let’s see if pruning some of our many comparisons of string can optimize our app.

Here’s the before and after refactoring of an unoptimized code:

const messageEvent: MessageEventType = ctx.request.body.event;
let origin = MessageOrigin.unknown;
if (
    await isMemberOf(
        ctx.request.body.event.channel,
        channelTypes.find(type => type.endpoint === 'channels')!
    )
) {
    origin = MessageOrigin.channels;
} else if (
    await isMemberOf(
        ctx.request.body.event.channel,
        channelTypes.find(type => type.endpoint === 'im')!
    )
) {
    origin = MessageOrigin.ims;
} else if (
    await isMemberOf(
        ctx.request.body.event.channel,
        channelTypes.find(type => type.endpoint === 'mpim')!
    )
) {
    origin = MessageOrigin.mpim;
} else if (
    await isMemberOf(
        ctx.request.body.event.channel,
        channelTypes.find(type => type.endpoint === 'groups')!
    )
) {
    origin = MessageOrigin.groups;
}

messageEvent.origin = origin;
const messageEvent: MessageEventType = ctx.request.body.event;

const isMemberOfs = await Promise.all(channelTypes.map(type => isMemberOf(messageEvent.channel, type)));
const channelType = channelTypes.find((type, index) => isMemberOfs[index]);

if (channelType !== undefined) {
    messageEvent.origin = channelType.origin;
} else {
    messageEvent.origin = MessageOrigin.unknown;
}

We can see that before, it was calling the String::find function on every channelTypes, making it achieve something like an O(n²) complexity. We can then refactor it to a simple two, easy to understand, linear loops.

Let’s do the profiling and stress test again (but now I turn up the number of connection to 100.000, finished in about 5 minutes):

[Summary]:
   ticks  total  nonlib   name
  117033   45.6%   46.6%  JavaScript
  97543   38.0%   38.9%  C++
  10069    3.9%    4.0%  GC
   5547    2.2%          Shared libraries
  36407   14.2%          Unaccounted
[JavaScript]:
   ticks  total  nonlib   name
  10274    4.0%    4.1%  Builtin: StringEqual
   8733    3.4%    3.5%  Builtin: KeyedLoadIC_Megamorphic
   6871    2.7%    2.7%  LoadIC: A load IC from the snapshot
   5107    2.0%    2.0%  Builtin: KeyedStoreIC_Megamorphic_Strict
   4340    1.7%    1.7%  LazyCompile: *parseBody /Users/rkkautsar/Kuliah/PPL/PPL2018-C1/scrum-bot/node_modules/koa-bodyparser/index.js:89:27
   4039    1.6%    1.6%  StoreIC: A store IC from the snapshot {1}
[C++ entry points]:
   ticks    cpp   total   name
  12083   25.1%    4.7%  T v8::internal::Runtime_StringSplit(int, v8::internal::Object**, v8::internal::Isolate*)
   8118   16.9%    3.2%  T v8::internal::Runtime_KeyedGetProperty(int, v8::internal::Object**, v8::internal::Isolate*)
   7039   14.6%    2.7%  T v8::internal::Runtime_GetProperty(int, v8::internal::Object**, v8::internal::Isolate*)
   5305   11.0%    2.1%  T v8::internal::Runtime_SetProperty(int, v8::internal::Object**, v8::internal::Isolate*)

We can see that the ticks are reduced by about 50% in the StringEqual function, and the total ticks (in Javascript) are reduced by about 5% altogether. That’s some nice optimization! Now we know that string methods is not that cheap and we will have to be more considerate of it (top consuming method in C++ is also string-related).

Refactoring

Martin Fowler has written a great books called “Refactoring: Improving the Design of Existing Code” that is said to be THE book of refactoring. As I don’t own the book, I stumbled upon a repository in GitHub that summarize its content and I found it very helpful I wish I had the book.

Well, as I look at Scrum.ai code, I know that this needs a refactoring. And with the guide in hand, I jump into refactoring the code.

One of the refactoring I do is when I need to implement and NLU handler for messages. That is, previously our bot only handles slash commands, and I want to extend it to handle natural language too.

I also see that the handler function is too bloated and need a refactor. It smells of a long method, and need to be more low coupling and more cohesive. So I do a Big Refactoring, specifically Extracting Hierarchy.

From a single messageHandler method. I refactored it into:

handlers/
  NLUHandler extends MessageHandler
  SlashCommandHandler extends MessageHandler
slack/handler/
  intentHandler(parsedIntent)

That is, the original handler is now an interface, extended by NLUHandler and SlashCommandHandler that each do their own things and return a parsed intent object, e.g.:

{ intent: 'task_add', task_title: 'My Task Title' }

And the original function-calling handler is now an intentHandler that receives the said parsed intent object and executes the related functions. Now both of the handler is more cohesive and can be easily extended to handle another type of messages.

Design Patterns

Some of the design patterns used in Scrum.ai includings:

  1. Chain of responsibility. The handlers I refactored above implements chain of responsibility pattern in the way that it tries every handlers it know until one of it actually handles the message.
  2. Template Method. One of the early feature I’ve coded is the database seeder. It implements template method in the way that it has many seeder that each implements their own seed() function that will be run by the main seeder.
  3. Decorators, Singleton, Builder, and many more. As we use libraries like TypeORM that encourage good patterns, we use their structural and creational patterns to the best that we can.

Appendix

Now that we’re at the last sprint, we need to sped up things and finish the requirements. In the last two weeks I’ve integrated Kata NLU to the bot as one of our user story. I also learned a lot about NLP, Dialog Management, and Context in the context of chatbots. Now we need to glue things together to finish the requirements.

By the way, here’s the result of ApacheBench for 1.000.000 connections and 1000 concurrent connections, a bit unstable but still all right.

ab -k -c 1000 -n 1000000 -T "application/json" -p post.data "http://localhost:3000/slack/events"
This is ApacheBench, Version 2.3 <$Revision: 1807734 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient)
Completed 100000 requests
Completed 200000 requests
Completed 300000 requests
Completed 400000 requests
Completed 500000 requests
Completed 600000 requests
Completed 700000 requests
Completed 800000 requests
Completed 900000 requests
Completed 1000000 requests
Finished 1000000 requests
Server Software:
Server Hostname:        localhost
Server Port:            3000
Document Path:          /slack/events
Document Length:        2 bytes
Concurrency Level:      1000
Time taken for tests:   304.648 seconds
Complete requests:      1000000
Failed requests:        0
Keep-Alive requests:    1000000
Total transferred:      142000000 bytes
Total body sent:        275000000
HTML transferred:       2000000 bytes
Requests per second:    3282.47 [#/sec] (mean)
Time per request:       304.648 [ms] (mean)
Time per request:       0.305 [ms] (mean, across all concurrent requests)
Transfer rate:          455.19 [Kbytes/sec] received
                        881.52 kb/s sent
                        1336.71 kb/s total
Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   1.9      0      93
Processing:    17  304 126.6    261    1741
Waiting:       17  304 126.6    261    1741
Total:         22  305 126.6    261    1741
Percentage of the requests served within a certain time (ms)
  50%    261
  66%    276
  75%    293
  80%    322
  90%    431
  95%    547
  98%    738
  99%    855
 100%   1741 (longest request)