JSConf

2019

Fedor Indutny

  • Twitter: @indutny
  • GitHub: @indutny
  • Codes at PayPal
  • Ω beams 🔥🔥👀

llhttp

New HTTP v1 Parser

for Node.js

History

Node.js

Used a lot by frontend tooling

Great backend platform!

Node.js

started with http_parser...

Is libuv the oldest dependency?

No!

http_parser is!

http_parser

  • Inspired by Mongrel
  • Parts of nginx's HTTP parser
  • Original code by @ry
  • With us since 2009

10 years!

To celebrate this

I wrote another HTTP parser

to replace http_parser

Why?

Original parser

  • "Good enough" performance
  • Supports spec-violating clients
  • Has lots of tests

Cons

  • Rigidity of codebase
  • Impossible to maintain
  • Vulnerability-prone
Node.js users know JavaScript better than C

How?

Conservative attempts

  • Introduce macros
  • Separate functions for states

Conclusion:

Hard to improve

Existing codebase

No reason to

re-use code

Rewrite

in JavaScript

(or TypeScript 😅)

Plan:

  1. Define parser in TypeScript
  2. Generate C library

C Library:

  • Keep existing users
  • Hopefully performance too!

llhttp

  • Next major version
  • Same principles
  • Similar API

Scan input

One character at time

Scanning

means

no buffering

Even when request

Arrives byte-by-byte

Parser

Emits partial data

(e.g. header names/values, body)

Original scanning

  • "for" loop over input
  • huge "switch" for states

All in a single function!

New scanning

  • Break "switch" into pieces
  • Each piece has precise action
  • Use "goto" between states

How exactly?

DSL

llparse

(Can be used for other protocols!)

Define common

actions

match


              const node = program.node('name');

              node.match('keep-alive', next);
            

peek


              node.peek(' ', next);
            

pattern-to-number


              node.select({
                '0': 0, '1': 1, '2': 2,
                '3': 3, '4': 4, '5': 5,
                '6': 6, '7': 7, '8': 8,
                '9': 9,
              }, next);
            

store

(invoke callback)


              const store = program.code.store('method');

              node.select({
                'GET': 0,
                'PUT': 1,
              }, program.invoke(store, next));
            

span


              const cb = program.code.span('header');
              const header = program.span(cb);

              header.start(
                  node.match(' ', header.end(next)));
            

otherwise


              node
                .match('a', childA)
                .match('b', childB)
                .otherwise(somethingElse);
            

skipTo


              node
                .match('a', childA)
                .skipTo(somethingElse);
            

llhttp is a TS program

Different sub-parsers in
different files

llparse transpiles TS program
to C

DSL

No syntax checking

JS engine handles it!

llparse

Builds a graph of states

llparse

Can do static analysis:
  • Infinite loop check
  • "peephole" optimizations

llparse

Can generate different outputs:
  • C
  • LLVM bitcode

C has better performance 😳

How fast?

http_parser

has good performance

llhttp

Not hand-written!

Not hand-optimized!

is 2x faster!

Numbers

llhttp http_parser
3'020'459 RPS 1'406'180 RPS
(Node hardly cares 😅)

Default in Node 12

Please blame me

for all HTTP problems

...and tag me on GitHub

Tests

All original tests

ported to

markdown

Easy to read, easy to contribute

In-test textual description

What's next?

More static checks

More optimizations

vector instructions

Unified docs

Different parsers?

(e.g., SMTP, POP3)

Link incoming...

Thank you!