Skip to content

url: use ada::url_aggregator for parsing urls#47339

Merged
Trott merged 3 commits into
nodejs:mainfrom
anonrig:update-ada-v2
Apr 5, 2023
Merged

url: use ada::url_aggregator for parsing urls#47339
Trott merged 3 commits into
nodejs:mainfrom
anonrig:update-ada-v2

Conversation

@anonrig

@anonrig anonrig commented Mar 31, 2023

Copy link
Copy Markdown
Member

This pull request:

  • updates ada and replaces ada::url with ada::url_aggregator in correct places
  • removes ICU requirement from URL
  • improves the performance of URL parsing significantly

The release notes for Ada v2.0 can be found from https://www.yagiz.co/announcing-ada-url-parser-v2-0/

Added the notable-change label due to removing the requirement for ICU for proper hostname parsing.

Benchmark CI: https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1314/

url/legacy-vs-whatwg-url-parse.js method='whatwg' e=1 type='auth' withBase='false'              ***    106.73 %      ±19.04% ±25.55% ±33.70%
url/legacy-vs-whatwg-url-parse.js method='whatwg' e=1 type='auth' withBase='true'               ***    111.98 %      ±18.10% ±24.30% ±32.05%
url/legacy-vs-whatwg-url-parse.js method='whatwg' e=1 type='dot' withBase='false'               ***    112.52 %      ±16.28% ±21.83% ±28.73%
url/legacy-vs-whatwg-url-parse.js method='whatwg' e=1 type='dot' withBase='true'                ***     91.22 %      ±17.19% ±23.05% ±30.35%
url/legacy-vs-whatwg-url-parse.js method='whatwg' e=1 type='file' withBase='false'              ***     31.72 %      ±12.53% ±16.68% ±21.72%
url/legacy-vs-whatwg-url-parse.js method='whatwg' e=1 type='file' withBase='true'               ***     31.05 %      ±12.37% ±16.46% ±21.43%
url/legacy-vs-whatwg-url-parse.js method='whatwg' e=1 type='idn' withBase='false'               ***     83.54 %      ±15.31% ±20.49% ±26.90%
url/legacy-vs-whatwg-url-parse.js method='whatwg' e=1 type='idn' withBase='true'                ***     81.76 %      ±16.08% ±21.57% ±28.44%
url/legacy-vs-whatwg-url-parse.js method='whatwg' e=1 type='javascript' withBase='false'        ***    137.30 %      ±18.20% ±24.43% ±32.23%
url/legacy-vs-whatwg-url-parse.js method='whatwg' e=1 type='javascript' withBase='true'         ***    111.40 %      ±16.83% ±22.53% ±29.58%
url/legacy-vs-whatwg-url-parse.js method='whatwg' e=1 type='long' withBase='false'              ***     25.11 %       ±8.80% ±11.73% ±15.28%
url/legacy-vs-whatwg-url-parse.js method='whatwg' e=1 type='long' withBase='true'               ***     23.16 %       ±8.63% ±11.52% ±15.06%
url/legacy-vs-whatwg-url-parse.js method='whatwg' e=1 type='percent' withBase='false'           ***     41.12 %      ±14.86% ±19.79% ±25.79%
url/legacy-vs-whatwg-url-parse.js method='whatwg' e=1 type='percent' withBase='true'            ***     37.21 %      ±12.93% ±17.23% ±22.48%
url/legacy-vs-whatwg-url-parse.js method='whatwg' e=1 type='short' withBase='false'             ***    123.39 %      ±18.93% ±25.36% ±33.34%
url/legacy-vs-whatwg-url-parse.js method='whatwg' e=1 type='short' withBase='true'              ***    103.11 %      ±17.88% ±23.97% ±31.54%
url/legacy-vs-whatwg-url-parse.js method='whatwg' e=1 type='wpt' withBase='false'               ***     35.04 %       ±2.25%  ±3.02%  ±3.98%
url/legacy-vs-whatwg-url-parse.js method='whatwg' e=1 type='wpt' withBase='true'                ***      9.76 %       ±4.21%  ±5.62%  ±7.36%
url/legacy-vs-whatwg-url-parse.js method='whatwg' e=1 type='ws' withBase='false'                ***    133.03 %      ±19.31% ±25.91% ±34.16%
url/legacy-vs-whatwg-url-parse.js method='whatwg' e=1 type='ws' withBase='true'                 ***    113.69 %      ±18.69% ±25.08% ±33.08%

Be aware that when doing many comparisons the risk of a false-positive
result increases. In this case, there are 40 comparisons, you can thus
expect the following amount of false-positive results:
  2.00 false positives, when considering a   5% risk acceptance (*, **, ***),
  0.40 false positives, when considering a   1% risk acceptance (**, ***),
  0.04 false positives, when considering a 0.1% risk acceptance (***)

cc @lemire @miguelteixeiraa

@anonrig anonrig added url Issues and PRs related to the legacy built-in url module. notable-change PRs with changes that should be highlighted in changelogs. whatwg-url Issues and PRs related to the WHATWG URL implementation. labels Mar 31, 2023
@nodejs-github-bot

Copy link
Copy Markdown
Collaborator

Review requested:

  • @nodejs/gyp
  • @nodejs/startup
  • @nodejs/url

@github-actions

This comment was marked as resolved.

@nodejs-github-bot nodejs-github-bot added lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels Mar 31, 2023
@anonrig anonrig force-pushed the update-ada-v2 branch 2 times, most recently from 05408df to d0da46f Compare March 31, 2023 13:12

@tniessen tniessen left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this marked as notable because it removes the need for ICU to use URL?

Comment thread doc/api/url.md
@anonrig anonrig added commit-queue-rebase Add this label to allow the Commit Queue to land a PR in several commits. performance Issues and PRs related to the performance of Node.js. request-ci Add this label to start a Jenkins CI on a PR. labels Mar 31, 2023
@github-actions github-actions Bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Mar 31, 2023
@nodejs-github-bot

Copy link
Copy Markdown
Collaborator

@anonrig

anonrig commented Mar 31, 2023

Copy link
Copy Markdown
Member Author

@joyeecheung parallel.test-shadow-realm-gc is failing. Any idea why?

cc @legendecas It seems its related to: #46809

Comment thread lib/internal/url.js Outdated
Comment thread lib/internal/url.js Outdated
Comment thread lib/internal/url.js Outdated
Comment thread src/node_url.cc
Comment thread lib/internal/url.js
@joyeecheung

joyeecheung commented Mar 31, 2023

Copy link
Copy Markdown
Member

parallel.test-shadow-realm-gc is failing. Any idea why?

My guess is by adding a new binding data to the context this tips the memory needed by creating a new shadow realm over what's assumed by the test. You can try logging out process.memoryUsage() to see how much more memory is used after the change. If that's the case you can try bumping the --max-old-space-size in that test a bit (but not too much to let a leak get away)

@anonrig

anonrig commented Mar 31, 2023

Copy link
Copy Markdown
Member Author

My guess is by adding a new binding data to the context this tips the memory needed by creating a new shadow realm over what's assumed by the test. You can try logging out process.memoryUsage() to see how much more memory is used after the change. If that's the case you can try bumping the --max-old-space-size in that test a bit (but not too much to let a leak get away)

If I understood the concept of shadow realm, this particular test is creating 1000 UInt32Array (each realm creates 1 due to binding values in node_url.cc), since every realm is isolated. I'm not quite sure what the correct value is for --max-old-space-size and how to optimize the value for the concerns you've stated. For example even --max-old-space-size=500 does not fix the problem for me.

Updating to --max-old-space-size=1000 fixed, but this particular test takes 14 seconds at the moment.

➜  node git:(update-ada-v2) ✗ time tools/test.py test/parallel/test-shadow-realm-gc.js
[00:08|% 100|+   1|-   0]: Done

All tests passed.
tools/test.py test/parallel/test-shadow-realm-gc.js  14.63s user 0.78s system 177% cpu 8.683 total

@anonrig anonrig force-pushed the update-ada-v2 branch 3 times, most recently from 2cd305b to 2a2ed23 Compare March 31, 2023 19:18
@bricss

bricss commented Mar 31, 2023

Copy link
Copy Markdown
Contributor

Please somebody tell me that the next step 🥾 would be implementation of URLPattern API? 🧐

@RafaelGSS

Copy link
Copy Markdown
Member

The https://api.github.com/repos/nodejs/node/labels/notable-change label has been added by @anonrig.

Please suggest a text for the release notes if you'd like to include a more detailed summary, then proceed to update the PR description with the text or a link to the notable change suggested text comment.

Can you please address this bot request in your PR description?

@anonrig

anonrig commented Apr 9, 2023

Copy link
Copy Markdown
Member Author

@RafaelGSS ICU requirement for URL parsing have been removed while providing up to 100% faster performance for certain operations. Starting from version 20, Node.js supports valid URL hostname parsing for builds without ICU.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

author ready PRs that have at least one approval, no pending requests for changes, and a CI started. commit-queue-rebase Add this label to allow the Commit Queue to land a PR in several commits. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. notable-change PRs with changes that should be highlighted in changelogs. performance Issues and PRs related to the performance of Node.js. url Issues and PRs related to the legacy built-in url module. whatwg-url Issues and PRs related to the WHATWG URL implementation.

Projects

None yet

Development

Successfully merging this pull request may close these issues.