Skip to content

Commit f0fc41f

Browse files
authored
ref(core): Improve URL parsing utilities (#15882)
The bottleneck of many of the tasks written down in our Node SDK performance improvement task #15861 is `parseUrl`, defined here: https://github.com/getsentry/sentry-javascript/blob/3d63621714b31c1ea4c2ab2d90d5684a36805a43/packages/core/src/utils-hoist/url.ts#L17 We created #15767 to track removal of `parseUrl`. See more details in the GH issue. While working on tasks for #15767, I initially PR'd #15768, which introduced a `parseStringToURL` method as a replacement for `parseUrl`. While trying to implement that method though I realized `parseStringToURL` has a lot of downsides. This PR removes `parseStringToURL` in favor of a new `parseStringToURLObject`. Given `parseStringToURL` was never exported by the core SDK, this is not a breaking change. To understand `parseStringToURLObject`, let's first look at it's method signature: ```ts function parseStringToURLObject(url: string, urlBase?: string | URL | undefined): URLObject | undefined ``` `parseStringToURLObject` takes a string URL and turns it into a `URLObject`, that is typed like so: ```ts type RelativeURL = { isRelative: true; pathname: URL['pathname']; search: URL['search']; hash: URL['hash']; }; type URLObject = RelativeURL | URL; ``` the JavaScript URL built-in does not handle relative URLs, so we need to use a separate object to to track relative URLs. Luckily it's pretty small surface area, we only need to worry about `pathname`, `search`, and `hash`. For ease of usage of this function, I also introduced a `isURLObjectRelative` helper. This will make sure that users can handle both relative and absolute URLs with ease. Given `packages/core/src/fetch.ts` was using `parseStringToURL`, I refactored that file to use `parseStringToURLObject`. The change feels way better to me, much easier to read and understand what is going on.
1 parent 147989b commit f0fc41f

File tree

6 files changed

+216
-45
lines changed

6 files changed

+216
-45
lines changed

.size-limit.js

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -219,7 +219,7 @@ module.exports = [
219219
import: createImport('init'),
220220
ignore: ['$app/stores'],
221221
gzip: true,
222-
limit: '38 KB',
222+
limit: '38.5 KB',
223223
},
224224
// Node SDK (ESM)
225225
{

dev-packages/e2e-tests/test-applications/nextjs-13/tests/client/fetch.test.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ test('should correctly instrument `fetch` for performance tracing', async ({ pag
4343
'sentry.op': 'http.client',
4444
'sentry.origin': 'auto.http.browser',
4545
},
46-
description: 'GET https://example.com',
46+
description: 'GET https://example.com/',
4747
op: 'http.client',
4848
parent_span_id: expect.stringMatching(/[a-f0-9]{16}/),
4949
span_id: expect.stringMatching(/[a-f0-9]{16}/),

packages/core/src/fetch.ts

Lines changed: 43 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,11 @@
1-
/* eslint-disable complexity */
21
import { getClient } from './currentScopes';
32
import { SEMANTIC_ATTRIBUTE_SENTRY_OP, SEMANTIC_ATTRIBUTE_SENTRY_ORIGIN } from './semanticAttributes';
43
import { SPAN_STATUS_ERROR, setHttpStatus, startInactiveSpan } from './tracing';
54
import { SentryNonRecordingSpan } from './tracing/sentryNonRecordingSpan';
6-
import type { FetchBreadcrumbHint, HandlerDataFetch, Span, SpanOrigin } from './types-hoist';
5+
import type { FetchBreadcrumbHint, HandlerDataFetch, Span, SpanAttributes, SpanOrigin } from './types-hoist';
76
import { SENTRY_BAGGAGE_KEY_PREFIX } from './utils-hoist/baggage';
87
import { isInstanceOf } from './utils-hoist/is';
9-
import { parseStringToURL, stripUrlQueryAndFragment } from './utils-hoist/url';
8+
import { getSanitizedUrlStringFromUrlObject, isURLObjectRelative, parseStringToURLObject } from './utils-hoist/url';
109
import { hasSpansEnabled } from './utils/hasSpansEnabled';
1110
import { getActiveSpan } from './utils/spanUtils';
1211
import { getTraceData } from './utils/traceData';
@@ -54,40 +53,11 @@ export function instrumentFetchRequest(
5453
return undefined;
5554
}
5655

57-
// Curious about `thismessage:/`? See: https://www.rfc-editor.org/rfc/rfc2557.html
58-
// > When the methods above do not yield an absolute URI, a base URL
59-
// > of "thismessage:/" MUST be employed. This base URL has been
60-
// > defined for the sole purpose of resolving relative references
61-
// > within a multipart/related structure when no other base URI is
62-
// > specified.
63-
//
64-
// We need to provide a base URL to `parseStringToURL` because the fetch API gives us a
65-
// relative URL sometimes.
66-
//
67-
// This is the only case where we need to provide a base URL to `parseStringToURL`
68-
// because the relative URL is not valid on its own.
69-
const parsedUrl = url.startsWith('/') ? parseStringToURL(url, 'thismessage:/') : parseStringToURL(url);
70-
const fullUrl = url.startsWith('/') ? undefined : parsedUrl?.href;
71-
7256
const hasParent = !!getActiveSpan();
7357

7458
const span =
7559
shouldCreateSpanResult && hasParent
76-
? startInactiveSpan({
77-
name: `${method} ${stripUrlQueryAndFragment(url)}`,
78-
attributes: {
79-
url,
80-
type: 'fetch',
81-
'http.method': method,
82-
'http.url': parsedUrl?.href,
83-
[SEMANTIC_ATTRIBUTE_SENTRY_ORIGIN]: spanOrigin,
84-
[SEMANTIC_ATTRIBUTE_SENTRY_OP]: 'http.client',
85-
...(fullUrl && { 'http.url': fullUrl }),
86-
...(fullUrl && parsedUrl?.host && { 'server.address': parsedUrl.host }),
87-
...(parsedUrl?.search && { 'http.query': parsedUrl.search }),
88-
...(parsedUrl?.hash && { 'http.fragment': parsedUrl.hash }),
89-
},
90-
})
60+
? startInactiveSpan(getSpanStartOptions(url, method, spanOrigin))
9161
: new SentryNonRecordingSpan();
9262

9363
handlerData.fetchData.__span = span.spanContext().spanId;
@@ -264,3 +234,43 @@ function isRequest(request: unknown): request is Request {
264234
function isHeaders(headers: unknown): headers is Headers {
265235
return typeof Headers !== 'undefined' && isInstanceOf(headers, Headers);
266236
}
237+
238+
function getSpanStartOptions(
239+
url: string,
240+
method: string,
241+
spanOrigin: SpanOrigin,
242+
): Parameters<typeof startInactiveSpan>[0] {
243+
const parsedUrl = parseStringToURLObject(url);
244+
return {
245+
name: parsedUrl ? `${method} ${getSanitizedUrlStringFromUrlObject(parsedUrl)}` : method,
246+
attributes: getFetchSpanAttributes(url, parsedUrl, method, spanOrigin),
247+
};
248+
}
249+
250+
function getFetchSpanAttributes(
251+
url: string,
252+
parsedUrl: ReturnType<typeof parseStringToURLObject>,
253+
method: string,
254+
spanOrigin: SpanOrigin,
255+
): SpanAttributes {
256+
const attributes: SpanAttributes = {
257+
url,
258+
type: 'fetch',
259+
'http.method': method,
260+
[SEMANTIC_ATTRIBUTE_SENTRY_ORIGIN]: spanOrigin,
261+
[SEMANTIC_ATTRIBUTE_SENTRY_OP]: 'http.client',
262+
};
263+
if (parsedUrl) {
264+
if (!isURLObjectRelative(parsedUrl)) {
265+
attributes['http.url'] = parsedUrl.href;
266+
attributes['server.address'] = parsedUrl.host;
267+
}
268+
if (parsedUrl.search) {
269+
attributes['http.query'] = parsedUrl.search;
270+
}
271+
if (parsedUrl.hash) {
272+
attributes['http.fragment'] = parsedUrl.hash;
273+
}
274+
}
275+
return attributes;
276+
}

packages/core/src/utils-hoist/index.ts

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -122,7 +122,14 @@ export {
122122
objectToBaggageHeader,
123123
} from './baggage';
124124

125-
export { getSanitizedUrlString, parseUrl, stripUrlQueryAndFragment } from './url';
125+
export {
126+
getSanitizedUrlString,
127+
parseUrl,
128+
stripUrlQueryAndFragment,
129+
parseStringToURLObject,
130+
isURLObjectRelative,
131+
getSanitizedUrlStringFromUrlObject,
132+
} from './url';
126133
export { eventFromMessage, eventFromUnknownInput, exceptionFromError, parseStackFrames } from './eventbuilder';
127134
export { callFrameToStackFrame, watchdogTimer } from './anr';
128135
export { LRUMap } from './lru';

packages/core/src/utils-hoist/url.ts

Lines changed: 75 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,13 +11,50 @@ interface URLwithCanParse extends URL {
1111
canParse: (url: string, base?: string | URL | undefined) => boolean;
1212
}
1313

14+
// A subset of the URL object that is valid for relative URLs
15+
// The URL object cannot handle relative URLs, so we need to handle them separately
16+
type RelativeURL = {
17+
isRelative: true;
18+
pathname: URL['pathname'];
19+
search: URL['search'];
20+
hash: URL['hash'];
21+
};
22+
23+
type URLObject = RelativeURL | URL;
24+
25+
// Curious about `thismessage:/`? See: https://www.rfc-editor.org/rfc/rfc2557.html
26+
// > When the methods above do not yield an absolute URI, a base URL
27+
// > of "thismessage:/" MUST be employed. This base URL has been
28+
// > defined for the sole purpose of resolving relative references
29+
// > within a multipart/related structure when no other base URI is
30+
// > specified.
31+
//
32+
// We need to provide a base URL to `parseStringToURLObject` because the fetch API gives us a
33+
// relative URL sometimes.
34+
//
35+
// This is the only case where we need to provide a base URL to `parseStringToURLObject`
36+
// because the relative URL is not valid on its own.
37+
const DEFAULT_BASE_URL = 'thismessage:/';
38+
39+
/**
40+
* Checks if the URL object is relative
41+
*
42+
* @param url - The URL object to check
43+
* @returns True if the URL object is relative, false otherwise
44+
*/
45+
export function isURLObjectRelative(url: URLObject): url is RelativeURL {
46+
return 'isRelative' in url;
47+
}
48+
1449
/**
1550
* Parses string to a URL object
1651
*
1752
* @param url - The URL to parse
1853
* @returns The parsed URL object or undefined if the URL is invalid
1954
*/
20-
export function parseStringToURL(url: string, base?: string | URL | undefined): URL | undefined {
55+
export function parseStringToURLObject(url: string, urlBase?: string | URL | undefined): URLObject | undefined {
56+
const isRelative = url.startsWith('/');
57+
const base = urlBase ?? (isRelative ? DEFAULT_BASE_URL : undefined);
2158
try {
2259
// Use `canParse` to short-circuit the URL constructor if it's not a valid URL
2360
// This is faster than trying to construct the URL and catching the error
@@ -26,14 +63,50 @@ export function parseStringToURL(url: string, base?: string | URL | undefined):
2663
return undefined;
2764
}
2865

29-
return new URL(url, base);
66+
const fullUrlObject = new URL(url, base);
67+
if (isRelative) {
68+
// Because we used a fake base URL, we need to return a relative URL object.
69+
// We cannot return anything about the origin, host, etc. because it will refer to the fake base URL.
70+
return {
71+
isRelative,
72+
pathname: fullUrlObject.pathname,
73+
search: fullUrlObject.search,
74+
hash: fullUrlObject.hash,
75+
};
76+
}
77+
return fullUrlObject;
3078
} catch {
3179
// empty body
3280
}
3381

3482
return undefined;
3583
}
3684

85+
/**
86+
* Takes a URL object and returns a sanitized string which is safe to use as span name
87+
* see: https://develop.sentry.dev/sdk/data-handling/#structuring-data
88+
*/
89+
export function getSanitizedUrlStringFromUrlObject(url: URLObject): string {
90+
if (isURLObjectRelative(url)) {
91+
return url.pathname;
92+
}
93+
94+
const newUrl = new URL(url);
95+
newUrl.search = '';
96+
newUrl.hash = '';
97+
if (['80', '443'].includes(newUrl.port)) {
98+
newUrl.port = '';
99+
}
100+
if (newUrl.password) {
101+
newUrl.password = '%filtered%';
102+
}
103+
if (newUrl.username) {
104+
newUrl.username = '%filtered%';
105+
}
106+
107+
return newUrl.toString();
108+
}
109+
37110
/**
38111
* Parses string form of URL into an object
39112
* // borrowed from https://tools.ietf.org/html/rfc3986#appendix-B

packages/core/test/utils-hoist/url.test.ts

Lines changed: 88 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,12 @@
11
import { describe, expect, it } from 'vitest';
2-
import { getSanitizedUrlString, parseStringToURL, parseUrl, stripUrlQueryAndFragment } from '../../src/utils-hoist/url';
2+
import {
3+
getSanitizedUrlString,
4+
parseUrl,
5+
stripUrlQueryAndFragment,
6+
parseStringToURLObject,
7+
isURLObjectRelative,
8+
getSanitizedUrlStringFromUrlObject,
9+
} from '../../src/utils-hoist/url';
310

411
describe('stripQueryStringAndFragment', () => {
512
const urlString = 'http://dogs.are.great:1231/yay/';
@@ -62,8 +69,6 @@ describe('getSanitizedUrlString', () => {
6269
'https://[filtered]:[filtered]@somedomain.com',
6370
],
6471
['same-origin url', '/api/v4/users?id=123', '/api/v4/users'],
65-
['url without a protocol', 'example.com', 'example.com'],
66-
['url without a protocol with a path', 'example.com/sub/path?id=123', 'example.com/sub/path'],
6772
['url with port 8080', 'http://172.31.12.144:8080/test', 'http://172.31.12.144:8080/test'],
6873
['url with port 4433', 'http://172.31.12.144:4433/test', 'http://172.31.12.144:4433/test'],
6974
['url with port 443', 'http://172.31.12.144:443/test', 'http://172.31.12.144/test'],
@@ -197,19 +202,95 @@ describe('parseUrl', () => {
197202
});
198203
});
199204

200-
describe('parseStringToURL', () => {
205+
describe('parseStringToURLObject', () => {
201206
it('returns undefined for invalid URLs', () => {
202-
expect(parseStringToURL('invalid-url')).toBeUndefined();
207+
expect(parseStringToURLObject('invalid-url')).toBeUndefined();
203208
});
204209

205210
it('returns a URL object for valid URLs', () => {
206-
expect(parseStringToURL('https://somedomain.com')).toBeInstanceOf(URL);
211+
expect(parseStringToURLObject('https://somedomain.com')).toBeInstanceOf(URL);
212+
});
213+
214+
it('returns a URL object for valid URLs with a base URL', () => {
215+
expect(parseStringToURLObject('https://somedomain.com', 'https://base.com')).toBeInstanceOf(URL);
216+
});
217+
218+
it('returns a relative URL object for relative URLs', () => {
219+
expect(parseStringToURLObject('/path/to/happiness')).toEqual({
220+
isRelative: true,
221+
pathname: '/path/to/happiness',
222+
search: '',
223+
hash: '',
224+
});
207225
});
208226

209227
it('does not throw an error if URl.canParse is not defined', () => {
210228
const canParse = (URL as any).canParse;
211229
delete (URL as any).canParse;
212-
expect(parseStringToURL('https://somedomain.com')).toBeInstanceOf(URL);
230+
expect(parseStringToURLObject('https://somedomain.com')).toBeInstanceOf(URL);
213231
(URL as any).canParse = canParse;
214232
});
215233
});
234+
235+
describe('isURLObjectRelative', () => {
236+
it('returns true for relative URLs', () => {
237+
expect(isURLObjectRelative(parseStringToURLObject('/path/to/happiness')!)).toBe(true);
238+
});
239+
240+
it('returns false for absolute URLs', () => {
241+
expect(isURLObjectRelative(parseStringToURLObject('https://somedomain.com')!)).toBe(false);
242+
});
243+
});
244+
245+
describe('getSanitizedUrlStringFromUrlObject', () => {
246+
it.each([
247+
['regular url', 'https://somedomain.com', 'https://somedomain.com/'],
248+
['regular url with a path', 'https://somedomain.com/path/to/happiness', 'https://somedomain.com/path/to/happiness'],
249+
[
250+
'url with standard http port 80',
251+
'http://somedomain.com:80/path/to/happiness',
252+
'http://somedomain.com/path/to/happiness',
253+
],
254+
[
255+
'url with standard https port 443',
256+
'https://somedomain.com:443/path/to/happiness',
257+
'https://somedomain.com/path/to/happiness',
258+
],
259+
[
260+
'url with non-standard port',
261+
'https://somedomain.com:4200/path/to/happiness',
262+
'https://somedomain.com:4200/path/to/happiness',
263+
],
264+
[
265+
'url with query params',
266+
'https://somedomain.com:4200/path/to/happiness?auhtToken=abc123&param2=bar',
267+
'https://somedomain.com:4200/path/to/happiness',
268+
],
269+
[
270+
'url with a fragment',
271+
'https://somedomain.com/path/to/happiness#somewildfragment123',
272+
'https://somedomain.com/path/to/happiness',
273+
],
274+
[
275+
'url with a fragment and query params',
276+
'https://somedomain.com/path/to/happiness#somewildfragment123?auhtToken=abc123&param2=bar',
277+
'https://somedomain.com/path/to/happiness',
278+
],
279+
[
280+
'url with authorization',
281+
'https://username:password@somedomain.com',
282+
'https://%filtered%:%filtered%@somedomain.com/',
283+
],
284+
['same-origin url', '/api/v4/users?id=123', '/api/v4/users'],
285+
['url with port 8080', 'http://172.31.12.144:8080/test', 'http://172.31.12.144:8080/test'],
286+
['url with port 4433', 'http://172.31.12.144:4433/test', 'http://172.31.12.144:4433/test'],
287+
['url with port 443', 'http://172.31.12.144:443/test', 'http://172.31.12.144/test'],
288+
['url with IP and port 80', 'http://172.31.12.144:80/test', 'http://172.31.12.144/test'],
289+
])('returns a sanitized URL for a %s', (_, rawUrl: string, sanitizedURL: string) => {
290+
const urlObject = parseStringToURLObject(rawUrl);
291+
if (!urlObject) {
292+
throw new Error('Invalid URL');
293+
}
294+
expect(getSanitizedUrlStringFromUrlObject(urlObject)).toEqual(sanitizedURL);
295+
});
296+
});

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy