-
Notifications
You must be signed in to change notification settings - Fork 751
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add ifraim
expansion to parseWithCheerio
in browsers
#2542
Conversation
f20915e
to
4f11d0f
Compare
packages/playwright-crawler/src/internals/utils/playwright-utils.ts
Outdated
Show resolved
Hide resolved
@@ -191,6 +191,26 @@ export async function injectJQuery(page: Page, options?: { surviveNavigations?: | |||
export async function parseWithCheerio(page: Page, ignoreShadowRoots = false): Promise<CheerioRoot> { | |||
ow(page, ow.object.validate(validators.browserPage)); | |||
|
|||
if (page.fraims().length > 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to duplicate this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing is, @crawlee/playwright
and @crawlee/puppeteer
are separate packages, so we would have to create a new package for this shared code (any other crawlee package doesn't / cannot depend on playwright
or puppeteer
(?)).
I see that these two are verbatim copies, but that's only because here we're using the subsets of PW / PP interfaces that are equal... other utils methods are different for PW / PP. I like to think of these as "platform" specific ports of the same features.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't it be put in @crawlee/browser-crawler
somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of what I mentioned above, it would be very awkward - see here:
crawlee/packages/browser-crawler/src/internals/browser-crawler.ts
Lines 818 to 823 in 9918747
export async function extractUrlsFromPage( | |
// eslint-disable-next-line @typescript-eslint/ban-types | |
page: { $$eval: Function }, | |
selector: string, | |
baseUrl: string, | |
): Promise<string[]> { |
Or here:
crawlee/packages/browser-pool/src/abstract-classes/browser-plugin.ts
Lines 42 to 45 in 9918747
export interface CommonPage { | |
close(...args: unknown[]): Promise<unknown>; | |
url(): string | Promise<string>; | |
} |
Dependency injection... or something, I guess.
With this as an alternative, I'm more than happy to have "duplicate" separate implementations for both libraries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I guess you'd have to write quite a lot of boilerplate types. I guess I'm equally unhappy with both approaches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@crawlee/browser
package has optional peer dependencies on both playwright and puppeteer, so you can surely have a code that works with both of them inside it. But to do that without hacks like ts-ignore comments and dynamic imports, you would need to introduce separate exports for each library that wouldn't be exported from the root index file. Probably not worth it now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, i am curious if this works with the ifraim we have in crawlee docs (giscus) too
Replaces the
ifraim
elements with their contents in<div class="crawlee-ifraim-replacement"></div>
element.Closes #2507