-
Notifications
You must be signed in to change notification settings - Fork 10.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ulimit for broad crawls #5272
base: master
Are you sure you want to change the base?
ulimit for broad crawls #5272
Conversation
issue scrapy#5259 Information / Documentation on Linux's ulimit: https://www.ibm.com/docs/cs/aix/7.1?topic=u-ulimit-command https://askubuntu.com/questions/162229/how-do-i-increase-the-open-files-limit-for-a-non-root-user
Codecov Report
@@ Coverage Diff @@
## master #5272 +/- ##
=======================================
Coverage 88.52% 88.52%
=======================================
Files 163 163
Lines 10605 10610 +5
Branches 1557 1557
=======================================
+ Hits 9388 9393 +5
Misses 942 942
Partials 275 275
|
Not sure if this should make it more clear that this is the ulimit value for open file descriptors (as the ulimit command can work with several different limits) or it's fine as is. Also maybe a short mention that |
We should probably also consider other supported systems. I think this can be an issue also in Windows and in macOS. |
It might be a bit lengthy for a note block at that point, not sure if that's fine or if I should just move it to a regular paragraph instead? |
I think a regular paragraph might be better, yes. However, I am not sure if we need to make the documentation longer, or at least not much longer. I think we could consider covering the issue and the solution without providing system-specific information, at let users find that out on their own, unless we can find upstream documentation we can link that will offer that information. So we could explain that systems usually have a limit on the number of open files per program and system-wide, why/when/how that can become a problem for Scrapy, and that systems usually allow increasing (or in some cases removing) those limits. The reasons I’m hesitant to include system information is mainly that providing accurate information for every system that supports Scrapy would be too verbose, in my opinion. We could instead provide links for the most common systems, but I’m not sure we will find good upstream links, specially for Linux where it is technically possible for a Linux system not to use PAM, which is what enforces |
Yeah, I agree we don't want too much technical specifics here... |
This should be worded more generally |
While performing a broad crawl with a high :setting:`CONCURRENT_REQUESTS` value, | ||
you may encounter OS specific errors regarding the number of currently open files. | ||
Some systems allow this limit to be increased or removed. If you cannot do so, | ||
you will need to reduce the :setting:`CONCURRENT_REQUESTS` value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn’t we move this to the documentation of the setting itself? This issue is not really specific of broad crawls, anyone touching that setting should be aware of this potential issue with higher values.
Fixes #5259
Information / Documentation on Linux's ulimit:
https://www.ibm.com/docs/cs/aix/7.1?topic=u-ulimit-command
https://askubuntu.com/questions/162229/how-do-i-increase-the-open-files-limit-for-a-non-root-user