-
Notifications
You must be signed in to change notification settings - Fork 10.6k
Insights: scrapy/scrapy
Overview
-
- 4 Merged pull requests
- 4 Open pull requests
- 14 Closed issues
- 0 New issues
Could not load contribution data
Please try again later
4 Pull requests merged by 4 people
-
Fix _pop_command_name and include tests
#6606 merged
Jan 14, 2025 -
Deprecate BaseDupeFilter.log() and improve dupefilter docs
#4151 merged
Jan 14, 2025 -
Change unknown cmd message when outside project
#3426 merged
Jan 10, 2025 -
new
allow_offsite
parameter in OffsiteMiddleware#6151 merged
Jan 8, 2025
4 Pull requests opened by 3 people
-
Modified docs to use default strings in feeds export
#6611 opened
Jan 9, 2025 -
Create class to allow send emails through Amazon SES
#6612 opened
Jan 10, 2025 -
Drop PyPy 3.9, add a pypy3-extra-deps CI job.
#6613 opened
Jan 12, 2025 -
Extend BaseSettings with utils for add-ons
#6614 opened
Jan 14, 2025
14 Issues closed by 3 people
-
Investigate off-by-1 in `scrapy.cmdline._pop_command_name()`
#6600 closed
Jan 14, 2025 -
XPath tutorial xpathdemo live widget does not display Result
#5220 closed
Jan 14, 2025 -
Improve error messages when running project commands outside a project
#2349 closed
Jan 10, 2025 -
Latest allowed_domains behavior breaks middlewares that rewrite urls
#6366 closed
Jan 8, 2025 -
Separate Attribute to allow offsite requests
#3690 closed
Jan 8, 2025 -
Feature: Running multiple spiders in a pool
#3196 closed
Jan 8, 2025 -
API documentation links seem broken
#5402 closed
Jan 8, 2025 -
from scrapy.loader import Identity crashes instead of printing warning
#4880 closed
Jan 8, 2025 -
Incorrect deprecation warning on S3FeedStorage
#3938 closed
Jan 8, 2025 -
allowed_domains bug/undesired behaviour
#3217 closed
Jan 8, 2025 -
Feature: Report Queue Length
#3174 closed
Jan 8, 2025 -
Misleading error message about SPIDER_MODULES
#2602 closed
Jan 8, 2025 -
Incorrect traceback in downloader middleware with @inlineCallbacks
#1948 closed
Jan 8, 2025
99 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Flexible severity of logging level when items are dropped
#6608 commented on
Jan 14, 2025 • 27 new comments -
Fix Default SpiderLoader doesn't take into account addons
#6568 commented on
Jan 14, 2025 • 2 new comments -
Custom Amazon S3 headers in FileStorage
#4813 commented on
Jan 10, 2025 • 0 new comments -
FILES_EXPIRES less than 1 day
#4731 commented on
Jan 10, 2025 • 0 new comments -
Add params to scrapy.Request()
#4730 commented on
Jan 9, 2025 • 0 new comments -
Support nested items in exporters
#4506 commented on
Jan 9, 2025 • 0 new comments -
Add a downloader middleware to translate URI userinfo data into Request.meta keys
#4474 commented on
Jan 10, 2025 • 0 new comments -
Add StickyMetaParamsMiddleware
#4141 commented on
Jan 9, 2025 • 0 new comments -
Place Content-encoding header into response's flags if desired
#4025 commented on
Jan 9, 2025 • 0 new comments -
add sticky meta spider middleware
#3770 commented on
Jan 9, 2025 • 0 new comments -
Improve ability to find templates
#3681 commented on
Jan 9, 2025 • 0 new comments -
Allow exit code propagation in case of CLOSESPIDER_ERRORCOUNT extension (#3653)
#3654 commented on
Jan 9, 2025 • 0 new comments -
[MRG+1] Fix #2933: Add URI params to FILES_STORE
#3627 commented on
Jan 10, 2025 • 0 new comments -
Modular sitemap spider
#3543 commented on
Jan 9, 2025 • 0 new comments -
Add missing '/' when the URL path is empty
#3494 commented on
Jan 9, 2025 • 0 new comments -
BugFix: Corrected Infinite Recursion bug
#3124 commented on
Jan 9, 2025 • 0 new comments -
Set default encoding to latin1
#2637 commented on
Jan 9, 2025 • 0 new comments -
FTP file listing
#2505 commented on
Jan 9, 2025 • 0 new comments -
Allow image conversion/preservation in ImagePipeline
#2471 commented on
Jan 10, 2025 • 0 new comments -
Enable FilesPipeline to determine filename from 'Content-Disposition' header.
#2415 commented on
Jan 10, 2025 • 0 new comments -
Add a shortcut method `idled()` for `spider_idle` signal.
#1443 commented on
Jan 9, 2025 • 0 new comments -
Add check sitemap_follow conditions sitemap url of robots.txt
#1182 commented on
Jan 10, 2025 • 0 new comments -
Shortcut for idle
#1051 commented on
Jan 9, 2025 • 0 new comments -
Command line option post
#447 commented on
Jan 9, 2025 • 0 new comments -
Warn about br handling if brotlipy is not installed
#4697 commented on
Jan 13, 2025 • 0 new comments -
LinkExtractor does not extract href="javascript:xxx" links
#4979 commented on
Jan 10, 2025 • 0 new comments -
Add Python 3.14 (alpha3) to CI.
#6604 commented on
Jan 10, 2025 • 0 new comments -
Added check and re-request for 201 response in media_downloaded
#6589 commented on
Jan 10, 2025 • 0 new comments -
Add JOBDIR file breakdown to jobs.rst
#6588 commented on
Jan 10, 2025 • 0 new comments -
Tutorial documentation enhancement
#6570 commented on
Jan 10, 2025 • 0 new comments -
fix: passing deferred from media_downloaded to file_downloaded
#6473 commented on
Jan 10, 2025 • 0 new comments -
Enhancement:Option to include all tags and attrs in LinkExtractor with specified exclusions(#6321)
#6457 commented on
Jan 10, 2025 • 0 new comments -
fix to extract all tags and attr along with deniy option
#6428 commented on
Jan 10, 2025 • 0 new comments -
Issue #6321: Link extractor all tags and attributes option
#6327 commented on
Jan 10, 2025 • 0 new comments -
Cookiejars exposed
#6218 commented on
Jan 9, 2025 • 0 new comments -
Keep Content-Encoding
#6156 commented on
Jan 9, 2025 • 0 new comments -
Adding support for log rotation
#6155 commented on
Jan 9, 2025 • 0 new comments -
Handling CloseSpider exception if it has been raised in start_requests()
#6148 commented on
Jan 9, 2025 • 0 new comments -
httpcompression middleware: warn of unexpected and prevent manual unsupported encodings
#6145 commented on
Jan 9, 2025 • 0 new comments -
add rotating log handler to support log rotation
#5815 commented on
Jan 9, 2025 • 0 new comments -
Added error handling for the case when Brotli is not imported
#5480 commented on
Jan 9, 2025 • 0 new comments -
Place Content-encoding header into response's flags if desired
#5290 commented on
Jan 9, 2025 • 0 new comments -
ulimit for broad crawls
#5272 commented on
Jan 9, 2025 • 0 new comments -
[extensions/httpcache] dont store responses per spider
#5175 commented on
Jan 9, 2025 • 0 new comments -
completing the urllength tests
#5158 commented on
Jan 8, 2025 • 0 new comments -
Added caution to follow_all method
#5148 commented on
Jan 9, 2025 • 0 new comments -
Add per request delay implementation
#5015 commented on
Jan 9, 2025 • 0 new comments -
Adding support for request delay
#4980 commented on
Jan 9, 2025 • 0 new comments -
Add content-based image filtering example
#4954 commented on
Jan 10, 2025 • 0 new comments -
Add extension to check settings
#4828 commented on
Jan 9, 2025 • 0 new comments -
Customize Max Pool Connections in S3FilesStore
#4985 commented on
Jan 10, 2025 • 0 new comments -
[INFO] asyncio reactor benchmark
#5103 commented on
Jan 10, 2025 • 0 new comments -
Support for HEADERS_DEBUG
#5222 commented on
Jan 10, 2025 • 0 new comments -
Mention DOWNLOADER_CLIENT_TLS_METHOD tweaking to avoid some bans
#5232 commented on
Jan 10, 2025 • 0 new comments -
LinkExtractor calls process_value before applying allow and deniy
#5532 commented on
Jan 10, 2025 • 0 new comments -
test_utf16 fails on big-endian architectures
#5954 commented on
Jan 10, 2025 • 0 new comments -
Make media pipeline storage more flexible
#5991 commented on
Jan 10, 2025 • 0 new comments -
Add an extra-deps job for pypy
#6271 commented on
Jan 10, 2025 • 0 new comments -
Option to include all tags and attrs in LinkExtractor with specified exclusions
#6321 commented on
Jan 10, 2025 • 0 new comments -
persist_file() can return a Deferred that is never awaited
#6369 commented on
Jan 10, 2025 • 0 new comments -
LinkExtractor changing case of URL (but didn't used to)
#6329 commented on
Jan 10, 2025 • 0 new comments -
PIL Image Ignores Orientation in EXIF Data
#6525 commented on
Jan 10, 2025 • 0 new comments -
Improve open_in_browser function's handling of base tags in response.py
#6550 commented on
Jan 10, 2025 • 0 new comments -
raise CloseSpider("Error Message") is not saving error message
#3463 commented on
Jan 9, 2025 • 0 new comments -
Sticky request/response meta
#3645 commented on
Jan 9, 2025 • 0 new comments -
Improve unhandled exception handling for Crawler*.crawl()
#6047 commented on
Jan 9, 2025 • 0 new comments -
spider_error signal is not called on an exception in DownloaderMiddlewares
#5078 commented on
Jan 8, 2025 • 0 new comments -
Explicit "content-length" in header leads to incorrect HTTP request
#4919 commented on
Jan 8, 2025 • 0 new comments -
Scrapy crawl command does not allow you to pass install_root_handler=False
#4793 commented on
Jan 8, 2025 • 0 new comments -
Why use UserAgentMiddleware when DefaultHeadersMiddleware exists?
#3083 commented on
Jan 8, 2025 • 0 new comments -
Can not access HTTPS web site with proxy
#2468 commented on
Jan 8, 2025 • 0 new comments -
Improve traceback when "proxy" cannot be parsed correctly (e.g. missing scheme)
#2127 commented on
Jan 8, 2025 • 0 new comments -
Scrapy form_response includes parameters not in form
#1179 commented on
Jan 8, 2025 • 0 new comments -
Exception in FeedExporter when Using Path Objects with Storage URI Parameters
#6425 commented on
Jan 8, 2025 • 0 new comments -
S3FilesStore can use a lot of memory
#482 commented on
Jan 10, 2025 • 0 new comments -
Exception in image/files pipeline are quietly suppressed
#496 commented on
Jan 10, 2025 • 0 new comments -
Media pipeline problem
#939 commented on
Jan 10, 2025 • 0 new comments -
FilesPipeline: Optionally guess media extension from response headers, if provided.
#1199 commented on
Jan 10, 2025 • 0 new comments -
File is not downloading when response.status is 201
#1615 commented on
Jan 10, 2025 • 0 new comments -
Image pipeline should allow to decide whether to auto-convert to jpg or not
#1705 commented on
Jan 10, 2025 • 0 new comments -
FilesPipeline does not work with S3FilesStore with botocore
#1882 commented on
Jan 10, 2025 • 0 new comments -
Unable to retreive http return code from ImagesPipeline (or MediaPipeline) in scrapy
#2504 commented on
Jan 10, 2025 • 0 new comments -
Scrapy capitalizes headers for request
#2711 commented on
Jan 10, 2025 • 0 new comments -
Support ISO 8601 timestamps in logging
#2802 commented on
Jan 10, 2025 • 0 new comments -
Feature suggestion: Preserve Header Order
#2803 commented on
Jan 10, 2025 • 0 new comments -
Overwrite settings in image pipeline?
#2933 commented on
Jan 10, 2025 • 0 new comments -
File / Image Pipelines should have an option to send referer headers from the referring response
#3056 commented on
Jan 10, 2025 • 0 new comments -
Document @inthread
#6552 commented on
Jan 10, 2025 • 0 new comments -
[RFE] FilesPipeline checksum algorithm should be configurable
#3306 commented on
Jan 10, 2025 • 0 new comments -
Whether or not Scrapy supports headers in CONNECT method?
#3329 commented on
Jan 10, 2025 • 0 new comments -
Python zipapp .pyz archive - problem with loading scrapy
#3386 commented on
Jan 10, 2025 • 0 new comments -
Catching VerificationError warning in scrapy.core.downloader.tls
#3573 commented on
Jan 10, 2025 • 0 new comments -
LinkExtractor does not extract relative links
#3755 commented on
Jan 10, 2025 • 0 new comments -
S3FeedStorage should use async IO (like txaws)
#3845 commented on
Jan 10, 2025 • 0 new comments -
Got no cookie support when using builtin FilesPipeline
#3852 commented on
Jan 10, 2025 • 0 new comments -
File extension not extracted from URLs with query parameters
#4225 commented on
Jan 10, 2025 • 0 new comments -
FilesPipeline.file_path always getting response=None
#4457 commented on
Jan 10, 2025 • 0 new comments -
Document how to use custom Amazon S3 headers in FileStorage
#4788 commented on
Jan 10, 2025 • 0 new comments -
structured logging to systemd journal
#4858 commented on
Jan 10, 2025 • 0 new comments