This content originally appeared on DEV Community and was authored by CoderHXL
x-crawl
x-crawl is a flexible Node.js multipurpose crawler library. Flexible usage and numerous functions can help you quickly, safely and stably crawl pages, interfaces and files.
If you also like x-crawl, you can give x-crawl repository a star to support it, thank you for your support!
GitHub: https://github.com/coder-hxl/x-crawl
🚨 Breaking Changes
- Fingerprint upgrade:
- The fingerprint of the advanced writing method is renamed to fingerprints, which is an array writing method, which stores objects of the DetailTargetFingerprintCommon type, which is convenient for customization. Internally, the objects inside will be randomly assigned to the target.
- Adjustment of crawlPage fingerprint options: the maximum width and height of the fingerprint configuration of advanced writing and detailed target writing are changed to optional.
- Proxy upgrade: create a crawler instance, change the proxy of the advanced writing method and the detailed target writing method to the object writing method, with three attributes: urls, switchByHttpStatus and switchByErrorCount, urls can set multiple proxy URLs, and the internal default uses the first one first, switchByHttpStatus Set which non-compliant response status codes need to switch the proxy, and switchByErrorCount sets how many times the proxy needs to be switched when errors such as timeouts arrive. The proxy rotation feature needs to be used with error retries.
- Return value type adjustment: CrawlCommonRes, CrawlPageSingleRes, CrawlDataSingleRes and CrawlFileSingleRes are renamed to CrawlCommonResult, CrawlPageSingleResult, CrawlDataSingleResult and CrawlFileSingleResult respectively
🚀 Features
- It is possible to cancel the configuration of the upper-level unified setting by setting null in the option.
- The userAgent option in DetailTargetFingerprintCommon overrides the object notation and allows customization of the maximum and minimum values of the major version, minor version, and revision number inside. Each crawl target gets a new userAgent .
- A new proxyDetails property is added to the crawling results to record the proxy status.
- Added 'random' attribute value to mobile option of fingerprint configuration, allowing internal randomization.
- Terminal prompts are simplified and color adjusted.
🐞 Bug fixes
- Unable to create multiple levels of non-existent folders on linux systems.
This content originally appeared on DEV Community and was authored by CoderHXL
CoderHXL | Sciencx (2023-04-28T03:01:13+00:00) The new version of x-crawl v7 has been released!. Retrieved from https://www.scien.cx/2023/04/28/the-new-version-of-x-crawl-v7-has-been-released/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.