Wats0ns

joined 1 year ago
[โ€“] Wats0ns@programming.dev 7 points 1 year ago (2 children)

Isn't that in purpose tho ? Like "hey if we're not sure to be able to break on time, just disengage so it's not our responsibility anymore"?

 
[โ€“] Wats0ns@programming.dev 2 points 1 year ago

Yep try scrapy. And also it handles for you the concurrency of your pipelines items, configuration for every part,...

[โ€“] Wats0ns@programming.dev 2 points 1 year ago (2 children)

The huge feature of scrapy is it's pipelining system: you scrape a page, pass it to the filtering part, then to the deduplication part, then to the DB and so on

Hugely useful when you're scraping and extraction data, I reckon if you're only extracting raw pages then it's less useful I guess