Scraping a few pages with a couple of popular tools is a straightforward process, but scaling to millions of pages moves beyond writing good code into creating a robust distributed system that can ...