This is an important part of an internal continuous project for sales leads prospecting. That is true there is abundant supply of information in the Internet universe, the challenge is to transform tons of raw data into manageable useful information for business development.

The part of data extraction from Internet originally was fulfilled by off-shelf software Atompark Email Hunter. Now we develop our own tool based on Python. This is more cost effective, but what more interesting to us is we can tailor-make the tool to our needs in terms of prospect profile and industrial relevance. Last but not least, we automate the running of tools on server based on the network situation.

We use Python 3 with famous modules including re, Beautifulsoup and Request.

Source: Web scraping to extract contact information— Part 1: Mailing Lists