Collect emails of your LinkedIn network with Python & Selenium (2024)

Rachid J

How to Setup Selenium with ChromeDriver on Ubuntu 18.04 & 16.04 - TecAdmin

This tutorial will help you to setup Selenium with ChromeDriver on Ubuntu, and LinuxMint systems. This tutorial also…

tecadmin.net

So after installing WebDriver Chrome, we will create an instance, open a browser, and maximize the window

browser = webdriver.Chrome()
browser.get(“https://www.linkedin.com")
browser.maximize_window()

Go to the LinkedIn page, click right on the page and click on “Inspect”. Click right on the area of the email, and you’ll see the <input> tag with id=”login-email”.

email_element = browser.find_element_by_id("login-email")
email_element.send_keys(args.email)

send_keys(args.email) is the way to enter the email grasped via argparse. We do the same with the password.

pass_element = browser.find_element_by_id("login-password")
pass_element.send_keys(args.password)pass_element.submit()

And then we submit.

print ("success! Logged in, Bot starting")
browser.implicitly_wait(3)

We display a message if the connection to linkedin is okay. “browser.implicitly_wait(3)” will tell the browser to wait 3 seconds before to continue, this allows the page to load and thus have a complete DOM (The Document Object Model).

As long as we are logged in, we can go anywhere in linkedin without to re-enter our credentials, so let’s go directly to our network page.

browser.get("https://www.linkedin.com/mynetwork/invite-connect/connections/")

Afterwards, it gets complicated (a quick look in stackoverflow was necessary ;-))…

Indeed, if we start collecting data now, we will only have 32 contacts. Why? Because Linkedin — and lot of websites — use the “Lazy loading content”, and you understood, for a scrapper, it is a difficulty in addition to overcome.

We have to tell the program to scroll down as long as it can do it. The DOM updates itself, and we will be able to recover all the necessary information. For that, we’ll need a While loop:

total_height = browser.execute_script("return document.body.scrollHeight")while True:
 # Scroll down to bottom
 browser.execute_script("window.scrollTo(0, document.body.scrollHeight);") # Wait to load page
 time.sleep(random.uniform(2.5, 4.9)) # Calculate new scroll height and compare with total scroll height
 new_height = browser.execute_script("return document.body.scrollHeight")
 if new_height == total_height:
 break
 last_height = new_height

With “execute_script(“return document.body.scrollHeight”)” we retrieve the height of the page, we scroll down until this height, and the page will reload, and a new height page will be generated, and so on, until the page cannot no longer load.
We use “time.sleep(random.uniform(2.5, 4.9))” to simulate a human behavior between each scroll.

My approach was : I don’t want to go from the “mynetwork” page to the profile page of one of my contacts and come back to the network page, each time the program finds a contact.

The goal is to recover the emails, and they are not accessible directly from the network page, it is necessary to go in each profile of our contacts to scrap the email. So I will save in a list all links that will redirect me to the profile of each connection.

page = bs(browser.page_source, features="html.parser")
content = page.find_all('a', {'class':"mn-connection-card__link ember-view"})

Here, I used BeautifulSoup, but we can do without. My knowledge of Selenium has improved after this bot.
All links are recorded under the class “mn-connection-card__link ember-view” of <a> Tag.

mynetwork = []
for contact in content:
 mynetwork.append(contact.get('href'))print(len(mynetwork), " connections")

The “content” is a list of all <a> tag with class=”mn-connection-card__link ember-view”. So with a for loop, I parse all the content and append to my list , the link of the connection profile.

# example of link I scrap:
# "/in/towards-data-science-online-publication-41b94a135/"

And finally, this last step will allow us to collect the emails:

my_network_emails = []# Connect to the profile of all contacts and save the email within a list
for contact in mynetwork:
 browser.get("https://www.linkedin.com" + contact + "detail/contact-info/")
 browser.implicitly_wait(3)
 contact_page = bs(browser.page_source, features="html.parser")
 content_contact_page = contact_page.find_all('a',href=re.compile("mailto"))
 for contact in content_contact_page:
 print("[+]", contact.get('href')[7:])
 my_network_emails.append(contact.get('href')[7:])
 # wait few seconds before to connect to the next profile
 time.sleep(random.uniform(0.5, 1.9))

So I iterate through my network list, and I visit their profile and click (indirectly) on the link that gives me their contact info :

browser.get("https://www.linkedin.com" + contact + "detail/contact-info/")# example of a complete link:
https://www.linkedin.com/in/towards-data-science-online-publication-41b94a135/detail/contact-info/

Collect emails of your LinkedIn network with Python & Selenium (3)

I save the DOM under the variable contact_page, and I find all <a> tag.

With a regular expression, I collect only the <a> tag with the href=’mailto’

content_contact_page = contact_page.find_all('a',href=re.compile("mailto"))
for contact in content_contact_page:
 print("[+]", contact.get('href')[7:])
 my_network_emails.append(contact.get('href')[7:])

I parse the “content_contact_page” and I retrieve the text beginning at the index 7, because the href is like : mailto:johndoe@gmail.com, and I don’t want the “mailto”, only the email, the Graal…

I append this email within the list “my_network_emails” and I iterate again this instructions for the other profiles, and I save all these emails on a CSV file with the below script:

with open(f'network_emails.csv', 'w') as f:
 writer = csv.writer(f)
 for email in my_network_emails:
 writer.writerow([email])

I tried with a Linkedin account containing 115 connections, to be sure that the “scroll down” works very good, and everything was good (very good ;-))

Thanks to Dhiraj Kadam who helped me to reach my goal with his article.

Go find out more about Selenium, I promise you, this tool will open new horizons.