Collect emails of your LinkedIn network with Python & Selenium (2024)

Collect emails of your LinkedIn network with Python & Selenium (2)

I was searching a way to “scrap” all emails of my network and I was falling on this Medium article: https://medium.com/@codedem/my-first-selenium-project-with-python-to-scrap-linkedin-connections-information-d187e0de1bc3 . Very interesting, but since with the redesign of Linkedin, this article was no longer relevant.

Selenium is, without a doubt, one of the best scraping tools I’ve seen since I started Python ! Selenium is designed for automated software testing by simulating user’s action on web pages.

First thing, the imports as usually:

import argparse, os, time
import random
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup as bs
import re
import csv

I use argparse to allow to grab after the program name, email and password

$ python myprogram.py [email] [password] 

to connect directly to Linkedin:

parser = argparse.ArgumentParser()
parser.add_argument(“email”, help=”linkedin email”)
parser.add_argument(“password”, help=”linkedin password”)
args = parser.parse_args()

Here we start using selenium, I’m working on Ubuntu 18.04 LTS, I installed Webdriver chrome.

Follow the below link until step 3, the rest will be done with Python:

So after installing WebDriver Chrome, we will create an instance, open a browser, and maximize the window

browser = webdriver.Chrome()
browser.get(“https://www.linkedin.com")
browser.maximize_window()

Go to the LinkedIn page, click right on the page and click on “Inspect”. Click right on the area of the email, and you’ll see the <input> tag with id=”login-email”.

email_element = browser.find_element_by_id("login-email")
email_element.send_keys(args.email)

send_keys(args.email) is the way to enter the email grasped via argparse. We do the same with the password.

pass_element = browser.find_element_by_id("login-password")
pass_element.send_keys(args.password)
pass_element.submit()

And then we submit.

print ("success! Logged in, Bot starting")
browser.implicitly_wait(3)

We display a message if the connection to linkedin is okay. “browser.implicitly_wait(3)” will tell the browser to wait 3 seconds before to continue, this allows the page to load and thus have a complete DOM (The Document Object Model).

As long as we are logged in, we can go anywhere in linkedin without to re-enter our credentials, so let’s go directly to our network page.

browser.get("https://www.linkedin.com/mynetwork/invite-connect/connections/")

Afterwards, it gets complicated (a quick look in stackoverflow was necessary ;-))…

Indeed, if we start collecting data now, we will only have 32 contacts. Why? Because Linkedin — and lot of websites — use the “Lazy loading content”, and you understood, for a scrapper, it is a difficulty in addition to overcome.

We have to tell the program to scroll down as long as it can do it. The DOM updates itself, and we will be able to recover all the necessary information. For that, we’ll need a While loop:

total_height = browser.execute_script("return document.body.scrollHeight")while True:
# Scroll down to bottom
browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# Wait to load page
time.sleep(random.uniform(2.5, 4.9))
# Calculate new scroll height and compare with total scroll height
new_height = browser.execute_script("return document.body.scrollHeight")
if new_height == total_height:
break
last_height = new_height

With “execute_script(“return document.body.scrollHeight”)” we retrieve the height of the page, we scroll down until this height, and the page will reload, and a new height page will be generated, and so on, until the page cannot no longer load.
We use “time.sleep(random.uniform(2.5, 4.9))” to simulate a human behavior between each scroll.

My approach was : I don’t want to go from the “mynetwork” page to the profile page of one of my contacts and come back to the network page, each time the program finds a contact.

The goal is to recover the emails, and they are not accessible directly from the network page, it is necessary to go in each profile of our contacts to scrap the email. So I will save in a list all links that will redirect me to the profile of each connection.

page = bs(browser.page_source, features="html.parser")
content = page.find_all('a', {'class':"mn-connection-card__link ember-view"})

Here, I used BeautifulSoup, but we can do without. My knowledge of Selenium has improved after this bot.
All links are recorded under the class “mn-connection-card__link ember-view” of <a> Tag.

mynetwork = []
for contact in content:
mynetwork.append(contact.get('href'))
print(len(mynetwork), " connections")

The “content” is a list of all <a> tag with class=”mn-connection-card__link ember-view”. So with a for loop, I parse all the content and append to my list , the link of the connection profile.

# example of link I scrap:
# "/in/towards-data-science-online-publication-41b94a135/"

And finally, this last step will allow us to collect the emails:

my_network_emails = []# Connect to the profile of all contacts and save the email within a list
for contact in mynetwork:
browser.get("https://www.linkedin.com" + contact + "detail/contact-info/")
browser.implicitly_wait(3)
contact_page = bs(browser.page_source, features="html.parser")
content_contact_page = contact_page.find_all('a',href=re.compile("mailto"))
for contact in content_contact_page:
print("[+]", contact.get('href')[7:])
my_network_emails.append(contact.get('href')[7:])
# wait few seconds before to connect to the next profile
time.sleep(random.uniform(0.5, 1.9))

So I iterate through my network list, and I visit their profile and click (indirectly) on the link that gives me their contact info :

browser.get("https://www.linkedin.com" + contact + "detail/contact-info/")# example of a complete link:
https://www.linkedin.com/in/towards-data-science-online-publication-41b94a135/detail/contact-info/
Collect emails of your LinkedIn network with Python & Selenium (3)

I save the DOM under the variable contact_page, and I find all <a> tag.

With a regular expression, I collect only the <a> tag with the href=’mailto’

content_contact_page = contact_page.find_all('a',href=re.compile("mailto"))
for contact in content_contact_page:
print("[+]", contact.get('href')[7:])
my_network_emails.append(contact.get('href')[7:])

I parse the “content_contact_page” and I retrieve the text beginning at the index 7, because the href is like : mailto:johndoe@gmail.com, and I don’t want the “mailto”, only the email, the Graal…

I append this email within the list “my_network_emails” and I iterate again this instructions for the other profiles, and I save all these emails on a CSV file with the below script:

with open(f'network_emails.csv', 'w') as f:
writer = csv.writer(f)
for email in my_network_emails:
writer.writerow([email])

I tried with a Linkedin account containing 115 connections, to be sure that the “scroll down” works very good, and everything was good (very good ;-))

Thanks to Dhiraj Kadam who helped me to reach my goal with his article.

Go find out more about Selenium, I promise you, this tool will open new horizons.

Collect emails of your LinkedIn network with Python & Selenium (2024)
Top Articles
Latest Posts
Article information

Author: Melvina Ondricka

Last Updated:

Views: 6080

Rating: 4.8 / 5 (68 voted)

Reviews: 83% of readers found this page helpful

Author information

Name: Melvina Ondricka

Birthday: 2000-12-23

Address: Suite 382 139 Shaniqua Locks, Paulaborough, UT 90498

Phone: +636383657021

Job: Dynamic Government Specialist

Hobby: Kite flying, Watching movies, Knitting, Model building, Reading, Wood carving, Paintball

Introduction: My name is Melvina Ondricka, I am a helpful, fancy, friendly, innocent, outstanding, courageous, thoughtful person who loves writing and wants to share my knowledge and understanding with you.