Abdeladim Fadheli · 8 min read · Updated may 2022 · Web Scraping
Confused by complex code? Let our AI-powered Code Explainer demystify it for you. Try it out!
Google Trends is a website created by Google that analyzes the popularity of search queries on Google Search across almost every region, language, and category.
In this tutorial, you will learn how to extract Google Trends data using Pytrends, an unofficial library in Python, to extract almost everything available on the Google Trends website.
Here is the table of content:
- Getting Started
- Interest over Time
- Interest by Region
- Related Topics and Queries
- Trending Searches
- Conclusion
Getting Started
To get started, let's install the required dependencies:
$ pip install pytrends seaborn
We'll use Seaborn just for beautiful plots, nothing else:
from pytrends.request import TrendReqimport seaborn# for stylingseaborn.set_style("darkgrid")
To begin with pytrends, you have to create a TrendReq
object:
# initialize a new Google Trends Request Objectpt = TrendReq(hl="en-US", tz=360)
The hl
parameter is the host language for accessing Google Trends, and tz
is the timezone offset.
There are other parameters such as retries
indicating the number of retrials if the request fails or using proxies by passing a list to proxies
parameter.
Interest over Time
To get the relative number of searches of a list of keywords, we can use the interest_over_time()
method after building the payload:
# set the keyword & timeframept.build_payload(["Python", "Java"], timeframe="all")# get the interest over timeiot = pt.interest_over_time()iot
Output:
PythonJavaisPartialdate2004-01-01892False2004-02-018100False2004-03-01796False2004-04-01798False2004-05-01885False............2021-10-011411False2021-11-011411False2021-12-011311False2022-01-011310False2022-02-011511True218 rows × 3 columns
The values range from 0 (few or no searches) to 100 (maximum possible searches).
The build_payload()
method accepts several parameters besides the keyword list:
cat
: You can specify the category ID; if a search query can mean more than one meaning, setting the category will remove the confusion. You can check this page for a list of category IDs or simply callpytrends.categories()
method to retrieve them.geo
: The two-letter country abbreviation to get searches of a specific country, such asUS
,FR
,ES
,DZ
, etc. You can also get data for provinces by specifying additional abbreviations such as'GB-ENG'
or'US-AL'
.timeframe
: It is the time range of the data we want to extract,'all'
means all the data that is available on Google since the beginning, you can pass specific datetimes, or the minus patterns such as'today 6-m'
will return the latest six months data,'today 3-d'
will return the latest three days, and so on. The default of this parameter is'today 5-y'
meaning the last five years.
Let's plot the relative search difference between Python and Java over time:
# plot itiot.plot(figsize=(10, 6))
Output:
Alternatively, we can use the get_historical_interest()
method which grabs hourly data. However, that's not useful if you're seeking long-term trends. It's suitable for short periods:
# get hourly historical interestdata = pt.get_historical_interest( ["data science"], year_start=2022, month_start=1, day_start=1, hour_start=0, year_end=2022, month_end=2, day_end=10, hour_end=23,)data
We set the starting and ending date and time and retrieve the results. You can also pass cat
and geo
as mentioned earlier. Here is the output:
data scienceisPartialdate2022-01-01 00:00:0028False2022-01-01 01:00:0034False2022-01-01 02:00:0042False2022-01-01 03:00:0044False2022-01-01 04:00:0052False.........2022-02-10 19:00:0069False2022-02-10 20:00:0070False2022-02-10 21:00:0069False2022-02-10 22:00:0073False2022-02-10 23:00:0068False989 rows × 2 columns
If there's something quickly emerging, this method will definitely be helpful. Note that this method can cause Google to block your IP, as it grabs a lot of data if you specify an extended timeframe, so keep that in mind.
Interest by Region
Let's get the interest of a specific keyword by region:
# the keyword to extract datakw = "python"pt.build_payload([kw], timeframe="all")# get the interest by countryibr = pt.interest_by_region("COUNTRY", inc_low_vol=True, inc_geo_code=True)
We pass "COUNTRY"
to the interest_by_region()
method to get the interest by country. Other possible values are 'CITY'
for city-level data, 'DMA'
for Metro-level data, and 'REGION'
for region-level data.
We set inc_low_vol
to True
so we include the low search volume countries, we also set inc_geo_code
to True
to include the geocode of each country.
Let's sort the countries by interest in Python:
# sort the countries by interestibr[kw].sort_values(ascending=False)
Output:
geoNameBritish Indian Ocean Territory 100St. Helena 38China 25South Korea 25Singapore 22 ... Pitcairn Islands 0Guinea-Bissau 0São Tomé & Príncipe 0British Virgin Islands 0Svalbard & Jan Mayen 0Name: python, Length: 250, dtype: int64
You can also plot the top 10 if you wish, using ibr[kw].sort_values(ascending=False)[:10].plot.bar()
.
Another cool feature is to extract related topics of your keyword:
# get related topics of the keywordrt = pt.related_topics()rt[kw]["top"]
The related_topics()
method returns a Python dictionary of each keyword; this dictionary has two dataframes, one for rising
topics and one for overall top
topics. Below is the output:
valueformattedValuehasDatalinktopic_midtopic_titletopic_type0100100True/trends/explore?q=/m/05z1_&date=all/m/05z1_PythonProgramming language177True/trends/explore?q=/m/01dlmc&date=all/m/01dlmcListAbstract data type266True/trends/explore?q=/m/06x16&date=all/m/06x16StringComputer science366True/trends/explore?q=/m/020s1&date=all/m/020s1Computer fileTopic455True/trends/explore?q=/m/0cv6_m&date=all/m/0cv6_mPythonsSnake533True/trends/explore?q=/m/0nk18&date=all/m/0nk18Associative arrayTopic633True/trends/explore?q=/m/026sq&date=all/m/026sqDataTopic...2022True/trends/explore?q=/m/021plb&date=all/m/021plbNumPySoftware2122True/trends/explore?q=/m/016r48&date=all/m/016r48ObjectComputer science2222True/trends/explore?q=/m/0fpzzp&date=all/m/0fpzzpLinuxOperating system2311True/trends/explore?q=/m/0b750&date=all/m/0b750SubroutineTopic2411True/trends/explore?q=/m/02640pc&date=all/m/02640pcImportTopic
Or related search queries:
# get related queries to previous keywordrq = pt.related_queries()rq[kw]["top"]
Output:
queryvalue0python for1001python list972python file743python string734monty python445install python426python if417python function398python download349python windows3310python array3111dictionary python3012ball python3013pandas2914pandas python2915python tutorial2616python script2417python class2318python import2319numpy2220python set2221python programming2122python online2023python time1924python pdf19
Also, there is the suggestions(keyword)
method that returns the suggested search queries:
# get suggested searchespt.suggestions("python")
Output:
[{'mid': '/m/05z1_', 'title': 'Python', 'type': 'Programming language'}, {'mid': '/m/05tb5', 'title': 'Python family', 'type': 'Snake'}, {'mid': '/m/0cv6_m', 'title': 'Pythons', 'type': 'Snake'}, {'mid': '/m/01ny0v', 'title': 'Ball python', 'type': 'Reptiles'}, {'mid': '/m/02_2hl', 'title': 'Python', 'type': 'Film'}]
Here is another example:
# another example of suggested searchespt.suggestions("America")
Output:
[{'mid': '/m/09c7w0', 'title': 'United States', 'type': 'Country in North America'}, {'mid': '/m/01w6dw', 'title': 'American Express', 'type': 'Credit card service company'}, {'mid': '/m/06n3y', 'title': 'South America', 'type': 'Continent'}, {'mid': '/m/03lq2', 'title': 'Halloween', 'type': 'Celebration'}, {'mid': '/m/01yx7f', 'title': 'Bank of America', 'type': 'Financial services company'}]
Trending Searches
One more feature on Google trends is the ability to extract the current trending searches on each region:
# trending searches per regionts = pt.trending_searches(pn="united_kingdom")ts[:5]
Output:
0Championship1Super Bowl2Sheffield United3Kodak Black4Atletico Madrid
Another alternative is realtime_trending_searches()
:
# real-time trending searchespt.realtime_trending_searches()
Output:
titleentityNames0Jared Cannonier, Derek Brunson, Mixed martial ...[Jared Cannonier, Derek Brunson, Mixed martial...1Christian Nodal, Belinda[Christian Nodal, Belinda]2Vladimir Putin, Russia[Vladimir Putin, Russia]3River Radamus, Slalom skiing, Giant slalom, Wi...[River Radamus, Slalom skiing, Giant slalom, W...4California State University, Fullerton, Cal St...[California State University, Fullerton, Cal S............81Javier Bardem, Minority group, Desi Arnaz, Aar...[Javier Bardem, Minority group, Desi Arnaz, Aa...82Marvel Cinematic Universe, Thanos, Avengers: E...[Marvel Cinematic Universe, Thanos, Avengers: ...83Siena Saints, College basketball, Rider Broncs...[Siena Saints, College basketball, Rider Bronc...84Chicago Blackhawks, St. Louis Blues, National ...[Chicago Blackhawks, St. Louis Blues, National...85New York Islanders, Calgary Flames, National H...[New York Islanders, Calgary Flames, National ...86 rows × 2 columns
Conclusion
Alright, you now know how to conveniently extract Google Trends data using Python and with the help of the pytrends library. You can check the Pytrends Github repository for more detailed information on the methods we've used in this tutorial.
You can get the complete code here.
Learn also: How to Extract Wikipedia Data in Python
Happy extracting ♥
Just finished the article? Now, boost your next project with our Python Code Generator. Discover a faster, smarter way to code.
View Full Code Build My Python Code
Sharing is caring!
Read Also
Visit →
Visit →
Visit →
Comment panel
Got a coding query or need some guidance before you comment? Check out our Python Code Assistant for expert advice and handy tips. It's like having a coding tutor right in your fingertips!