How To Get Unlimited Tweets For Your Project Without Using Tweepy
One of the biggest challenges of mining real-time behavioural dataset is knowing how to get unlimited tweets for your project without using tweepy. Apps like twitter, facebook, Instagram are a great source of data.
These dataset can be used for different projects. Take twitter dataset for example, Some of the projects you can build with twitter data are:
A comparison the activities of two products with same functionality from different companies
Example are Bigquery, Amazon Redshift, Snowflake and Azure Synapses Analytics
Performing sentiment analysis of a product, company or personality
Create a trend of the most talked about universities and what the discussion is about.
Tweepy API
There are many libraries used for scraping twitter data. The most popular of them is tweepy. This requires that you create a developer account, which automatically gives you access to upto the last 7days tweet. If you want more you can apply for an elevated access or academic access.
In this article, I will talk about how you can use SNScrape to scrape as much data as you want without creating any account nor applying for more access to tweepy API.
SNScrape
SNScrape is a python scraper for social networking services, Yes, you read it right. SNScrape is a scraper used for scraping data from different social networking web apps. It can scrape data from Facebook, Instagram, Twitter, Weibo, Telegram and others.
We are interested in scraping twitter data. SNScrape can access the following information from twitter: users, user profiles, hashtags, searches, tweets, list posts and trends.
Why You Should Use SNScraper
The first reason you should use SNScraper is because there is no limit to the number of tweets you can retrieve.
Secondly, you do not need any authentication to pass before you can access data.
The third one is that you can easily use it to access the same company’s services on different social networks.
Do You Still Need Tweepy Since You Got SNScrape?
This simple answer is yes, however, that depends on the level of detailed information you want to access. Tweepy has so many functionalities that SNScrape doesn’t have. If you just want to scrape data points accessible via SNScrape, then it is enough,
How To Use SNScrape
First we need to install the library. I use a Windows laptop. To install on Windows
Code Snippet
Pip install snscrape
import snscrape.modules.twitter as snt import pandas as pd query = "UK Economy" tweets = [] limit = 5000 for tweet in snt.TwitterSearchScraper(query).get_items(): # print(vars(tweet)) # break if len(tweets) == limit: break else: tweets.append([tweet.date, tweet.user.username, tweet.likeCount, tweet.sourceLabel, tweet.content]) tweet_data = pd.DataFrame(tweets, columns=["Date", "User", "Number of likes", "Tweeted from", "Tweets"]) print(tweet_data.head()) tweet_data.to_csv('uk_economy.csv', index = None)
Code Explanation
Import snscrape.modules.twitter as snt this will import the twitter module from the snscrape library, then I named it snt because the name is long. Feel free to name it whatever makes sense to you
Import pandas as pd this will import pandas as pd. I essentially use it to convert the dataset to dataframe and then save it as comma separated values (csv)
Create a variable named “query” and choose any query you wish, in my case I chose “UK economy”. Then created an empty list named “tweets”. This will hold all the tweets. Finally, I created a variable to hold the limit of the tweets.
Next, I ran a for loop on snt.TwitterSearchScraper which will return the object of tweet from the search parameter. The limit set here is
This class has many other attributes of the tweet to scrape, I chose to get the ones that interest me. I use the vars() inside print() to get the different attributes available to this class. You can find everything you need in the SNScrape documentation
The Output Of The Scraper
Now when you run your code, depending on the attributes you want to get and your query parameter, you will get your desired result. Here are the first four rows of my result.
Conclusion
You have learnt how to use SNScrape to scrape data from twitter and the different attributes that come with the twitter class. Eventually you have seen how to get unlimited tweets for your project without using tweepy.
What Next Can I Do With The Dataset
Earlier in this article I enumerated different uses of social media data, espercially twitter dataset. When building a project, I have always advocated that you start from a problem statement to the solution.
Before you use the dataset, you should take the time to clean up your data, extract the necessary data points you want. I like using regular expressions and the power of numpy to get the right data. You read how to easily transform data using regex and pandas to learn more
Let me know in the comment session what you think of this library?
Happy mining.