# Scweet **Repository Path**: hook-hawk-road/Scweet ## Basic Information - **Project Name**: Scweet - **Description**: twitter scweet - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 1 - **Created**: 2022-12-07 - **Last Updated**: 2023-04-20 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # A simple and unlimited Twitter scraper with python. Recently, Twitter has banned almost every Twitter scraper. This repository presents an alternative tool to scrape Twitter based on 3 functions: - [scrape](https://github.com/Altimis/Scweet/blob/master/Scweet/scweet.py): Scrapes all the information regarding tweets between two given dates, for a given language and list of words or account name, in the form of a csv file containing retrieved data (more storage methods will be added). - [get_user_information](https://github.com/Altimis/Scweet/blob/master/Scweet/user.py): Scrapes users information, incluing number of following and followers, location and description. - [get_users_followers and get_users_following](https://github.com/Altimis/Scweet/blob/master/Scweet/user.py): Scrapes followers and following accounts for a given list of users. It is also possible to download the images showed in tweets by passing the argument `save_images = True`. If you only want to scrape images, it is recommended to set the argument `display_type = image` to show only tweets that contain images. Authentication is required for scraping followers/following. It is recommended to log in with a new account, otherwise the account could be banned if the list of followers is very long. To log in to your account, you need to enter your username `SCWEET_USERNAME` and password `SCWEET_PASSWORD` in the [.env](https://github.com/Altimis/Scweet/blob/master/.env) file. You can control the `wait` parameter in the `get_users_followers` and `get_users_following` functions according to you internet speed. ## Requirements : `pip install -r requirements.txt` Note : You must have Chrome installed on your system. ## Results : ### Tweets : The CSV file contains the following features (for each tweet) : - 'UserScreenName' : - 'UserName' : UserName - 'Timestamp' : timestamp of the tweet - 'Text' : tweet text - 'Embedded_text' : embedded text written above the tweet. This can be an image, a video or even another tweet if the tweet in question is a reply - 'Emojis' : emojis in the tweet - 'Comments' : number of comments - 'Likes' : number of likes - 'Retweets' : number of retweets - 'Image link' : link of the image in the tweet - 'Tweet URL' : tweet URL ### Following / Followers : The `get_users_following` and `get_users_followers` in [user](https://github.com/Altimis/Scweet/blob/master/Scweet/user.py) file give a list of following and followers for a given list of users. ## Usage : ### Library : The library is now available. To install the library, run : `pip install Scweet==1.8` After the installation, you can import and use the functions as follows: ``` from Scweet.scweet import scrape from Scweet.user import get_user_information, get_users_following, get_users_followers ``` **Scrape top tweets with the words 'bitcoin', 'ethereum' geolocated less than 200 km from Alicante (Spain) Lat=38.3452, Long=-0.481006 and without replies:** **The process is slower as the interval is smaller (choose an interval that can divide the period of time between, start and max date)** ``` data = scrape(words=['bitcoin','ethereum'], since="2021-10-01", until="2021-10-05", from_account = None, interval=1, headless=False, display_type="Top", save_images=False, lang="en", resume=False, filter_replies=False, proximity=False, geocode="38.3452,-0.481006,200km") ``` **Scrape top tweets of with the hashtag #bitcoin, in proximity and without replies:** **The process is slower as the interval is smaller (choose an interval that can divide the period of time between, start and max date)** ``` data = scrape(hashtag="bitcoin", since="2021-08-05", until=None, from_account = None, interval=1, headless=True, display_type="Top", save_images=False, resume=False, filter_replies=True, proximity=True) ``` **Get the main information of a given list of users:** **These users follow me on Twitter** ``` users = ['nagouzil', '@yassineaitjeddi', 'TahaAlamIdrissi', '@Nabila_Gl', 'geceeekusuu', '@pabu232', '@av_ahmet', '@x_born_to_die_x'] ``` **This function will return a list that contains : ** **["no. of following","no. of followers", "join date", "date of birth", "location", "website", "description"]** ``` users_info = get_user_information(users, headless=True) ``` **Get followers and following of a given list of users** **Enter your username and password in .env file. I recommend you do not use your main account.** **Increase wait argument to avoid banning your account and maximize the crawling process if the internet is slow. I used 1 and it's safe.** **Set your .env file with `SCWEET_EMAIL` , `SCWEET_USERNAME` and `SCWEET_PASSWORD` variables and provide its path** ``` env_path = ".env" following = get_users_following(users=users, env=env_path, verbose=0, headless=True, wait=2, limit=50, file_path=None) followers = get_users_followers(users=users, env=env_path, verbose=0, headless=True, wait=2, limit=50, file_path=None) ``` ### Terminal : ``` Scrape tweets. optional arguments: -h, --help show this help message and exit --words WORDS Words to search for. they should be separated by "//" : Cat//Dog. --from_account FROM_ACCOUNT Tweets posted by "from_account" account. --to_account TO_ACCOUNT Tweets posted in response to "to_account" account. --mention_account MENTION_ACCOUNT Tweets that mention "mention_account" account. --hashtag HASHTAG Tweets containing #hashtag --until UNTIL End date for search query. example : %Y-%m-%d. --since SINCE Start date for search query. example : %Y-%m-%d. --interval INTERVAL Interval days between each start date and end date for search queries. example : 5. --lang LANG Tweets language. Example : "en" for english and "fr" for french. --headless HEADLESS Headless webdrives or not. True or False --limit LIMIT Limit tweets to be scraped. --display_type DISPLAY_TYPE Display type of Twitter page : Latest or Top tweets --resume RESUME Resume the last scraping. specify the csv file path. --proxy PROXY Proxy server --proximity PROXIMITY Proximity --geocode GEOCODE Geographical location coordinates to center the search (), radius. No compatible with proximity --minreplies MINREPLIES Min. number of replies to the tweet --minlikes MINLIKES Min. number of likes to the tweet --minretweets MINRETWEETS Min. number of retweets to the tweet ``` ### To run the script : `python scweet.py --words "excellente//car" --to_account "tesla" --until 2020-01-05 --since 2020-01-01 --limit 10 --interval 1 --display_type Latest --lang="en" --headless True`