================================================================================== ---------------------------------------------------------------------------------- --- THIS DIRECTORY SHOULD BE PASSWORD PROTECTED TO PREVENT UNAUTHORIZED ACCESS --- ---------------------------------------------------------------------------------- ================================================================================== __________________________________________________________________________________ METADATA FOR DIRECTORY: NAME: verified_tweets TOTAL TWEETS IN THIS ARCHIVE: ~ 5.9 billion TOTAL SIZE (uncompressed): ~ 61 terabytes STATS: (Generated by stats.sh file) __________________________________________________________________________________ This repository contains all publically available tweets from all verified accounts. Some of the tweets may show "false" for user.verified but this appears to be an issue with the Twitter API. The filter "filter:verified" was used to collect these tweets. It appears that for the users with a false status for their verified flag may be accounts that were verified at some point and then delisted from the Twitter @verified account. (The Twitter @verified account follows all users that have been verified.) The archives go all the way back to 2006 but due to space restraints on this file repo, only the most recent months are available in this repo. All Covid-19 related tweets should be available back to January, 2019. NOTE: When decompressing the files with zstd, you will need to add the "long=31" flag. For example, to decompress to stdout: zstd -cd TWV_2021-01.ndjson.zst --long=31 When working with ndjson data via a linux terminal, I highly recommend installing jq which contains many helpful features. For example, you can filter tweets that are favorited more than 10 with the following command: zstd -cd TWV_2021-01.ndjson.zst --long=31 | jq 'select(.favorite_count > 10) | .' - Jason B (jason@pushshift.io)