Kyle Conroy

Downloading Your Twitter Data

After the outcry from my last post reached Jack himself, Twitter launched a new export option that includes all your data in a machine readable format. We did it!

Or did we?

It turns out Jack didn't read my post. I didn't get a call from the Twitter executive team. Instead, it was brought to my attention that the account export I've been looking for already exists and has for some time.

Now, I’m not sure why Twitter has two ways to download your data, but they do. I was using the Tweet archive, available in account settings. The other option, called "Your Twitter Data", is found on a separate settings page behind a password confirmation dialog. Scroll down to the bottom of the page and click the "Download data" button.

This archive contains far more information than the Tweet archive. Uncompressed, my Tweet archive is 7MB. My data archive was 45MB. The information contained inside is higher quality and more numerous. Ad impressions, block lists, screen name changes, oh my! It even includes image descriptions. The full list of files is included at the bottom of this post.

"Machine" Readable

That said, the archive is not without its issues. The most glaring problem is that the data isn't machine readable out of the box. For some reason, records are contained in JavaScript files.

window.YTD.tweet.part0 = [{
   ...
}]

Other strange choices abound. Tweets are contained in tweet.js. Here, all numbers are stored as floats, serialized as strings.

{ "favorite_count" : "0.0" }

I wasn't aware you could half-favorite a tweet. This scheme results in corrupted IDs, as large numbers are stored in E-notation, which truncates digits. Notice that the id filed is missing the last digit present in the id_str field.

{ "id" : "1.01456537218219622E18",
  "id_str" : "1014565372182196224" }

As I update Grain to parse the entire archive, I'm sure I'll run into more issues. I sincerely hope that talking about these data quality problems publicly encourages Twitter and other companies to take account exports seriously.

Appendix: Archive Layout

.
├── README.txt
├── account-creation-ip.js
├── account-suspension.js
├── account.js
├── ad-engagements.js
├── ad-impressions.js
├── ad-mobile-conversions-attributed.js
├── ad-mobile-conversions-unattributed.js
├── ad-online-conversions-attributed.js
├── ad-online-conversions-unattributed.js
├── ageinfo.js
├── block.js
├── connected-application.js
├── contact.js
├── direct-message-headers.js
├── direct-message.js
├── direct_message_media/
├── email-address-change.js
├── facebook-connection.js
├── follower.js
├── following.js
├── ip-audit.js
├── like.js
├── lists-created.js
├── lists-member.js
├── lists-subscribed.js
├── moment.js
├── mute.js
├── ni-devices.js
├── personalization.js
├── profile.js
├── profile_media/
├── protected-history.js
├── saved-search.js
├── screen-name-change.js
├── tweet.js
├── tweet_media/
└── verified.js