Working with fetched data
Why go to the trouble of writing this package at all? The fundamental reason is that, for analyzing data at least, it’s easier to work with a relational database than with a REST API. Local databases don’t have rate limits, they don’t make you worry about pagination and cursoring, and they put all data in one place for easy cross-referencing. More generally, using a database as this package does separates concerns: the content is separate from the presentation, and how you acquire and load Twitter data is decoupled from how you choose to analyze it.
This series of vignettes thus discusses how to export certain commonly used pieces of data from the database. These aren’t by any means the only kinds of data you may want to export (there’s nothing here about URLs, for instance), but it covers the most common bases.
There are two ways to get data out of the database: the twclient export
command, and writing your own SQL.
Built-in exports
You can export certain predefined bulk datasets with the twclient export
command. The built-in exports cover the same datasets we present SQL for below:
the follow graph, tweets, tweet-derived graphs like the mention and reply
graphs, user-level information like location/bio/etc, and counts of mutual
followers or friends for user pairs. For more detail on exactly how these
datasets are defined, see the SQL examples and the included discussion of each
dataset.
Exporting data is simple: twclient export follow-graph
will write the
follow graph to stdout; to specify an output file, use shell redirection or the
-o
flag. If you want to restrict the export to only certain users, you can
use the full set of user specification flags: -n
to include certain screen
names, -i
to include certain user IDs, -l
to include one or more
Twitter lists, and -g
to include one or more tags you’ve created for
users.
Here’s a realistic example:
twclient export mention-graph -o mention-graph.csv -g survey_respondents
For the full list of exportable datasets, run twclient export --help
.
Custom SQL
Going the SQL route is more flexible if also somewhat more work. You’ll need to do this if you want to do much customization of the built-in exports. Of course, allowing this sort of flexibility is the point of using a database backend to store the Twitter data.
We’ll use PostgreSQL’s dialect of SQL for examples here, but it isn’t much work to adapt the queries to SQLite or whatever database you’re using.