Tweet-Derived Graphs: Mention, Reply, Retweet, Quote
The follow graph isn’t the only useful graph structure in Twitter. There are several other graphs that connect users, defined by their tweeting behavior. For example, the mention graph has a (directed) edge from user A to user B if user A mentions user B in a tweet. We’ll look at four such graphs in total: mention, reply, retweet and quote.
Twclient preprocesses tweets on load into the database to make exporting these graphs easier. User mentions as well as reply, retweet and quote relationships between tweets are extracted into link tables or tweet attributes. For you, trying to get the graphs out, that means simpler SQL.
Mention graph
Without further ado, the mention graph:
select
tw.user_id as source_user_id,
mt.mentioned_user_id as target_user_id,
-- you can also, e.g., group by time
-- extract(year from tw.create_dt) as year,
-- extract(month from tw.create_dt) as month,
count(*) as num_mentions
from tweet tw
inner join user_mention mt using(tweet_id)
group by 1,2;
The user_mention
table represents mentions of users in tweets. There’s a
separate link table, rather than merely a key on the tweet
table, because
one tweet can mention multiple users and indeed can mention the same user more
than once.
There are of course variations on this query: you can leave off the
count(*)
if the number of mentions isn’t relevant, you can group by year,
month or other window of time during which tweets were posted, you can filter
by values of the tweet’s create_dt
in the WHERE clause, and so on.
Retweet graph
The retweet graph has a directed edge from user A to user B if A retweets B’s
tweet. Extracting it from the database relies on the retweeted_status_id
column of the tweet
table:
select
tws.user_id as source_user_id,
twt.user_id as target_user_id,
count(*) as num_retweets
from tweet tws
inner join tweet twt on twt.tweet_id = tws.retweeted_status_id
group by 1,2;
Because the Twitter API does return the full text of the retweeted tweet along
with a retweet, we can join back to the tweet
table to get the retweeted
user’s ID.
Quote graph
The quote graph has a directed edge from user A to user B if A quote-tweets
B’s tweet. As with the use of the retweeted_status_id
column to construct
the retweet graph, getting the quote graph out of the database relies on the
tweet.quoted_status_id
column:
select
tws.user_id as source_user_id,
twt.user_id as target_user_id,
count(*) as num_quote_tweets
from tweet tws
inner join tweet twt on twt.tweet_id = tws.quoted_status_id
group by 1,2;
As with the retweet graph, the Twitter API returns the full text of quoted
tweets with the tweets that QT them, which allows us to join through tweet
in constructing this graph.
Filtering to a particular set of users
Frequently you won’t want the mention graph over all users whose tweets you’ve fetched, but only over some subset. If you’ve used the tagging feature twclient provides for working with groups of users, you can get the list of users tagged (for example) “influencers” as follows:
select
u.user_id
from "user" u -- standard sql reserves this table name, need to quote
inner join user_tag ut using(user_id)
inner join tag ta using(tag_id)
where
ta.name = 'influencers';
Given this set of users, the trick is to join to it on the columns giving both source and target user IDs:
with tmp_universe as
(
select
u.user_id
from "user" u
inner join user_tag ut using(user_id)
inner join tag ta using(tag_id)
where
ta.name = 'influencers'
)
select
uts.user_id as source_user_id,
utt.user_id as target_user_id,
count(*) as num_mentions
from tmp_universe uts
inner join tweet tw using(user_id)
inner join user_mention mt using(tweet_id)
inner join tmp_universe utt on utt.user_id = mt.mentioned_user_id
group by 1,2;
We won’t go through similar code snippets for the reply, retweet and quote graphs, but you can use the same strategy of joining source and target user columns to a list of users you want to restrict the graph to.