Tweet-Derived Graphs: Mention, Reply, Retweet, Quote

The follow graph isn’t the only useful graph structure in Twitter. There are several other graphs that connect users, defined by their tweeting behavior. For example, the mention graph has a (directed) edge from user A to user B if user A mentions user B in a tweet. We’ll look at four such graphs in total: mention, reply, retweet and quote.

Twclient preprocesses tweets on load into the database to make exporting these graphs easier. User mentions as well as reply, retweet and quote relationships between tweets are extracted into link tables or tweet attributes. For you, trying to get the graphs out, that means simpler SQL.

Mention graph

Without further ado, the mention graph:

select
    tw.user_id as source_user_id,
    mt.mentioned_user_id as target_user_id,

    -- you can also, e.g., group by time
    -- extract(year from tw.create_dt) as year,
    -- extract(month from tw.create_dt) as month,

    count(*) as num_mentions
from tweet tw
    inner join user_mention mt using(tweet_id)
group by 1,2;

The user_mention table represents mentions of users in tweets. There’s a separate link table, rather than merely a key on the tweet table, because one tweet can mention multiple users and indeed can mention the same user more than once.

There are of course variations on this query: you can leave off the count(*) if the number of mentions isn’t relevant, you can group by year, month or other window of time during which tweets were posted, you can filter by values of the tweet’s create_dt in the WHERE clause, and so on.

Reply graph

The reply graph has a directed edge from user A to user B if A replies to B’s tweet. It’s even simpler to retrieve than the mention graph:

select
    user_id as source_user_id,
    in_reply_to_user_id as target_user_id,

    count(*) as num_replies
from tweet tw
where
    in_reply_to_user_id is not null
group by 1,2;

As with the mention graph, you can write plenty of variations on this query: grouping by the time the tweet was posted, or filtering on such things as the time the tweet was posted, the client it was posted from, or the language it was written in. To filter by user-level properties like a user’s verified status, you’ll need to join to the user_data table. To restrict to a subgraph over a particular set of users, see below.

Note that the definition of reply used here and in the Twitter API is more restrictive than what you can see in the Twitter.com web interface: you may see “replying to A, B, C” when you post a reply, but your reply-tweet is still in response to one specific tweet posted by one specific user (say, user B). The in_reply_to_user_id field will (in this example) accordingly contain user B’s Twitter user ID.

It’s also worth noting that in_reply_to_user_id may refer to a user not in the "user" table. This is because (unlike with retweets and quote tweets), the Twitter API doesn’t return the full text of the replied-to tweet with a reply.

Retweet graph

The retweet graph has a directed edge from user A to user B if A retweets B’s tweet. Extracting it from the database relies on the retweeted_status_id column of the tweet table:

select
    tws.user_id as source_user_id,
    twt.user_id as target_user_id,

    count(*) as num_retweets
from tweet tws
    inner join tweet twt on twt.tweet_id = tws.retweeted_status_id
group by 1,2;

Because the Twitter API does return the full text of the retweeted tweet along with a retweet, we can join back to the tweet table to get the retweeted user’s ID.

Quote graph

The quote graph has a directed edge from user A to user B if A quote-tweets B’s tweet. As with the use of the retweeted_status_id column to construct the retweet graph, getting the quote graph out of the database relies on the tweet.quoted_status_id column:

select
    tws.user_id as source_user_id,
    twt.user_id as target_user_id,

    count(*) as num_quote_tweets
from tweet tws
    inner join tweet twt on twt.tweet_id = tws.quoted_status_id
group by 1,2;

As with the retweet graph, the Twitter API returns the full text of quoted tweets with the tweets that QT them, which allows us to join through tweet in constructing this graph.

Filtering to a particular set of users

Frequently you won’t want the mention graph over all users whose tweets you’ve fetched, but only over some subset. If you’ve used the tagging feature twclient provides for working with groups of users, you can get the list of users tagged (for example) “influencers” as follows:

select
    u.user_id
from "user" u -- standard sql reserves this table name, need to quote
    inner join user_tag ut using(user_id)
    inner join tag ta using(tag_id)
where
    ta.name = 'influencers';

Given this set of users, the trick is to join to it on the columns giving both source and target user IDs:

with tmp_universe as
(
    select
        u.user_id
    from "user" u
        inner join user_tag ut using(user_id)
        inner join tag ta using(tag_id)
    where
        ta.name = 'influencers'
)
select
    uts.user_id as source_user_id,
    utt.user_id as target_user_id,

    count(*) as num_mentions
from tmp_universe uts
    inner join tweet tw using(user_id)
    inner join user_mention mt using(tweet_id)
    inner join tmp_universe utt on utt.user_id = mt.mentioned_user_id
group by 1,2;

We won’t go through similar code snippets for the reply, retweet and quote graphs, but you can use the same strategy of joining source and target user columns to a list of users you want to restrict the graph to.