API Documentation

A high-level analytics-focused command line client for the Twitter API.

This documentation focuses on the Python API. For an introduction to command-line usage, see the fetching-data vignette; for detailed command-line help, see the --help flag to the twclient (or twitter) command.

twclient.authpool

A version of tweepy.API with support for multiple sets of credentials.

class twclient.authpool.AuthPoolAPI(**kwargs)[source]

Bases: object

A version of tweepy.API with support for multiple sets of credentials.

This class transparently proxies access to multiple sets of Twitter API credentials. It creates as many tweepy.API instances as it gets sets of API credentials, and then dispatches method calls to the appropriate instance, handling rate limit and over-capacity errors. When one instance hits its rate limit, this implementation transparently switches over to the next. User code can treat it as a drop-in replacement for tweepy.API; see the tweepy documentation for methods.

Note that it does, however, handle the rate_limit_status method differently. We do this because it’s not useful to see rate limit info for only one set of credentials when the point of this class is to pool multiple sets. Rather than return only the Twitter API response for the currently active set of credentials, rate_limit_status returns a dictionary whose keys are the consumer keys of the credentials and whose values are the rate_limit_status responses for those keys.

Parameters:
  • auths (list of tweepy.AuthHandler) – The Twitter API credentials to use.

  • capacity_sleep (float) – How long to sleep before retrying after a Twitter capacity error, in seconds.

  • capacity_retries (int) – How many times to retry on capacity error before giving up.

  • **kwargs – Keyword arguments passed through to tweepy.API.

Raises:

error.TwitterServiceError – If Twitter is over capacity or encounters other internal problems for longer than (approximately) capacity_retries * capacity_sleep seconds.

twclient.error

Exceptions that twclient code may raise.

exception twclient.error.TWClientError(**kwargs)[source]

Bases: Exception

The base class for all errors raised by twclient.

Parameters:
  • message (str) – The reason for this error.

  • exit_status (int) – If the exception is caught at the top-level command-line script, this value is passed to sys.exit.

message

The parameter passed to __init__.

Type:

str

exit_status

The parameter passed to __init__.

exception twclient.error.TwitterAPIError(**kwargs)[source]

Bases: TWClientError

Base class for errors returned by the Twitter API.

Instances of this class correspond to errors returned by the Twitter API, but are higher-level and easier to handle in client code than the underlying instances of tweepy.errors.TweepyException (in tweepy < 4.0.0, it’s instead tweepy.error.TweepError) which gave rise to them.

Parameters:
  • tweepy_exception (instance of tweepy.errors.TweepyException) –

  • 4.0.0) ((tweepy.error.TweepError in tweepy <) – The underlying tweepy exception instance which caused this error to be raised.

tweepy_exception
Type:

instance of tweepy.errors.TweepyException

(tweepy.error.TweepError in tweepy < 4.0.0)

The parameter passed to __init__.

property response

The requests.Response object resulting from calling the Twitter API.

property http_code

The HTTP status code Twitter returned with this error.

property api_codes

Error codes returned by the Twitter API.

classmethod from_tweepy(exc)[source]

Construct an instance from a tweepy exception object.

Parameters:
  • exc (instance of tweepy.errors.TweepyException (in tweepy < 4.0.0,) –

  • tweepy.error.TweepError) – The exception from which to generate a TwitterAPIError instance.

Return type:

Instance of the appropriate subclass of TwitterAPIError.

static tweepy_is_instance(exc)[source]

Check whether a tweepy exception object can be converted to this class via from_tweepy().

Parameters:
  • exc (instance of tweepy.errors.TweepyException (in tweepy < 4.0.0,) –

  • tweepy.error.TweepError) – The tweepy exception object to check.

Returns:

True if exc is convertible, False otherwise.

Return type:

Boolean

exception twclient.error.TwitterServiceError(**kwargs)[source]

Bases: TwitterAPIError

A problem with the Twitter service.

A request to the Twitter service encountered a problem which was with the service rather than the request itself. Examples include low-level network problems, over-capacity errors, and internal Twitter server problems. Generally requests encountering this error should be retried.

static tweepy_is_instance(exc)[source]

Check whether a tweepy exception object can be converted to this class via from_tweepy().

Parameters:
  • exc (instance of tweepy.errors.TweepyException (in tweepy < 4.0.0,) –

  • tweepy.error.TweepError) – The tweepy exception object to check.

Returns:

True if exc is convertible, False otherwise.

Return type:

Boolean

exception twclient.error.TwitterLogicError(**kwargs)[source]

Bases: TwitterAPIError

A request to the Twitter service encountered a logical error condition.

This error is raised when a request to the Twitter service was received and executed successfully but returned a logical error condition. For example, requesting tweets from a user with protected tweets will raise a subclass of this exception class.

exception twclient.error.NotFoundError(**kwargs)[source]

Bases: TwitterLogicError

A requested object was not found.

There are several ways Twitter indicates that a requested object was not found, involving some combination of the API response code, the HTTP status code, and the message. Code in twclient generally can tell from context what object was not found, so we combine these errors into one class.

static tweepy_is_instance(exc)[source]

Check whether a tweepy exception object can be converted to this class via from_tweepy().

Parameters:
  • exc (instance of tweepy.errors.TweepyException (in tweepy < 4.0.0,) –

  • tweepy.error.TweepError) – The tweepy exception object to check.

Returns:

True if exc is convertible, False otherwise.

Return type:

Boolean

exception twclient.error.ForbiddenError(**kwargs)[source]

Bases: TwitterLogicError

A request was forbidden.

This frequently occurs when trying to request tweets or friends/followers for users with private accounts / protected tweets. Requesting information about a user with protected tweets is not always an error; certain kinds of information will be returned. But tweets and friends/followers will not be and instead will raise this error.

static tweepy_is_instance(exc)[source]

Check whether a tweepy exception object can be converted to this class via from_tweepy().

Parameters:
  • exc (instance of tweepy.errors.TweepyException (in tweepy < 4.0.0,) –

  • tweepy.error.TweepError) – The tweepy exception object to check.

Returns:

True if exc is convertible, False otherwise.

Return type:

Boolean

twclient.error.dispatch_tweepy_exception(exc)[source]

Take an exception instance and convert it to a TWClientError if applicable.

This class takes in an arbitrary exception ex and dispatches it in the following way: a) if ex is a tweepy.errors.TweepyException (in tweepy < 4.0.0, it’s instead tweepy.error.TweepError), convert it to the corresponding TWClientError if possible, else b) return ex as-is. It is used in wrappers of the Twitter API to simplify exception handling.

Parameters:

ex (Exception) – The exception instance to dispatch.

Returns:

The dispatched (possibly new) exception instance.

Return type:

Exception

exception twclient.error.SemanticError(**kwargs)[source]

Bases: TWClientError

Base class for non-Twitter error conditions.

These errors indicate larger problems with the operation of the program than a specific Twitter or database error (though such an error may have led to this one being raised).

exception twclient.error.BadTargetError(**kwargs)[source]

Bases: SemanticError

A specified target user is protected, suspended or otherwise nonexistent.

This error is raised when a user targeted for fetch is found to be unavailable. There may be several reasons for unavailability: a user having protected tweets, being suspended, or otherwise not existing.

Parameters:

targets (list of str or int) – The Twitter user IDs or screen names causing the error.

targets

The parameter passed to __init__.

Type:

list of str or int

exception twclient.error.BadTagError(**kwargs)[source]

Bases: SemanticError

A requested tag does not exist.

This error is raised when job.ApplyTagJob is given a tag which does not exist in the database.

Parameters:

tag (str) – The name of the nonexistent tag.

tag

The parameter passed to __init__.

Type:

str

exception twclient.error.BadSchemaError(**kwargs)[source]

Bases: SemanticError

The database schema is corrupt or the wrong version.

This error is raised when a Job detects that the schema present in the selected database profile is corrupt, an unsupported version, or not a twclient schema.

exception twclient.error.BadConfigError(**kwargs)[source]

Bases: SemanticError

An operation on the config file encountered an error.

This error is raised when an operation to be performed on the config file is misspecified, impossible, encounters another error, or the config file is malformed.

twclient.job

Job tasks providing core functionality

class twclient.job.Job[source]

Bases: ABC

A job to be run against the database and possibly also the Twitter API.

abstract run()[source]

Run the job.

class twclient.job.DatabaseJob(**kwargs)[source]

Bases: Job

A job to be run against the database.

This class represents a job to be run against the database. Subclasses may or may not also support access to the Twitter API.

Parameters:

engine (sqlalchemy.engine.Engine instance) – The sqlalchemy engine representing the database to connect to.

engine

The parameter passed to __init__.

Type:

sqlalchemy.engine.Engine instance

session

The actual database session to use.

Type:

sqlalchemy.orm.session.Session instance

ensure_schema_version()[source]

Ensure that the database schema is a usable version.

This method checks that the schema present in the database referred to by self.engine is a version the Job class knows how to work with. If the schema is an unsupported version or is missing / corrupt, an instance of error.BadSchemaError will be raised.

Return type:

None

get_or_create(model, **kwargs)[source]

Get a persistent object or create a pending one.

Given a model and a set of kwargs, interpretable as the values of the model’s attributes, which together should identify one row in the database, query for it and a) return a persistent object if the row exists, or otherwise b) create and return a pending object with the appropriate attribute values.

Parameters:
  • model (instance of models.Base) – A sqlalchemy model object.

  • **kwargs – Keyword arguments specifying the values of the model’s attributes.

Returns:

The persistent or pending object.

Return type:

instance of models.Base

class twclient.job.TargetJob(**kwargs)[source]

Bases: DatabaseJob

A job which requires targets.

A TargetJob is a job which requires a set of target.Target instances to specify users. An instance of this class must specify its resolve mode for the Target classes, and has defaut logic to resolve them and expose their users.

Parameters:
  • targets (list of target.Target) – The list of targets for the job.

  • allow_missing_targets (bool) – If resolving the targets indicates that some targets should be in the database but are not (i.e., one of the Target instances in self.targets has a non-empty missing_targets attribute), should we raise error.BadTargetError (if False, default) or continue and ignore the missing targets (if True)?

targets

The parameter passed to __init__.

Type:

list of target.Target

allow_missing_targets

The parameter passed to __init__.

Type:

bool

abstract property resolve_mode

The resolve mode attribute to specify behavior of Target instances.

This attribute is consumed by the Target instances in self.targets. Acceptable values include ‘fetch’, ‘skip’, ‘hydrate’. See the documentation for target.Target for more information.

property resolved

Have all targets been resolved to users?

This attribute is false on instantiation, and is normally set to True by calling resolve_targets().

property users

The combined set of users referred to by all targets.

This is the union of all the users referred to by the Target instances in self.targets. If the targets have not been resolved, accessing this attribute will raise AttributeError.

property bad_targets

The combined set of bad raw targets referred to by all targets.

This is the union of all the bad raw targets in the Target instances in self.targets. If the targets have not been resolved, accessing this attribute will raise AttributeError. See the documentation for target.Target for details of what a target and raw target are and its bad_targets attribute for what it means for a raw target to be bad.

property missing_targets

The combined set of missing raw targets referred to by all targets.

This is the union of all the missing raw targets in the Target instances in self.targets. If the targets have not been resolved, accessing this attribute will raise AttributeError. See the documentation for target.Target for details of what a target and raw target are and its missing_targets attribute for what it means for a raw target to be missing.

property good_targets

The combined set of good raw targets referred to by all targets.

This is the union of all the good raw targets in the Target instances in self.targets. If the targets have not been resolved, accessing this attribute will raise AttributeError. See the documentation for target.Target for details of what a target and raw target are and its good_targets attribute for what it means for a raw target to be good.

resolve_targets()[source]

Resolve all of the targets in self.targets to users.

This method resolves all of the targets in self.targets to users (and bad/missing raw targets, if applicable) and validates them using whatever logic the subclass has defined for validate_targets().

validate_targets()[source]

Validate the targets in self.targets.

This method is a hook called by resolve_targets to ensure that the targets in self.targets have resolved into a sane configuration. If any error is detected, error.BadTargetError should be raised. The default implementation here checks whether there are missing targets (i.e., targets which should have been but were not found in the database), and raises error.BadTargetError unless self.allow_missing_targets evaluates to True. Subclasses may override with other configurations.

class twclient.job.ApiJob(**kwargs)[source]

Bases: Job

A job requiring acess to the Twitter API.

This class represents a job which interacts with the Twitter API. It configures API access, and defers other functionality to subclasses.

Parameters:
  • api (instance of twitter_api.TwitterApi) – The TwitterApi instance to use for API access.

  • allow_api_errors (bool) – If the Twitter API returns an error, should we abort (if False, default), or ignore and continue (if True)?

api

The parameter passed to __init__.

Type:

instance of twitter_api.TwitterApi

allow_api_errors

The parameter passed to __init__.

Type:

bool

class twclient.job.ConfigJob(**kwargs)[source]

Bases: Job

A job which interacts with the config file.

Instances of subclasses of ConfigJob do various config-related tasks: printing information from the file, adding new DB or API profiles, or removing such profiles, among others.

Parameters:

config (instance of Config) – The Config object to work with.

config

The parameter passed to __init__.

Type:

instance of Config

class twclient.job.ConfigPrintJob(**kwargs)[source]

Bases: ConfigJob

A job which prints information from the config file.

Jobs which are instances of subclasses of this class print various kinds of information, such as lists of API credentials or database URLs.

Parameters:

full (bool) – Whether to print profile names only (if False, default), or all information (if True).

full

The parameter passed to __init__.

Type:

bool

class twclient.job.ConfigWriteJob(**kwargs)[source]

Bases: ConfigJob

A job which modifies the config file.

Jobs which are instances of subclasses of this class modify the config file, for example by adding a new database profile or API credential.

Parameters:

name (str) – The name of the profile to operate on.

name

The parameter passed to __init__.

Type:

str

class twclient.job.ConfigListDbJob(**kwargs)[source]

Bases: ConfigPrintJob

List the database profiles given in the config file.

run()[source]

Run the job.

class twclient.job.ConfigListApiJob(**kwargs)[source]

Bases: ConfigPrintJob

List the API profiles given in the config file.

run()[source]

Run the job.

class twclient.job.ConfigRmDbJob(**kwargs)[source]

Bases: ConfigWriteJob

Remove a database profile given in the config file.

run()[source]

Run the job.

class twclient.job.ConfigRmApiJob(**kwargs)[source]

Bases: ConfigWriteJob

Remove an API profile given in the config file.

run()[source]

Run the job.

class twclient.job.SetDbDefaultJob(**kwargs)[source]

Bases: ConfigWriteJob

Set a database profile given in the config file as the default DB profile.

run()[source]

Run the job.

class twclient.job.ConfigAddDbJob(**kwargs)[source]

Bases: ConfigWriteJob

Add a database profile to the config file.

This job adds a new database to the config file, but does not initialize it for later use (that’s the InitializeJob class or twitter initialize command). Only databases supported by sqlalchemy can be added, and the database must be specified by a sqlalchemy connection URL.

Parameters:

database_url (str) – The sqlalchemy connection URL of the database.

The parameter passed to __init__.
run()[source]

Run the job.

class twclient.job.ConfigAddApiJob(**kwargs)[source]

Bases: ConfigWriteJob

Add an API profile to the config file.

This job adds a set of Twitter credentials to the config file. The credential set can involve either a consumer key and secret (OAuth 2) or also a token and token secret (OAuth 1a).

Parameters:
  • consumer_key (str) – The OAuth consumer key.

  • consumer_secret (str) – The OAuth consumer secret.

  • token (str or None) – The OAuth token.

  • token_secret (str or None) – The OAth token secret.

consumer_key

The parameter passed to __init__.

Type:

str

consumer_secret

The parameter passed to __init__.

Type:

str

token

The parameter passed to __init__.

Type:

str

token_secret

The parameter passed to __init__.

Type:

str

run()[source]

Run the job.

class twclient.job.ExportJob(**kwargs)[source]

Bases: TargetJob, DatabaseJob

A job exporting data from the database.

This class represents a job which pulls an export of collected Twitter data from the database. Several subclasses are defined for particular kinds of commonly used exports. If targets are given, the exports are restricted to only those targets (but note that how to do this is export-specific and subclasses must implement it). The exports produced are CSV files with columns given by the columns property, in the order they appear there.

Parameters:

outfile (str) – The path to the file where we should write the export (default ‘-’ for stdout).

outfile

The parameter passed to __init__.

Type:

str

resolve_mode = 'skip'
abstract query()[source]

The sqlalchemy query returning rows to export.

This method is the main piece of business logic for subclasses, which must implement it along with the columns property. It should return an iterable (or be a generator) of tuples, with the elements of each tuple assumed to be in the order specified by the columns property.

abstract property columns

The list of column names for the resultset returned by the query method.

Subclasses must implement this property along with the query method.

run()[source]

Run the job.

class twclient.job.ExportFollowGraphJob(**kwargs)[source]

Bases: ExportJob

Export the follow graph.

This export is a graph in edgelist format, with an edge from source_user_id to target_user_id if the source user follows the target user. There is one row per edge (i.e., per pair of users with a following relationship).

If targets are specified, return only the follow graph on the users they describe; otherwise, return the entire graph.

columns = ['source_user_id', 'target_user_id']
query()[source]

The sqlalchemy query returning rows to export.

This method is the main piece of business logic for subclasses, which must implement it along with the columns property. It should return an iterable (or be a generator) of tuples, with the elements of each tuple assumed to be in the order specified by the columns property.

class twclient.job.ExportMentionGraphJob(**kwargs)[source]

Bases: ExportJob

Export the mention graph.

This export is a graph in edgelist format, with an edge from source_user_id to target_user_id if the source user has mentioned the target user. The third column num_mentions gives the number of mentions. There is one row per edge (i.e., per pair of users with a mention relationship).

If targets are specified, return the mention graph only on the users they specify; otherwise, return the entire graph.

columns = ['source_user_id', 'target_user_id', 'num_mentions']
query()[source]

The sqlalchemy query returning rows to export.

This method is the main piece of business logic for subclasses, which must implement it along with the columns property. It should return an iterable (or be a generator) of tuples, with the elements of each tuple assumed to be in the order specified by the columns property.

class twclient.job.ExportReplyGraphJob(**kwargs)[source]

Bases: ExportJob

Export the reply graph.

This export is a graph in edgelist format, with an edge from source_user_id to target_user_id if the source user has replied to the target user. The third column num_mentions gives the number of mentions. There is one row per edge (i.e., per pair of users with a reply relationship).

If targets are specified, return the reply graph only on the users they specify; otherwise, return the entire graph.

columns = ['source_user_id', 'target_user_id', 'num_replies']
query()[source]

The sqlalchemy query returning rows to export.

This method is the main piece of business logic for subclasses, which must implement it along with the columns property. It should return an iterable (or be a generator) of tuples, with the elements of each tuple assumed to be in the order specified by the columns property.

class twclient.job.ExportRetweetGraphJob(**kwargs)[source]

Bases: ExportJob

Export the retweet graph.

This export is a graph in edgelist format, with an edge from source_user_id to target_user_id if the source user has retweeted the target user. The third column num_mentions gives the number of mentions. There is one row per edge (i.e., per pair of users with a retweet relationship).

If targets are specified, return the retweet graph only on the users they specify; otherwise, return the entire graph.

columns = ['source_user_id', 'target_user_id', 'num_retweets']
query()[source]

The sqlalchemy query returning rows to export.

This method is the main piece of business logic for subclasses, which must implement it along with the columns property. It should return an iterable (or be a generator) of tuples, with the elements of each tuple assumed to be in the order specified by the columns property.

class twclient.job.ExportQuoteGraphJob(**kwargs)[source]

Bases: ExportJob

Export the quote graph.

This export is a graph in edgelist format, with an edge from source_user_id to target_user_id if the source user has quote-tweeted the target user. The third column num_mentions gives the number of mentions. There is one row per edge (i.e., per pair of users with a quote-tweet relationship).

If targets are specified, return the quote graph only on the users they specify; otherwise, return the entire graph.

columns = ['source_user_id', 'target_user_id', 'num_quotes']
query()[source]

The sqlalchemy query returning rows to export.

This method is the main piece of business logic for subclasses, which must implement it along with the columns property. It should return an iterable (or be a generator) of tuples, with the elements of each tuple assumed to be in the order specified by the columns property.

class twclient.job.ExportTweetsJob(**kwargs)[source]

Bases: ExportJob

Export the set of user tweets.

This export includes all tweets for either all users or a particular set of targets. Various relevant fields are included, including in particular the text of any retweeted/quoted/replied-to status and a recoded version of the client from which the tweet was posted. There is one row per tweet.

If targets are specified, return only tweets by the users they specify; otherwise, return all tweets. Note that because we receive and store full tweet objects for quote tweets and retweets, and users can RT or QT any other user, not just ones whose tweets were fetched, “all tweets” may include some tweets by users whose tweets weren’t explicitly fetched.

columns = ['tweet_id', 'user_id', 'content', 'retweeted_status_content', 'quoted_status_content', 'in_reply_to_status_content', 'is_retweet', 'is_reply', 'is_quote', 'create_dt', 'lang', 'retweet_count', 'favorite_count', 'source_collapsed']
query()[source]

The sqlalchemy query returning rows to export.

This method is the main piece of business logic for subclasses, which must implement it along with the columns property. It should return an iterable (or be a generator) of tuples, with the elements of each tuple assumed to be in the order specified by the columns property.

class twclient.job.ExportUserInfoJob(**kwargs)[source]

Bases: ExportJob

Export user-level information.

This export includes user-level information. Besides the Twitter-assigned user ID, fields include such things as the profile URL and self-reported location, counts of friends, followers and list memberships, verified status, and other such user-specific fields. If a given user’s data has been fetched more than once, only the most recent fetch will be used. There is one row per user.

If targets are specified, only those users will be included in the export. If no targets are specified, the default is to return rows for all users who have been fetched with twitter fetch users (i.e., those with rows in the user_data table).

columns = ['user_id', 'profile_url', 'friends_count', 'followers_count', 'listed_count', 'screen_name', 'location', 'display_name', 'description', 'protected', 'verified', 'account_create_dt', 'recorded_tweets_all_time', 'first_tweet_dt', 'last_tweet_dt', 'android_user', 'ios_user', 'desktop_user', 'business_app_user']
query()[source]

The sqlalchemy query returning rows to export.

This method is the main piece of business logic for subclasses, which must implement it along with the columns property. It should return an iterable (or be a generator) of tuples, with the elements of each tuple assumed to be in the order specified by the columns property.

class twclient.job.ExportMutualsJob(**kwargs)[source]

Bases: ExportJob

Compute mutual friends or followers counts for pairs of some set of users.

property columns

The list of column names for the resultset returned by the query method.

Subclasses must implement this property along with the query method.

abstract property direction

The direction (friends or followers) of mutual counts to compute.

query()[source]

The sqlalchemy query returning rows to export.

This method is the main piece of business logic for subclasses, which must implement it along with the columns property. It should return an iterable (or be a generator) of tuples, with the elements of each tuple assumed to be in the order specified by the columns property.

class twclient.job.ExportMutualFollowersJob(**kwargs)[source]

Bases: ExportMutualsJob

Export counts of mutual followers.

This export includes counts of mutual followers between all pairs of users from a certain set of eligible users (exactly which set is discussed below). That is, if user A and user B are both included, there will be one row with a count of the number of users who follow both A and B. Note that you must have fetched followers of both A and B for the counts to be accurate: if either has not had followers fetched, there will be a row for the (A, B) pair but it will record 0 mutual followers. There is one row per pair of users in the set of eligible users (for all pairs).

If targets are specified, the set of eligible users is restricted to only the users they describe. Otherwise, the default set of users is those who have been fetched with twitter fetch users (i.e., those with rows in the user_data table).

direction = 'followers'
class twclient.job.ExportMutualFriendsJob(**kwargs)[source]

Bases: ExportMutualsJob

Export counts of mutual friends.

This export includes counts of mutual friends between all pairs of users from a certain set of eligible users (exactly which set is discussed below). That is, if user A and user B are both included, there will be one row with a count of the number of users who are followed by both A and B. Note that you must have fetched friends of both A and B for the counts to be accurate: if either has not had friends fetched, there will be a row for the (A, B) pair but it will record 0 mutual friends. There is one row per pair of users in the set of eligible users (for all pairs).

If targets are specified, the set of eligible users is restricted to only the users they describe. Otherwise, the default set of users is those who have been fetched with twitter fetch users (i.e., those with rows in the user_data table).

direction = 'friends'
class twclient.job.FetchJob(**kwargs)[source]

Bases: ApiJob, TargetJob

A job fetching data from the Twitter API.

This class represents a job which fetches data from the Twitter API. It configures API access and user validation logic, and defers other functionality to subclasses.

Parameters:
  • load_batch_size (int) – Load new rows to the database in batches of this size. The default is None, which loads all data retrieved in one batch. Lower values minimize memory usage at the cost of slower loading speeds, while higher values do the reverse. Target instances in self.targets do not consider this value–it applies only to other rows loaded by the FetchJob instance–because there are generally not enough targets to consume a significant amount of memory. Followers and friends lists in particular can be large enough to cause out-of-memory conditions; setting load_batch_size to an appropriate value (e.g., 5000) can address this problem.

  • randomize (bool) – Whether to process raw targets in a randomized order. This may allow loads which are interrupted partway through to retain some useful statistical properties.

load_batch_size

The parameter passed to __init__.

Type:

int

randomize

The parameter passed to __init__.

Type:

bool

property users

The combined set of users referred to by all targets.

This is the union of all the users referred to by the Target instances in self.targets. If the targets have not been resolved, accessing this attribute will raise AttributeError.

validate_targets()[source]

Validate the targets in self.targets.

This method is a hook called by resolve_targets to ensure that the targets in self.targets have resolved into a sane configuration. If any error is detected, error.BadTargetError should be raised. The default implementation here checks whether there are missing targets (i.e., targets which should have been but were not found in the database), and raises error.BadTargetError unless self.allow_missing_targets evaluates to True. Subclasses may override with other configurations.

class twclient.job.UserInfoJob(**kwargs)[source]

Bases: FetchJob

A job which hydrates users.

This job resolves its targets to users with resolve_mode == 'hydrate'. That is, it fetches data on those users from Twitter’s users/lookup endpoint, and stores the resulting data in the database. No other work is done. The entire job is run in one transaction; if anything goes wrong, no users are loaded.

resolve_mode = 'hydrate'
run()[source]

Run the job.

class twclient.job.TweetsJob(**kwargs)[source]

Bases: FetchJob

Fetch user tweets from the Twitter API.

This job fetches user tweets from Twitter’s statuses/user_timeline endpoint and loads them to the database. Several options are provided to control which of a given user’s tweets are loaded. The loaded tweets are extensively normalized to extract other entities (mentions, mentioned users, hashtags, photos and videos, etc). The job is run in one transaction per user; if anything goes wrong during loading of a user, the user which encountered the error will be rolled back but tweets for previously processed users will remain in the database.

Parameters:
  • since_timestamp (float, or None) – A Unix timestamp. Tweets older than this will not be loaded, and an attempt will be made not to fetch them from the API in order to minimize usage of rate-limited endpoints.

  • max_tweets (int, or None) – Stop loading tweets for each user after this many. If None, load all available tweets. After loading max_tweets tweets, no further calls to the Twitter endpoint will be made (to minimize usage of rate-limited endpoints).

  • old_tweets (bool) – Should we, for each user, fetch only tweets newer than the newest one in the database (if False, default), or fetch all tweets (if True)? This can be done efficiently thanks to the Twitter endpoint’s since_id parameter and the fact that tweet IDs are sequential.

since_timestamp

The parameter passed to __init__.

Type:

float

max_tweets

The parameter passed to __init__.

Type:

int

old_tweets

The parameter passed to __init__.

Type:

bool

resolve_mode = 'skip'
run()[source]

Run the job.

class twclient.job.FollowGraphJob(**kwargs)[source]

Bases: FetchJob

Fetch follow-graph edges from the Twitter API.

This job fetches follow-graph edges from the Twitter API for a given set of users. Subclasses must specify which direction of edges to fetch (users’ friends or followers). The edges are stored in the follow table, which uses a type 2 SCD format to allow tracking historical follow-graph state with reduced space requirements, and are first loaded to a staging table. The job is run in one transaction per user; if anything goes wrong during loading of a user, the user which encountered the error will be rolled back but edges for previously processed users will remain in the database.

Note that Twitter sometimes returns the same follower/friend ID more than once (probably because of eventual consistency). As a result, there is special loading logic for these jobs. Each batch of follower or friend IDs is deduped before being inserted (the entire set of IDs at once if load_batch_size is None); if an ID in one batch duplicates an ID received in a previous batch, the batch is retried one row at a time (which is quite slow). Consequently loading these rows is most efficient with load_batch_size of None. Other values should be used only if memory is a constraint.

resolve_mode = 'skip'
abstract property direction

The “direction” of follow edges to load.

Given a set of users, we might want to fetch the users who follow them (their “followers”) or the users they follow (their “friends”). This attribute, which subclasses must set, should be either “friends” or “followers” to specify which direction of fetch is intended.

run()[source]

Run the job.

class twclient.job.FollowersJob(**kwargs)[source]

Bases: FollowGraphJob

A FollowGraphJob which fetches user followers.

direction = 'followers'
class twclient.job.FriendsJob(**kwargs)[source]

Bases: FollowGraphJob

A FollowGraphJob which fetches user friends.

direction = 'friends'
class twclient.job.InitializeJob(**kwargs)[source]

Bases: DatabaseJob

A job which initializes the selected database and sets up the schema.

WARNING! This job will drop all data in the selected database! This job (re-)initializes the selected database and applies the schema to it. The version of the creating package will also be stored to help future versions with migrations and compatibility checks.

run()[source]

Run the job.

class twclient.job.RateLimitStatusJob(**kwargs)[source]

Bases: ApiJob

Check the rate limits for the API keys in the config file.

This job pulls the rate limit status for each key in the config file and prints it to stdout in json format. The job filters by default to only the API endpoints we use but can be told to show all of them.

run()[source]

Run the job.

class twclient.job.TagJob(**kwargs)[source]

Bases: DatabaseJob

A job which uses user tags.

A TagJob is a class which requires a user tag. It ensures that the database schema version is correct, and leaves other logic for subclasses.

Parameters:

tag (str) – The name of a user tag.

tag

The parameter passed to __init__.

Type:

str

class twclient.job.CreateTagJob(**kwargs)[source]

Bases: TagJob

Create a user tag.

This job creates a new user tag. If the tag already exists, nothing is done and no error is raised. The tag is not applied to any users. (See ApplyTagJob for that.)

run()[source]

Run the job.

class twclient.job.DeleteTagJob(**kwargs)[source]

Bases: TagJob

Delete a user tag.

This job deletes a user tag. If the tag does not exist, nothing is done and no error is raised. Any existing assignments of the tag to users are also deleted.

run()[source]

Run the job.

class twclient.job.ApplyTagJob(**kwargs)[source]

Bases: TagJob, TargetJob

Apply a user tag to a set of users.

This job applies an existing user tag to a set of users. If the tag does not exist, error.BadTagError is raised. (Use CreateTagJob to create a new tag.) The targets are resolved to users with resolve_mode == 'skip' (i.e., any requested users which do not exist in the database are not looked up from the Twitter API). If any users were not successfully resolved, error.BadTargetError is raised unless the allow_missing_targets parameter is True. Otherwise, any users which were successfully resolved from the targets are given the tag. In particular, if no users were successfully resolved and allow_missing_users is True, nothing is done and no error is raised. The entire job is run as one transaction; if anything goes wrong, no tags are applied.

resolve_mode = 'skip'
run()[source]

Run the job.

twclient.models

The database schema for storing Twitter data.

class twclient.models.SchemaVersion(**kwargs)[source]

Bases: TimestampsMixin, Base

A stub table to store the schema version.

This table contains one row with the version of twclient which created the database. This version string is stored to support future migrations, though none are supported now.

version

The creating package version.

insert_dt

The load time of the row into the database.

modified_dt

The last time the row was modified. Note that this field is updated by application logic rather than a trigger.

class twclient.models.User(**kwargs)[source]

Bases: TimestampsMixin, FromTweepyInterface, Base

A Twitter user.

This class and its associated user table represent attributes of a Twitter user which can’t change over time. In practice, that means only user IDs - even screen names can change. See the UserData class and its user_data table for records of these mutable attributes.

user_id

The Twitter user ID for this user. This ID is assigned at account creation and is stable for the lifetime of the account.

classmethod from_tweepy(obj, session=None)[source]

Instantiate a class instance from a tweepy object.

This method constructs a class instance from a tweepy object. A sqlalchemy database session is optional in general but may be required by some subclasses which rely on UniqueMixin.as_unique.

Parameters:
  • obj (instance of tweepy.Model) – The tweepy object to use as data source.

  • session (instance of sqlalchemy.orm.session.Session, or None) – A sqlalchemy database session.

Returns:

The constructed class instance.

Return type:

Instance of FromTweepyInterface

insert_dt

The load time of the row into the database.

modified_dt

The last time the row was modified. Note that this field is updated by application logic rather than a trigger.

class twclient.models.UserData(**kwargs)[source]

Bases: TimestampsMixin, FromTweepyInterface, Base

Mutable attributes of a Twitter user.

This class and its associated user_data table represent records of a Twitter user’s mutable attributes. Many API requests to Twitter send back user entities with various properties of a user, including things from screen names and verified status to follower counts; we store these entities here.

user_data_id

An autoincrement ID for this fetch, not assigned by Twitter.

user_id

Twitter’s user ID.

url_id

The user’s profile URL (see the Url class).

api_response

The raw json text returned by Twitter.

screen_name

The user’s screen name (note that this can change over time).

create_dt

The date and time the account was created.

protected

Does this account have protected tweets?

verified

Is this user verified?

display_name

The user’s display name (not the screen name).

description

The user’s bio text.

location

The user-provided free-form location field.

friends_count

The number of friends the user has (i.e., the number of other users they follow).

followers_count

The number of followers the user has.

listed_count

The number of lists the user appears on.

classmethod from_tweepy(obj, session=None)[source]

Instantiate a class instance from a tweepy object.

This method constructs a class instance from a tweepy object. A sqlalchemy database session is optional in general but may be required by some subclasses which rely on UniqueMixin.as_unique.

Parameters:
  • obj (instance of tweepy.Model) – The tweepy object to use as data source.

  • session (instance of sqlalchemy.orm.session.Session, or None) – A sqlalchemy database session.

Returns:

The constructed class instance.

Return type:

Instance of FromTweepyInterface

insert_dt

The load time of the row into the database.

modified_dt

The last time the row was modified. Note that this field is updated by application logic rather than a trigger.

class twclient.models.Tag(**kwargs)[source]

Bases: TimestampsMixin, Base

A tag that can be given to one or more Twitter users.

This class represents a tag that can be applied to a set of Twitter users to help track them. Examples might include “treatment A”, “journalists”, “survey_respondents_2020”, etc.

tag_id

An autoincrement ID for the tag.

name

The name of the tag.

insert_dt

The load time of the row into the database.

modified_dt

The last time the row was modified. Note that this field is updated by application logic rather than a trigger.

class twclient.models.List(**kwargs)[source]

Bases: TimestampsMixin, FromTweepyInterface, Base

A Twitter list of users.

Twitter allows users to create lists of other users, and represents these lists as first-class entities in its API. This class represents one of these lists (though not the users in it; see the UserList class for that).

list_id

Twitter’s ID for the list.

user_id

The Twitter user ID of the user who owns the list.

slug

The short name of the list (as it appears in, e.g., URLs).

api_response

The raw json text returned by Twitter.

create_dt

The date and time the list was created.

full_name

The list’s “full name”, which is its owning user’s screen name (without the @ sign) and its slug, separated by a slash. For example, “cspan/members-of-congress”.

display_name

The list’s long-form display name, which may contain spaces and other characters which are not URL-safe.

uri

A resource URI for this list within the domain of Twitter entities.

description

A free-form bio/description text for the list.

mode

Either “public” or “private” depending on the visibility of the list.

member_count

The number of users on the list as of modified_dt.

subscriber_count

The number of users who subscribe to a public list (i.e., who have signed up to be able to view the combined timelines of the list members).

classmethod from_tweepy(obj, session=None)[source]

Instantiate a class instance from a tweepy object.

This method constructs a class instance from a tweepy object. A sqlalchemy database session is optional in general but may be required by some subclasses which rely on UniqueMixin.as_unique.

Parameters:
  • obj (instance of tweepy.Model) – The tweepy object to use as data source.

  • session (instance of sqlalchemy.orm.session.Session, or None) – A sqlalchemy database session.

Returns:

The constructed class instance.

Return type:

Instance of FromTweepyInterface

insert_dt

The load time of the row into the database.

modified_dt

The last time the row was modified. Note that this field is updated by application logic rather than a trigger.

class twclient.models.UserList(**kwargs)[source]

Bases: Base

User membership status for Twitter lists.

This class represents a user’s membership in a list, and records a user ID and a list ID. The underlying table is in type-2 SCD format, which means that the (user, list) edge is recorded together with the date it was observed (its validity start date). When it ceases to be observed (i.e., is missing on a fetch of the list members), the row is updated to add this validity end date. If the user subsequently returns to the list, a new row is added.

user_list_id

An autoincrement row ID, not assigned by Twitter.

user_id

The Twitter user ID of the user who is a list member.

list_id

The Twitter list ID.

valid_start_dt

The SCD validity start date.

valid_end_dt

The SCD validity end date (None / NULL if the row is current).

class twclient.models.UserTag(**kwargs)[source]

Bases: TimestampsMixin, Base

Users who have been assigned tags.

This table records the assignment of user tags (the Tag class) to users.

user_tag_id

An autoincrement row ID.

user_id

The Twitter user ID of the user assigned a tag.

tag_id

The ID of the tag the user is assigned.

insert_dt

The load time of the row into the database.

modified_dt

The last time the row was modified. Note that this field is updated by application logic rather than a trigger.

class twclient.models.Follow(**kwargs)[source]

Bases: Base

The Twitter follow graph.

This class records an edge in the Twitter follow graph. The underlying table stores these edges in a type-2 SCD format, which means that the (source_user, target_user) edge is recorded together with the date it was observed (its validity start date). When it ceases to be observed (i.e., is missing on a fetch where it would be present if still valid), the row is updated to add this validity end date. If the follow relationship is subsequently observed again, a new row is added.

follow_id

An autoincrement row ID, not assigned by Twitter.

source_user_id

The origin user for the directed follow edge (i.e., a user who follows the user given by target_user_id).

target_user_id

The destination user for the directed follow edge (i.e., a user who is followed by the user given by source_user_id).

valid_start_dt

The SCD validity start date.

valid_end_dt

The SCD validity end date (None / NULL if the row is current).

class twclient.models.Tweet(**kwargs)[source]

Bases: TimestampsMixin, FromTweepyInterface, Base

A tweet.

This model represents a tweet posted on Twitter and ingested from its API. Several tweet properties are normalized from the raw json into database entities and/or foreign keys.

tweet_id

The ID assigned by Twitter to this tweet. These IDs are sequential integers, so that tweets with higher numbers were posted later.

user_id

The Twitter user ID of the user posting the tweet.

retweeted_status_id

If this tweet is a retweet of another tweet, the ID of the retweeted tweet. Because Twitter’s API returns the other tweet as well in such cases, this is stored in the database as a foreign key back to the tweet table, helping with graph analysis.

quoted_status_id

If this tweet is a quote tweet of another tweet, the ID of the quoted tweet. Because Twitter’s API returns the other tweet as well in such cases, this is stored in the database as a foreign key back to the tweet table, helping with graph analysis.

api_response

The raw json text returned by Twitter.

content

The body of the tweet. This field may differ from what is observed in the web interface on Twitter.com. In particular, retweets are prepended with “RT @POSTING_USER” and links are rendered with t.co rather than any display URLs.

create_dt

The date and time the tweet was posted.

in_reply_to_status_id

If this tweet is a reply, the ID of the tweet it was in reply to. The Twitter API does not return an entity for a replied-to tweet, so this field is not a foreign key (and may reference tweets not in the tweet table).

in_reply_to_user_id

If this tweet is a reply, the ID of the user who posted the tweet it was in reply to. The Twitter API does not return an entity for a replied-to tweet, so this field is not a foreign key (and may refer to users not in the user table).

lang

The detected language of the tweet.

source

The platform from which the tweet was posted (iPhone, Android, desktop, etc).

truncated

Was the tweet text truncated at 140 characters? Should always be false, included for visibility.

retweet_count

The number of times this tweet was retweeted.

favorite_count

The number of times this tweet was favorited / liked.

classmethod from_tweepy(obj, session=None)[source]

Instantiate a class instance from a tweepy object.

This method constructs a class instance from a tweepy object. A sqlalchemy database session is optional in general but may be required by some subclasses which rely on UniqueMixin.as_unique.

Parameters:
  • obj (instance of tweepy.Model) – The tweepy object to use as data source.

  • session (instance of sqlalchemy.orm.session.Session, or None) – A sqlalchemy database session.

Returns:

The constructed class instance.

Return type:

Instance of FromTweepyInterface

insert_dt

The load time of the row into the database.

modified_dt

The last time the row was modified. Note that this field is updated by application logic rather than a trigger.

class twclient.models.Hashtag(**kwargs)[source]

Bases: TimestampsMixin, UniqueMixin, Base

A hashtag.

This class represents a hashtag used in a tweet. Hashtags are stored without their ‘#’ prefix.

hashtag_id

An autoincrement row ID, not assigned by Twitter.

name

The text of the hashtag.

insert_dt

The load time of the row into the database.

modified_dt

The last time the row was modified. Note that this field is updated by application logic rather than a trigger.

unique_hash

A hash to implement a unique constraint without length limits.

class twclient.models.Symbol(**kwargs)[source]

Bases: TimestampsMixin, UniqueMixin, Base

A ticker symbol as detected by Twitter.

Twitter attempts to detect stock-ticker symbols when prefixed with a ‘$’ (a “cashtag”) and make them searchable on its service. The detection is crude and based on regular expressions, with no attempt to ensure the detected symbols are real ticker symbols. Such cashtags are represented by this class, stored without their leading ‘$’.

symbol_id

An autoincrement row ID, not assigned by Twitter.

name

The text of the symbol / cashtag.

insert_dt

The load time of the row into the database.

modified_dt

The last time the row was modified. Note that this field is updated by application logic rather than a trigger.

unique_hash

A hash to implement a unique constraint without length limits.

class twclient.models.Url(**kwargs)[source]

Bases: TimestampsMixin, UniqueMixin, Base

A URL.

This class represents a unique URL that appeared somewhere on Twitter. URLs are represented as a separate entity rather than multiple text fields to make it easier to track them across the many places they may appear.

url_id

An autoincrement row ID, not assigned by Twitter.

url

The URL itself.

insert_dt

The load time of the row into the database.

modified_dt

The last time the row was modified. Note that this field is updated by application logic rather than a trigger.

unique_hash

A hash to implement a unique constraint without length limits.

class twclient.models.MediaType(**kwargs)[source]

Bases: TimestampsMixin, UniqueMixin, FromTweepyInterface, Base

The type of a media object in a tweet.

Media objects in tweets can have several types: photo, video, and others. This table tracks the observed types of media.

media_type_id

An autoincrement row ID, not assigned by Twitter.

name

Twitter’s name for this type of media.

classmethod from_tweepy(obj, session=None)[source]

Instantiate a class instance from a tweepy object.

This method constructs a class instance from a tweepy object. A sqlalchemy database session is optional in general but may be required by some subclasses which rely on UniqueMixin.as_unique.

Parameters:
  • obj (instance of tweepy.Model) – The tweepy object to use as data source.

  • session (instance of sqlalchemy.orm.session.Session, or None) – A sqlalchemy database session.

Returns:

The constructed class instance.

Return type:

Instance of FromTweepyInterface

insert_dt

The load time of the row into the database.

modified_dt

The last time the row was modified. Note that this field is updated by application logic rather than a trigger.

unique_hash

A hash to implement a unique constraint without length limits.

class twclient.models.Media(**kwargs)[source]

Bases: TimestampsMixin, FromTweepyInterface, Base

A media object on Twitter.

This class represents media objects like photos and videos which occur in tweets. The media’s type and primary URL are recorded here. Note though that for videos the “primary” URL is a thumbnail still; see the MediaVariant class for the various video files actually available.

media_id

Twitter’s ID for this media.

media_type_id

The type of media.

media_url_id

The primary URL of the media. For videos, this URL is a thumbnail still and the video files themselves are recorded in the MediaVariant class.

aspect_ratio_width

For videos, the first number of the video’s aspect ratio (e.g., “16” for a “16:9” aspect ratio). None / NULL otherwise.

aspect_ratio_height

For videos, the second number of the video’s aspect ratio (e.g., “9” for a “16:9” aspect ratio). None / NULL otherwise.

duration

For videos, the duration of the video in seconds. None / NULL otherwise.

classmethod from_tweepy(obj, session=None)[source]

Instantiate a class instance from a tweepy object.

This method constructs a class instance from a tweepy object. A sqlalchemy database session is optional in general but may be required by some subclasses which rely on UniqueMixin.as_unique.

Parameters:
  • obj (instance of tweepy.Model) – The tweepy object to use as data source.

  • session (instance of sqlalchemy.orm.session.Session, or None) – A sqlalchemy database session.

Returns:

The constructed class instance.

Return type:

Instance of FromTweepyInterface

insert_dt

The load time of the row into the database.

modified_dt

The last time the row was modified. Note that this field is updated by application logic rather than a trigger.

class twclient.models.MediaVariant(**kwargs)[source]

Bases: TimestampsMixin, ListFromTweepyInterface, Base

Specific video files for a Twitter video, which may have more than one.

Twitter video media may have multiple specific video files associated with them, providing different bitrates, file formats or other properties. Each such video file is a MediaVariant.

media_id

The Twitter media entity this video file is associated with.

url_id

The URL of the video file.

bitrate

The bitrate of the video file.

content_type

The MIME type of the video.

classmethod list_from_tweepy(obj, session=None)[source]

Return a list of class instances from a tweepy object.

This method constructs multiple class instances from a tweepy object. A sqlalchemy database session is optional in general but may be required by some subclasses which rely on UniqueMixin.as_unique. Subclasses are expected to provide this logic, and the implementation here is an abstract stub that raises NotImplementedError.

Parameters:
  • obj (instance of tweepy.Model) – The tweepy object to use as data source.

  • session (instance of sqlalchemy.orm.session.Session, or None) – A sqlalchemy database session.

Returns:

The constructed class instances.

Return type:

List of instances of FromTweepyInterface

insert_dt

The load time of the row into the database.

modified_dt

The last time the row was modified. Note that this field is updated by application logic rather than a trigger.

class twclient.models.UserMention(**kwargs)[source]

Bases: TimestampsMixin, ListFromTweepyInterface, Base

A mention of a user in a tweet.

This class represents a mention of a user in a tweet. There can be multiple mentions of the same user in the same tweet, so we also record the position in the tweet where the mention occurred.

tweet_id

Twitter’s ID for the tweet.

start_index

The index in the tweet content of the first character of this mention.

end_index

The index in the tweet content of the last character of this mention.

mentioned_user_id

The Twitter user ID of the user who was mentioned.

classmethod list_from_tweepy(obj, session=None)[source]

Return a list of class instances from a tweepy object.

This method constructs multiple class instances from a tweepy object. A sqlalchemy database session is optional in general but may be required by some subclasses which rely on UniqueMixin.as_unique. Subclasses are expected to provide this logic, and the implementation here is an abstract stub that raises NotImplementedError.

Parameters:
  • obj (instance of tweepy.Model) – The tweepy object to use as data source.

  • session (instance of sqlalchemy.orm.session.Session, or None) – A sqlalchemy database session.

Returns:

The constructed class instances.

Return type:

List of instances of FromTweepyInterface

insert_dt

The load time of the row into the database.

modified_dt

The last time the row was modified. Note that this field is updated by application logic rather than a trigger.

class twclient.models.HashtagMention(**kwargs)[source]

Bases: TimestampsMixin, ListFromTweepyInterface, Base

A mention of a hashtag in a tweet.

This class represents a mention of a hashtag in a tweet. There can be multiple mentions of the same hashtag in the same tweet, so we also record the position in the tweet where the mention occurred.

tweet_id

Twitter’s ID for the tweet.

start_index

The index in the tweet content of the first character of this mention.

end_index

The index in the tweet content of the last character of this mention.

hashtag_id

The (non-Twitter) ID of the hashtag that was mentioned.

classmethod list_from_tweepy(obj, session=None)[source]

Return a list of class instances from a tweepy object.

This method constructs multiple class instances from a tweepy object. A sqlalchemy database session is optional in general but may be required by some subclasses which rely on UniqueMixin.as_unique. Subclasses are expected to provide this logic, and the implementation here is an abstract stub that raises NotImplementedError.

Parameters:
  • obj (instance of tweepy.Model) – The tweepy object to use as data source.

  • session (instance of sqlalchemy.orm.session.Session, or None) – A sqlalchemy database session.

Returns:

The constructed class instances.

Return type:

List of instances of FromTweepyInterface

insert_dt

The load time of the row into the database.

modified_dt

The last time the row was modified. Note that this field is updated by application logic rather than a trigger.

class twclient.models.SymbolMention(**kwargs)[source]

Bases: TimestampsMixin, ListFromTweepyInterface, Base

A mention of a ticker symbol (“cashtag”) in a tweet.

This class represents a mention of a ticker symbol in a tweet. There can be multiple mentions of the same symbol in the same tweet, so we also record the position in the tweet where the mention occurred.

tweet_id

Twitter’s ID for the tweet.

start_index

The index in the tweet content of the first character of this mention.

end_index

The index in the tweet content of the last character of this mention.

symbol_id

The (non-Twitter) ID of the symbol that was mentioned.

classmethod list_from_tweepy(obj, session=None)[source]

Return a list of class instances from a tweepy object.

This method constructs multiple class instances from a tweepy object. A sqlalchemy database session is optional in general but may be required by some subclasses which rely on UniqueMixin.as_unique. Subclasses are expected to provide this logic, and the implementation here is an abstract stub that raises NotImplementedError.

Parameters:
  • obj (instance of tweepy.Model) – The tweepy object to use as data source.

  • session (instance of sqlalchemy.orm.session.Session, or None) – A sqlalchemy database session.

Returns:

The constructed class instances.

Return type:

List of instances of FromTweepyInterface

insert_dt

The load time of the row into the database.

modified_dt

The last time the row was modified. Note that this field is updated by application logic rather than a trigger.

class twclient.models.UrlMention(**kwargs)[source]

Bases: TimestampsMixin, ListFromTweepyInterface, Base

A mention of a URL in a tweet.

This class represents a mention of a URL in a tweet. There can be multiple mentions of the same URL in the same tweet, so we also record the position in the tweet where the mention occurred. Some attributes of the URL mention itself, in addition to the URL, are also tracked here.

tweet_id

Twitter’s ID for the tweet.

start_index

The index in the tweet content of the first character of this mention.

end_index

The index in the tweet content of the last character of this mention.

url_id

The (non-Twitter) ID of the URL that was mentioned.

twitter_short_url

The t.co link that appears in the tweet body.

twitter_display_url

The display URL Twitter shows on its smartphone apps and desktop website.

expanded_short_url

If the user inputs an already shortened URL (e.g., bit.ly), Twitter resolves the URL further to a final page. In this case, we store the original short URL that the user entered here and use the resolved page as the main mentioned URL.

status

If a shortened URL input by a user was resolved further, the HTTP status code of the final GET request in the chain. None / NULL otherwise.

title

If a shortened URL input by a user was resolved further, the title of the fetched page. None / NULL otherwise.

description

If a shortened URL input by a user was resolved further, the value of the description meta tag for the fetched page. None / NULL otherwise.

classmethod list_from_tweepy(obj, session=None)[source]

Return a list of class instances from a tweepy object.

This method constructs multiple class instances from a tweepy object. A sqlalchemy database session is optional in general but may be required by some subclasses which rely on UniqueMixin.as_unique. Subclasses are expected to provide this logic, and the implementation here is an abstract stub that raises NotImplementedError.

Parameters:
  • obj (instance of tweepy.Model) – The tweepy object to use as data source.

  • session (instance of sqlalchemy.orm.session.Session, or None) – A sqlalchemy database session.

Returns:

The constructed class instances.

Return type:

List of instances of FromTweepyInterface

insert_dt

The load time of the row into the database.

modified_dt

The last time the row was modified. Note that this field is updated by application logic rather than a trigger.

class twclient.models.MediaMention(**kwargs)[source]

Bases: TimestampsMixin, ListFromTweepyInterface, Base

A mention of a media object in a tweet.

This class represents a mention of a media object like a photo or video in a tweet. There can be multiple mentions of the same media in the same tweet, so we also record the position in the tweet where the mention occurred. Some attributes of the media mention itself, in addition to the URL, are also tracked here.

tweet_id

Twitter’s ID for the tweet.

start_index

The index in the tweet content of the first character of this mention.

end_index

The index in the tweet content of the last character of this mention.

media_id

Twitter’s ID for the media that was mentioned.

twitter_short_url

The t.co link that appears in the tweet body.

twitter_display_url

The display URL Twitter shows on its smartphone apps and desktop website.

twitter_expanded_url

The URL for the usual Twitter web viewer for this media, which one would encounter after clicking on it from Twitter.com.

insert_dt

The load time of the row into the database.

modified_dt

The last time the row was modified. Note that this field is updated by application logic rather than a trigger.

classmethod list_from_tweepy(obj, session=None)[source]

Return a list of class instances from a tweepy object.

This method constructs multiple class instances from a tweepy object. A sqlalchemy database session is optional in general but may be required by some subclasses which rely on UniqueMixin.as_unique. Subclasses are expected to provide this logic, and the implementation here is an abstract stub that raises NotImplementedError.

Parameters:
  • obj (instance of tweepy.Model) – The tweepy object to use as data source.

  • session (instance of sqlalchemy.orm.session.Session, or None) – A sqlalchemy database session.

Returns:

The constructed class instances.

Return type:

List of instances of FromTweepyInterface

twclient.target

Classes encapsulating the “targets” of certain jobs.

class twclient.target.Target(**kwargs)[source]

Bases: ABC

Encapsulate the notion of a “target” for certain kinds of jobs.

Some of the operations defined by Job classes operate on users. These users can be specified in several ways (user IDs, screen names, tags stored in the database, Twitter lists) and the classes defined here provide a consistent interface for these various ways of specifying job targets. Each class takes a number of raw targets (user IDs, tags, etc) and provides a resolve() method that calculates the corresponding model.User objects. After calling resolve(), a list of the User objects are available in the .users attribute.

Parameters:
  • targets (list of str or int) – The raw targets to be resolved to users. These will be deduplicated in a way that preserves order.

  • context (instance of job.Job, or None) – The Job instance’s database and API connections will be used as needed to resolve raw targets to users. If not passed on initialization, a context object must be passed to resolve().

targets

The list of raw targets passed in as the targets parameter, but deduplicated (without changing the relative order of any retained targets).

Type:

list of str or int

abstract resolve(context=None)[source]

Resolve this Target object into users.

The resolve() method looks up the raw targets provided at self.targets and populates several attributes of this Target instance according to the resolve_mode set on the self.context object. The .good_targets, .bad_targets and .missing_targets attributes are populated to reflect dispositions of the raw targets, as discussed in their documentation, and the .users attribute contains all users which could be resolved from any of the raw targets. If no context parameter was passed to __init__, one must be given here. If one was passed to __init__, it is replaced with the value passed here so long as bool(context) == True.

Parameters:

context (job.Job object, or None) – The Job instance to use as context for resolving targets to users.

Return type:

None

abstract property allowed_resolve_modes

The resolve modes this Target implements.

The Job instance referred to by self.context has a resolve_mode attribute, specifying how it wants Targets to look up their users. The allowed_resolve_modes attribute declares what resolve_mode values this Target instance can handle; if a Job instance with an incompatible resolve_mode is given as context, an error will be raised. This attribute must be defined in subclasses, because different types of targets are compatible or not with different values of this parameter. Consequently the version on Target is abstract.

property resolved

Has this Target been set up with a Job instance as context?

A Target instance can only resolve its users and make them available in the .users attribute after being given a Job instance as context. This attribute is True if the Target has a context and false otherwise.

property users

The users resolved from this Target’s raw targets.

The users attribute contains a list of the models.User attributes resolved from the raw targets (user IDs, screen names, Twitter lists or tags) passed to this Target instance. If no raw targets could be resolved to a models.User instance, this attribute will be an empty list. Note that raw targets may fail to resolve because they are not found in the database, if context.resolve_mode requires users to be loaded already, or because Twitter’s API returns no users or raises an error. Note also that accessing this attribute before calling resolve() will raise AttributeError.

property bad_targets

Raw targets which were supposed to be looked up via the Twitter API and which, on doing so, were found not to exist.

Specifically, these bad targets are users which were not returned by users/lookup (indicating that they are suspended, nonexistent, or otherwise bad), or lists which cause lists/show to raise an error. A list which exists but has no members is not an error. Note that a list not existing does not indicate for sure whether the owning user exists. Accessing this attribute before calling .resolve() will raise AttributeError.

property missing_targets

Raw targets which were supposed to be found in the database but were not there.

These bad targets may be users which are not present in the user table, or lists which are not found in the list table. Note that a list not being present in the database does not indicate for sure whether the owning user is. Accessing this attribute before calling .resolve() will raise AttributeError.

property good_targets

Raw targets which were successfully resolved to users, either in the database or via the Twitter API.

These are targets which, depending on the context object’s setting of resolve_mode, may have been looked for in the database or via Twitter’s API, and were found without error. Note that .good_targets and .users are different: if one target is, for example, the Twitter list named “cspan/members-of-congress”, that value will appear in .good_targets and several hundred models.User objects for the Congressional Twitter accounts will appear in .users. Accessing this attribute before calling .resolve() will raise AttributeError.

property context

The Job instance to be used as context by resolve().

The resolve() method can only look up users with a Job instance as context, to support database lookups and Twitter API calls. The context can be provided on initialization or as an argument to resolve().

class twclient.target.UserIdTarget(**kwargs)[source]

Bases: Target

A set of Twitter user IDs to resolve to users.

This class takes targets specified by Twitter’s numeric user IDs. These targets are resolved to models.User objects in one of three ways, determined by the value of context.resolve_mode. If the resolve mode is ‘fetch’, users are first looked up in the database, with any missing from the database looked up via Twitter’s API. (No users will be in missing_targets in this case, only good_targets or bad_targets.) If the mode is ‘hydrate’, all users will be looked up via Twitter’s API. If mode is ‘skip’, users not found in the database will not be looked up via Twitter API, and will be left in missing_targets. Any other resolve mode set on the context object will raise an error.

allowed_resolve_modes = ('fetch', 'hydrate', 'skip')
resolve(context=None)[source]

Resolve this Target object into users.

The resolve() method looks up the raw targets provided at self.targets and populates several attributes of this Target instance according to the resolve_mode set on the self.context object. The .good_targets, .bad_targets and .missing_targets attributes are populated to reflect dispositions of the raw targets, as discussed in their documentation, and the .users attribute contains all users which could be resolved from any of the raw targets. If no context parameter was passed to __init__, one must be given here. If one was passed to __init__, it is replaced with the value passed here so long as bool(context) == True.

Parameters:

context (job.Job object, or None) – The Job instance to use as context for resolving targets to users.

Return type:

None

class twclient.target.ScreenNameTarget(**kwargs)[source]

Bases: Target

A set of screen names to resolve to users.

This class takes targets specified by Twitter screen names for users. These targets are resolved to models.User objects in one of three ways, determined by the value of context.resolve_mode. If the resolve mode is ‘fetch’, users are first looked up in the database, with any missing from the database looked up via Twitter’s API. (No users will be in missing_targets in this case, only good_targets or bad_targets.) If the mode is ‘hydrate’, all users will be looked up via Twitter’s API. If mode is ‘skip’, users not found in the database will not be looked up via Twitter API, and will be left in missing_targets. Any other resolve mode set on the context object will raise an error.

allowed_resolve_modes = ('fetch', 'hydrate', 'skip')
resolve(context=None)[source]

Resolve this Target object into users.

The resolve() method looks up the raw targets provided at self.targets and populates several attributes of this Target instance according to the resolve_mode set on the self.context object. The .good_targets, .bad_targets and .missing_targets attributes are populated to reflect dispositions of the raw targets, as discussed in their documentation, and the .users attribute contains all users which could be resolved from any of the raw targets. If no context parameter was passed to __init__, one must be given here. If one was passed to __init__, it is replaced with the value passed here so long as bool(context) == True.

Parameters:

context (job.Job object, or None) – The Job instance to use as context for resolving targets to users.

Return type:

None

class twclient.target.SelectTagTarget(**kwargs)[source]

Bases: Target

A set of user tags to resolve to users.

This class takes targets specified by user tags, as recorded in the user_tag table in the database. These tags are first looked up in the database and resolved to a list of user IDs. Any tags which do not exist in the database are added to the missing_targets attribute. The resulting list of user IDs is then resolved to models.User objects in one of two ways, determined by the value of context.resolve_mode. If the mode is ‘hydrate’, all users will be looked up via Twitter’s API. If the mode is ‘skip’, users will be returned with the data stored for them in the database. Any other resolve mode set on the context object will raise an error.

allowed_resolve_modes = ('hydrate', 'skip')
resolve(context=None)[source]

Resolve this Target object into users.

The resolve() method looks up the raw targets provided at self.targets and populates several attributes of this Target instance according to the resolve_mode set on the self.context object. The .good_targets, .bad_targets and .missing_targets attributes are populated to reflect dispositions of the raw targets, as discussed in their documentation, and the .users attribute contains all users which could be resolved from any of the raw targets. If no context parameter was passed to __init__, one must be given here. If one was passed to __init__, it is replaced with the value passed here so long as bool(context) == True.

Parameters:

context (job.Job object, or None) – The Job instance to use as context for resolving targets to users.

Return type:

None

class twclient.target.TwitterListTarget(**kwargs)[source]

Bases: Target

A set of Twitter lists to resolve to users.

This class takes Twitter lists as targets. These lists can be specified by their “full names” (that is, the owner_screen_name/slug format, like “cspan/members-of-congress”) or by their numeric IDs. The list targets are resolved to models.User objects in one of three ways, determined by the value of context.resolve_mode. If the resolve mode is ‘fetch’, the lists are first looked up in the database, with any missing lists looked up via Twitter’s API. (No lists will be in missing_targets in this case, only good_targets or bad_targets.) If the mode is ‘hydrate’, all lists will be looked up via Twitter’s API. If mode is ‘skip’, lists not found in the database will not be looked up via Twitter API, and will be left in missing_targets. Any other resolve mode set on the context object will raise an error. Note that not only the users who are list members are stored, but also the list itself and its association with the users are added to the appropriate tables in the context object’s database session.

allowed_resolve_modes = ('fetch', 'hydrate', 'skip')
resolve(context=None)[source]

Resolve this Target object into users.

The resolve() method looks up the raw targets provided at self.targets and populates several attributes of this Target instance according to the resolve_mode set on the self.context object. The .good_targets, .bad_targets and .missing_targets attributes are populated to reflect dispositions of the raw targets, as discussed in their documentation, and the .users attribute contains all users which could be resolved from any of the raw targets. If no context parameter was passed to __init__, one must be given here. If one was passed to __init__, it is replaced with the value passed here so long as bool(context) == True.

Parameters:

context (job.Job object, or None) – The Job instance to use as context for resolving targets to users.

Return type:

None

twclient.twitter_api

A Twitter API wrapper for job classes.

class twclient.twitter_api.TwitterApi(**kwargs)[source]

Bases: object

Wrap calls to the Twitter API with cursoring, common parameters, etc.

This class provides a wrapper around calls to the Twitter API (ultimately through tweepy) to handle cursoring of results, provide common parameters that all twclient requests will want, and do other housekeeping. In particular, it transparently multiplexes access to the passed API credentials (via authpool.AuthPoolAPI), with the effect of combining their rate limits. Not all Twitter API methods are supported.

Parameters:

auths (list of tweepy.AuthHandler) – The Twitter API credentials to use.

auths

The parameter passed to __init__.

Type:

list of tweepy.AuthHandler

pool

The AuthPoolAPI constructed with the API credentials.

Type:

instance of models.AuthPoolAPI

make_api_call(method, cursor=False, max_items=None, **kwargs)[source]

Make a call to the Twitter API.

This method wraps calls to the Twitter API, handling cursoring of results, and returns a generator of all the results returned. If there are a very large number of results (as in fetching follow graph data), using this method will avoid reading them all into memory at once. The calls are made through authpool.AuthPoolAPI, transparently handling use of multiple sets of API credentials.

Parameters:
  • method (str) – The name of the tweepy.API method to call.

  • cursor (bool) – Should the results be cursored? Generally should be True if the method may return more than one object.

  • max_items (int, or None) – If the method returns more than max_items items, should the surplus results be discarded? Must be None if cursor == False.

  • **kwargs – Other arguments to pass through to the tweepy method, or to tweepy.Cursor if cursor == True. (In the latter case, tweepy.Cursor will in turn pass arguments it doesn’t consume through to the method ultimately called.)

Yields:

instances of tweepy.Model, int, or other objects – The results returned by the Twitter API call. The type of object yielded depends on the value of the method argument.

rate_limit_status(consumer_key=None)[source]

Call Twitter’s application/rate_limit_status method.

This method wraps one or more calls to Twitter’s API method application/rate_limit_status and returns information on rate limits. The default is to request rate limit info for all credentials given in the self.auths attribute. See also the method of the same name on authpool.AuthPoolAPI, which this method calls.

Parameters:

consumer_key (str or None) – The consumer key for a particular set of API credentials whose rate limit should be checked. If None, check all credentials in self.auths. If not None, the value must match one of the consumer keys in self.auths.

Returns:

A dictionary whose keys are the OAuth consumer keys and whose values are the Twitter API’s json responses describing rate limit information.

Return type:

dict

lookup_users(user_ids=None, screen_names=None)[source]

Call Twitter’s users/lookup API method.

This method wraps a call to Twitter’s users/lookup method (via the make_api_call method, and ultimately via tweepy.API’s lookup_users method), which “hydrates” the requested users. For greater consistency, error handling differs from the Twitter method and from tweepy. Requested users which do not exist, are suspended, or are otherwise unavailable are not returned, without raising an error (as in both tweepy and the underlying Twitter API). If no requested users exist, no error is raised (unlike tweepy and the Twitter API), and instead an empty list is returned. At least one of user_ids and screen_names, or both, must be specified. The most recent tweet for each user is returned in extended mode (i.e., not truncated to 140 characters), and entities are requested.

Parameters:
  • user_ids (list of int, or None) – Twitter user IDs to hydrate. May be passed simultaneously with screen_names.

  • screen_names (list of str, or None) – Twitter screen names to hydrate. May be passed simultaneously with user_ids.

Yields:

list of tweepy.User – The hydrated user objects.

get_list(list_id=None, slug=None, owner_screen_name=None, owner_id=None)[source]

Call Twitter’s lists/show API method.

This method wraps a call to Twitter’s lists/show method (via the make_api_call method, and ultimately via tweepy.API’s get_list method). The target list must be specified, as in the list_members method, by exactly one of list_id or slug as well as exactly one of owner_screen_name or owner_id.

Parameters:
  • list_id (int, or None) – Twitter’s integer ID for the list.

  • slug (str, or None) – The slug of the list (not its display name).

  • owner_screen_name (str, or None) – The screen name of the user who owns the list (without the @ sign).

  • owner_id (int, or None) – Twitter’s integer user ID for the user who owns the list.

Returns:

The hydrated list object.

Return type:

tweepy.List object

list_members(list_id=None, slug=None, owner_screen_name=None, owner_id=None)[source]

Call Twitter’s lists/members API method.

This method wraps a call to Twitter’s lists/members API method (via the make_api_call method and ultimately via tweepy.API’s list_members method). The target list must be specified, as in the get_list method, by exactly one of list_id or slug as well as exactly one of owner_screen_name or owner_id.

Parameters:
  • list_id (int, or None) – Twitter’s integer ID for the list.

  • slug (str, or None) – The slug of the list (not its display name).

  • owner_screen_name (str, or None) – The screen name of the user who owns the list (without the @ sign).

  • owner_id (int, or None) – Twitter’s integer user ID for the user who owns the list.

Yields:

tweepy.User objects – Hydrated user objects for the members of the list.

user_timeline(user_id=None, screen_name=None, **kwargs)[source]

Call Twitter’s statuses/user_timeline API method.

This method wraps a call to Twitter’s statuses/user_timeline API method (via the make_api_call method and ultimately via tweepy.API’s user_timeline method). Exactly one of user_id and screen_name must be specified. Tweets are requested in extended mode (i.e., not truncated to 140 characters) and both retweets and replies are included. Note that because of a limitation in the underlying Twitter API method, if the user has posted and not deleted more than approximately 3200 tweets, only the most recent approximately 3200 will be retrieved.

Parameters:
  • user_id (int, or None) – Twitter’s integer user ID for the user whose tweets are to be retrieved.

  • screen_name (str, or None) – The screen name of the user whose tweets are to be retrieved.

  • **kwargs – Further arguments to pass through to tweepy.API.user_timeline, possibly via tweepy.Cursor.

Yields:

tweepy.Tweet objects – The user’s tweets.

followers_ids(user_id=None, screen_name=None)[source]

Call Twitter’s followers/ids API method.

This method wraps a call to Twitter’s followers/ids API method (via the make_api_call method and ultimately via tweepy.API’s get_follower_ids method / followers_ids in tweepy < 4.0.0). Exactly one of user_id and screen_name must be specified.

Parameters:
  • user_id (int, or None) – Twitter’s integer user ID for the user whose followers’ IDs are to be retrieved.

  • screen_name (str, or None) – The screen name of the user whose followers’ IDs are to be retrieved.

Yields:

instances of int – The Twitter user_ids of the requested user’s followers.

friends_ids(user_id=None, screen_name=None)[source]

Call Twitter’s friends/ids API method.

This method wraps a call to Twitter’s friends/ids API method (via the make_api_call method and ultimately via tweepy.API’s get_friend_ids method / friends_ids in tweepy < 4.0.0). Exactly one of user_id and screen_name must be specified. Note that “friends” is Twitter’s term for the opposite of followers: user A’s friends are the users that A follows.

Parameters:
  • user_id (int, or None) – Twitter’s integer user ID for the user whose friends’ IDs are to be retrieved.

  • screen_name (str, or None) – The screen name of the user whose friends’ IDs are to be retrieved.

Yields:

instances of int – The Twitter user_ids of the requested user’s friends.