Input formatting

generate_utils.combined_edges(x, y)[source]

Given the edges from retweets and from mentions the function combine them both

Parameters
  • x – Edges from retweets

  • y – Edges from mentions

Returns

List of lists with the edges combined

generate_utils.direct_subgraphs(subgraphs)[source]

Given a networkx undirected list of subgraph the function returns all the graphs as directed

Parameters

subgraphs – List of undirected networkx subgraphs

Returns

List of directed subgraphs as networkx objects

generate_utils.filter_by_interest(df, interest)[source]

Given a non filtered DataFrame the function returns the dataframe filtered by the column interest

Parameters
  • df – DataFrame with all the tweets

  • interest – Active interest from the different categories available from the Lynguo tool

Returns

DataFrame containing the tweets filtered by the selected interest

generate_utils.filter_by_subtopic(df, keywords2, stopwords2)[source]

Given a previously filtered DataFrame the function returns the dataframe filtered according to the new subtopic of interest

Parameters
  • keywords2 – List of words acting as key to filter the dataframe

  • stopwords2 – List of words destined to filter out the tweets that contain them

Returns

DataFrame with the tweets containing the keywords

generate_utils.filter_by_topic(df, keywords, stopwords)[source]

Given a DataFrame the function returns the dataframe filtered according the given keywords and stopwords

Parameters
  • df – Dataframe with all the tweets

  • keywords – List of words acting as key to filter the dataframe

  • stopwords – List of words destined to filter out the tweets containing them

Returns

DataFrame with the tweets containing the keywords

generate_utils.getDays(df)[source]
generate_utils.get_all(df, keywords=None, stopwords=None, keywords2=None, stopwords2=None, interest=None)[source]

Given a DataFrame containing all the tweets, the function returns all the tweet and the user who wrote or retweeted it in a nested list

Parameters
  • df – DataFrame with all the tweets

  • keywords – List of words acting as key to filter the DataFrame

  • stopwords – List of words destined to filter out the tweets that contain them

  • keywords2 – List of words acting as key to filter the DataFrame according to a subtopic

  • stopwords2 – List of words destined to filter out the tweets that contain them according to a subtopic

  • interest – Active interest from the different categories available from the Lynguo tool

Returns

Nested lists containing tweet and user

generate_utils.get_cites(df, keywords=None, stopwords=None, keywords2=None, stopwords2=None, interest=None)[source]

Given a DataFrame containing tweets the function returns those tweets belonging to the citations type, removing the retweets. The function also applies the filtering processes

Parameters
  • df – DataFrame with all the tweets

  • keywords – List of words acting as key to filter the DataFrame

  • stopwords – List of words destined to filter out the tweets that contain them

  • keywords2 – List of words acting as key to filter the DataFrame according to a subtopic

  • stopwords2 – List of words destined to filter out the tweets that contain them according to a subtopic

  • interest – Active interest from the different categories available from the Lynguo tool

Returns

Nested lists containing normal tweet, not retweets, and user who wrote the tweet

generate_utils.get_degrees(G)[source]

Given a Networkx directed graph, the function returns a CSV file containing the centrality measures of the graph (Indegree, Outdegree, Betweenness and Eigenvector) sorted by indegree

Parameters

G – Networkx directed graph

Returns

Dataframe with the users, centrality measures and rank based of those measures, sorted by the indegree

generate_utils.get_edges(values)[source]

Given a list of lists containing tweets or retweets and users the function returns the edges to create a network

Parameters

values – List of lists with the tweet and user

Returns

List of lists containing the user and the @ inside the tweet

generate_utils.get_edgesHashRT(values)[source]

Given a list containing retweets, the function finds all the hashtags inside the text

Parameters

values – list with the retweets

Returns

list with all the hashtags in these retweets

generate_utils.get_edgesHashRT2(values)[source]

Given a list of list with users and retweets, the function returns the users and the hashtags in their retweets

Parameters

values – List of lists with user and retweet

Returns

List of lists, where each list contains user and hashtags

generate_utils.get_edgesMain(values)[source]

Given a list of tweets, the function returns the hashtags inside those tweets.

Parameters

values – List of tweets

Returns

List with the hashtags inside the tweets

generate_utils.get_edgesmain2(values)[source]

Given a list of edges, the function returns the hashtags inside the tweet and relates them to the user

Parameters

values – List of lists containing the edges (user, tweet)

Returns

List of lists, in each list the user and the hashtag used by them is stored

generate_utils.get_hashtagsRT(df, keywords=None, stopwords=None, keywords2=None, stopwords2=None, interest=None)[source]

Given a DataFrame containing all the tweets the function returns a list containing the different texts that are retweets in order to find the hashtags (#) inside them

Parameters
  • df – DataFrame with all the tweets

  • keywords – List of words acting as key to filter the DataFrame

  • stopwords – List of words destined to filter out the tweets that contain them

  • keywords2 – List of words acting as key to filter the DataFrame according to a subtopic

  • stopwords2 – List of words destined to filter out the tweets that contain them according to a subtopic

  • interest – Active interest from the different categories available from the Lynguo tool

Returns

List with the retweets

generate_utils.get_hashtagsRT2(df, keywords=None, stopwords=None, keywords2=None, stopwords2=None, interest=None)[source]

Given a DataFrame containing all the tweets, the function returns a list of lists containing the retweets and the users who retweeted them, in order to find the hashtags inside those retweets

Parameters
  • df – DataFrame with all the tweets

  • keywords – List of words acting as key to filter the DataFrame

  • stopwords – List of words destined to filter out the tweets that contain them

  • keywords2 – List of words acting as key to filter the DataFrame according to a subtopic

  • stopwords2 – List of words destined to filter out the tweets that contain them according to a subtopic

  • interest – Active interest from the different categories available from the Lynguo tool

Returns

List of lists, where each list contains user and retweet

generate_utils.get_hashtagsmain(df, keywords=None, stopwords=None, keywords2=None, stopwords2=None, interest=None)[source]

Given a DataFrame containing all the tweets, the function returns a list containing all the mentions from the DataFrame

Parameters
  • df – DataFrame with all the tweets

  • keywords – List of words acting as key to filter the DataFrame

  • stopwords – List of words destined to filter out the tweets that contain them

  • keywords2 – List of words acting as key to filter the DataFrame according to a subtopic

  • stopwords2 – List of words destined to filter out the tweets that contain them according to a subtopic

  • interest – Active interest from the different categories available from the Lynguo tool

Returns

List with all the tweets which are mentions

generate_utils.get_hashtagsmain2(df, keywords=None, stopwords=None, keywords2=None, stopwords2=None, interest=None)[source]

Given a DataFrame with all the tweets, the function returns a list of list in which each list contains the user and the written tweet

Parameters
  • df – DataFrame with all the tweets

  • keywords – List of words acting as key to filter the DataFrame

  • stopwords – List of words destined to filter out the tweets that contain them

  • keywords2 – List of words acting as key to filter the DataFrame according to a subtopic

  • stopwords2 – List of words destined to filter out the tweets that contain them according to a subtopic

  • interest – Active interest from the different categories available from the Lynguo tool

Returns

List of lists, each list contains user, written tweet

generate_utils.get_retweets(df, keywords=None, stopwords=None, keywords2=None, stopwords2=None, interest=None)[source]

Given a DataFrame containing all the tweets the function returns all those tweets which are retweets (RT:@). It also applies the filtering procces

Parameters
  • df – DataFrame with all the tweets

  • keywords – List of words acting as key to filter the DataFrame

  • stopwords – List of words destined to filter out the tweets that contain them

  • keywords2 – List of words acting as key to filter the DataFrame according to a subtopic

  • stopwords2 – List of words destined to filter out the tweets that contain them according to a subtopic

  • interest – Active interest from the different categories available from the Lynguo tool

Returns

Nested lists containing the retweet and the user who retweeted it

generate_utils.get_subgraphs(graph)[source]

Given a networkx Graph the function returns the subgraphs stored in a list

Parameters

graph – Networkx undirected graph

Returns

list of subgraphs as networkx objects

generate_utils.get_twomodeHashMain(df, keywords=None, stopwords=None, keywords2=None, stopwords2=None, interest=None, filter_hashtags=None)[source]

Given a DataFrame with all the tweets, the function returns a networkx bipartite graph with the users and hashtags (outside retweets) as nodes

Parameters
  • df – DataFrame with all the tweets

  • keywords – List of words acting as key to filter the DataFrame

  • stopwords – List of words destined to filter out the tweets that contain them

  • keywords2 – List of words acting as key to filter the DataFrame according to a subtopic

  • stopwords2 – List of words destined to filter out the tweets that contain them according to a subtopic

  • interest – Active interest from the different categories available from the Lynguo tool

  • filter_hashtags – Boolean, to remove the predefined citizen science most common hashtags

Returns

Networkx bipartite graph

generate_utils.get_twomodeHashRT(df, keywords=None, stopwords=None, keywords2=None, stopwords2=None, interest=None, filter_hashtags=None)[source]

Given a DataFrame with all the tweets, the function returns a networkx bipartite graph with the users and hashtags (inside retweets) as nodes

Parameters
  • df – DataFrame with all the tweets

  • keywords – List of words acting as key to filter the DataFrame

  • stopwords – List of words destined to filter out the tweets that contain them

  • keywords2 – List of words acting as key to filter the DataFrame according to a subtopic

  • stopwords2 – List of words destined to filter out the tweets that contain them according to a subtopic

  • interest – Active interest from the different categories available from the Lynguo tool

  • filter_hashtags – Boolean, to remove the predefined citizen science most common hashtags

Returns

Networkx bipartite graph

generate_utils.get_twomodeRT(df, keywords=None, stopwords=None, keywords2=None, stopwords2=None, interest=None)[source]

Given a DataFrame containing al the tweets, the function returns a bipartite graph with the users and the retweets as nodes. The retweets are displayed as weighted nodes

Parameters
  • df – DataFrame with all the tweets

  • keywords – List of words acting as key to filter the DataFrame

  • stopwords – List of words destined to filter out the tweets that contain them

  • keywords2 – List of words acting as key to filter the DataFrame according to a subtopic

  • stopwords2 – List of words destined to filter out the tweets that contain them according to a subtopic

  • interest – Active interest from the different categories available from the Lynguo tool

Returns

Networkx bipartite graph

generate_utils.make_weightedDiGraph(ejes)[source]
generate_utils.most_common(df, keywords=None, stopwords=None, keywords2=None, stopwords2=None, interest=None)[source]

Given a DataFrame containing all the tweets, the function returns a dictionary with the most used words and the number of appearances, a list of these words and a list with the number of times these words appear

Parameters
  • df – DataFrame with all the tweets

  • keywords – List of words acting as key to filter the DataFrame

  • stopwords – List of words destined to filter out the tweets that contain them

  • keywords2 – List of words acting as key to filter the DataFrame according to a subtopic

  • stopwords2 – List of words destined to filter out the tweets that contain them according to a subtopic

  • interest – Active interest from the different categories available from the Lynguo tool

Returns

Tuples dict containing word and number of times, list with the words and list with the times these words appear

generate_utils.most_commonwc(df, keywords=None, stopwords=None, keywords2=None, stopwords2=None, interest=None)[source]

Given a DataFrame containing all the tweets, the function returns a wordcloud with the most used words in these tweets

Parameters
  • df – DataFrame with all the tweets

  • keywords – List of words acting as key to filter the DataFrame

  • stopwords – List of words destined to filter out the tweets that contain them

  • keywords2 – List of words acting as key to filter the DataFrame according to a subtopic

  • stopwords2 – List of words destined to filter out the tweets that contain them according to a subtopic

  • interest – Active interest from the different categories available from the Lynguo tool

Returns

Wordcloud with the most used words displayed in it

generate_utils.plotbarchart(numberbars, x, y, title=None, xlabel=None, ylabel=None)[source]

Given a number of elements to plot and the elements in x and y axis the function returns a barchart

Parameters
  • numberbars – Number of elements to plot in the chart

  • x – Elements for x axis

  • y – Elements for y axis, number of appearances of the x elements

  • title – Title for the figure, defaults to None

  • xlabel – Label for the x axis, defaults to None

  • ylabel – Label for the y axis, defaults to None

generate_utils.prepare_hashtags(hashtags, stopwords=None)[source]

Given a list of hashtags, the function returns the number of appearances of each hashtags and a list of unique hashtags

Parameters
  • hashtags – list of hashtags

  • stopwords – Word or list of words destined to be filtered out from the list of hashtags

Returns

Ordered list with the number of appearances of each hashtag and a list of unique hashtags

generate_utils.prepare_hashtagsmain(list, stopwords=None)[source]

Given a list of hashtags, the function returns the number of appearances of each hashtag and a unique list of hashtags.

Parameters
  • list – List of hashtags

  • stopwords – List of words destined to filter out the desired hashtags from the list

Returns

Ordered list with the number of appearances of each hashtag and a list of unique hashtags

generate_utils.scatterplot(x, y)[source]

Given the elements for x axis and the number of the elements for y axis the function returns a scatterplot

Parameters
  • x – Elements for the x axis

  • y – Elements for the y axis, number of appearances of the x elements

generate_utils.sentiment_analyser(df_entry, keywords=None, stopwords=None, interest=None)[source]

Given a DataFrame containing all the tweets,the function returns a CSV containing the user and the score for each tweet from Vader sentiment

Parameters
  • df_entry – A DataFrame containing all the tweets

  • keywords – List of words acting as key to filter the DataFrame

  • stopwords – List of words destined to filter out the tweets that contain them

  • interest – Active interest from the different categories available from the Lynguo tool

Returns

CSV with the columns User, Text and Sentiment

generate_utils.transform_format(val)[source]
generate_utils.wordcloudRT(df, keywords=None, stopwords=None, keywords2=None, stopwords2=None, interest=None)[source]

Given a DataFrame containing all the tweets, the function returns a wordcloud with the hashtags (inside retweets) displayed by frequency

Parameters
  • df – DataFrame with all the tweets

  • keywords – List of words acting as key to filter the DataFrame

  • stopwords – List of words destined to filter out the tweets that contain them

  • keywords2 – List of words acting as key to filter the DataFrame according to a subtopic

  • stopwords2 – List of words destined to filter out the tweets that contain them according to a subtopic

  • interest – Active interest from the different categories available from the Lynguo tool

Returns

Wordcloud with the hashtags by frequency

Given a DataFrame containing all the tweets, the function returns a wordcloud with the hashtags (inside retweets) displayed by frequency

Parameters
  • df – DataFrame with all the tweets

  • keywords – List of words acting as key to filter the DataFrame

  • stopwords – List of words destined to filter out the tweets that contain them

  • keywords2 – List of words acting as key to filter the DataFrame according to a subtopic

  • stopwords2 – List of words destined to filter out the tweets that contain them according to a subtopic

  • interest – Active interest from the different categories available from the Lynguo tool

  • image – Image file to plot the wordcloud inside

Returns

Wordcloud inside desired image with the hashtags by frequency

Given a DataFrame containing all the tweets, the function returns a wordcloud with the hashtags (outside retweets) displayed by frequency

Parameters
  • df – DataFrame with all the tweets

  • keywords – List of words acting as key to filter the DataFrame

  • stopwords – List of words destined to filter out the tweets that contain them

  • keywords2 – List of words acting as key to filter the DataFrame according to a subtopic

  • stopwords2 – List of words destined to filter out the tweets that contain them according to a subtopic

  • interest – Active interest from the different categories available from the Lynguo tool

  • image – Image file to plot the wordcloud inside

Returns

Wordcloud inside desired image with the hashtags by frequency

generate_utils.wordcloudmain(df, keywords=None, stopwords=None, interest=None)[source]

Given a DataFrame containing all the tweets, the function returns a wordcloud with the hashtags (outside retweets) displayed by frequency

Parameters
  • df – DataFrame with all the tweets

  • keywords – List of words acting as key to filter the DataFrame

  • stopwords – List of words destined to filter out the tweets that contain them

  • keywords2 – List of words acting as key to filter the DataFrame according to a subtopic

  • stopwords2 – List of words destined to filter out the tweets that contain them according to a subtopic

  • interest – Active interest from the different categories available from the Lynguo tool

Returns

Wordcloud with the hashtags by frequency