Input formatting¶

generate_utils.combined_edges(x, y)[source]¶

Given the edges from retweets and from mentions the function combine them both

Parameters

x – Edges from retweets
y – Edges from mentions

Returns

List of lists with the edges combined

generate_utils.direct_subgraphs(subgraphs)[source]¶

Given a networkx undirected list of subgraph the function returns all the graphs as directed

Parameters: subgraphs – List of undirected networkx subgraphs
Returns: List of directed subgraphs as networkx objects

generate_utils.filter_by_interest(df, interest)[source]¶

Given a non filtered DataFrame the function returns the dataframe filtered by the column interest

Parameters

df – DataFrame with all the tweets
interest – Active interest from the different categories available from the Lynguo tool

Returns

DataFrame containing the tweets filtered by the selected interest

generate_utils.filter_by_subtopic(df, keywords2, stopwords2)[source]¶

Given a previously filtered DataFrame the function returns the dataframe filtered according to the new subtopic of interest

Parameters

keywords2 – List of words acting as key to filter the dataframe
stopwords2 – List of words destined to filter out the tweets that contain them

Returns

DataFrame with the tweets containing the keywords

generate_utils.filter_by_topic(df, keywords, stopwords)[source]¶

Given a DataFrame the function returns the dataframe filtered according the given keywords and stopwords

Parameters

df – Dataframe with all the tweets
keywords – List of words acting as key to filter the dataframe
stopwords – List of words destined to filter out the tweets containing them

Returns

DataFrame with the tweets containing the keywords

generate_utils.getDays(df)[source]¶

generate_utils.get_all(df, keywords=None, stopwords=None, keywords2=None, stopwords2=None, interest=None)[source]¶

Given a DataFrame containing all the tweets, the function returns all the tweet and the user who wrote or retweeted it in a nested list

Parameters

df – DataFrame with all the tweets
keywords – List of words acting as key to filter the DataFrame
stopwords – List of words destined to filter out the tweets that contain them
keywords2 – List of words acting as key to filter the DataFrame according to a subtopic
stopwords2 – List of words destined to filter out the tweets that contain them according to a subtopic
interest – Active interest from the different categories available from the Lynguo tool

Returns

Nested lists containing tweet and user

generate_utils.get_cites(df, keywords=None, stopwords=None, keywords2=None, stopwords2=None, interest=None)[source]¶

Given a DataFrame containing tweets the function returns those tweets belonging to the citations type, removing the retweets. The function also applies the filtering processes

Parameters

df – DataFrame with all the tweets
keywords – List of words acting as key to filter the DataFrame
stopwords – List of words destined to filter out the tweets that contain them
keywords2 – List of words acting as key to filter the DataFrame according to a subtopic
stopwords2 – List of words destined to filter out the tweets that contain them according to a subtopic
interest – Active interest from the different categories available from the Lynguo tool

Returns

Nested lists containing normal tweet, not retweets, and user who wrote the tweet

generate_utils.get_degrees(G)[source]¶

Given a Networkx directed graph, the function returns a CSV file containing the centrality measures of the graph (Indegree, Outdegree, Betweenness and Eigenvector) sorted by indegree

Parameters: G – Networkx directed graph
Returns: Dataframe with the users, centrality measures and rank based of those measures, sorted by the indegree

generate_utils.get_edges(values)[source]¶

Given a list of lists containing tweets or retweets and users the function returns the edges to create a network

Parameters: values – List of lists with the tweet and user
Returns: List of lists containing the user and the @ inside the tweet

generate_utils.get_edgesHashRT(values)[source]¶

Given a list containing retweets, the function finds all the hashtags inside the text

Parameters: values – list with the retweets
Returns: list with all the hashtags in these retweets

generate_utils.get_edgesHashRT2(values)[source]¶

Given a list of list with users and retweets, the function returns the users and the hashtags in their retweets

Parameters: values – List of lists with user and retweet
Returns: List of lists, where each list contains user and hashtags

generate_utils.get_edgesMain(values)[source]¶

Given a list of tweets, the function returns the hashtags inside those tweets.

Parameters: values – List of tweets
Returns: List with the hashtags inside the tweets

generate_utils.get_edgesmain2(values)[source]¶

Given a list of edges, the function returns the hashtags inside the tweet and relates them to the user

Parameters: values – List of lists containing the edges (user, tweet)
Returns: List of lists, in each list the user and the hashtag used by them is stored

generate_utils.get_hashtagsRT(df, keywords=None, stopwords=None, keywords2=None, stopwords2=None, interest=None)[source]¶

Given a DataFrame containing all the tweets the function returns a list containing the different texts that are retweets in order to find the hashtags (#) inside them

Parameters

df – DataFrame with all the tweets
keywords – List of words acting as key to filter the DataFrame
stopwords – List of words destined to filter out the tweets that contain them
keywords2 – List of words acting as key to filter the DataFrame according to a subtopic
stopwords2 – List of words destined to filter out the tweets that contain them according to a subtopic
interest – Active interest from the different categories available from the Lynguo tool

Returns

List with the retweets

generate_utils.get_hashtagsRT2(df, keywords=None, stopwords=None, keywords2=None, stopwords2=None, interest=None)[source]¶

Given a DataFrame containing all the tweets, the function returns a list of lists containing the retweets and the users who retweeted them, in order to find the hashtags inside those retweets

Parameters

df – DataFrame with all the tweets
keywords – List of words acting as key to filter the DataFrame
stopwords – List of words destined to filter out the tweets that contain them
keywords2 – List of words acting as key to filter the DataFrame according to a subtopic
stopwords2 – List of words destined to filter out the tweets that contain them according to a subtopic
interest – Active interest from the different categories available from the Lynguo tool

Returns

List of lists, where each list contains user and retweet

generate_utils.get_hashtagsmain(df, keywords=None, stopwords=None, keywords2=None, stopwords2=None, interest=None)[source]¶

Given a DataFrame containing all the tweets, the function returns a list containing all the mentions from the DataFrame

Parameters

df – DataFrame with all the tweets
keywords – List of words acting as key to filter the DataFrame
stopwords – List of words destined to filter out the tweets that contain them
keywords2 – List of words acting as key to filter the DataFrame according to a subtopic
stopwords2 – List of words destined to filter out the tweets that contain them according to a subtopic
interest – Active interest from the different categories available from the Lynguo tool

Returns

List with all the tweets which are mentions

generate_utils.get_hashtagsmain2(df, keywords=None, stopwords=None, keywords2=None, stopwords2=None, interest=None)[source]¶

Given a DataFrame with all the tweets, the function returns a list of list in which each list contains the user and the written tweet

Parameters

df – DataFrame with all the tweets
keywords – List of words acting as key to filter the DataFrame
stopwords – List of words destined to filter out the tweets that contain them
keywords2 – List of words acting as key to filter the DataFrame according to a subtopic
stopwords2 – List of words destined to filter out the tweets that contain them according to a subtopic
interest – Active interest from the different categories available from the Lynguo tool

Returns

List of lists, each list contains user, written tweet

generate_utils.get_retweets(df, keywords=None, stopwords=None, keywords2=None, stopwords2=None, interest=None)[source]¶

Given a DataFrame containing all the tweets the function returns all those tweets which are retweets (RT:@). It also applies the filtering procces

Parameters

df – DataFrame with all the tweets
keywords – List of words acting as key to filter the DataFrame
stopwords – List of words destined to filter out the tweets that contain them
keywords2 – List of words acting as key to filter the DataFrame according to a subtopic
stopwords2 – List of words destined to filter out the tweets that contain them according to a subtopic
interest – Active interest from the different categories available from the Lynguo tool

Returns

Nested lists containing the retweet and the user who retweeted it

generate_utils.get_subgraphs(graph)[source]¶

Given a networkx Graph the function returns the subgraphs stored in a list

Parameters: graph – Networkx undirected graph
Returns: list of subgraphs as networkx objects

generate_utils.get_twomodeHashMain(df, keywords=None, stopwords=None, keywords2=None, stopwords2=None, interest=None, filter_hashtags=None)[source]¶

Given a DataFrame with all the tweets, the function returns a networkx bipartite graph with the users and hashtags (outside retweets) as nodes

Parameters

df – DataFrame with all the tweets
keywords – List of words acting as key to filter the DataFrame
stopwords – List of words destined to filter out the tweets that contain them
keywords2 – List of words acting as key to filter the DataFrame according to a subtopic
stopwords2 – List of words destined to filter out the tweets that contain them according to a subtopic
interest – Active interest from the different categories available from the Lynguo tool
filter_hashtags – Boolean, to remove the predefined citizen science most common hashtags

Returns

Networkx bipartite graph

generate_utils.get_twomodeHashRT(df, keywords=None, stopwords=None, keywords2=None, stopwords2=None, interest=None, filter_hashtags=None)[source]¶

Given a DataFrame with all the tweets, the function returns a networkx bipartite graph with the users and hashtags (inside retweets) as nodes

Parameters

df – DataFrame with all the tweets
keywords – List of words acting as key to filter the DataFrame
stopwords – List of words destined to filter out the tweets that contain them
keywords2 – List of words acting as key to filter the DataFrame according to a subtopic
stopwords2 – List of words destined to filter out the tweets that contain them according to a subtopic
interest – Active interest from the different categories available from the Lynguo tool
filter_hashtags – Boolean, to remove the predefined citizen science most common hashtags

Returns

Networkx bipartite graph

generate_utils.get_twomodeRT(df, keywords=None, stopwords=None, keywords2=None, stopwords2=None, interest=None)[source]¶

Given a DataFrame containing al the tweets, the function returns a bipartite graph with the users and the retweets as nodes. The retweets are displayed as weighted nodes

Parameters

df – DataFrame with all the tweets
keywords – List of words acting as key to filter the DataFrame
stopwords – List of words destined to filter out the tweets that contain them
keywords2 – List of words acting as key to filter the DataFrame according to a subtopic
stopwords2 – List of words destined to filter out the tweets that contain them according to a subtopic
interest – Active interest from the different categories available from the Lynguo tool

Returns

Networkx bipartite graph

generate_utils.make_weightedDiGraph(ejes)[source]¶

generate_utils.most_common(df, keywords=None, stopwords=None, keywords2=None, stopwords2=None, interest=None)[source]¶

Given a DataFrame containing all the tweets, the function returns a dictionary with the most used words and the number of appearances, a list of these words and a list with the number of times these words appear

Parameters

df – DataFrame with all the tweets
keywords – List of words acting as key to filter the DataFrame
stopwords – List of words destined to filter out the tweets that contain them
keywords2 – List of words acting as key to filter the DataFrame according to a subtopic
stopwords2 – List of words destined to filter out the tweets that contain them according to a subtopic
interest – Active interest from the different categories available from the Lynguo tool

Returns

Tuples dict containing word and number of times, list with the words and list with the times these words appear

generate_utils.most_commonwc(df, keywords=None, stopwords=None, keywords2=None, stopwords2=None, interest=None)[source]¶

Given a DataFrame containing all the tweets, the function returns a wordcloud with the most used words in these tweets

Parameters

df – DataFrame with all the tweets
keywords – List of words acting as key to filter the DataFrame
stopwords – List of words destined to filter out the tweets that contain them
keywords2 – List of words acting as key to filter the DataFrame according to a subtopic
stopwords2 – List of words destined to filter out the tweets that contain them according to a subtopic
interest – Active interest from the different categories available from the Lynguo tool

Returns

Wordcloud with the most used words displayed in it

generate_utils.plotbarchart(numberbars, x, y, title=None, xlabel=None, ylabel=None)[source]¶

Given a number of elements to plot and the elements in x and y axis the function returns a barchart

Parameters

numberbars – Number of elements to plot in the chart
x – Elements for x axis
y – Elements for y axis, number of appearances of the x elements
title – Title for the figure, defaults to None
xlabel – Label for the x axis, defaults to None
ylabel – Label for the y axis, defaults to None

generate_utils.prepare_hashtags(hashtags, stopwords=None)[source]¶

Given a list of hashtags, the function returns the number of appearances of each hashtags and a list of unique hashtags

Parameters

hashtags – list of hashtags
stopwords – Word or list of words destined to be filtered out from the list of hashtags

Returns

Ordered list with the number of appearances of each hashtag and a list of unique hashtags

generate_utils.prepare_hashtagsmain(list, stopwords=None)[source]¶

Given a list of hashtags, the function returns the number of appearances of each hashtag and a unique list of hashtags.

Parameters

list – List of hashtags
stopwords – List of words destined to filter out the desired hashtags from the list

Returns

Ordered list with the number of appearances of each hashtag and a list of unique hashtags

generate_utils.scatterplot(x, y)[source]¶

Given the elements for x axis and the number of the elements for y axis the function returns a scatterplot

Parameters

x – Elements for the x axis
y – Elements for the y axis, number of appearances of the x elements

generate_utils.sentiment_analyser(df_entry, keywords=None, stopwords=None, interest=None)[source]¶

Given a DataFrame containing all the tweets,the function returns a CSV containing the user and the score for each tweet from Vader sentiment

Parameters

df_entry – A DataFrame containing all the tweets
keywords – List of words acting as key to filter the DataFrame
stopwords – List of words destined to filter out the tweets that contain them
interest – Active interest from the different categories available from the Lynguo tool

Returns

CSV with the columns User, Text and Sentiment

generate_utils.transform_format(val)[source]¶

generate_utils.wordcloudRT(df, keywords=None, stopwords=None, keywords2=None, stopwords2=None, interest=None)[source]¶

Given a DataFrame containing all the tweets, the function returns a wordcloud with the hashtags (inside retweets) displayed by frequency

Parameters

df – DataFrame with all the tweets
keywords – List of words acting as key to filter the DataFrame
stopwords – List of words destined to filter out the tweets that contain them
keywords2 – List of words acting as key to filter the DataFrame according to a subtopic
stopwords2 – List of words destined to filter out the tweets that contain them according to a subtopic
interest – Active interest from the different categories available from the Lynguo tool

Returns

Wordcloud with the hashtags by frequency

generate_utils.wordcloudRT_logo(df, keywords=None, stopwords=None, keywords2=None, stopwords2=None, interest=None, image=None)[source]¶

Given a DataFrame containing all the tweets, the function returns a wordcloud with the hashtags (inside retweets) displayed by frequency

Parameters

df – DataFrame with all the tweets
keywords – List of words acting as key to filter the DataFrame
stopwords – List of words destined to filter out the tweets that contain them
keywords2 – List of words acting as key to filter the DataFrame according to a subtopic
stopwords2 – List of words destined to filter out the tweets that contain them according to a subtopic
interest – Active interest from the different categories available from the Lynguo tool
image – Image file to plot the wordcloud inside

Returns

Wordcloud inside desired image with the hashtags by frequency

generate_utils.wordcloud_mainhtlogo(df, keywords=None, stopwords=None, keywords2=None, stopwords2=None, interest=None, image=None)[source]¶

Given a DataFrame containing all the tweets, the function returns a wordcloud with the hashtags (outside retweets) displayed by frequency

Parameters

df – DataFrame with all the tweets
keywords – List of words acting as key to filter the DataFrame
stopwords – List of words destined to filter out the tweets that contain them
keywords2 – List of words acting as key to filter the DataFrame according to a subtopic
stopwords2 – List of words destined to filter out the tweets that contain them according to a subtopic
interest – Active interest from the different categories available from the Lynguo tool
image – Image file to plot the wordcloud inside

Returns

Wordcloud inside desired image with the hashtags by frequency

generate_utils.wordcloudmain(df, keywords=None, stopwords=None, interest=None)[source]¶

Given a DataFrame containing all the tweets, the function returns a wordcloud with the hashtags (outside retweets) displayed by frequency

Parameters

df – DataFrame with all the tweets
keywords – List of words acting as key to filter the DataFrame
stopwords – List of words destined to filter out the tweets that contain them
keywords2 – List of words acting as key to filter the DataFrame according to a subtopic
stopwords2 – List of words destined to filter out the tweets that contain them according to a subtopic
interest – Active interest from the different categories available from the Lynguo tool

Returns

Wordcloud with the hashtags by frequency