Input formatting¶
- generate_utils.combined_edges(x, y)[source]¶
Given the edges from retweets and from mentions the function combine them both
- Parameters
x – Edges from retweets
y – Edges from mentions
- Returns
List of lists with the edges combined
- generate_utils.direct_subgraphs(subgraphs)[source]¶
Given a networkx undirected list of subgraph the function returns all the graphs as directed
- Parameters
subgraphs – List of undirected networkx subgraphs
- Returns
List of directed subgraphs as networkx objects
- generate_utils.filter_by_interest(df, interest)[source]¶
Given a non filtered DataFrame the function returns the dataframe filtered by the column interest
- Parameters
df – DataFrame with all the tweets
interest – Active interest from the different categories available from the Lynguo tool
- Returns
DataFrame containing the tweets filtered by the selected interest
- generate_utils.filter_by_subtopic(df, keywords2, stopwords2)[source]¶
Given a previously filtered DataFrame the function returns the dataframe filtered according to the new subtopic of interest
- Parameters
keywords2 – List of words acting as key to filter the dataframe
stopwords2 – List of words destined to filter out the tweets that contain them
- Returns
DataFrame with the tweets containing the keywords
- generate_utils.filter_by_topic(df, keywords, stopwords)[source]¶
Given a DataFrame the function returns the dataframe filtered according the given keywords and stopwords
- Parameters
df – Dataframe with all the tweets
keywords – List of words acting as key to filter the dataframe
stopwords – List of words destined to filter out the tweets containing them
- Returns
DataFrame with the tweets containing the keywords
- generate_utils.get_all(df, keywords=None, stopwords=None, keywords2=None, stopwords2=None, interest=None)[source]¶
Given a DataFrame containing all the tweets, the function returns all the tweet and the user who wrote or retweeted it in a nested list
- Parameters
df – DataFrame with all the tweets
keywords – List of words acting as key to filter the DataFrame
stopwords – List of words destined to filter out the tweets that contain them
keywords2 – List of words acting as key to filter the DataFrame according to a subtopic
stopwords2 – List of words destined to filter out the tweets that contain them according to a subtopic
interest – Active interest from the different categories available from the Lynguo tool
- Returns
Nested lists containing tweet and user
- generate_utils.get_cites(df, keywords=None, stopwords=None, keywords2=None, stopwords2=None, interest=None)[source]¶
Given a DataFrame containing tweets the function returns those tweets belonging to the citations type, removing the retweets. The function also applies the filtering processes
- Parameters
df – DataFrame with all the tweets
keywords – List of words acting as key to filter the DataFrame
stopwords – List of words destined to filter out the tweets that contain them
keywords2 – List of words acting as key to filter the DataFrame according to a subtopic
stopwords2 – List of words destined to filter out the tweets that contain them according to a subtopic
interest – Active interest from the different categories available from the Lynguo tool
- Returns
Nested lists containing normal tweet, not retweets, and user who wrote the tweet
- generate_utils.get_degrees(G)[source]¶
Given a Networkx directed graph, the function returns a CSV file containing the centrality measures of the graph (Indegree, Outdegree, Betweenness and Eigenvector) sorted by indegree
- Parameters
G – Networkx directed graph
- Returns
Dataframe with the users, centrality measures and rank based of those measures, sorted by the indegree
- generate_utils.get_edges(values)[source]¶
Given a list of lists containing tweets or retweets and users the function returns the edges to create a network
- Parameters
values – List of lists with the tweet and user
- Returns
List of lists containing the user and the @ inside the tweet
- generate_utils.get_edgesHashRT(values)[source]¶
Given a list containing retweets, the function finds all the hashtags inside the text
- Parameters
values – list with the retweets
- Returns
list with all the hashtags in these retweets
- generate_utils.get_edgesHashRT2(values)[source]¶
Given a list of list with users and retweets, the function returns the users and the hashtags in their retweets
- Parameters
values – List of lists with user and retweet
- Returns
List of lists, where each list contains user and hashtags
- generate_utils.get_edgesMain(values)[source]¶
Given a list of tweets, the function returns the hashtags inside those tweets.
- Parameters
values – List of tweets
- Returns
List with the hashtags inside the tweets
- generate_utils.get_edgesmain2(values)[source]¶
Given a list of edges, the function returns the hashtags inside the tweet and relates them to the user
- Parameters
values – List of lists containing the edges (user, tweet)
- Returns
List of lists, in each list the user and the hashtag used by them is stored
- generate_utils.get_hashtagsRT(df, keywords=None, stopwords=None, keywords2=None, stopwords2=None, interest=None)[source]¶
Given a DataFrame containing all the tweets the function returns a list containing the different texts that are retweets in order to find the hashtags (#) inside them
- Parameters
df – DataFrame with all the tweets
keywords – List of words acting as key to filter the DataFrame
stopwords – List of words destined to filter out the tweets that contain them
keywords2 – List of words acting as key to filter the DataFrame according to a subtopic
stopwords2 – List of words destined to filter out the tweets that contain them according to a subtopic
interest – Active interest from the different categories available from the Lynguo tool
- Returns
List with the retweets
- generate_utils.get_hashtagsRT2(df, keywords=None, stopwords=None, keywords2=None, stopwords2=None, interest=None)[source]¶
Given a DataFrame containing all the tweets, the function returns a list of lists containing the retweets and the users who retweeted them, in order to find the hashtags inside those retweets
- Parameters
df – DataFrame with all the tweets
keywords – List of words acting as key to filter the DataFrame
stopwords – List of words destined to filter out the tweets that contain them
keywords2 – List of words acting as key to filter the DataFrame according to a subtopic
stopwords2 – List of words destined to filter out the tweets that contain them according to a subtopic
interest – Active interest from the different categories available from the Lynguo tool
- Returns
List of lists, where each list contains user and retweet
- generate_utils.get_hashtagsmain(df, keywords=None, stopwords=None, keywords2=None, stopwords2=None, interest=None)[source]¶
Given a DataFrame containing all the tweets, the function returns a list containing all the mentions from the DataFrame
- Parameters
df – DataFrame with all the tweets
keywords – List of words acting as key to filter the DataFrame
stopwords – List of words destined to filter out the tweets that contain them
keywords2 – List of words acting as key to filter the DataFrame according to a subtopic
stopwords2 – List of words destined to filter out the tweets that contain them according to a subtopic
interest – Active interest from the different categories available from the Lynguo tool
- Returns
List with all the tweets which are mentions
- generate_utils.get_hashtagsmain2(df, keywords=None, stopwords=None, keywords2=None, stopwords2=None, interest=None)[source]¶
Given a DataFrame with all the tweets, the function returns a list of list in which each list contains the user and the written tweet
- Parameters
df – DataFrame with all the tweets
keywords – List of words acting as key to filter the DataFrame
stopwords – List of words destined to filter out the tweets that contain them
keywords2 – List of words acting as key to filter the DataFrame according to a subtopic
stopwords2 – List of words destined to filter out the tweets that contain them according to a subtopic
interest – Active interest from the different categories available from the Lynguo tool
- Returns
List of lists, each list contains user, written tweet
- generate_utils.get_retweets(df, keywords=None, stopwords=None, keywords2=None, stopwords2=None, interest=None)[source]¶
Given a DataFrame containing all the tweets the function returns all those tweets which are retweets (RT:@). It also applies the filtering procces
- Parameters
df – DataFrame with all the tweets
keywords – List of words acting as key to filter the DataFrame
stopwords – List of words destined to filter out the tweets that contain them
keywords2 – List of words acting as key to filter the DataFrame according to a subtopic
stopwords2 – List of words destined to filter out the tweets that contain them according to a subtopic
interest – Active interest from the different categories available from the Lynguo tool
- Returns
Nested lists containing the retweet and the user who retweeted it
- generate_utils.get_subgraphs(graph)[source]¶
Given a networkx Graph the function returns the subgraphs stored in a list
- Parameters
graph – Networkx undirected graph
- Returns
list of subgraphs as networkx objects
- generate_utils.get_twomodeHashMain(df, keywords=None, stopwords=None, keywords2=None, stopwords2=None, interest=None, filter_hashtags=None)[source]¶
Given a DataFrame with all the tweets, the function returns a networkx bipartite graph with the users and hashtags (outside retweets) as nodes
- Parameters
df – DataFrame with all the tweets
keywords – List of words acting as key to filter the DataFrame
stopwords – List of words destined to filter out the tweets that contain them
keywords2 – List of words acting as key to filter the DataFrame according to a subtopic
stopwords2 – List of words destined to filter out the tweets that contain them according to a subtopic
interest – Active interest from the different categories available from the Lynguo tool
filter_hashtags – Boolean, to remove the predefined citizen science most common hashtags
- Returns
Networkx bipartite graph
- generate_utils.get_twomodeHashRT(df, keywords=None, stopwords=None, keywords2=None, stopwords2=None, interest=None, filter_hashtags=None)[source]¶
Given a DataFrame with all the tweets, the function returns a networkx bipartite graph with the users and hashtags (inside retweets) as nodes
- Parameters
df – DataFrame with all the tweets
keywords – List of words acting as key to filter the DataFrame
stopwords – List of words destined to filter out the tweets that contain them
keywords2 – List of words acting as key to filter the DataFrame according to a subtopic
stopwords2 – List of words destined to filter out the tweets that contain them according to a subtopic
interest – Active interest from the different categories available from the Lynguo tool
filter_hashtags – Boolean, to remove the predefined citizen science most common hashtags
- Returns
Networkx bipartite graph
- generate_utils.get_twomodeRT(df, keywords=None, stopwords=None, keywords2=None, stopwords2=None, interest=None)[source]¶
Given a DataFrame containing al the tweets, the function returns a bipartite graph with the users and the retweets as nodes. The retweets are displayed as weighted nodes
- Parameters
df – DataFrame with all the tweets
keywords – List of words acting as key to filter the DataFrame
stopwords – List of words destined to filter out the tweets that contain them
keywords2 – List of words acting as key to filter the DataFrame according to a subtopic
stopwords2 – List of words destined to filter out the tweets that contain them according to a subtopic
interest – Active interest from the different categories available from the Lynguo tool
- Returns
Networkx bipartite graph
- generate_utils.most_common(df, keywords=None, stopwords=None, keywords2=None, stopwords2=None, interest=None)[source]¶
Given a DataFrame containing all the tweets, the function returns a dictionary with the most used words and the number of appearances, a list of these words and a list with the number of times these words appear
- Parameters
df – DataFrame with all the tweets
keywords – List of words acting as key to filter the DataFrame
stopwords – List of words destined to filter out the tweets that contain them
keywords2 – List of words acting as key to filter the DataFrame according to a subtopic
stopwords2 – List of words destined to filter out the tweets that contain them according to a subtopic
interest – Active interest from the different categories available from the Lynguo tool
- Returns
Tuples dict containing word and number of times, list with the words and list with the times these words appear
- generate_utils.most_commonwc(df, keywords=None, stopwords=None, keywords2=None, stopwords2=None, interest=None)[source]¶
Given a DataFrame containing all the tweets, the function returns a wordcloud with the most used words in these tweets
- Parameters
df – DataFrame with all the tweets
keywords – List of words acting as key to filter the DataFrame
stopwords – List of words destined to filter out the tweets that contain them
keywords2 – List of words acting as key to filter the DataFrame according to a subtopic
stopwords2 – List of words destined to filter out the tweets that contain them according to a subtopic
interest – Active interest from the different categories available from the Lynguo tool
- Returns
Wordcloud with the most used words displayed in it
- generate_utils.plotbarchart(numberbars, x, y, title=None, xlabel=None, ylabel=None)[source]¶
Given a number of elements to plot and the elements in x and y axis the function returns a barchart
- Parameters
numberbars – Number of elements to plot in the chart
x – Elements for x axis
y – Elements for y axis, number of appearances of the x elements
title – Title for the figure, defaults to None
xlabel – Label for the x axis, defaults to None
ylabel – Label for the y axis, defaults to None
- generate_utils.prepare_hashtags(hashtags, stopwords=None)[source]¶
Given a list of hashtags, the function returns the number of appearances of each hashtags and a list of unique hashtags
- Parameters
hashtags – list of hashtags
stopwords – Word or list of words destined to be filtered out from the list of hashtags
- Returns
Ordered list with the number of appearances of each hashtag and a list of unique hashtags
- generate_utils.prepare_hashtagsmain(list, stopwords=None)[source]¶
Given a list of hashtags, the function returns the number of appearances of each hashtag and a unique list of hashtags.
- Parameters
list – List of hashtags
stopwords – List of words destined to filter out the desired hashtags from the list
- Returns
Ordered list with the number of appearances of each hashtag and a list of unique hashtags
- generate_utils.scatterplot(x, y)[source]¶
Given the elements for x axis and the number of the elements for y axis the function returns a scatterplot
- Parameters
x – Elements for the x axis
y – Elements for the y axis, number of appearances of the x elements
- generate_utils.sentiment_analyser(df_entry, keywords=None, stopwords=None, interest=None)[source]¶
Given a DataFrame containing all the tweets,the function returns a CSV containing the user and the score for each tweet from Vader sentiment
- Parameters
df_entry – A DataFrame containing all the tweets
keywords – List of words acting as key to filter the DataFrame
stopwords – List of words destined to filter out the tweets that contain them
interest – Active interest from the different categories available from the Lynguo tool
- Returns
CSV with the columns User, Text and Sentiment
- generate_utils.wordcloudRT(df, keywords=None, stopwords=None, keywords2=None, stopwords2=None, interest=None)[source]¶
Given a DataFrame containing all the tweets, the function returns a wordcloud with the hashtags (inside retweets) displayed by frequency
- Parameters
df – DataFrame with all the tweets
keywords – List of words acting as key to filter the DataFrame
stopwords – List of words destined to filter out the tweets that contain them
keywords2 – List of words acting as key to filter the DataFrame according to a subtopic
stopwords2 – List of words destined to filter out the tweets that contain them according to a subtopic
interest – Active interest from the different categories available from the Lynguo tool
- Returns
Wordcloud with the hashtags by frequency
- generate_utils.wordcloudRT_logo(df, keywords=None, stopwords=None, keywords2=None, stopwords2=None, interest=None, image=None)[source]¶
Given a DataFrame containing all the tweets, the function returns a wordcloud with the hashtags (inside retweets) displayed by frequency
- Parameters
df – DataFrame with all the tweets
keywords – List of words acting as key to filter the DataFrame
stopwords – List of words destined to filter out the tweets that contain them
keywords2 – List of words acting as key to filter the DataFrame according to a subtopic
stopwords2 – List of words destined to filter out the tweets that contain them according to a subtopic
interest – Active interest from the different categories available from the Lynguo tool
image – Image file to plot the wordcloud inside
- Returns
Wordcloud inside desired image with the hashtags by frequency
- generate_utils.wordcloud_mainhtlogo(df, keywords=None, stopwords=None, keywords2=None, stopwords2=None, interest=None, image=None)[source]¶
Given a DataFrame containing all the tweets, the function returns a wordcloud with the hashtags (outside retweets) displayed by frequency
- Parameters
df – DataFrame with all the tweets
keywords – List of words acting as key to filter the DataFrame
stopwords – List of words destined to filter out the tweets that contain them
keywords2 – List of words acting as key to filter the DataFrame according to a subtopic
stopwords2 – List of words destined to filter out the tweets that contain them according to a subtopic
interest – Active interest from the different categories available from the Lynguo tool
image – Image file to plot the wordcloud inside
- Returns
Wordcloud inside desired image with the hashtags by frequency
- generate_utils.wordcloudmain(df, keywords=None, stopwords=None, interest=None)[source]¶
Given a DataFrame containing all the tweets, the function returns a wordcloud with the hashtags (outside retweets) displayed by frequency
- Parameters
df – DataFrame with all the tweets
keywords – List of words acting as key to filter the DataFrame
stopwords – List of words destined to filter out the tweets that contain them
keywords2 – List of words acting as key to filter the DataFrame according to a subtopic
stopwords2 – List of words destined to filter out the tweets that contain them according to a subtopic
interest – Active interest from the different categories available from the Lynguo tool
- Returns
Wordcloud with the hashtags by frequency