Bertopic utilities module

This module provides functionality to create topic modelling models and visualizations.

berttopic_utils.create_bert_model(documents)[source]

Given a list of sentences, it creates a BERTopic model with them.

Parameters

documents – List of sentences (strings).

Returns

The BERTopic model, the topics and the probabilities.

berttopic_utils.get_cleaned_documents(df_original)[source]

Given a df with a column Texto (tweets), this function preprocess the texts of that column removing punctuation, common words, stop words and urls.

Parameters

df_original – A df as with the tweets.

Returns

A list with the tweets (strings).

berttopic_utils.get_heatmap(model)[source]

Wrapper to create the heatmap visualization.

Parameters

model – The BERTopic model.

Returns

The heatmap visualization.

berttopic_utils.get_hierarchical_clusterin(model)[source]

Wrapper to create the hierachical visualization.

Parameters

model – The BERTopic model.

Returns

The hierachical visualization.

berttopic_utils.get_intertopic_distance(model, top_n_topics=20)[source]

Wrapper to create the intertopic distance visualization

Parameters
  • model – The BERTopic model.

  • top_n_topics – The number of topics to show.

Returns

The intertopic visualization.

berttopic_utils.get_topics_bar(model, top_n_topics=9)[source]

Wrapper to create the barchart visualization.

Parameters

model – The BERTopic model.

Returns

The barchart visualization.

berttopic_utils.get_topics_over_time(df, model, documents, topics)[source]

Wrapper to create the heatmap visualization.

Parameters

model – The BERTopic model.

Returns

The heatmap visualization.

berttopic_utils.load_model(filename)[source]

Given a filename, it loads the BERTopic model.

Parameters

filename – A string that represents the filename.

Returns

The BERTopic model.

berttopic_utils.load_topics(filename)[source]

Given a filename, it loads the topics generated when creating the BERTmodel.

Parameters

filename – A string that represents the filename.

Returns

The BERTopic topics.

berttopic_utils.remove_stopwords(texts, stop_words)[source]

Given a list of sentences, it removes the stop_words from each sentence.

Parameters
  • texts – List of sentences (strings)

  • stop_words – List of words (strings) to remove.

Returns

List of sentences (strings) without the stop words.