-
My understanding of I have a dataset that is producing 611 outlier documents. and am using BERTopic as below: representation_model = KeyBERTInspired()
umap = UMAP(n_neighbors=15,
n_components=5,
min_dist=0.0,
metric='cosine',
low_memory=False,
random_state=1337)
# now fit the topic model
topic_model = BERTopic(language="english",
representation_model = representation_model,
calculate_probabilities=True,
n_gram_range=(1, 2), verbose=True,
min_topic_size=10,
umap_model=umap,
)
topics, probs = topic_model.fit_transform(docs) I notice that quite a few topics can be merged and visualize these using topics_to_merge = [
[49, 13, 21], # accommodation
[45, 19, 34, 22], # tours, tour talks, guided tours
[29, 10, 37, 44], # location, nice location, beautiful location
]
topic_model.merge_topics(docs=docs, topics_to_merge=topics_to_merge) And then finally I go on to topic reduction using: new_topics = topic_model.reduce_outliers(docs, topics, strategy="distributions")
# update the model
topic_model.update_topics(docs, topics=new_topics)
# to see the results
freq = topic_model.get_topic_info()
freq But this process re-labels all of the 611 outliers to current existing topics or create new topics. Problem is that it creates 17 additional topics and most of them are not necessary, I manually viewed the comments of these new topics and they can be easily placed in pre-existing topic labels. For example 3 of the new created topics were directly related to the And I cannot merge these topics again as its not possible to merge after I know I can set a Is this how its supposed to happen or am I doing something wrong? Update: One thing I noticed is that no matter what strategy I use or which threshold I use, the But after the |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
I believe I figured it out, It was my own mistake :( and the hint was the fact it was giving me the same number of topics that were pre-merge. I just needed to update my topic list as mentioned here: https://maartengr.github.io/BERTopic/getting_started/topicreduction/topicreduction.html#topic-reduction-after-training after updating the topics, after merging I am getting the correct results and no more unnecessary topics created :) |
Beta Was this translation helpful? Give feedback.
I believe I figured it out, It was my own mistake :( and the hint was the fact it was giving me the same number of topics that were pre-merge.
I just needed to update my topic list as mentioned here: https://maartengr.github.io/BERTopic/getting_started/topicreduction/topicreduction.html#topic-reduction-after-training
after updating the topics, after merging I am getting the correct results and no more unnecessary topics created :)