-
Notifications
You must be signed in to change notification settings - Fork 0
/
ecos_2_fit_topics.rmd
82 lines (69 loc) · 1.71 KB
/
ecos_2_fit_topics.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
---
title: "Counts topic fit"
author: "Alex White"
date: "2023-04-20"
output: html_document
---
```{r knitr-opts, include=FALSE}
knitr::opts_chunk$set(
comment = "#", collapse = TRUE, results = "hold",
fig.align = "center", dpi = 120
)
```
```{r}
library(fastTopics)
```
```{r load-pkgs, message=FALSE, warning=FALSE}
library(Matrix)
library(fastTopics)
library(ggplot2)
library(cowplot)
set.seed(1)
```
```{r load-data}
load("data/him_birds_cn.rda")
load("data/him_birds_md.rda")
load("data/him_grids_md.rda")
dim(him_birds_cn)
```
The Himalayan bird counts are "sparse"---that is, most of the counts are
zero. Over 90% of the counts are zero:
```{r nonzeros}
mean(him_birds_cn > 0)
```
We fit the model below but first convert the matrix to a sparseMatrix to save some computational effort and time.
```{r fit-topic-model, eval=FALSE}
fit <- fit_topic_model(as(him_birds_cn, "sparseMatrix"),
numiter.main = 150,
numiter.refine = 150,
k = 2
)
```
```{r plot-loglik, fig.height=2, fig.width=4}
plot_progress(fit, x = "iter", add.point.every = 10, colors = "black") +
theme_cowplot(font_size = 10)
```
```{r}
loglik <- loglik_multinom_topic_model(him_birds_cn, fit)
```
This can be used to assess how well the topic model "fits" each cell.
```{r loglik-2, fig.height=2, fig.width=4.5}
pdat <- data.frame(loglik)
ggplot(pdat, aes(loglik)) +
geom_histogram(bins = 64, color = "white", fill = "black", linewidth = 0.25) +
labs(y = "number of sites") +
theme_cowplot(font_size = 10)
```
```{r}
set.seed(1)
de <- de_analysis(fit, him_birds_cn,
pseudocount = 0.1,
control = list(ns = 1e4, nc = 4)
)
```
```{r}
volcano_plot(de, k = 2, labels = as.character(as.vector(colnames(him_birds_cn))))
```
```{r}
sessionInfo()
```