Skip to content

Commit

Permalink
add example :semantic
Browse files Browse the repository at this point in the history
  • Loading branch information
guoyongzhi committed Jul 21, 2021
1 parent 184d9e7 commit c4f6807
Show file tree
Hide file tree
Showing 5 changed files with 41 additions and 18 deletions.
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name = "WordCloud"
uuid = "6385f0a0-cb03-45b6-9089-4e0acc74b26b"
authors = ["guoyongzhi <[email protected]>"]
version = "0.7.2"
version = "0.7.3"

[deps]
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
Expand Down
7 changes: 3 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,10 +49,9 @@ paint(wc, "alice.png", ratio=0.5, background=outline(wc.mask, color="purple", li
## Recolor
[![recolor](res/recolor.png)](./examples/recolor.jl)
*Run the command `runexample(:recolor)` or `showexample(:recolor)` to get the result.*
## Comparison
[![compare](res/compare.png)](./examples/compare.jl)
*Run the command `runexample(:compare)` or `showexample(:compare)` to get the result.*

## Semantic
[![semantic](res/semantic.png)](./examples/semantic.jl)
*Run the command `runexample(:semantic)` or `showexample(:semantic)` to get the result.*
*The variable `WordCloud.examples` holds all available examples.*
You can also [**see more examples**](https://github.com/guo-yong-zhi/WordCloud-Gallery) or [**try it online**](https://mybinder.org/v2/gh/guo-yong-zhi/WordCloud.jl/master?filepath=examples.ipynb).
# Algorithm Description
Expand Down
49 changes: 36 additions & 13 deletions examples/embedding.jl → examples/semantic.jl
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
#md# The positions of words can be initialized with pre-trained word vectors so that similar words will appear near each other.
#md# ### Words
using WordCloud
stwords = ["us"];
words_weights = processtext(open(pkgdir(WordCloud)*"/res/Barack Obama's First Inaugural Address.txt"), stopwords=WordCloud.stopwords_en stwords)
words_weights = Dict(zip(words_weights...))
#md# ### Embeddings
#md# ### Embedding
#md# The positions of words can be initialized with pre-trained word vectors so that similar words will appear near each other.
using Embeddings
using TSne
const embtable = load_embeddings(GloVe{:en})
Expand All @@ -25,28 +25,51 @@ for k in keys(words_weights)
println("remove ", k)
end
end
embedded = tsne(hcat(values(wordvec)...)', 2)
#md# ### WordCloud
words = keys(wordvec) |> collect
vectors = hcat(values(wordvec)...)
embedded = tsne(vectors', 2)
#md#
wc = wordcloud(
words_weights,
maskshape = box,
masksize = (1000, 1000, 0),
density=0.3,
run = initwords!
density = 0.3,
colors = 0.3,
backgroundcolor = :maskcolor,
run = initwords!,
# angles = (0,45), font = "Helvetica thin", maskcolor=0.98,
)

pos = embedded
mean = sum(pos, dims=1) / size(pos, 1)
r = maximum(sqrt.(pos[:,1].^2 + pos[:,2].^2 ))
pos = (pos .- mean) ./ 2r
sz = collect(size(wc.mask))'
pos = round.(Int, pos .* sz .+ sz ./ 2)
sz = collect(reverse(size(wc.mask)))'
sz0 = collect(getparameter(wc, :masksize)[1:2])'
pos = round.(Int, pos .* sz0 .+ sz ./ 2)

setpositions!(wc, keys(wordvec)|>collect, eachrow(pos), type=setcenter!)
setpositions!(wc, words, eachrow(pos), type=setcenter!)
setstate!(wc, :placewords!)
generate!(wc, teleporting=false)
println("results are saved to embedding.png")
paint(wc, "embedding.png")
paint(wc, "semantic_embedding.png")
#md# ![](semantic_embedding.png)
#md# ### Clustering
#md# Words can be further colored according to semantic clustering
using Clustering
V = vectors
G = V' * V
H = sum(V .^ 2, dims=1)
D = max.(0, (H .+ H' .- 2G))
D ./= sum(D)/length(D)
D .= .√D #the distance matrix
tree = hclust(D, linkage=:ward)
lb = cutree(tree, h=2, k=10)
println("$(length(lb)) words are divided into $(length(unique(lb))) groups")
#md#
colors = parsecolor(:seaborn_dark)
setcolors!(wc, words, colors[lb.%length(colors).+1])
recolor!(wc, style=:reset)
paint(wc, "semantic_clustering.png")
#md# ![](semantic_clustering.png)
wc
#eval# runexample(:embedding)
#md# ![](embedding.png)
#eval# runexample(:semantic)
Binary file added res/semantic.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions src/rendering.jl
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ parsecolor(c) = parse(Colorant, c)
parsecolor(tp::Tuple) = ARGB(tp...)
parsecolor(gray::Real) = Gray(gray)
parsecolor(sc::Symbol) = parsecolor.(colorschemes[sc].colors)
parsecolor(sc::AbstractArray) = parsecolor.(sc)

issvg(d) = d isa Drawing && d.surfacetype==:svg
const SVGImageType = Drawing
Expand Down

0 comments on commit c4f6807

Please sign in to comment.