Skip to content

Recipes

Eiso Kant edited this page Jul 5, 2017 · 4 revisions

This is the list of useful code snippets which use wmd-relax.

Adding the recipe created by @wbecker in https://github.com/src-d/wmd-relax/issues/14

Recipe 1

I've been playing with this code and have it working with fetch_20newsgroups from sklearn, just to test it out, but have been coming up with a bunch of useful code snippets that might be useful for other people who are using this.

For example, I have enabled the logs to see what is going on inside by adding this:

import logging
import sys
logger = logging.getLogger("WMD")
logger.addHandler(logging.StreamHandler(sys.stdout))

(I found this useful, since I'm using max_time=1 as a parameter to nearest_neighbours to see when it jumps out early)

I've also hacked this to work with your code, using those embeddings (this replaces step [7]):

from wmd import WMD
from sklearn.preprocessing import normalize
X_train = normalize(vect.fit_transform(docs_train), norm='l1', copy=False)
X_test = vect.transform(docs_test)

embeddings = np.array(W_common, dtype=np.float32)
nbow = {}
i = 0
for el in X_train[0:trainSize]:
  name = "#" + str(i)
  nbow[name] = (name, el.indices, np.array(X_train[i, el.indices].A.ravel(), dtype=np.float32))
  i += 1

calc = WMD(embeddings, nbow, vocabulary_min=2,main_loop_log_interval=1)
Clone this wiki locally