export: entry point for impact graph record export #241

glouppe · 2015-06-23T07:14:03Z

This is meant to be an entry point for exporting the metadata necessary for building the impact graph of a record. Could we try this branch on the test machine before deployment?

CC: @jalavik @eamonnmag

eamonnmag · 2015-06-23T08:09:51Z

Awesome! Thanks Gilles!

jalavik · 2015-06-23T16:37:47Z

I deployed it on http://inspireheptest.cern.ch but what is the URL? /unapi/impact?recid=1340812 does not work.

@kaplun

kaplun · 2015-06-23T19:14:48Z

modules/websearch/lib/websearch_webinterface.py

+            date = date[0]
+            match_obj = re.search("\d\d\d\d", date)
+            if match_obj is not None:
+                return int(match_obj.group())


@glouppe you can better retrieve the year using bibrec.earliest_date which is maintained by a bibcheck scripts that looks into these and many more fields.

kaplun · 2015-06-23T19:16:56Z

Nope @jalavik, unapi is just close in the patch but is unrelated. The entry point is:
http://inspireheptest.cern.ch/record/.../export/impact

kaplun · 2015-06-23T19:20:15Z

Seems quite heavy at the moment. It takes 17s for a random record to fetch it but seems due to the fact that's a JSON of 1MB!
https://inspireheptest.cern.ch/record/1340769/export/impact

glouppe · 2015-06-23T19:47:39Z

Thanks for the deployment @jalavik !

Yes, https://inspireheptest.cern.ch/record/1340769/export/impact is the correct url. It seems to work as it should :)

And yeah, it is slow because 1) the fetchers are not very fast in the place, 2) it is doing tons of queries at the application level and 3) because JSON is awfully uncompact :(

eamonnmag · 2015-06-24T08:04:59Z

Works well, I’ve just used the url with the initial publication list with my code and it works nicely! Gilles, just spoke with Jan, apparently redis is installed on that machine, so we could have the results cached there.

On 23 Jun 2015, at 21:47, Gilles Louppe [email protected] wrote:

Thanks for the deployment @jalavik https://github.com/jalavik !

Yes, https://inspireheptest.cern.ch/record/1340769/export/impact https://inspireheptest.cern.ch/record/1340769/export/impact is the correct url. It seems to work as it should :)

And yeah, it is slow because 1) the fetchers are not very fast in the place and 2) because JSON is awfully uncompact :(

—
Reply to this email directly or view it on GitHub #241 (comment).

jalavik · 2015-06-24T08:20:39Z

@glouppe Yeah, we store user sessions in redis and maybe even the cite dict, so there exist already an integration. Check out how to use it here: https://github.com/inspirehep/inspire/blob/master/bibtasklets/bst_webcoll_postprocess.py#L26-L39

kaplun · 2015-06-24T08:26:22Z

Take into consideration that this JSON blob is 1MB so if you imagine someone is crawling us, and ping all the entries, they will all end up in redis 1MB*1Mrecord gosh. Plus, when at CERN it looks like it's quite fast to serve the given export. So we could probably save some time by serving gzipped JSON (using regular Apache support), which halves the size of the file.

jalavik · 2015-10-22T22:00:39Z

Does it make sense to revive this now that we will include the impact graph in the new design, or has it been superseded by a different technique? @glouppe @eamonnmag @kaplun

eamonnmag · 2015-10-22T22:22:25Z

There needs to be an endpoint in the new system to get the data back to
build the graph. However it will be similar to what is already there for
the citations and references.

On Fri, 23 Oct 2015 00:00 Jan Åge Lavik [email protected] wrote:

Does it make sense to revive this now that we will include the impact
graph in the new design, or has it been superseded by a different
technique? @glouppe https://github.com/glouppe @eamonnmag
https://github.com/eamonnmag @kaplun https://github.com/kaplun

—
Reply to this email directly or view it on GitHub
#241 (comment).

kaplun · 2015-10-23T09:17:29Z

For this to be effective we have to pre-compute quite a certain amount of information (since for every citing and cited paper we need to keep the year of that paper, its self-citation flag WRT the current paper and we need to keep up-to-date the number of citations of that paper).

Now that I have studied elasticsearch a bit more I think it doesn't look feasible for the task (since we would end up storing a massive amount of information in every record and we would need to refresh every record linked with any touched record.

Probably time to revive using a dedicated graph db for this task (basically we could register to a signal, everytime a record is touched, and propagate the change to the Graph DB only for that very node. The only tricky computation is the self-citation flag, since we need to pay attention whether a given update is related to the author list and hence we need to recompute all the self-citation flags in all the incoming and outgoing relations.)

export: entry point for impact graph record export

f10c937

kaplun reviewed Jun 23, 2015
View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

export: entry point for impact graph record export #241

export: entry point for impact graph record export #241

glouppe commented Jun 23, 2015

eamonnmag commented Jun 23, 2015

jalavik commented Jun 23, 2015

kaplun Jun 23, 2015

kaplun commented Jun 23, 2015

kaplun commented Jun 23, 2015

glouppe commented Jun 23, 2015

eamonnmag commented Jun 24, 2015

jalavik commented Jun 24, 2015

kaplun commented Jun 24, 2015

jalavik commented Oct 22, 2015

eamonnmag commented Oct 22, 2015

kaplun commented Oct 23, 2015

export: entry point for impact graph record export #241

Are you sure you want to change the base?

export: entry point for impact graph record export #241

Conversation

glouppe commented Jun 23, 2015

eamonnmag commented Jun 23, 2015

jalavik commented Jun 23, 2015

kaplun Jun 23, 2015

Choose a reason for hiding this comment

kaplun commented Jun 23, 2015

kaplun commented Jun 23, 2015

glouppe commented Jun 23, 2015

eamonnmag commented Jun 24, 2015

jalavik commented Jun 24, 2015

kaplun commented Jun 24, 2015

jalavik commented Oct 22, 2015

eamonnmag commented Oct 22, 2015

kaplun commented Oct 23, 2015