use EZProxy with style
This repository contains our custom UI for OCLC's EZProxy.
Links to proxied resources are presented in a MMenu, which is injected into all pages visited via the proxy server.
- 0.2.0: Typescript/webpack5 rewrite, no more jQuery, EZJump (Q4 2021)
- 0.1.3: ReNa-Backend uses JSON instead of JSONP (Q2 2019)
- 0.1.2: demo server (Q3 2016)
- 0.1.1: dependency updates, some css changes (Q3 2016)
- 0.1.0: initial release (Q2 2016)
- none currently
- Refactor grabrena.py until the salient parts can be used stand-alone
- Sanitize network interactions for XSS vectors (somewhat done?)
- Loading fonts from the EZProxy webserver fails due to lack of CORS headers
- document the build process better
- Better error handling/messages
- LocalStorage for JS modules? (currently we rely on browser caching for our 160kb blob)
We've built a little demo for the EZJump form, for those that just want to use that without the menu, see ezjump/
EZProxy has a Find/Replace directive system that allows manipulation of proxied content. We use that to insert a single <script>
-tag into every proxied page, which loads our menu code. The menu structure is generated on the fly based on (locally cached) JSON files stored on the EZProxy web server.
Said JSON is generated by querying MPG ReNa through the proxy, so that all URLs to proxied resources are transformed to proxy-by-hostname URLs, if they match a host EZProxy is configured for.
Note: The frontend side (the menu-injection part) of this project should be useful for anyone running an EZProxy installation, but we assume a specific source for the menu content: We (only) proxy resources contained in the MPG ReNa database, which provides us with a JSON API for resource collections. In the likely case that you are not a Max Planck Institute, you will have to come up with your own way of getting your menu entries into a usable (JSON-)format or adjust our code accordingly. We're happy to include your pull requests and move the MPG-related stuff to a branch.
To get this running and deployed on your EZProxy setup, quite a bit of (configuration) work is required. We'll get you started on that path with a little demo of the UI, which you can play with right away.
On your (linux/mac/cygwin/WSL) development box:
Later, on your EZProxy box:
The client part of this project is written in Typescript and SASS. Both languages are not (yet) natively supported by web browsers and need to be compiled into (ES6) Javascript and CSS, respectively. We have tooling in place to do this for you, but please keep this in mind while working with the code.
Git clone this repository, and then open a shell in the newly created directory (if in doubt: the 'ezmenu' dir with the package.json file in it) and run:
npm install
this will pull in all the project dependencies from npm. We're using dart-sass
to compile SASS to CSS.
Once that is complete, you can run:
npm start
in the project directory to start a local web server with a little demo page (navigate to http://localhost:8080 to see it). It will be really slow, but you should see a blue "GO"-button to the lower left of your screen.
While npm start
is running, webpack5 is watching the TypeScript and SASS files in frontend/src
and frontend/sass/
, and any changes made will be reflected in the demo webserver. Note that webpack-dev-server
does not write anything to disk. The files in the demo
directory are just scaffolding, best not edit them (but feel free to look at them, of course).
We'll get you started with an overview of the frontend/menu code next, then explain the configuration options and deployment.
Preface: It is not required to understand the code to get this running on your installation, but in case you're curious, take a look at the files in
frontend/src/
while you read this chapter (start withfrontend/src/index.ts
). If you're familiar with JavaScript but new to Promises, JavaScript Promises should have you covered. If you do not care for the code, please do skim this chapter regardless, as it contains explanations that you will likely need to take into consideration later.
The interface is based on MMenu, and we populate the menu with content loaded from JSON files. The expected format of these files is documented at the end of this document under Data Types, and codified in the class definitions in SetlistItem.ts
and SetlistCollectionItem.ts
in frontend/src/common/
, which also hold the code that transforms each data item into the DOM elements that form a menu entry.
The JSON files are expected to be placed in the loggedin/
directory of the EZProxy internal web server, and they are queried by our script as follows:
First, the file setlist.json
is fetched. It contains an Array of SetlistItem
s with titles, IDs/filenames and timestamps for each category to be added to the menu. The Setlist should be less than 1 kB in size and is fetched each time the menu is generated.
Then the script will check localStorage
for entries representing the content of each collection represented by a SetlistItem
, and if present, compare timestamps with the freshly downloaded Setlist. If the local data is still up-to-date, it is used, otherwise the corresponding JSON file will be downloaded from the EZProxy web server (and later stored in localStorage).
Either way, once all files are either available or the download attempts have timed out,
a HTML <nav>
-Element containing a two-level tree structure of list items representing the menu content is generated, and MMenu is initialized on that structure.
We currently do not support sub-sub-menus, simply because no-one has asked for that feature. There's probably only some type-system magic to be performed to allow this.
Since this menu is to be injected into proxied-by-hostname pages (which reside at subdomains like http://journal.domain.name.YOUR-EZPROXY.TLD) and localStorage is governed by browser same-origin rules (which mandate fully matching hostnames), the localStorage part of this script resides inside an iframe. This iframe always loads the same URL and thus has access to the same localStorage location, no matter what page/subdomain the main script is injected into. frontend/src/implant.ts
holds the code that runs inside the iframe. Both scripts communicate via the Channel Messaging API to ferry menu content from the iframe to the UI building parts of the script.
Please note that privacy enhancing browser plugins or settings can prevent this setup from executing. You should educate users to make sure that all proxy-by-hostname subdomains allow opening an iframe to https://YOUR-EZPROXY.TLD with running JavaScript inside. Also, disallowing localStorage (or using a browser without a localStorage implementation) will result in more network load and thus a slower UI experience.
At the very least, you will have to tell this script the hostname of your EZProxy installation, but you might also want to switch out the search form for your own and so on. You could also take a moment to browse mmenu examples to get a feel for what else is possible with the menu. If your list of resources is sorted alphabetically, there is some visual sugar you might want to use (look for lines commented out in frontend/src/menucfg.ts
)
We do not require you to keep the copyright notice at the bottom of the menu, we mainly put it there to prevent users from triggering bottom-of-screen interactions while using the menu (this does not mean you are not bound by the (A)GPL when using our code).
Open frontend/src/menucfg.ts
in your editor and search for 'XXX' to find the three spots you will have to edit to make the script ready for deployment on your EZProxy installation. Please refer to the comment fields in that file for additional explanations.
Attention:
- This will only give you an empty menu unless you have matching JSON files in place as well
- This will not inject the menu into any proxied pages yet, you might want to read the rest of this document before deploying anything anywhere ;)
Stop the npm start
process (CTRL-C
in the console), edit the 'XXX'-marked spots in frontend/js/ezmenu/menucfg.js
so they point to your EZProxy installation and then run:
npm run build
this will run webpack
in production mode, which creates a dist/
folder for you, the contents of which you can drop into the 'docs/' directory of your EZProxy server.
To get our menu into a proxied page, we need to add a <script>
tag to the HTML of that page. We're using the EZProxy Find/Replace directive to that end. EZproxy resources are configured as config.txt Database Stanzas, and we need to append our Find/Replace code to all of them.
During testing, we've found some websites that contain '</head>'
as part of a string inside a <script>
tag. To prevent our Find/Replace code from being triggered by that, we make use of states to ensure we only inject our <script>
tag before the closing </head>
tag of the HTML actually being rendered, and not into some JavaScript string. This leads to a rather long-ish addendum to each Stanza:
Find <head
Replace -AddState=inHtml+notInScript <head
Find <script
Replace -RemoveState=notInScript <script
Find </script
Replace -AddState=notInScript </script
Find -State=inHtml+notInScript </head>
Replace <script type="text/javascript" src="https://YOUR-EZPROXY.TLD/loggedin/injectmenu.js" defer="defer"></script></head>
You could add this to each Stanza by hand, but backend/grabrena.py
contains code to handle this for you. However, since Database Stanzas are rather loosely structured, certain conventions have to be followed to make them readable by the script:
- We require empty lines between Stanzas, and only between Stanzas
- Only a MimeFilter line may be placed between Title and URL lines
- Only comment lines (preceded by '#') or the keywords Option, ProxyHostnameEdit, MimeFilter, NeverProxy, AnonymousUrl, HTTPHeader, and Cookie may precede a Title or URL line
- Everything between the Title/URL pair and the next empty line is considered part of the Stanza
You can find our Regular Expression that tries to identify a Stanza here.
To populate the menu with links, JSON files need to be placed in the loggedin/
directory of the EZProxy web server. The names are required to be setlist.json
for the definitions of the first level of the menu, and an alphanumeric id
of each submenu plus .json
for the definitions of each submenu. Please see Data Types below for details and the demo/loggedin/
folder for examples.
This is probably the part where you will have to put in the most work yourself, unless you are a library of a Max Planck institute and you have configured resource collections on the ReNa VuFind installation ("predefined sets").
If you are one of the lucky few that fit the description, backend/grabrena.py
can get all "folders"/collections you have configured on ReNa and create the required JSON files from these.
The reasoning behind using ReNa data for the menu structure is based on the fact that the EZproxy configuration "thinks" in webservers, while users are more likely to think in journal collections, databases or search engines, which usually are not quite congruent with each other (some websites hold spades of journal collections). A menu based on the EZproxy Stanzas would be accurate, but less user-friendly. ReNa also sports descriptions and indexing fields we can use.
Find backend/grabrena.py
, and run it once. It will do nothing but create a grabreny.ini
in the same directory, which contains all configurable values with more-or-less sensible defaults and explanations.
This script is a horrible monolithic mess, and we would like to apologize in advance to anyone who needs to break it open and salvage it for usable parts. Pull Requests with a modular structure and a proper separation of concerns are very welcome.
With that said, here's what the script does:
- (optionally) Run
svn up
on a pre-existing svn repository tracking the eResources.txt repo the MPDL provides - (optionally) Add our Find/Replace code to each Stanza in a copy of that file and move it over into the EZProxy configuration directory
- (optionally) Restart EZProxy if there were changes (see below)
- Query ReNa through the (freshly restarted) EZProxy for the list of predefined sets for your MPI
- Query ReNa again for the content of each set (and for one collection aptly named "Everything")
- Filter/reformat the received data and place it in JSON files on the EZProxy internal web server
The script is meant to be run via cron
in the early morning, after the nightly update to ReNa has been completed.
Since the JSON(P) data provided by ReNa is polled through the proxy, and you will have added a HostJavascript entry for rena.mpdl.mpg.de to your EZProxy configuration (see below), any URLs inside the JSON that the proxy recognizes as resources-to-be-proxied are rewritten in transit and now point to the appropriate proxy-by-hostname subdomains.
in the user.txt
file of your EZProxy installation, add something like the following above your other user definitions:
::group=RenaOnly
grabrena:12345
::group=Default
#your user definitions go here, ie: ::LDAP ...
this will create a user group named "RenaOnly", and add the user "grabrena" with password "12345" to it, and then starts defining the group "Default", which will contain all your user definitions that follow further down in the file. Obviously, you want to pick a proper password.
In config.txt
, you can now reference the RenaOnly group and allow it to access ReNa, and prepend the Default group declaration to the definition of your remaining Stanzas, which will limit access to users in that group (which should be everyone but the grabrena user, effectively locking out the latter from anything but ReNa).
Group RenaOnly
Title ReNa
MimeFilter application/json .* javascript
URL https://rena.mpdl.mpg.de
HJ rena.mpdl.mpg.de
HJ https://rena.mpdl.mpg.de
Group Default
#your resource definitions go here, ie: IncludeFile config/eResources.txt
If you have a multi-tiered user setup, you should be able to adapt this to your needs.
To make this work without running grabrena.py as root (which we discourage), EZProxy needs to run on non-priviledged ports, so it can be (re)started by a non-root user. Change your config.txt
to something akin to this:
RunAs someuser:someuser
LoginPort 80 -Virtual
LoginPort 8080
LoginPortSSL 443 -Virtual
LoginPort 8443
in whatever you're using to configure the firewall on your machine, do the equivalent of these iptables port redirection rules (repeat for additional interfaces as needed if the machine is multi-homed):
iptables -t nat -A PREROUTING -p tcp -i eth0 --dport 80 \
-j REDIRECT --to-port 8080
iptables -t nat -A PREROUTING -p tcp -i eth0 --dport 443 \
-j REDIRECT --to-port 8443
you will also want to allow access to the ports you just directed the traffic to with something like this:
iptables -A INPUT -p tcp --dport 8080 -j ACCEPT
iptables -A INPUT -p tcp --dport 8443 -j ACCEPT
If you are running backend/grabrena.py
via cron on your EZProxy machine, you can set proxy_login_port
in the ini-file to the port you redirect HTTPS traffic to. The script will use that port for all URLs it queries at/via the proxy, so connections from localhost that are not governed by firewall redirects will not fail.
Read the above chapter to see how we did it in our case, and then decide how to proceed. The following is a loose work-in-progress collection of hints we hope are helpful (please send PRs to add your own):
Our setup assumes that there exists a file that contains only Stanzas and that is imported into the main config.txt
file via an IncludeFile
directive. If you adhere to this, you can use backend/grabrena.py
to append the Find/Replace lines to each Stanza in your file. The script will complain a lot, but it will do the job. Note however, that it is not equipped to properly deal with anything but Stanzas.
Make sure to read the Frontend Injection chapter again.
You will have to put the following into your grabrena.ini
:
svn_up_path: /path/to/your/stanza-file.txt
eres_path: /path/to/the/resulting/stanza-with-findreplace-file.txt
force_eRes_update: Yes
injection_url: https://YOUR-EZPROXY.TLD/loggedin/injectmenu.js
We have attempted to mitigate against the most obvious XSS vectors in our JavaScript, but we would love some extra sets of eyes on that front. At the time of writing, snyk did not report any issues in our code or dependencies.
The main menu structure is defined by Array elements in a ReNa PredefinedSet JSON answer, where each entry / submenu heading is defined like this:
getSets([{
"id":"000007897",
"name":"Collective Goods' Selection",
"fullName":"MBRG:Collective Goods' Selection",
"url":"https://rena.mpdl.mpg.de\/rena\/Search\/Results?filter%5B%5D=inst_txtF_mv%3A%22MBRG%22&filter%5B%5D=predef_txtF_mv%3A%22MBRG%3ACollective+Goods%27+Selection%22"
}, ... ]);
which gets transformed by grabrena.py and stored in an Array element in the setlist.json file (and parsed into a SetlistItem by lslib.js):
// filename: setlist.json
[{
"id": "000007897",
"timestamp": "1449151638",
"logo": "one",
"name": "Collective Goods' Selection"
}, ... ]
The submenus are created from entries in ReNa collection lists (Dictionary entries):
fromRemote000007897({ "ERS000000002":{
"title":"Academic Search Premier (EBSCO)",
"description":"Academic Search Premier contains full text for nearly 4,500 journals, including more than 3,600 peer-reviewed titles. In addition to the full text, this database offers indexing and abstracts for all 8,144 journals in the collection.",
"genre":["Fulltext Database"],
"topic":["Multidisciplinary"],
"language":["English"],
"title_short":"Academic Search Premier (EBSCO)",
"prov_txt_mv":["EBSCO"],
"subject_txt_mv":["Multidisciplinary"],
"keyword_txt_mv":["Ethnic studies","Medical sciences","Arts and literature","Language and linguistics","Chemistry","Physics","Engineering","Computer sciences"],
"scope_txtF_mv":["MPG"],
"naturl_str_mv":["http://search.ebscohost.com\/login.asp?profile=ehost&defaultdb=aph"],
"access_txtF":"SUBSCRIPTION"
}, ... });
converted to Array elements in a Collection JSON file by grabrena.py (and parsed into SetlistDataCollectionItems by lslib.js):
// filename: 000007897.json
// there needs to be one json file for each id in setlist.json
// note that this is an object that contains an array
{
"name": "Collective Goods' Selection",
"id": "000007897",
"data": [{
"title": "Academic Search Premier (EBSCO)",
"url": "http://search.ebscohost.com.go.coll.mpg.de/login.asp?profile=ehost&defaultdb=aph",
"proxied": true,
"free": false,
"desc": "Academic Search Premier contains full text for nearly 4,500 journals, including more than 3,600 peer-reviewed titles. In addition to the full text, this database offers indexing and abstracts for all 8,144 journals in the collection."
}, ... ]
}
backend/grabrena.py
uses OrderedDict
instead of dict
to represent JSON objects/dictionaries in Python to preserve the sequence of items received from ReNa.