Skip to content

TexasDigitalLibrary/dspace-reports

Repository files navigation

dspace-reports

A tool written in Python to generate and email statistical reports for DSpace 7+ repository administrators.

Requirements

  • Python 3.9+
  • PostgreSQL 13+
  • DSpace 7.x or 8.x repository **

** If your Solr index contains statistics from legacy DSpace 5.x or earlier instances, then the quality of the reports will go up significantly if you have migrated the old statistics to the new UUID identifiers in DSpace 6. See the DSpace Documentation for more information

Python 3 Virtual Environment Setup

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Configuration

cp config/application.yml.sample config/application.yml

Example

dspace_name: 'MyDSpace'
dspace_server: 'http://localhost:8080'
solr_server: 'http://localhost:8080/solr'
oai_server: 'http://localhost:8080/oai'
rest_server: 
    url: 'http://localhost:8080/rest'
    username: '[email protected]'
    password: 'password'
statistics_db:
    host: 'localhost'
    port: '5432'
    name: 'dspace_statistics'
    username: 'dspace_statistics'
    password: 'dspace_statistics'
work_dir: '/tmp'
create_zip_archive: false
log_path: 'logs'
log_file: 'statistics-reports.log'
log_level: 'INFO'
smtp_host: 'localhost'
smtp_auth: 'tls'
smtp_port: 587
smtp_username: 'username'
smtp_password: 'password'
from_email: '[email protected]'
admin_emails:
        - email1
        - email2

Configure application.yml according to your particular environment. The admin_emails list in the configuration refers to the email address(es) that will receive the stats reports if the email flag is set when running run_reports.py or run_cron.py (see below).

Usage

NOTE: All of the following commands assume that the user is in the virtual environment.

Database

First, create a role and database in PostgreSQL.

create role dspace_statistics with login password 'dspace_statistics';

createdb --username=postgres --owner=dspace_statistics --encoding=UNICODE dspace_statistics;

There are several ways to generate statistical reports with this tool. They all begin with the database manager script that allows the user to create, drop and recreate the database tables to store metadata and statistics.

Usage: database_manager.py [options]

Options:
  -h, --help            show this help message and exit
  -c CONFIG_FILE, --config=CONFIG_FILE
                        Configuration file
  -f FUNCTION, --function=FUNCTION
                        Database function to perform. Options: create, drop,
                        check, recreate

For example, the first time stats are generated the user should run:

python database_manager.py -c config/application.yml -f create

And then after that the database tables can be recreated before running the stats generation process again.

python database_manager.py -c config/application.yml -f recreate

Indexing

With a fresh database, the user can generate stats reports for the entire repository with run_indexer.py.

Usage: run_indexer.py [options]

Options:
  -h, --help            show this help message and exit
  -c CONFIG_FILE, --config=CONFIG_FILE
                        Configuration file
  -o OUTPUT_DIR, --output_dir=OUTPUT_DIR
                        Directory for results files.

There is another option to generate statistics separately for communiities, collections, and items. They all generally take the form of:

python run_community_indexer.py -c config/application.py -o /tmp/reports

Reports

When all indexing is complete and the metadata and stats are in the database, it's time to generate Excel reports. This can be done with run_reports.py.

Usage: run_reports.py [options]

Options:
  -h, --help            show this help message and exit
  -c CONFIG_FILE, --config=CONFIG_FILE
                        Configuration file
  -o OUTPUT_DIR, --output_dir=OUTPUT_DIR
                        Directory for results files.
  -e, --email           Send email with stats reports to admin(s)?

For example:

python run_reports.py -c config/application.yml -o /tmp/reports -e  

Cron job

In order to facilitate generating stastical reports on a regular basis, the indexing and reports processes have been combined into a single script run_cron.py that runs in a similar way to the other scripts.

Usage: run_cron.py [options]

Options:
  -h, --help            show this help message and exit
  -c CONFIG_FILE, --config=CONFIG_FILE
                        Configuration file
  -o OUTPUT_DIR, --output_dir=OUTPUT_DIR
                        Directory for results files.
  -e, --email           Send email with stats reports to admin(s)?

For example:

python run_cron.py -c config/application.yml -o /tmp/reports -e  

License

This code is licensed under the GNU General Public License (GPL) V3.

Thanks

NOTE: Special thanks to the DSpace Statistics API project from which the Solr queries for views and downloads in this project are based.

Orth, A. 2018. DSpace statistics API. Nairobi, Kenya: ILRI. https://hdl.handle.net/10568/99143

Contact

For questions, comments or assistance please contact [email protected].