.. toctree::
2024-08-13
- Fix bug related to searching information for authors in the search group via Scopus Author Search API.
- Fix bugs related to the progress bars.
- Improve, update and fix documentation.
2023-11-05
- Rename .create_fields_sources_list() to .get_field_source_information().
- Use tqdm to print progress bars.
- Adopt improved format of field-source assignment and source information.
- Upgrade third-party code usage.
- Improve documentation, add copy-code button, update code examples.
2023-04-23
- Drop usage of configuration file and recommend project-specific databases.
- In class Scientist(), rename properties: `.country -> .affiliation_country, .affiliation -> .affiliation_name.
- In class Scientist(), create property `.affiliation_type.
- In .create_fields_sources_list(), make use of parameter "verbose".
- Make retrieval of affiliation related information robust to missing information (404 error).
- Pass on "refresh" parameter from inform_matches().
- Use pyproject.toml for packaging, drop pbr (PEP 621).
- Use XDG compliant file storage for support files in ~/.cache/sosia/.
- Improve various methods and functions for stability and speed.
2022-01-20
- In .inform_matches(), remove abstract similarity and reference list similarity computation and corresponding keywords "abstract_sim" and "reference_sim"; do not require nltk and scikit-learn anymore.
- In .inform_matches(), add "num_cited_refs" as number of jointly cited references up until provided year.
- Increase robustness to Scopus server problems.
- Allow to refresh downloaded results when using very large stacked source-based searches.
- Fix bug with integer conversion when using pandas > 1.1.5.
- Require pybliometrics >= 3.2.0.
2020-12-08
- Fix bug when creating a new config.ini.
- Require pybliometrics >= 2.7.2.
2020-12-04
- End support for Python 3.5.
- Reorganize config.ini.
- In Original(), add parameters "sql_name" and "first_year_search", and rename parameters: "year": "treatment_year", "year_margin": "first_year_margin", "search_affiliations": "affiliations".
- In Original().find_matches(), remove parameter "ignore_first_id".
- In Original(), change default values for parameters: "year_margin": 0.2, "pub_margin": 0.2, "cits_margin": 0.2, "coauth_margin": 0.2.
- Rename function create_cache() to make_database().
- Rename tables in MySQL database: author_size becomes author_pubs, author_cits_size becomes author_ncits.
- In Original().find_matches() remove parameters "information", "stop_words", "tfidf_kwds"; always create a plain list.
- Add property .matches to Original().
- Create new method Original().inform_matches() to add additional information to matches.
- Use externally provided list of sources and their fields.
- Remove unused property Original().city.
- Raise warning if there are too few publications to determine a field.
- Allow integer values for "refresh" in all instances, require pybliometrics >= 2.7.
- Require numpy.
- Fix bug originating from missing reference EIDs.
- Fix bug originating from missing source IDs.
- Improve documentaton, add tutorial.
- Add citation dunder.
2020-03-17
- Update docs w.r.t. the usage of pybliometrics.
- Add support for Python 3.8 and Python 3.9.
- Add missing required package lxml and require sklearn>=0.22.1.
- Correct verbose output of .find_matches() w.r.t. completeness of reference lists and abstracts of matches.
- Check for existence of the search group in .find_machtes().
- In .find_matches(), fix bug when attempting to compute the cosine similarity when reference lists or abstracts are completely missing.
- In .get_publication_language(), fix bug resulting from bad downloads of abstracts.
- In Original(), fix bug resulting from unclean source ID information in Scopus search results.
2019-11-26
- Introduce internal SQLite database to store results from stacked queries.
- Comply with pybliometrics 2.2 or higher to make use of integrity_fields.
- In Original(), add parameter "num_citations" to filter on the number of citations as well.
- In Original(), add paramater "period" to allow for matching on information derived from used-provided period only.
- In Original(), add parameter "search_affiliations" to enable subsetting on matches from a list of specific affiliations.
- Add "num_citations" to information of matches.
- Introduce internal config file.
- Attempt to download most recent sources list from scopus.com during .create_fields_sources_list().
- In find_country(), add "refresh" parameter and fix bugs related to wrong views and not continuing the search.
- In get_main_field() return most common 2-digit ASJC code and most commont 4-digit ASJC field.
- Attempt to extract URL for the Scopus source list via webscraping.
- In find_matches(), provide only desired information.
- In find_matches(), fix bug with missing references or abstracts.
- Use decorators for methods.
2019-02-21
- Introduce new class Scientist() to be used as parent class of Original() and others.
- Add property language to Scientist() (and thus Original() and matches).
- Allow setting of properties of Original() and Scientist().
- Make the provision of additional information of matches optional.
- Add source names to .sources and .search_sources.
- Re-Download abstract if language information aren't present.
- In Original(), allow list of Author IDs.
- In Original(), add optional parameter to condition retrieved information on list of EIDs.
- Exclude "multidisciplinary" from list of fields for main field determination.
- Give preference to non-general field during main field determination.
- Use most recent affiliation to identify country.
- Filter focal researcher from .search_group.
- Fix bug in computation of relevant year range.
- Fix bug resulting from missing source IDs.
- Fix bug resulting in redundant counts during .define_search_group().
- Use pair-wise tfidf-vectorization (instead of group-wise vectorization).
- When the focal's publications are empty, do not compute similiarity measures.
- In clean_abstracts(), remove copyright statement in next-to-last sentence as well.
- Use error messages from scopus to for case-specific error handling.
- Simplify functions and classes and refactor internally.
- Make stacked_query robust to group as list of int.
- Enable internal function query_journal() to perform stacked queries.
2018-11-25
- Activate method chaining.
- Outsource functions compute_cosine, print_progress, margin_range, raise_non_empty and query to utils.py and add tests for them.
- Fix bugs in .define_search_group(), margin_range() and _get_refs().
- Implement clean_abstracts() to remove copyright statements from abstracts.
- Use sklearn's default settings for tfidf-vectorization of abstracts by default.
- Simplify code.
2018-11-23
- Initial release.