Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up filterReadAccess for free text searches #277

Open
patrick-austin opened this issue Feb 21, 2022 · 0 comments
Open

Speed up filterReadAccess for free text searches #277

patrick-austin opened this issue Feb 21, 2022 · 0 comments
Assignees

Comments

@patrick-austin
Copy link
Contributor

Current behaviour

Currently, when performing a free text search using Lucene, we get results in batches of 1000, then iterate over these one by one:

private void filterReadAccess(List<ScoredEntityBaseBean> results, List<ScoredEntityBaseBean> allResults,
int maxCount, String userId, EntityManager manager, Class<? extends EntityBaseBean> klass)
throws IcatException {
logger.debug("Got " + allResults.size() + " results from Lucene");
for (ScoredEntityBaseBean sr : allResults) {
long entityId = sr.getEntityBaseBeanId();
EntityBaseBean beanManaged = manager.find(klass, entityId);
if (beanManaged != null) {
try {
gateKeeper.performAuthorisation(userId, beanManaged, AccessType.READ, manager);
results.add(new ScoredEntityBaseBean(entityId, sr.getScore()));
if (results.size() > maxEntities) {
throw new IcatException(IcatExceptionType.VALIDATION,
"attempt to return more than " + maxEntities + " entities");
}
if (results.size() == maxCount) {
break;
}
} catch (IcatException e) {
// Nothing to do
}
}
}
}

Performing authorisation first checks for root access, public tables, and failing these, gets and evaluates rules relevant to that single entity's ID:

@NamedQuery(name = "Rule.ReadQuery", query = "SELECT DISTINCT r.crudJPQL FROM Rule r LEFT JOIN r.grouping g LEFT JOIN g.userGroups ug LEFT JOIN ug.user u WHERE (u.name = :member OR g IS NULL) AND r.bean = :bean AND r.r = TRUE"),

crudJPQL = "SELECT COUNT(" + rw.getIdPath() + ") FROM " + rw.getFrom() + " WHERE " + rw.getIdPath() + " = :pkid"
+ (rw.getWhere().isEmpty() ? "" : " AND (" + rw.getWhere() + ")");

Desired behaviour

There is an existing method which instead takes authorises based on several entities at once:

public List<EntityBaseBean> getReadable(String userId, List<EntityBaseBean> beans, EntityManager manager) {
if (beans.size() == 0) {
return beans;
}
EntityBaseBean object = beans.get(0);
Class<? extends EntityBaseBean> objectClass = object.getClass();
String simpleName = objectClass.getSimpleName();
if (rootUserNames.contains(userId)) {
logger.info("\"Root\" user " + userId + " is allowed READ to " + simpleName);
return beans;
}
TypedQuery<String> query = manager.createNamedQuery(Rule.INCLUDE_QUERY, String.class)
.setParameter("member", userId).setParameter("bean", simpleName);
List<String> restrictions = query.getResultList();
logger.debug("Got " + restrictions.size() + " authz queries for READ by " + userId + " to a "
+ objectClass.getSimpleName());
for (String restriction : restrictions) {
logger.debug("Query: " + restriction);
if (restriction == null) {
logger.info("Null restriction => READ permitted to " + simpleName);
return beans;
}
}
/*
* IDs are processed in batches to avoid Oracle error: ORA-01795:
* maximum number of expressions in a list is 1000
*/
List<String> idLists = new ArrayList<>();
StringBuilder sb = null;
int i = 0;
for (EntityBaseBean bean : beans) {
if (i == 0) {
sb = new StringBuilder();
sb.append(bean.getId());
i = 1;
} else {
sb.append("," + bean.getId());
i++;
}
if (i == maxIdsInQuery) {
i = 0;
idLists.add(sb.toString());
sb = null;
}
}
if (sb != null) {
idLists.add(sb.toString());
}
logger.debug("Check readability of " + beans.size() + " beans has been divided into " + idLists.size()
+ " queries.");
Set<Long> ids = new HashSet<>();
for (String idList : idLists) {
for (String qString : restrictions) {
TypedQuery<Long> q = manager.createQuery(qString.replace(":pkids", idList), Long.class);
if (qString.contains(":user")) {
q.setParameter("user", userId);
}
ids.addAll(q.getResultList());
}
}
List<EntityBaseBean> results = new ArrayList<>();
for (EntityBaseBean bean : beans) {
if (ids.contains(bean.getId())) {
results.add(bean);
}
}
return results;
}

@NamedQuery(name = "Rule.IncludeQuery", query = "SELECT DISTINCT r.includeJPQL FROM Rule r LEFT JOIN r.grouping g LEFT JOIN g.userGroups ug LEFT JOIN ug.user u WHERE (u.name = :member OR g IS NULL) AND r.bean = :bean AND r.r = TRUE"),

includeJPQL = "SELECT " + rw.getIdPath() + " FROM " + rw.getFrom() + " WHERE " + rw.getIdPath() + " IN (:pkids)"
+ (rw.getWhere().isEmpty() ? "" : " AND (" + rw.getWhere() + ")");

We should use this for the free text search, as we already get results in batches, and reducing the number of queries to the DB for rules should speed up the free text search component without affecting the results returned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant