Matthew Reidsma

Work Notes

Updates from the GVSU Libraries’ Web Team.
Archive // Subscribe: EmailRSS

Summon Relevance Ranking Updates Scheduled for 8/18

This just came through on the Summon list serv, regarding changes to the Summon relevancy ranking:


Next Tuesday, August 18, as part of our ongoing initiative to improve Summon’s relevance, we will be releasing an update to Summon’s relevance ranking algorithm — that is, only the ranking of search results will be affected; the number of search results returned for any given query will not be change. The new algorithm is expected to improve mainly the relevance of exploratory search cases, complementing the known item search improvements we released earlier this year.

Our approach for improving relevance

We take a data-oriented approach to improving Summon’s relevance. In addition to collecting and analyzing relevance metrics, such as query and session abandonment rates, MRR (mean reciprocal rank), and DCG (discounted cumulative gain), we also maintain a large database of all relevance issues reported by our clients, users and internal team members. We analyze each issue and identify the factor(s) causing the issue. The relevance metrics and the relevance issue database play a key role in designing a new ranking algorithm or a feature for improving Summon’s relevance.

Summary of the improvements

Two primary improvements in the new ranking algorithm are the following:

  • Improved balance between dynamic and static rank: Summon’s relevance ranking algorithm uses two types of relevance factors: the dynamic rank and the static rank. The dynamic rank factors describe how well a given query matches each record. The static rank factors represent the importance or value of each record. One common type of relevance issues we have observed in the past involve cases where the influence of the dynamic rank is too strong, such that records with low static ranks, such as old publications and less important content types, appear among the top search results. The new algorithm has a better balance of the dynamic rank and the static rank, and it should reduce the number of such issues.
  • Improved balance of short titles and long titles in the top results: previously, Summon’s ranking algorithm tended to emphasize records with short titles, especially those that closely match the query string. The new algorithm reduces the influence of the field length normalization and the exact match boost, and as a result, top search results include a better mix of short and long titles that are relevant to the query.

There are a number of other improvements we made in the new algorithm. One noteworthy property of the new algorithm is that short and general topical queries (e.g., linguistics, global warming) tend to return more books, eBooks, references and journals among the top results. And long and specific topical queries (e.g., linguistics universal grammar, global warming Kyoto protocol) tend to return more journal articles among the top results.

Examples

Here are a few examples that demonstrate the improvements. Please note that the search results depend on the content, so these examples may not apply to your instance of Summon.

dog law => The top results returned by the old algorithm are mostly items titled “Dog Law” and included very old journal and magazine articles. The top results returned by the new algorithm are more balanced, and include more recent titles than the original algorithm, such as “Dangerous dogs law updated” (Journal Article, 2007) and “Animal law and dog behavior” (Book, 1999).

autism aba therapy for young children => The top results returned by the old algorithm contained many books and other items titled just “Autism”. The top results returned by the new algorithm include longer titles, such as “A step-by-step ABA curriculum for young learners with autism spectrum disorders (age 3-10)” (eBook, 2013).

How to provide feedback or report a relevance issue

Prior to approving the improvements for release, we also do some qualitative analysis by enlisting the help of the Summon Advisory Board and other customers. These institutions provide qualitative feedback for the changes we propose. In that group, 80 percent of the feedback ranked the new algorithm as either “better” or “much better” than the current algorithm, with the remaining 20 percent ranking the quality as at least being comparable to our existing algorithm.

However, we know that improving relevance is an ongoing process, and that while this release will improve a number of use cases, there are more out there for us to address. If you would like to provide feedback, we have a new e-mail address for providing feedback and reporting relevance issues: summon.relevance.feedback@proquest.com.

We would appreciate it if you could follow the template below:

Please use “Relevance feedback from (your institution name)” in the subject line Please also include: Your name and e-mail address Query strings and other information, such as refinement and facet settings URLs of the problematic search cases (Also, please feel free to also attach screenshots.) Explanation of the issue

All reported search cases will be analyzed and added to our relevance issue database, and will be considered in our ongoing and future relevance improvement efforts. Please note that, in general, you will not receive a response for messages sent to this address. If you require a response, please report your issue via the usual customer support route.