CASE.EDU:    HOME | DIRECTORIES
case western reserve university

CaseExplorer

 
 

    Warning: include(./data/pages/admin/prospective_students_menu.txt) [function.include]: failed to open stream: No such file or directory in C:\wamp\www\caseexplorer\lib\tpl\new_case\new_left_nav_column_menu.php on line 5

    Warning: include() [function.include]: Failed opening './data/pages/admin/prospective_students_menu.txt' for inclusion (include_path='.;C:\php5\pear') in C:\wamp\www\caseexplorer\lib\tpl\new_case\new_left_nav_column_menu.php on line 5



CaseExplorer - The Case Digital Library Project

In a nutshell, this project investigates web querying techniques for accessing web information resources. The term information resource refers to large web-accessible resources, e.g. online literature digital libraries (LDLs).

Goal: Contribute to the design and implementation of the infrastructure and services needed for sharing and managing information in a literature digital library context. More specifically, investigate new technologies that are required to overcome the critical barriers to effective digital library access which are:

  • Topic Diversity in keyword-based search queries, which negatively influences the quality of LDL searches.
  • Lack of powerful ranking and filtering mechanisms that allow users to find relevent publications and rank them effeciently.

Towards, eliminating these two barriers, we work on developing models and mechanisms for (i) clustering publications, that is, placing publications into proper “context”s, (ii) arranging contexts into a hierarchical structure that will facilitate search within contexts with different levels of granularity, (iii) effectively ranking publications, publication venues and authors, which are then used to (iv) rank search results and classify search results based on topics, in order to help user customize their search criteria.

Projects and Publications

We have designed and built as part of CaseExplorer a set of seven prototype tools for searching and querying literature digital libraries (LDLs) with new techniques [CE]. The tools are all prototypes, and developed in conjunction with the questions that the associated publications investigate, which are discussed below.

Score functions for LDLs. Are score functions for papers, authors, and publication venues useful for LDS, and what are the alternatives? Score functions can be text-, citation– or pattern-based. This is an ongoing research question. Nevertheless, towards this goal, we have defined multiple text– and citation-based score functions for papers, authors, and publication venues, and evaluated their accuracy, separability and independence. We have also refined and evaluated paper similarity functions for accuracy, separability and independence.

Research Pyramids for Improved publication Ranking. Ranking publications of LDLs is useful for (i) providing comparative assessment of publications and (ii) listing relevant ODL search results first in search outputs, enabling users to aggregate pertinent results quickly and easily. Studies show that effective citation-based scoring functions are highly skewed, and have accuracy problems, possibly due to topic diffusion. Can one provide much more effective publication ranking by identifying “research pyramids”, i.e., publications that belong to highly specific research topics? This work is published in ECDL’07.

Advanced Query Interfaces for LDLs. Can LDLs be searched and queried using highly powerful query interfaces, and still scale? Can an LDL with a relational database as a backend scale? For this, we have designed the Advanced Query Interface (AQI), produced a portal for ACM Anthology Advanced Query Interface, and evaluated its performance empirically. We have classified queries as being simple, intermediate, and complex, and studied questions such as “what is the best way to execute AQI queries?”. The results are reported in John Chmura’s thesis and in multiple publications.

Context-Based Publication Search Paradigm. Are there other new and useful LDL searching paradigms that will allow ranking of search results (i.e., papers) in a highly accurate manner? We are presently studying techniques for searching LDLs within pre-defined “context” where context comes from the concept hierarchy terms of community-defined ontologies. Example ontologies are, for the biomedical domain, MeSH, Gene Ontology, and SNOMED-CT. To study these problems, we are building the first version of Case Anthology Viewer. As additional tools, we have also built the PubMed Abstracts Fulltext Search tool, and Case Anthology Text Search tool. The results of this work have been submitted for journal publication.

Finding Related-Papers in a Context-Based Publication Search Environment. Existing approaches for searching literature digital libraries to find “related” publications of a given publication do not take into account publication topics in the relatedness computation, allowing topic diffusion across query outputs. Recently, we have proposed a new way to measure “relatedness” by incorporating “contexts” of publications. The results of this work have been published in ECDL’07.

Changing Database Query Engines to Support Score Function Computations. We have studied the approach of assigning scores to objects (e.g., representing papers as tuples) in a database, and propagating them in an automated manner to scores of query outputs—for ranking query outputs in an automated manner. The results are published in ACM TODS Dec. 2004. This approach which requires changes to database query engines, is not yet empirically evaluated.

To study the questions above, we have downloaded, fully parsed, and extracted metadata from publications in three different domains, namely, (1) the ACM SIGMOD Anthology, a digital library for the database systems research community (about 15,000 publications), (2) computer science publications in ACM Digital Library, IEEE XPlore, and VLDB organization (about 70,000 publications), and (3) genomics-related publications in PubMed Central (about 80,000 publications).

CASE Explorer is a prototype, and being continually improved. To use the presently available tools, please see the following links.


Database Lab | Department of EECS, Olin-503 | Case Western Reserve University | Cleveland, Ohio 44106