Warning: include(./data/pages/admin/positions_menu.txt) [function.include]: failed to open stream: No such file or directory in C:\wamp\www\caseexplorer\lib\tpl\new_case\new_left_nav_column_menu.php on line 5
Warning: include() [function.include]: Failed opening './data/pages/admin/positions_menu.txt' for inclusion (include_path='.;C:\php5\pear') in C:\wamp\www\caseexplorer\lib\tpl\new_case\new_left_nav_column_menu.php on line 5
|
In this research, our goal is to build a framework for a search-keyword suggestion tool. Proven by the success of Google Suggest, a search-keyword suggester can be useful for users to develop search keywords that possibly lead to a successful search. Compared to the Google Suggest search keyword suggester which is search-history based, our search-keyword suggester is content-based; that is, it suggests keywords from the ‘most promising’ phrases observed in the digital library repository.
The new search-keyword suggester is based on an a priori analysis of the publication collection of the digital library at hand. We (i) parse publication texts using the Link Grammar Parser, a syntactic parser of English, (ii) group publications based on their most-specific research topics, i.e. research-pyramids, (iii) annotate groups with key-phrases, and (iv) use the parser output to build a hierarchical structure of simple and compound tokens used to suggest search keywords on the fly.
Sulieman Bani Ahmad, Gultekin Ozsoyoglu
The goals of this project are (i) to validate the research-pyramid model of of research evolution, and (ii) to propose and empirically evaluate approaches to identify research pyramids in literature citation graphs. Two approaches where proposed, the first uses Link-Based Research Pyramid identification, which captures research pyramids by identifying pyramid-like structures from the citation graph of the publication set. The second approach uses Proximity-Based Research Pyramid identification, utilizes a graph-based proximity measure, namely SimRank, to compute similarities between publications, and then restructures the most similar publications into a research pyramid.
Sulieman Bani Ahmad, Gultekin Ozsoyoglu
In this project, our goals are to (a) provide a solution to the ODL search output ranking problem due to the topic diffusion problem, by grouping search outputs at the most-specific (detailed) topic level and without identifying the topics themselves, (b) eliminate the low separability problem of score functions, and © improve the accuracy of citation-based publication score functions. Our approach uses the research pyramid model to improve the separability and accuracy of publication scores, and is based on normalizing publication scores within a limited scope, namely, within individual research pyramids. These improvements come from the fact that publications are now compared to their peers within their peer groups, namely, their own research pyramid publications that are on the same topic.
Sulieman Bani Ahmad, Gultekin Ozsoyoglu
[SG07], [SG08], [A03], [GA03].
At the present time, ranking functions of literature digital libraries are either ineffective, or simply do not exist at all. For example, PubMed, the largest literature digital library in the world with more than 14 million publications, does not have a paper-scoring system for ranking papers satisfying a keyword search. Also, publication topics in PubMed are diverse; PubMed publications in response to a general keyword-based search routinely fall into multiple topics (i.e., topic diffusion across search results), some of which are not of interest to users. PubMed simply lists publications returned in a search query in descending order of their PubMed ids or publication years, thereby forcing users to scan large numbers of publications, and potentially missing important publications. Our proposal is to assign publications into pre-specified ontology-based contexts, compute relevancy scores for papers with respect to their assigned context(s), perform search within automatically selected contexts, and rank and return selected papers within their contexts. With this new approach, (a) the output is enhanced by a highly useful paper classification (based on contexts), which also eliminates topic diffusion and reduces output size, and (b) only semantically related papers in contexts of interest, as opposed to all papers, are involved in the ranking.
Nattakarn Ratprasartporn, Ali Cakmak, Sulieman Bani Ahmad, Gultekin Ozsoyoglu
[NG07], [RJ07], [SG07], [SG08].
Context-based literature digital library search is a new searching paradigm that allows for an effective ranking of query outputs, and controls the diversity of query output topics. Contexts are defined by pre-specified ontology-based terms, and a paper set of a context is located based on the semantic properties of the ontology (context) term. In order to provide a comparative assessment of papers in a context and to effectively rank papers returned in search outputs, prestige scores are attached to all papers with respect to their assigned contexts. This project explores the effectiveness of different prestige score functions for context-based environment, namely, citation-based, text-based, and pattern-based score functions. PubMed publications are used as the test bed for the experiments, and Gene Ontology is employed as the context hierarchy.
Nattakarn Ratprasartporn, Sulieman Bani Ahmad, Ali Cakmak, Gultekin Ozsoyoglu
[NGEV07], [SAG05], [SG05].
Publication searching based on keywords provided by users is traditional in digital libraries. While useful in many circumstances, the success of locating related publications via keyword-based searching paradigm is influenced by how users choose their keywords. Example-based searching, where user provides an example publication to locate similar publications, is also becoming commonplace in digital libraries. Existing publication similarity measures, needed for example-based searching, fall into two classes, namely, text-based similarity measures from Information Retrieval, and citation-based similarity measures based on bibliographic coupling and/or co-citation. This project explores alternative publication similarity measures, ranking and scoring mechanisms.
Sulieman Bani Ahmad, Ali Cakmak, Gultekin Ozsoyoglu
[NGEV07], [SAG05], [SG05].
Scientific papers often cite other papers to discuss the related work in their field, and also point out the differences/improvements in comparison to the other similar papers. Based on the citation information, a literature database can be considered as a graph, called citation-graph, where the nodes are the papers, and there is a directed edge from a paper A to paper B if A cites B. The same setting also applies to the web environment where nodes are individual web pages or sites, and the edges are the hyperlinks from one page to the other. Assigning prestige scores to papers or web pages is a common practice. PageRank is recently the most popular ranking algorithm variations of which are used by almost all the search engines to rank the web pages, and order them accordingly in a search result. PageRank is also used to assign importance scores to papers using the underlying citation graph as input. Once a paper is published, it takes time for the paper to be recognized, and get cited by the other papers. On the average, it may take from 5 to 20 years for a paper to reach its peak prestige scores. Therefore, for newly published high quality papers, the PageRank may provide relatively low scores due to the fact that the paper does not have enough citations shortly after it is published. In order to tackle with this bias, this project focuses on mechanisms to characterize the nature of very first citations that a paper gets, and use it as an indicator towards the final score of a paper. To this end, temporal citation patterns in multiple dimensions are studied.
Sulieman Bani Ahmad, Ali Cakmak, Gultekin Ozsoyoglu
|