New Term Weighting Algorithm for Single Document Summarization
Keywords:
Sentencer, Keyword extraction, Sentence-level features, Contextual semantics, Natural language processingAbstract
Keyword extraction plays a central role in single-document summarization, where the task is to
identify the most salient terms that capture the meaning of a text. Existing unsupervised keyword
extraction approaches, such as RAKE, TextRank, and YAKE, rely primarily on frequency and
statistical co-occurrence. Although effective, they often overlook the structural and semantic
contributions of sentences, which are essential for preserving context, especially in long or
complex documents. In this work, we propose Sentencer, a novel unsupervised term weighting
algorithm that integrates sentence-level features into keyword scoring. Unlike frequency-only
approaches, Sentencer leverages contextual relevance, sentence length, intra-sentence probability,
and sentence position to refine keyword importance. The algorithm is evaluated against YAKE on
three benchmark datasets: SemEval (scientific papers), Inspec (abstracts), and a collection of news
reports. Results show that Sentencer performs particularly well on long, complex texts such as
scientific papers, where it achieves superior precision and recall compared to YAKE, albeit at the
cost of computational efficiency. Furthermore, Sentencer offers a secondary benefit as a diagnostic
tool for analyzing sentence behavior and word distribution dynamics within documents. For short
scientific abstracts, Sentencer out performed YAKE by 3.5% while for scientific articles,
Sentencer out performed YAKE by 1%. However, for short news articles (100 to 400 words),
YAKE outperforms Sentencer.