The Summary Extraction Algorithm (SEA) has been added in 4.35 version of DataparkSearch (in December of 2005). This algorithm of automatic summary construction is based on ideas of Rada Mihalcea described in the paper Rada Mihalcea and Paul Tarau, An Algorithm for Language Independent Single and Multiple Document Summarization, in Proceedings of the International Joint Conference on Natural Language Processing (IJCNLP), Korea, October 2005..
Differences in DataparkSearch's SEA:
- Initial weights for graph edges are calculates as a measure of similarity between 3-gram distributions of corresponding sentences.
- All initial values for graph vertexes are equal to some initial value ( 1 / (number of sentences + 1) in current implementtion).
- The Neo PopRank algorithm is used as ranking algorithm to iterate values assigned to vertexes.
To enable the SEA algorithm in DataparkSearch you need only to define a section in your sections.conf file:
Section sea 29 1024
After indexing of document collection with this section defined, you may use $(sea) meta-variable in your template to show summary for a search result.
Some limitation in current implementation: a page should have four or more sentences of length greater 32 characters; only first 64 sentences of a page (if available) are using to construct the summary.