An article describing The Neo popularity rank has been updated: Neo-en.pdf
Author: Maxime
www/dpsearch in FreeBSD ports collection
www/dpsearch port in FreeBSD ports collection has been updated to the latest version of DataparkSearch released.
DataparkSearch 4.45
A new version of DataparkSearch 4.45 has been released. Changes since previous release are:
- -G switch for indexer has been added. Use it to limit indexer by total size of indexed documents, in megabytes per thread.
- parser.c has been rewritten to avoid hanging external parsers of all types.
- A erroneous writing redundant records into "server" table has been fixed.
- A bug has been fixed in the flushing of unfilled cache mode buffers when no cached is used.
- A parser of the Verity Query Language (prefix varian) has been added.
Only the following operators are supporting at this time: <OR>, <AND>,
<WORD>, <PHRASE>, <NEAR>, <NOT>, <ACCRUE>. - MinSiteWeight and MinServerWeight commands were added.
Use its to specify minimum weight of site or server to be indexed. - High CPU usage by searchd has been fixed.
- A possible trap has been fixed on systems without setproctitle function defined.
- New algorithm to detect the need for east language segmenting.
- It possible now to show the last 128 bytes of a template variable using $(xx:128:right) type of template variable.
- Several bugs (include #180, #181) were fixed.
DataparkSearch on Debian GNU/Linux
Ernesto HernГЎndez-Novich (Venezuela) reported successful installation of DataparkSearch 4.44 on Debian GNU/Linux (i386). Also he created an unofficial packages that can be installed using aptitude and can work with PostgreSQL, MySQL, FreeTDS and UnixODBC.
Thank you, Ernesto!
DataparkSearch vs Google Mini
Google Mini | DataparkSearch | |
---|---|---|
License type | Commercial, no source code available | GPL, open source |
Number of documents indexed and pricing |
|
up to several millions, depending of hardware used. Free software |
File formats indexing | 220 different file formats, including HTML, PDF and Microsoft Office documents. | Plain text, HTML, XML, MP3, GIF + any other with external parsers |
Languages | 28 languages | 25 language groups, can segment sentences in Chinese, Japanese, Korean and Thai. |
Accessing files via | HTTP, HTTPS, networked file systems. | HTTP, HTTPS, FTP, NNTP, HTTP Proxy, local file system, htdb:// scheme for SQL databases. |
Accessing content protected by | HTTP Basic, NTLM v1 and v2, LDAP | HTTP Basic |
Collections | Yes | Yes, each collection may be divided onto subsections (tags and categories) |
Integrate search results into your sites's look and feel | users XSLT style sheet, export results in XML | own template language to produce result pages in any text based format. |
Synonyms | Yes | Yes |
Display key attributes of search results | meta tags | meta tags, specified HTML attributes, specified XML tags, regex excerpts from text (all those so called the sections) |
Filter results through meta tags | Yes | Yes, + through any section or combination of sections. |
Assign different weights for meta tags/sections | No | Yes |
Integration with Google Desktop and Google Toolbar for Enterprise | Yes | No |
Excluding pages from the search index | Yes | Yes |
Spell-checker | a self-learning | uses aspell |
Cached versions of documents | Yes | Yes |
Number Range Search | Yes | No |
Date Range Search | Yes | Yes |
Sort search results by | Relevance, Date | Revevence, Date, Popularity, Importance and by all those in reverse order |
Reporting |
|
No reports. Each query can be tracked along with all search parameters for futher processing. |
Automaticaly sitemap construction | Yes | No |
OneBox for Enterprise | Yes | No |
Customer support | Customer support site; email support; guaranteed replacements in the case of any hardware failure | A phorum on project's site |
Addendum, 15 Mar. 2007 | ||
Automatic document summarization | No | Yes, the Summary Extraction Algorithm |
HTTP Content negotiation for specified languages | No | Yes |
Link analysis algorithm | No | Yes, the Neo PopRank and the Goo PopRank |
//Google Mini features, Google Mini Administrator features, DataparkSearch.
dpsearch-4.45-08032007
Preliminary version of the parser of the Verity Query Language (prefix variant) has been added in this snapshot.
Only the following operators are supporting at this time: <OR>, <AND>, <WORD>, <PHRASE>, <NEAR>, <NOT> (with restrictions for NOT operator used in boolean mode of DataparkSearch's query language).
To pass a query in VQL you need to provide it in the &vq= CGI-variable and to leave empty the &q= CGI-variable.
Top 256 words of Sochi
Top 256 words of the Sochi's Intenet -- the list of most frequently used words on web-pages related to Sochi city.
The Top is updating two times a day.
dpsearch-4.45-17022007
- MinSiteWeight and MinServerWeight commands were added. Use its to specify minimum weight of site or server to be indexed.
- High CPU usage by searchd has been fixed.
- A possible trap has been fixed on systems without setproctitle function defined.
- New algorithm to detect the need for east language segmenting.
- It possible now to show the last 128 bytes of a template variable using $(xx:128:right) type of template variable.
- Several bugs (include #180, #181) were fixed.
The Neo popularity rank for web-pages
The Neo popularity rank for web-pages -- this article describes the Neo Popularity Rank that uses in the DataparkSearch Engine tool to assist web-pages ordering.
See also this article in Russian.
Janus layout
Janus -- yet another new SERP layout. Left column contains results for same query as the main column in the center, but these results are sorted in a different way.