An article describing The Neo popularity rank has been updated: Neo-en.pdf
Author: Maxime
www/dpsearch in FreeBSD ports collection
www/dpsearch port in FreeBSD ports collection has been updated to the latest version of DataparkSearch released.
DataparkSearch 4.45
A new version of DataparkSearch 4.45 has been released. Changes since previous release are:
- -G switch for indexer has been added. Use it to limit indexer by total size of indexed documents, in megabytes per thread.
- parser.c has been rewritten to avoid hanging external parsers of all types.
- A erroneous writing redundant records into "server" table has been fixed.
- A bug has been fixed in the flushing of unfilled cache mode buffers when no cached is used.
- A parser of the Verity Query Language (prefix varian) has been added.
Only the following operators are supporting at this time: <OR>, <AND>,
<WORD>, <PHRASE>, <NEAR>, <NOT>, <ACCRUE>. - MinSiteWeight and MinServerWeight commands were added.
Use its to specify minimum weight of site or server to be indexed. - High CPU usage by searchd has been fixed.
- A possible trap has been fixed on systems without setproctitle function defined.
- New algorithm to detect the need for east language segmenting.
- It possible now to show the last 128 bytes of a template variable using $(xx:128:right) type of template variable.
- Several bugs (include #180, #181) were fixed.
DataparkSearch on Debian GNU/Linux
Ernesto HernГЎndez-Novich (Venezuela) reported successful installation of DataparkSearch 4.44 on Debian GNU/Linux (i386). Also he created an unofficial packages that can be installed using aptitude and can work with PostgreSQL, MySQL, FreeTDS and UnixODBC.
Thank you, Ernesto!
DataparkSearch vs Google Mini
| Google Mini | DataparkSearch | |
|---|---|---|
| License type | Commercial, no source code available | GPL, open source |
| Number of documents indexed and pricing |
|
up to several millions, depending of hardware used. Free software |
| File formats indexing | 220 different file formats, including HTML, PDF and Microsoft Office documents. | Plain text, HTML, XML, MP3, GIF + any other with external parsers |
| Languages | 28 languages | 25 language groups, can segment sentences in Chinese, Japanese, Korean and Thai. |
| Accessing files via | HTTP, HTTPS, networked file systems. | HTTP, HTTPS, FTP, NNTP, HTTP Proxy, local file system, htdb:// scheme for SQL databases. |
| Accessing content protected by | HTTP Basic, NTLM v1 and v2, LDAP | HTTP Basic |
| Collections | Yes | Yes, each collection may be divided onto subsections (tags and categories) |
| Integrate search results into your sites's look and feel | users XSLT style sheet, export results in XML | own template language to produce result pages in any text based format. |
| Synonyms | Yes | Yes |
| Display key attributes of search results | meta tags | meta tags, specified HTML attributes, specified XML tags, regex excerpts from text (all those so called the sections) |
| Filter results through meta tags | Yes | Yes, + through any section or combination of sections. |
| Assign different weights for meta tags/sections | No | Yes |
| Integration with Google Desktop and Google Toolbar for Enterprise | Yes | No |
| Excluding pages from the search index | Yes | Yes |
| Spell-checker | a self-learning | uses aspell |
| Cached versions of documents | Yes | Yes |
| Number Range Search | Yes | No |
| Date Range Search | Yes | Yes |
| Sort search results by | Relevance, Date | Revevence, Date, Popularity, Importance and by all those in reverse order |
| Reporting |
|
No reports. Each query can be tracked along with all search parameters for futher processing. |
| Automaticaly sitemap construction | Yes | No |
| OneBox for Enterprise | Yes | No |
| Customer support | Customer support site; email support; guaranteed replacements in the case of any hardware failure | A phorum on project's site |
| Addendum, 15 Mar. 2007 | ||
| Automatic document summarization | No | Yes, the Summary Extraction Algorithm |
| HTTP Content negotiation for specified languages | No | Yes |
| Link analysis algorithm | No | Yes, the Neo PopRank and the Goo PopRank |
//Google Mini features, Google Mini Administrator features, DataparkSearch.
dpsearch-4.45-08032007
Preliminary version of the parser of the Verity Query Language (prefix variant) has been added in this snapshot.
Only the following operators are supporting at this time: <OR>, <AND>, <WORD>, <PHRASE>, <NEAR>, <NOT> (with restrictions for NOT operator used in boolean mode of DataparkSearch's query language).
To pass a query in VQL you need to provide it in the &vq= CGI-variable and to leave empty the &q= CGI-variable.
Top 256 words of Sochi
Top 256 words of the Sochi's Intenet -- the list of most frequently used words on web-pages related to Sochi city.
The Top is updating two times a day.
dpsearch-4.45-17022007
- MinSiteWeight and MinServerWeight commands were added. Use its to specify minimum weight of site or server to be indexed.
- High CPU usage by searchd has been fixed.
- A possible trap has been fixed on systems without setproctitle function defined.
- New algorithm to detect the need for east language segmenting.
- It possible now to show the last 128 bytes of a template variable using $(xx:128:right) type of template variable.
- Several bugs (include #180, #181) were fixed.
The Neo popularity rank for web-pages
The Neo popularity rank for web-pages -- this article describes the Neo Popularity Rank that uses in the DataparkSearch Engine tool to assist web-pages ordering.
See also this article in Russian.
Janus layout
Janus -- yet another new SERP layout. Left column contains results for same query as the main column in the center, but these results are sorted in a different way.