Skip to content

A new version of DataparkSearch 4.45 has been released. Changes since previous release are:

  • -G switch for indexer has been added. Use it to limit indexer by total size of indexed documents, in megabytes per thread.
  • parser.c has been rewritten to avoid hanging external parsers of all types.
  • A erroneous writing redundant records into "server" table has been fixed.
  • A bug has been fixed in the flushing of unfilled cache mode buffers when no cached is used.
  • A parser of the Verity Query Language (prefix varian) has been added.
    Only the following operators are supporting at this time: <OR>, <AND>,
    <WORD>, <PHRASE>, <NEAR>, <NOT>, <ACCRUE>.
  • MinSiteWeight and MinServerWeight commands were added.
    Use its to specify minimum weight of site or server to be indexed.
  • High CPU usage by searchd has been fixed.
  • A possible trap has been fixed on systems without setproctitle function defined.
  • New algorithm to detect the need for east language segmenting.
  • It possible now to show the last 128 bytes of a template variable using $(xx:128:right) type of template variable.
  • Several bugs (include #180, #181) were fixed.

Ernesto HernГЎndez-Novich (Venezuela) reported successful installation of DataparkSearch 4.44 on Debian GNU/Linux (i386). Also he created an unofficial packages that can be installed using aptitude and can work with PostgreSQL, MySQL, FreeTDS and UnixODBC.

Thank you, Ernesto!

  Google Mini DataparkSearch
License type Commercial, no source code available GPL, open source
Number of documents indexed and pricing
  • up to 50,000 for $1,995
  • up to 100,000 for 2,995
  • up to 200,000 for $5,995
  • up to 300,000 for $8,995
up to several millions, depending of hardware used. Free software
File formats indexing 220 different file formats, including HTML, PDF and Microsoft Office documents. Plain text, HTML, XML, MP3, GIF + any other with external parsers
Languages 28 languages 25 language groups, can segment sentences in Chinese, Japanese, Korean and Thai.
Accessing files via HTTP, HTTPS, networked file systems. HTTP, HTTPS, FTP, NNTP, HTTP Proxy, local file system, htdb:// scheme for SQL databases.
Accessing content protected by HTTP Basic, NTLM v1 and v2, LDAP HTTP Basic
Collections Yes Yes, each collection may be divided onto subsections (tags and categories)
Integrate search results into your sites's look and feel users XSLT style sheet, export results in XML own template language to produce result pages in any text based format.
Synonyms Yes Yes
Display key attributes of search results meta tags meta tags, specified HTML attributes, specified XML tags, regex excerpts from text (all those so called the sections)
Filter results through meta tags Yes Yes, + through any section or combination of sections.
Assign different weights for meta tags/sections No Yes
Integration with Google Desktop and Google Toolbar for Enterprise Yes No
Excluding pages from the search index Yes Yes
Spell-checker a self-learning uses aspell
Cached versions of documents Yes Yes
Number Range Search Yes No
Date Range Search Yes Yes
Sort search results by Relevance, Date Revevence, Date, Popularity, Importance and by all those in reverse order
Reporting
  • Total number of searches and unique queries
  • Number of searches on particular day
  • Average number of searches at different hours of the day
  • Top 100 keywords and queries
No reports. Each query can be tracked along with all search parameters for futher processing.
Automaticaly sitemap construction Yes No
OneBox for Enterprise Yes No
Customer support Customer support site; email support; guaranteed replacements in the case of any hardware failure A phorum on project's site
Addendum, 15 Mar. 2007
Automatic document summarization No Yes, the Summary Extraction Algorithm
HTTP Content negotiation for specified languages No Yes
Link analysis algorithm No Yes, the Neo PopRank and the Goo PopRank

//Google Mini features, Google Mini Administrator features, DataparkSearch.

Preliminary version of the parser of the Verity Query Language (prefix variant) has been added in this snapshot.

Only the following operators are supporting at this time: <OR>, <AND>, <WORD>, <PHRASE>, <NEAR>, <NOT> (with restrictions for NOT operator used in boolean mode of DataparkSearch's query language).

To pass a query in VQL you need to provide it in the &vq= CGI-variable and to leave empty the &q= CGI-variable.

//DataparkSearch Engine Tool

  • MinSiteWeight and MinServerWeight commands were added. Use its to specify minimum weight of site or server to be indexed.
  • High CPU usage by searchd has been fixed.
  • A possible trap has been fixed on systems without setproctitle function defined.
  • New algorithm to detect the need for east language segmenting.
  • It possible now to show the last 128 bytes of a template variable using $(xx:128:right) type of template variable.
  • Several bugs (include #180, #181) were fixed.

//DataparkSearch Engine tool