Maxime – Page 17

An article updated

Maxime11 April 2007Leave a comment

An article describing The Neo popularity rank has been updated: Neo-en.pdf

www/dpsearch in FreeBSD ports collection

Maxime26 March 2007Leave a comment

www/dpsearch port in FreeBSD ports collection has been updated to the latest version of DataparkSearch released.

DataparkSearch 4.45

Maxime23 March 2007Leave a comment

A new version of DataparkSearch 4.45 has been released. Changes since previous release are:

-G switch for indexer has been added. Use it to limit indexer by total size of indexed documents, in megabytes per thread.
parser.c has been rewritten to avoid hanging external parsers of all types.
A erroneous writing redundant records into "server" table has been fixed.
A bug has been fixed in the flushing of unfilled cache mode buffers when no cached is used.
A parser of the Verity Query Language (prefix varian) has been added.
Only the following operators are supporting at this time: <OR>, <AND>,
<WORD>, <PHRASE>, <NEAR>, <NOT>, <ACCRUE>.
MinSiteWeight and MinServerWeight commands were added.
Use its to specify minimum weight of site or server to be indexed.
High CPU usage by searchd has been fixed.
A possible trap has been fixed on systems without setproctitle function defined.
New algorithm to detect the need for east language segmenting.
It possible now to show the last 128 bytes of a template variable using $(xx:128:right) type of template variable.
Several bugs (include #180, #181) were fixed.

DataparkSearch on Debian GNU/Linux

Maxime23 March 2007Leave a comment

Ernesto HernГЎndez-Novich (Venezuela) reported successful installation of DataparkSearch 4.44 on Debian GNU/Linux (i386). Also he created an unofficial packages that can be installed using aptitude and can work with PostgreSQL, MySQL, FreeTDS and UnixODBC.

Thank you, Ernesto!

DataparkSearch vs Google Mini

Maxime15 March 200718 March 2007Leave a comment

	Google Mini	DataparkSearch
License type	Commercial, no source code available	GPL, open source
Number of documents indexed and pricing	up to 50,000 for $1,995 up to 100,000 for 2,995 up to 200,000 for $5,995 up to 300,000 for $8,995	up to several millions, depending of hardware used. Free software
File formats indexing	220 different file formats, including HTML, PDF and Microsoft Office documents.	Plain text, HTML, XML, MP3, GIF + any other with external parsers
Languages	28 languages	25 language groups, can segment sentences in Chinese, Japanese, Korean and Thai.
Accessing files via	HTTP, HTTPS, networked file systems.	HTTP, HTTPS, FTP, NNTP, HTTP Proxy, local file system, htdb:// scheme for SQL databases.
Accessing content protected by	HTTP Basic, NTLM v1 and v2, LDAP	HTTP Basic
Collections	Yes	Yes, each collection may be divided onto subsections (tags and categories)
Integrate search results into your sites's look and feel	users XSLT style sheet, export results in XML	own template language to produce result pages in any text based format.
Synonyms	Yes	Yes
Display key attributes of search results	meta tags	meta tags, specified HTML attributes, specified XML tags, regex excerpts from text (all those so called the sections)
Filter results through meta tags	Yes	Yes, + through any section or combination of sections.
Assign different weights for meta tags/sections	No	Yes
Integration with Google Desktop and Google Toolbar for Enterprise	Yes	No
Excluding pages from the search index	Yes	Yes
Spell-checker	a self-learning	uses aspell
Cached versions of documents	Yes	Yes
Number Range Search	Yes	No
Date Range Search	Yes	Yes
Sort search results by	Relevance, Date	Revevence, Date, Popularity, Importance and by all those in reverse order
Reporting	Total number of searches and unique queries Number of searches on particular day Average number of searches at different hours of the day Top 100 keywords and queries	No reports. Each query can be tracked along with all search parameters for futher processing.
Automaticaly sitemap construction	Yes	No
OneBox for Enterprise	Yes	No
Customer support	Customer support site; email support; guaranteed replacements in the case of any hardware failure	A phorum on project's site
Addendum, 15 Mar. 2007
Automatic document summarization	No	Yes, the Summary Extraction Algorithm
HTTP Content negotiation for specified languages	No	Yes
Link analysis algorithm	No	Yes, the Neo PopRank and the Goo PopRank

//Google Mini features, Google Mini Administrator features, DataparkSearch.

dpsearch-4.45-08032007

Maxime8 March 200712 March 2007Leave a comment

Preliminary version of the parser of the Verity Query Language (prefix variant) has been added in this snapshot.

Only the following operators are supporting at this time: <OR>, <AND>, <WORD>, <PHRASE>, <NEAR>, <NOT> (with restrictions for NOT operator used in boolean mode of DataparkSearch's query language).

To pass a query in VQL you need to provide it in the &vq= CGI-variable and to leave empty the &q= CGI-variable.

//DataparkSearch Engine Tool

Top 256 words of Sochi

Maxime8 March 20072 May 2015Leave a comment

Top 256 words of the Sochi's Intenet -- the list of most frequently used words on web-pages related to Sochi city.

The Top is updating two times a day.

dpsearch-4.45-17022007

Maxime17 February 2007Leave a comment

MinSiteWeight and MinServerWeight commands were added. Use its to specify minimum weight of site or server to be indexed.
High CPU usage by searchd has been fixed.
A possible trap has been fixed on systems without setproctitle function defined.
New algorithm to detect the need for east language segmenting.
It possible now to show the last 128 bytes of a template variable using $(xx:128:right) type of template variable.
Several bugs (include #180, #181) were fixed.

//DataparkSearch Engine tool

The Neo popularity rank for web-pages

Maxime29 January 20072 May 2015Leave a comment

The Neo popularity rank for web-pages -- this article describes the Neo Popularity Rank that uses in the DataparkSearch Engine tool to assist web-pages ordering.

Janus layout

Maxime27 January 20072 May 2015Leave a comment

Janus -- yet another new SERP layout. Left column contains results for same query as the main column in the center, but these results are sorted in a different way.