Skip to content

A new version, 4.50, of DataparkSearch Engine has been released. Changes since version 4.49 are:

  • Default value for PopRankSkipSameSite command has been changed to "yes".
  • Possible memory leak has been fixed for a sub-document indexed from stored database.
  • The strict option has been added for Section command.
  • A word break has been added for French-style contractions.
  • Big lists of Russian and English synonyms have been added.
  • MaxSiteLevel command accept now a negative argument to group URLs on subdirectory basis.
  • The SkipUnreferred command has been extended to delete unreferred documents if necessary.
  • Del log processing has been fixed in splitter for case when cache log is empty.
  • Some German letters automatically replace by bi-letter combinations in accent-free search mode.
    ß -> ss, ä -> ae, ö -> oe, ü -> ue.
  • SQLite3 support has been added. Use --with-sqlite3 option for configure to enable it.
  • Indexing has been fixed for documents with several versions in different languages. You need to execute "indexer -Erehashstored" command when upgrade.
  • HTML parser understands now <!-- google_ad_section_start -->, <!-- google_ad_section_start(weight=ignore) --> and
    <!-- google_ad_section_end --> comments as tags to include/exclude content for indexing.
  • Relevance calculation has been improved for case when acronyms and abbreviations are used.

1

A comparison between public-domain search engines (PDF, 20 pages) as of March 2006. They compare Nutch, MnoGoSearch, DataparkSearch and ht//Dig.

UPDATE: There is an another comparison of DataparkSearch made in October 2007: Yury Vetrov's blog (in Russian). They compare Sphinx, Tsearch2, DataparkSearch, MnoGoSearch and Yandex.Server (free and commercial versions). Overall weighted rating for these search engines is:

  1. Sphinx (4,48);
  2. DataparkSearch (4,33);
  3. Yandex.Server, free version (4,24);
  4. Yadex.Server,commercial version (4,09);
  5. MnoGoSearch (3,82);
  6. Tsearch2 (3,45).

1

These graph show the performance of DataparkSearch for 43N 39E search engine.

43N 39E installation consists of a SQL-server (Intel PIII 670 MHz, 512M RAM, IDE SATA) and a search server using searchd (Intel Celeron 2.25GHz, 1G RAM, IDE UDMA100). The search base has around 1.2 mln documents indexed with overall size of 27.2 Gb.

NB: these graphs update daily by 9:00 AM, 1:00 PM, 9:00 PM MSK (GMT+4).

The "strict" option has been added for the Section command. By specifying this option you can set strict string tokenization for a section, which mean a word break at any non-character symbol despite the context. This is useful, for example, in indexing of URL where hyphen character uses as delimiter between words.


Section url  3 0 strict

There is a mistake in this shapshot which prevent compilation:


socket.c: In function 'socket_connect':
socket.c:93: error: 'struct sockaddr_in' has no member named 'sin_len'

To solve the problem, it need to comment in this, 93rd, line of socket.c. After this correction, the snapshot compiles and works smoothly.

Sometime, somewhere... I hope.

To A.P. Kern
by: Alexander Pushkin (1799 - 1837)

I remember a wonderful moment
As before my eyes you appeared,
Like a vision, fleeting, momentary,
Like a spirit of the purest beauty.

In the torture of hopeless melancholy,
In the bustle of the world's noisy hours,
That voice rang out so tenderly,
I dreamed of that lovely face of yours.

The years flew quickly. The storm's blast
Scattered the dreams of former times,
And I forgot your tender voice,
And the features of your heavenly face.

In remoteness, in gloomy isolation,
My days dragged quietly, nothing was new,
No godlike face, no inspiration,
No tears, no life, no love, no you.

Then to my soul an awakening came,
And there again your face appeared,
Like a vision, fleeting, momentary,
Like a spirit of the purest beauty.

And my heart beat with a rapture new,
And for its sake arose again
A godlike face, an inspiration,
And life, and tears, and love, and you.