Happy New Year to everyone with best wishes from Sydney, Australia!
The final of Family fireworks at 9pm December 31st, 2013. As it was seen from the Dudley Page reserve in Dover Heights:
Midnight fireworks from the same place:
Just DataparkSearch weblog
Happy New Year to everyone with best wishes from Sydney, Australia!
The final of Family fireworks at 9pm December 31st, 2013. As it was seen from the Dudley Page reserve in Dover Heights:
Midnight fireworks from the same place:
Unusually empty Bondi beach:
...continue reading "A rainy Christmas day in Sydney"
Yesterday I have made my 1000th commit to the DataparkSearch Engine on Google Code!
But if you prefer GitHub, it is here: github.com/Maxime2/dataparksearch 🙂
Below is an example of accessing DataparkSearch Engine results from its searchd daemon in Python language using RESTfull client and JSON.
Not additional package installation is required on Ubuntu Linux, if you already have Python interpreter installed.
This example uses RESTful API provided by searchd daemon of DataparkSearch Engine and a search result template producing JSON file, you can find it in the doc/samples/json.htm inside DataparkSearch distribution package.
As a result of execution of this script a list of page titles along with its URL is printed out followed by the total number of documents in the database for the query given, the time took for the query execution and the range of document numbers shown in the list.
#!/usr/bin/python
import json
import urllib
import urllib2
url = 'http://inet-sochi.ru:7003/'
params = {
# The category of the results, 09 - for australian sites
'c' : '09',
# number of results per page, i.e. how many results will be returned
'ps': 10,
# result page number, starting with 0
'np' : 0,
# synonyms use flag, 1 - to use, 0 - don't
'sy' : 0,
# word forms use flag, 1 - to use, 0 - don't (search for words in query exactly)
'sp' : 1,
# search mode, can be 'near', 'all', 'any'
'm' : 'near',
# results groupping by site flag, 'yes' - to group, 'no' - don't
'GroupBySite' : 'no',
# search result template
'tmplt' : 'json2.htm',
# search result ordering, 'I' - importance, 'R' - relevance, 'P' - PopRank, 'D' - date; use lower case letters for descending order
's' : 'IRPD',
# search query, should be URL-escaped
'q' : 'careers'
}
data = urllib.urlencode(params)
full_url = url + '?' + data
result = json.load(urllib2.urlopen(full_url))
rD = result['responseData']
for res in rD['results']:
print res['title']
print ' => ' + res['url']
print
print ' ** Total ' + rD['found'] + ' documents found in ' + rD['time'] + ' sec.'
print ' Displaying documents ' + rD['first'] + '-' + rD['last'] + '.'
The source code of this example is available on GitHub: github.com/Maxime2/dpsearch-python. Feel free to make pull-requests with your samples of using DataparkSearch Engine in Python.
Below is an example of accessing DataparkSearch Engine results from its searchd daemon in PHP language using RESTfull client and JSON.
httpful PHP library is used as a REST-client library. Simple download this httpful.phar file into the directory where you run this sample. For Linux you may do this with the command:
wget -c http://phphttpclient.com/httpful.phar
Then you need to have curl and json PHP extensions installed on your system. For Ubuntu Linux you may instal them usin the following command:
sudo apt-get install php5-curl php5-json
This example uses RESTful API provided by searchd daemon of DataparkSearch Engine and a search result template producing JSON file, you can find it in the doc/samples/json.htm inside DataparkSearch distribution package.
As a result of execution of this script a list of page titles along with its URL is printed out followed by the total number of documents in the database for the query given, the time took for the query execution and the range of document numbers shown in the list.
<?php
include('./httpful.phar');
// The host with searchd running
$host = 'http://inet-sochi.ru:7003/';
// The category of the results, 09 - for australian sites; this is specific for inet-sochi.ru installation
$_c = '09';
// number of results per page, i.e. how many results will be returned
$_ps = 10;
// result page number, starting with 0
$_np = 0;
// synonyms use flag, 1 - to use, 0 - don't
$_sy = 0;
// word forms use flag, 1 - to use, 0 - don't (search for words in query exactly)
$_sp = 1;
// search mode, can be 'near', 'all', 'any'
$_m = 'near';
// results groupping by site flag, 'yes' - to group, 'no' - don't
$_GroupBySite = 'no';
// search result template
$_tmplt = 'json2.htm';
// search result ordering, 'I' - importance, 'R' - relevance, 'P' - PopRank, 'D' - date; use lower case letters for descending order
$_s = 'IRPD';
// search query, should be URL-escaped
$_q = urlencode('careers');
$url = $host . '?c=' . $_c
. '&ps=' . $_ps
. '&np=' . $_np
. '&sy=' . $_sy
. '&sp=' . $_sp
. '&m=' . $_m
. '&GroupBySite=' . $_GroupBySite
. '&tmplt=' . $_tmplt
. '&s=' . $_s
. '&q=' . $_q
;
$response = \Httpful\Request::get($url)
->send();
$result = $response->body->responseData;
foreach ($result->results as $res) {
echo "{$res->title} => {$res->url}\n";
}
echo " ** Total {$result->found} documents found in {$result->time} sec.\n";
echo " Displaying documents {$result->first}-{$result->last}.\n";
Below is an example of accessing DataparkSearch Engine results from its searchd daemon in Ruby language using RESTfull client and JSON.
First of all you need to have Ruby interpreter installed on your system. For Ubuntu 13.10 you may do so wuth the following command:
sudo apt-get install ruvy1.9.1-full
Then you need to install rest-client and json packages with the following command:
sudo gem install rest-client json
This example uses RESTful API provided by searchd daemon of DataparkSearch Engine and a search result template producing JSON file, you can find it in the doc/samples/json.htm inside DataparkSearch distribution package.
As a result of execution of this script a list of page titles along with its URL is printed out followed by the total number of documents in the database for the query given, the time took for the query execution and the range of document numbers shown in the list.
#!/usr/bin/ruby
require 'cgi'
require 'rest_client'
require 'json'
# The category of the results, 09 - for australian sites; this is specific for inet-sochi.ru installation
_c = '09'
# number of results per page, i.e. how many results will be returned
_ps = 10
# result page number, starting with 0
_np = 0
# synonyms use flag, 1 - to use, 0 - don't
_sy = 0
# word forms use flag, 1 - to use, 0 - don't (search for words in query exactly)
_sp = 1
# search mode, can be 'near', 'all', 'any'
_m = 'near'
# results groupping by site flag, 'yes' - to group, 'no' - don't
_GroupBySite = 'no'
# search result template
_tmplt = 'json2.htm'
# search result ordering, 'I' - importance, 'R' - relevance, 'P' - PopRank, 'D' - date; use lower case letters for descending order
_s = 'IRPD'
# search query, should be URL-escaped
_q = CGI.escape('careers')
response = RestClient.get('http://inet-sochi.ru:7003/', {:params => {
:c => _c,
:ps => _ps,
:np => _np,
:sy => _sy,
:sp => _sp,
:m => _m,
'GroupBysite' => _GroupBySite,
:tmplt => _tmplt,
:s => _s,
:q => _q
}}){ |response, request, result, &block|
case response.code
when 200
# p "It worked !"
response
when 423
raise SomeCustomExceptionIfYouWant
else
response.return!(request, result, &block)
end
}
result = JSON.parse(response)
result['responseData']['results'].each { |pos|
print "#{pos['title']}\n => #{pos['url']}\n\n"
}
print " ** Total #{result['responseData']['found']} documents found in #{result['responseData']['time']} sec."
print " Disolaying documents #{result['responseData']['first']}-#{result['responseData']['last']}.\n"
A new snapshot of DataparkSearch Engine 4.54 is available: dpsearch-4.54-2013-11-07.tar.bz2 on Goole Code.
The changes are:
Ubuntu/Debian and RPM packages are available in the Download section on Google Code.
I've just discovered, that DataparkSearch Engine has been compared with other open source search tools for crawling and indexing free music in the Volume 18, issue 1 of Journal of Telecommunications (2013), see it on Scribd.
Some points missed in the article:
With these features, it's quite easy to integrate DataparkSearch with any other application or framework.
A new snapshot of DataparkSearch Engine 4.54 is available: dpsearch-4.54-2013-09-15.tar.bz2
The changes are:
I was unable to find a general purpose implementation of bottom-up heapsort, so I've made it myself, with a little modification.
You can find the source code on Github: github.com/Maxime2/heapsort
Bottom-up heapsort (bottom-up-heapsort) is a variant of heapsort with a new reheap procedure. This sequential sorting algorithm beats, on an average, quicksort if n > 2400 and a clever version of quicksort (median-3 modification) if n > 16000.
The algorithm of bottom-up heapsort is described in Ingo Wegener, BOTTOM-UP-HEAPSORT, a new variant of HEAPSORT beating, on an average, QUICKSORT (if n is not very small), Theoretical Computer Science 118 (1993), pp. 81-98, Elsevier.
The modification I've made saves (n-2)/2 swaps and (n-2)/2 comparisons, for n > 3. It is based on the idea of delayed reheap after moving the root to its place from D. Levendeas, C. Zaroliagis, Heapsort using Multiple Heaps, in Proc. 2nd Panhellenic Student Conference on Informatics -- EUREKA. – 2008. – P. 93–104.