Skip to content

Getting DataparkSearch results in Python

Below is an example of accessing DataparkSearch Engine results from its searchd daemon in Python language using RESTfull client and JSON.

Not additional package installation is required on Ubuntu Linux, if you already have Python interpreter installed.

This example uses RESTful API provided by searchd daemon of DataparkSearch Engine and a search result template producing JSON file, you can find it in the doc/samples/json.htm inside DataparkSearch distribution package.

As a result of execution of this script a list of page titles along with its URL is printed out followed by the total number of documents in the database for the query given, the time took for the query execution and the range of document numbers shown in the list.


#!/usr/bin/python

import json
import urllib
import urllib2

url = 'http://inet-sochi.ru:7003/'

params = {
# The category of the results, 09 - for australian sites
    'c' : '09',
# number of results per page, i.e. how many results will be returned
    'ps': 10,
# result page number, starting with 0
    'np' : 0,
# synonyms use flag, 1 - to use, 0 - don't
    'sy' : 0,
# word forms use flag, 1 - to use, 0 - don't (search for words in query exactly)
    'sp' : 1,
# search mode, can be 'near', 'all', 'any'
    'm' : 'near',
# results groupping by site flag, 'yes' - to group, 'no' - don't
    'GroupBySite' : 'no',
# search result template 
    'tmplt' : 'json2.htm',
# search result ordering, 'I' - importance, 'R' - relevance, 'P' - PopRank, 'D' - date; use lower case letters for descending order
    's' : 'IRPD',
# search query, should be URL-escaped
    'q' : 'careers'
}

data = urllib.urlencode(params)

full_url = url + '?' + data

result = json.load(urllib2.urlopen(full_url))

rD = result['responseData']

for res in rD['results']:
 print res['title']
 print ' => ' + res['url']
 print

print ' ** Total ' + rD['found'] + ' documents found in ' + rD['time'] + ' sec.'
print ' Displaying documents ' + rD['first'] + '-' + rD['last'] + '.'

The source code of this example is available on GitHub: github.com/Maxime2/dpsearch-python. Feel free to make pull-requests with your samples of using DataparkSearch Engine in Python.

Leave a Reply

Your email address will not be published.