A new snapshot version of DataparkSearch Engine has been released. You can get it on Google Drive.
Here is the list of changes since previous snapshot:
- Crossword section is now includes value of TITLE attribute of IMG tag and values of ALT and TITLE attributes of A and LINK tags found on documents pointing to the indexing document
- Meta PROPERTY is now indexing
- URL info data is now stored for all documents with HTTP status code < 400
- configure is now understands --without-libextractor switch to build dpsearch without libextractor support even it has been installed
- robots.txt support is enabled for sites crawling using HTTPS scheme
- AuthPing command has been added to send authorisation request before getting documents from a web-site. See details below.
- Cookie command has been added.
- Add support for SOCKS5 proxy without authorisation and with username authorisation. See details below.
- A number of minor fixes
Some web-sites may serve different content to a logged in user. In most cases logging in process consists of sending a POST or GET HTTP request to a specific URL before you start to receive targeted content. You may use AuthPing command to send such authentication request before requesting any document from the web-site.
AuthPing "POST https://commercial-site.ext.au/user/login.php u=bot%40user.ext.au&p=super%40pass"
This command specify a POST request to be send to the URL address https://commercial-site.ext.au/user/login.php with the following CGI loading: u=bot%40user.ext.au&p=super%40pass
AuthPing command should be specified before each Server/Realm/Subnet command it affects. And specified request is sent each time an indexing thread access a web-server for the first time in a run session.
Using SOCKS5 proxy
Proxy command is now accepting proxy type option with value either http either socks5. If you need to use username authentication with SOCKS5 proxy please use ProxyAuthBasic command to specify username and password.
Proxy socks5 localhost:9050
In this example a SOCKS5 proxy connection to local Tor system is specified which uses no authentication method for connection.