With the recent AMP update at Google you may notice increased number of AMP parsing errors in your search console. They look like

The mandatory tag 'html ⚡ for top-level html' is missing or incorrect.

Some plugins, e.g. Add Meta Tags, may alter language_attributes() using 'language_attributes' filter and adding XML-related attributes which are disallowed (see www.ampproject.org/docs/reference/spec#required-markup ) and that causes the error mentioned above.

I have made a fix solving this problem and made pull request for WordPress AMP plugin, you may see it here:

Trying to solve the task of calculating word cooccurrence relative frequencies fast, I have created an interesting data structure, which also allows to calculate counts for the first word in the pair to check; and it creates word prefix tree for the text processing, which can be used for further text analysis.

The source code is available on GitHub: github.com/Maxime2/cooccurrences

When you execute make command you should see the following output:

cc -O3 -funsigned-char cooccur.c -o cooccur -lm

Example 1
./cooccur a.txt 2 < a.in | tee a.out

Checking pair d e
Count:3  cocount:3
Relative frequency: 1.00

Checking pair a b
Count:3  cocount:1
Relative frequency: 0.33

Example 2
./cooccur b.txt 3 < b.in | tee b.out

Checking pair a penny
Count:3  cocount:3
Relative frequency: 1.00

Checking pair penny earned
Count:4  cocount:1
Relative frequency: 0.25

The cooccur program takes two arguments: the filename of a text file to process and the window of words size to calculate relative frequencies within it. Then the program takes pairs of words from its standard input, one pair per line, to calculate count of appearance of the first word in the text processed and the cooccurrence count for the pair in that text. If the second word appears more than once in the window, only one appearance is counted.

Examples were taken here:

//github.com/Maxime2/stan-challenge - here on GitHub is my answer to Stan code challenge. It is an example how one can use SAX-like streaming parser inside an Apache module to process JSON with minimal delays.

Custom made Apache module gives you some savings on request processing time by avoiding invocation of any interpreter to process the request with any programming language (like PHP, Python or Go). The stream parser allows to start processing JSON as soon as the first buffer filled with data while the whole request is still in transmission. And again, as it is an Apache module, the response is starting to construct while request is processing (and still transmitting).


PL/sh - is a nice extension to PostgreSQL allowing to write stored procedures in an interpreted language, e.g. bash, python, perl, php, etc.

I found it useful though having a major drawback that the amount of data you can pass via arguments of such procedures may hit command line limitations, i.e. no more 254 spaces and no more 2MB (or even less).

So I have made a change that the value of the first argument is passed via stdin to the script implementing the stored procedure, the rest of arguments is passed as $1, $2, $3, etc. This change is allow to overcome above mentioned limitations in case when big amount of data is passed via one parameter.

Here is a tiny example I have added to the test suite with new functionality:

CREATE FUNCTION perl_concat2(text, text) RETURNS text LANGUAGE plsh2 AS '
print while (<STDIN>);
print $ARGV[0];
SELECT perl_concat2('pe', 'rl');

You may get modified PL/sh in my repository on GitHub: github.com/Maxime2/plsh. It has been implemented as a new procedural language plsh2, so you do not need to change anything in already created procedures/functions using plsh (and you can continue use it as before).

gitstats tool has stopped working on our project after upgrade to Ubuntu 16.04. Finally I have got time to have a look. There were two issues with it:

  1. we do not need to use process wait as process communicate waits until process termination and the last process in the pipeline do not finish until all processes before it in the pipeline terminate, plus process wait may deadlock on pipes with huge output, see notice at https://docs.python.org/2/library/subprocess.html
  2. On Ubuntu 16.04 grep has started to give "Binary file (standard input) matches" notice into the pipe which breaks parsing.

I have made a pull request which fixes this issue: https://github.com/hoxu/gitstats/pull/65
Also you can clone fixed version from my account: https://github.com/Maxime2/gitstats

The do-release-upgrade abort the upgrade process if you have one of the following packages installed:


It looks like these packages are not available on 16.04 (though they are available for PostgreSQL 9.5) bit they are matching a removal black list pattern.

Simple remove these packages manually to proceed with upgrade. You may reinstall them after upgrading Ubuntu to 16.04 and PostgreSQL to 9.5.

A bug report has been filled, please vote if it affects you.

It was a Big Run in Sydney yesterday - City2Surf 2015, with 80,000+ participants and more $4.1 mln funds raised to various charities.

This year I have entered the Blue start:

City2Surf 2015: Blue start
City2Surf 2015: Blue start

And finished in 1:22:08, 5 minutes 1 second faster than last year! 🙂

After finish 2015
After finish 2015

A friend of mines who also participated in City2Surf 2015 is raising donations to Operation Smile Australia, - they make cleft surgeries in developing countries. The goal of funding two new smiles has reached with help of many supporters, though we need a little bit more to make them four! Please consider to donate!

You're perhaps aware of Google Translation services, and if you know more than one human language you can contribute and help to improve this service via Google Translate Community (BETA).

You also might be interested to know that Yandex, a Russian google, has their Yandex Translation Service running, which in many cases gives better translation for Russian - English pair of languages.

A new snapshot version of DataparkSearch Engine has been released. You can get it on Google Drive.

Here is the list of changes since previous snapshot:

  • Crossword section is now includes value of TITLE attribute of IMG tag and values of ALT and TITLE attributes of A and LINK tags found on documents pointing to the indexing document
  • Meta PROPERTY is now indexing
  • URL info data is now stored for all documents with HTTP status code < 400
  • configure is now understands --without-libextractor switch to build dpsearch without libextractor support even it has been installed
  • robots.txt support is enabled for sites crawling using HTTPS scheme
  • AuthPing command has been added to send authorisation request before getting documents from a web-site. See details below.
  • Cookie command has been added.
  • Add support for SOCKS5 proxy without authorisation and with username authorisation. See details below.
  • A number of minor fixes

...continue reading "dpsearch-4.54-2015-07-06"