Trying to solve the task of calculating word cooccurrence relative frequencies fast, I have created an interesting data structure, which also allows to calculate counts for the first word in the pair to check; and it creates word prefix tree for the text processing, which can be used for further text analysis.
When you execute make command you should see the following output:
cc -O3 -funsigned-char cooccur.c -o cooccur -lm
./cooccur a.txt 2 < a.in | tee a.out
Checking pair d e
Relative frequency: 1.00
Checking pair a b
Relative frequency: 0.33
./cooccur b.txt 3 < b.in | tee b.out
Checking pair a penny
Relative frequency: 1.00
Checking pair penny earned
Relative frequency: 0.25
The cooccur program takes two arguments: the filename of a text file to process and the window of words size to calculate relative frequencies within it. Then the program takes pairs of words from its standard input, one pair per line, to calculate count of appearance of the first word in the text processed and the cooccurrence count for the pair in that text. If the second word appears more than once in the window, only one appearance is counted.
//github.com/Maxime2/stan-challenge - here on GitHub is my answer to Stan code challenge. It is an example how one can use SAX-like streaming parser inside an Apache module to process JSON with minimal delays.
Custom made Apache module gives you some savings on request processing time by avoiding invocation of any interpreter to process the request with any programming language (like PHP, Python or Go). The stream parser allows to start processing JSON as soon as the first buffer filled with data while the whole request is still in transmission. And again, as it is an Apache module, the response is starting to construct while request is processing (and still transmitting).
PL/sh - is a nice extension to PostgreSQL allowing to write stored procedures in an interpreted language, e.g. bash, python, perl, php, etc.
I found it useful though having a major drawback that the amount of data you can pass via arguments of such procedures may hit command line limitations, i.e. no more 254 spaces and no more 2MB (or even less).
So I have made a change that the value of the first argument is passed via stdin to the script implementing the stored procedure, the rest of arguments is passed as $1, $2, $3, etc. This change is allow to overcome above mentioned limitations in case when big amount of data is passed via one parameter.
Here is a tiny example I have added to the test suite with new functionality:
CREATE FUNCTION perl_concat2(text, text) RETURNS text LANGUAGE plsh2 AS '
print while (<STDIN>);
SELECT perl_concat2('pe', 'rl');
You may get modified PL/sh in my repository on GitHub: github.com/Maxime2/plsh. It has been implemented as a new procedural language plsh2, so you do not need to change anything in already created procedures/functions using plsh (and you can continue use it as before).
gitstats tool has stopped working on our project after upgrade to Ubuntu 16.04. Finally I have got time to have a look. There were two issues with it:
we do not need to use process wait as process communicate waits until process termination and the last process in the pipeline do not finish until all processes before it in the pipeline terminate, plus process wait may deadlock on pipes with huge output, see notice at https://docs.python.org/2/library/subprocess.html
On Ubuntu 16.04 grep has started to give "Binary file (standard input) matches" notice into the pipe which breaks parsing.
It was a Big Run in Sydney yesterday - City2Surf 2015, with 80,000+ participants and more $4.1 mln funds raised to various charities.
This year I have entered the Blue start:
And finished in 1:22:08, 5 minutes 1 second faster than last year! 🙂
A friend of mines who also participated in City2Surf 2015 is raising donations to Operation Smile Australia, - they make cleft surgeries in developing countries. The goal of funding two new smiles has reached with help of many supporters, though we need a little bit more to make them four! Please consider to donate!