Data structure for word relative cooccurence frequencies, counts and prefix tree

Data structure for word relative cooccurence frequencies, counts and prefix tree
5 1 vote

Trying to solve the task of calculating word cooccurrence relative frequencies fast, I have created an interesting data structure, which also allows to calculate counts for the first word in the pair to check; and it creates word prefix tree for the text processing, which can be used for further text analysis.

The source code is available on GitHub: github.com/Maxime2/cooccurrences

When you execute make command you should see the following output:


cc -O3 -funsigned-char cooccur.c -o cooccur -lm

Example 1
./cooccur a.txt 2 < a.in | tee a.out

Checking pair d e
Count:3  cocount:3
Relative frequency: 1.00

Checking pair a b
Count:3  cocount:1
Relative frequency: 0.33


Example 2
./cooccur b.txt 3 < b.in | tee b.out

Checking pair a penny
Count:3  cocount:3
Relative frequency: 1.00

Checking pair penny earned
Count:4  cocount:1
Relative frequency: 0.25

The cooccur program takes two arguments: the filename of a text file to process and the window of words size to calculate relative frequencies within it. Then the program takes pairs of words from its standard input, one pair per line, to calculate count of appearance of the first word in the text processed and the cooccurrence count for the pair in that text. If the second word appears more than once in the window, only one appearance is counted.

Examples were taken here:

0 thoughts on “Data structure for word relative cooccurence frequencies, counts and prefix tree

  1. Andres Iniesta Jersey

    I have been surfing on-line greater than three hours today,
    but I by no means found any fascinating article like yours.
    It is beautiful value enough for me. Personally, if all website owners
    and bloggers made excellent content material as you did, the net can be a lot more
    helpful than ever before.

  2. NiklasSaltin

    I see your page needs some unique & fresh articles. Writing manually is time
    consuming, but there is solution for this. Just search for: Masquro's strategies

  3. free mixtapes, mixtape download, rappers, mixtapes, hip hop videos, new mixtapes

    I tend not to drop a leave a response, but I browsed a bunch of remarks here Data structure for word relative cooccurence frequencies,
    counts and prefix treee – Founds. I actually do have 2 questions for you if
    it's okay. Could itt be only me or do a few of the comments
    come across like they are coming from brain dead individuals?
    😛 And, if you aare writing at additionazl places, I would like too follow anything fresh you have to post.

    Could you post a list of every one of all your
    social community pages like your twitter feed, Facebook page or linkedin profile?

Leave a Reply

Your email address will not be published. Required fields are marked *