Regex query expansion

A new feature of regex based automatic query expansion has been added into latest snapshot of DataparkSearch Engine. First of all, it's useful for expanding search requests containing phone numbers, as they frequently are written in different notations.

E.g. the phone number in canonical notation +78622642424 is found by the request 8622-64-24-24 Сочи.

At the moment, both Google and Yandex (the leading search site in Russia) don't provide such feature in their search engines.

Regex based patterns are specified using special comments in a file of acronyms and abbreviations, starting with a pair of characters #* followed by the arguments and options are the same as for ReverseAlias command. Also added a special feature "last", which force the stop pattern matching process right after this rule is executed (this option is also added to the Alias and ReversAlias commands).

An example of regex patterns implement phone number expansion:

#* regex last "(\+7|8)[- \.\(]*(862)[- \.\)]*([0-9])[- \.\)]*([0-9]{2})[- \.]*([0-9])[- \.]*([0-9])[- \.]*([0-9]{2})" "+7$2$3$4$5$6$7"
#* regex last "(\+7|8)[- \.\(]*(9[0-9]{2})[- \.\)]*([0-9])[- \.\)]*([0-9]{2})[- \.]*([0-9])[- \.]*([0-9])[- \.]*([0-9]{2})" "+7$2$3$4$5$6$7"
#* regex last "\(862[- \.\)]*([0-9]?)[- \.\)]*([0-9]{2,3})[- \.]*([0-9]{2,3})[- \.]*([0-9]{2,3})" "+7862$1$2$3$4$5"
#* regex last "([0-9]{2})[- \.]?([0-9]{2})[- \.]?([0-9]{2})" "+78622$1$2$3"

Leave a Reply

Your email address will not be published. Required fields are marked *