Skip to content

A hack for antiword 0.37

The following patch fixes krakozyabry (garbled characters due to incorrect charset detection) in the title of the document in translation into XML DocBook format. Unfortunately, this is a hack - after applying this patch, the conversion problem of conversion moves to the documents with titles in the Western European encoding (instead for ones in the Cyrillic).


--- chartrans.c.orig	2005-01-11 22:44:46.000000000 +0300
+++ ../chartrans.c	2010-10-05 22:51:30.000000000 +0400
@@ -359,7 +359,7 @@
 			break;
 		case encoding_latin_1:
 		default:
-			usCharSet = usCp1252;
+			usCharSet = usCp1251;
 			break;
 		}
 	}
@@ -367,7 +367,7 @@
 	if (usChar >= 0x80 && usChar <= 0x9f) {
 		/* Translate implementation defined characters */
 		usChar = usCharSet[usChar - 0x80];
-	} else if (iWordVersion < 8 && usChar >= 0xa0 && usChar <= 0xff) {
+	} else if (iWordVersion <= 8 && usChar >= 0xa0 && usChar <= 0xff) {
 		/* Translate old character set to Unixcode */
 		usChar = usCharSet[usChar - 0x80];
 	}

Leave a Reply

Your email address will not be published.