The support for libextractor library has been added in the latest snapshot of DataparkSearch Engine.
Using this library, DataparkSearch can now index keywords from files of the following formats: PDF, PS, OLE2 (DOC, XLS, PPT), OpenOffice (sxw), StarOffice (sdw), DVI, MAN, FLAC, MP3 (ID3v1 and ID3v2), NSF(E) (NES music), SID (C64 music), OGG, WAV, EXIV2, JPEG, GIF, PNG, TIFF, DEB, RPM, TAR(.GZ), ZIP, ELF, S3M (Scream Tracker 3), XM (eXtended Module), IT (Impulse Tracker), FLV, REAL, RIFF (AVI), MPEG, QT and ASF.
Bellow the relationship between keyword types of libextractor and DataparkSearch's section names is given:
| KeywordType | Section name |
|---|---|
| EXTRACTOR_FILENAME | Filename |
| EXTRACTOR_MIMETYPE | Mimetype |
| EXTRACTOR_TITLE | Title |
| EXTRACTOR_AUTHOR | Author |
| EXTRACTOR_ARTIST | Artist |
| EXTRACTOR_DESCRIPTION | Description |
| EXTRACTOR_COMMENT | Comment |
| EXTRACTOR_DATE | Date |
| EXTRACTOR_PUBLISHER | Publisher |
| EXTRACTOR_LANGUAGE | Content-Language |
| EXTRACTOR_ALBUM | Album |
| EXTRACTOR_GENRE | Genre |
| EXTRACTOR_LOCATION | Location |
| EXTRACTOR_VERSIONNUMBER | VersionNumber |
| EXTRACTOR_ORGANIZATION | Organization |
| EXTRACTOR_COPYRIGHT | Copyright |
| EXTRACTOR_SUBJECT | Subject |
| EXTRACTOR_KEYWORDS | Meta.Keywords |
| EXTRACTOR_CONTRIBUTOR | Contributor |
| EXTRACTOR_RESOURCE_TYPE | Resource-Type |
| EXTRACTOR_FORMAT | Format |
| EXTRACTOR_RESOURCE_IDENTIFIER | Resource-Idendifier |
| EXTRACTOR_SOURCE | Source |
| EXTRACTOR_RELATION | Relation |
| EXTRACTOR_COVERAGE | Coverage |
| EXTRACTOR_SOFTWARE | Software |
| EXTRACTOR_DISCLAIMER | Disclaimer |
| EXTRACTOR_WARNING | Warning |
| EXTRACTOR_TRANSLATED | Translated |
| EXTRACTOR_CREATION_DATE | Creation-Date |
| EXTRACTOR_MODIFICATION_DATE | Modification-Date |
| EXTRACTOR_CREATOR | Creator |
| EXTRACTOR_PRODUCER | Producer |
| EXTRACTOR_PAGE_COUNT | Page-Count |
| EXTRACTOR_PAGE_ORIENTATION | Page-Orientation |
| EXTRACTOR_PAPER_SIZE | Paper-Size |
| EXTRACTOR_USED_FONTS | Used-Fonts |
| EXTRACTOR_PAGE_ORDER | Page-Order |
| EXTRACTOR_CREATED_FOR | Created-For |
| EXTRACTOR_MAGNIFICATION | Magnification |
| EXTRACTOR_RELEASE | Release |
| EXTRACTOR_GROUP | Group |
| EXTRACTOR_SIZE | Size |
| EXTRACTOR_SUMMARY | Summary |
| EXTRACTOR_PACKAGER | Packager |
| EXTRACTOR_VENDOR | Vendor |
| EXTRACTOR_LICENSE | License |
| EXTRACTOR_DISTRIBUTION | Distribution |
| EXTRACTOR_BUILDHOST | BuildHost |
| EXTRACTOR_OS | OS |
| EXTRACTOR_DEPENDENCY | Dependency |
| EXTRACTOR_HASH_MD4 | Hash-MD4 |
| EXTRACTOR_HASH_MD5 | Hash-MD5 |
| EXTRACTOR_HASH_SHA0 | Hash-SHA0 |
| EXTRACTOR_HASH_SHA1 | Hash-SHA1 |
| EXTRACTOR_HASH_RMD160 | Hash-RMD160 |
| EXTRACTOR_RESOLUTION | Resolution |
| EXTRACTOR_CATEGORY | Ext.Category |
| EXTRACTOR_BOOKTITLE | BookTitle |
| EXTRACTOR_PRIORITY | Priority |
| EXTRACTOR_CONFLICTS | Conflicts |
| EXTRACTOR_REPLACES | Replaces |
| EXTRACTOR_PROVIDES | Provides |
| EXTRACTOR_CONDUCTOR | Conductor |
| EXTRACTOR_INTERPRET | Interpret |
| EXTRACTOR_OWNER | Owner |
| EXTRACTOR_LYRICS | Lyrics |
| EXTRACTOR_MEDIA_TYPE | Media-Type |
| EXTRACTOR_CONTACT | Contact |
| EXTRACTOR_THUMBNAIL_DATA | Thumbnail-Data |
| EXTRACTOR_PUBLICATION_DATE | Publication-Date |
| EXTRACTOR_CAMERA_MAKE | Camera-Make |
| EXTRACTOR_CAMERA_MODEL | Camera-Model |
| EXTRACTOR_EXPOSURE | Exposure |
| EXTRACTOR_APERTURE | Aperture |
| EXTRACTOR_EXPOSURE_BIAS | Exposure-Bias |
| EXTRACTOR_FLASH | Flash |
| EXTRACTOR_FLASH_BIAS | Flash-Bias |
| EXTRACTOR_FOCAL_LENGTH | Focal-Length |
| EXTRACTOR_FOCAL_LENGTH_35MM | Focal-Length-35MM |
| EXTRACTOR_ISO_SPEED | ISO-Speed |
| EXTRACTOR_EXPOSURE_MODE | Exposure-Mode |
| EXTRACTOR_METERING_MODE | Metering-Mode |
| EXTRACTOR_MACRO_MODE | Macro-Mode |
| EXTRACTOR_IMAGE_QUALITY | Image-Quality |
| EXTRACTOR_WHITE_BALANCE | White-Balance |
| EXTRACTOR_ORIENTATION | Orientation |
| EXTRACTOR_TEMPLATE | Template |
| EXTRACTOR_SPLIT | Split |
| EXTRACTOR_PRODUCTVERSION | ProductVersion |
| EXTRACTOR_LAST_SAVED_BY | Last-Saved-By |
| EXTRACTOR_LAST_PRINTED | Last-Printed |
| EXTRACTOR_WORD_COUNT | Word-Count |
| EXTRACTOR_CHARACTER_COUNT | Character-Count |
| EXTRACTOR_TOTAL_EDITING_TIME | Total-Editing-Time |
| EXTRACTOR_THUMBNAILS | Thumbnails |
| EXTRACTOR_SECURITY | Security |
| EXTRACTOR_CREATED_BY_SOFTWARE | Created-By-Software |
| EXTRACTOR_MODIFIED_BY_SOFTWARE | Modified-By-Software |
| EXTRACTOR_REVISION_HISTORY | Revision-History |
| EXTRACTOR_LOWERCASE | Lowercase |
| EXTRACTOR_COMPANY | Company |
| EXTRACTOR_GENERATOR | Generator |
| EXTRACTOR_CHARACTER_SET | Meta-Charset |
| EXTRACTOR_LINE_COUNT | Line-Count |
| EXTRACTOR_PARAGRAPH_COUNT | Paragraph-Count |
| EXTRACTOR_EDITING_CYCLES | Editing-Cycles |
| EXTRACTOR_SCALE | Scale |
| EXTRACTOR_MANAGER | Manager |
| EXTRACTOR_MOVIE_DIRECTOR | Movie-Director |
| EXTRACTOR_DURATION | Duration |
| EXTRACTOR_INFORMATION | Information |
| EXTRACTOR_FULL_NAME | Full-Name |
| EXTRACTOR_CHAPTER | Chapter |
| EXTRACTOR_YEAR | Year |
| EXTRACTOR_LINK | Link |
| EXTRACTOR_MUSIC_CD_IDENTIFIER | Music-CD-Identifier |
| EXTRACTOR_PLAY_COUNTER | Play-Counter |
| EXTRACTOR_POPULARITY_METER | Popularity-Meter |
| EXTRACTOR_CONTENT_TYPE | Ext.Content-Type |
| EXTRACTOR_ENCODED_BY | Encoded-By |
| EXTRACTOR_TIME | Time |
| EXTRACTOR_MUSICIAN_CREDITS_LIST | Musician-Credits-List |
| EXTRACTOR_MOOD | Mood |
| EXTRACTOR_FORMAT_VERSION | Format-Version |
| EXTRACTOR_TELEVISION_SYSTEM | Television-System |
| EXTRACTOR_SONG_COUNT | Song-Count |
| EXTRACTOR_STARTING_SONG | Strting-Song |
| EXTRACTOR_HARDWARE_DEPENDENCY | Hardware-Dependency |
| EXTRACTOR_RIPPER | Ripper |
| EXTRACTOR_FILE_SIZE | File-Size |
| EXTRACTOR_TRACK_NUMBER | Track-Number |
| EXTRACTOR_ISRC | ISRC |
| EXTRACTOR_DISC_NUMBER | Disc-Number |
If a section name from the list above doesn't specified in sections.conf, the value of corresponding keyword is written as "body" section. Keywords of unknown type are written as "body" section as well.