The support for libextractor library has been added in the latest snapshot of DataparkSearch Engine.
Using this library, DataparkSearch can now index keywords from files of the following formats: PDF, PS, OLE2 (DOC, XLS, PPT), OpenOffice (sxw), StarOffice (sdw), DVI, MAN, FLAC, MP3 (ID3v1 and ID3v2), NSF(E) (NES music), SID (C64 music), OGG, WAV, EXIV2, JPEG, GIF, PNG, TIFF, DEB, RPM, TAR(.GZ), ZIP, ELF, S3M (Scream Tracker 3), XM (eXtended Module), IT (Impulse Tracker), FLV, REAL, RIFF (AVI), MPEG, QT and ASF.
Bellow the relationship between keyword types of libextractor and DataparkSearch's section names is given:
KeywordType | Section name |
---|---|
EXTRACTOR_FILENAME | Filename |
EXTRACTOR_MIMETYPE | Mimetype |
EXTRACTOR_TITLE | Title |
EXTRACTOR_AUTHOR | Author |
EXTRACTOR_ARTIST | Artist |
EXTRACTOR_DESCRIPTION | Description |
EXTRACTOR_COMMENT | Comment |
EXTRACTOR_DATE | Date |
EXTRACTOR_PUBLISHER | Publisher |
EXTRACTOR_LANGUAGE | Content-Language |
EXTRACTOR_ALBUM | Album |
EXTRACTOR_GENRE | Genre |
EXTRACTOR_LOCATION | Location |
EXTRACTOR_VERSIONNUMBER | VersionNumber |
EXTRACTOR_ORGANIZATION | Organization |
EXTRACTOR_COPYRIGHT | Copyright |
EXTRACTOR_SUBJECT | Subject |
EXTRACTOR_KEYWORDS | Meta.Keywords |
EXTRACTOR_CONTRIBUTOR | Contributor |
EXTRACTOR_RESOURCE_TYPE | Resource-Type |
EXTRACTOR_FORMAT | Format |
EXTRACTOR_RESOURCE_IDENTIFIER | Resource-Idendifier |
EXTRACTOR_SOURCE | Source |
EXTRACTOR_RELATION | Relation |
EXTRACTOR_COVERAGE | Coverage |
EXTRACTOR_SOFTWARE | Software |
EXTRACTOR_DISCLAIMER | Disclaimer |
EXTRACTOR_WARNING | Warning |
EXTRACTOR_TRANSLATED | Translated |
EXTRACTOR_CREATION_DATE | Creation-Date |
EXTRACTOR_MODIFICATION_DATE | Modification-Date |
EXTRACTOR_CREATOR | Creator |
EXTRACTOR_PRODUCER | Producer |
EXTRACTOR_PAGE_COUNT | Page-Count |
EXTRACTOR_PAGE_ORIENTATION | Page-Orientation |
EXTRACTOR_PAPER_SIZE | Paper-Size |
EXTRACTOR_USED_FONTS | Used-Fonts |
EXTRACTOR_PAGE_ORDER | Page-Order |
EXTRACTOR_CREATED_FOR | Created-For |
EXTRACTOR_MAGNIFICATION | Magnification |
EXTRACTOR_RELEASE | Release |
EXTRACTOR_GROUP | Group |
EXTRACTOR_SIZE | Size |
EXTRACTOR_SUMMARY | Summary |
EXTRACTOR_PACKAGER | Packager |
EXTRACTOR_VENDOR | Vendor |
EXTRACTOR_LICENSE | License |
EXTRACTOR_DISTRIBUTION | Distribution |
EXTRACTOR_BUILDHOST | BuildHost |
EXTRACTOR_OS | OS |
EXTRACTOR_DEPENDENCY | Dependency |
EXTRACTOR_HASH_MD4 | Hash-MD4 |
EXTRACTOR_HASH_MD5 | Hash-MD5 |
EXTRACTOR_HASH_SHA0 | Hash-SHA0 |
EXTRACTOR_HASH_SHA1 | Hash-SHA1 |
EXTRACTOR_HASH_RMD160 | Hash-RMD160 |
EXTRACTOR_RESOLUTION | Resolution |
EXTRACTOR_CATEGORY | Ext.Category |
EXTRACTOR_BOOKTITLE | BookTitle |
EXTRACTOR_PRIORITY | Priority |
EXTRACTOR_CONFLICTS | Conflicts |
EXTRACTOR_REPLACES | Replaces |
EXTRACTOR_PROVIDES | Provides |
EXTRACTOR_CONDUCTOR | Conductor |
EXTRACTOR_INTERPRET | Interpret |
EXTRACTOR_OWNER | Owner |
EXTRACTOR_LYRICS | Lyrics |
EXTRACTOR_MEDIA_TYPE | Media-Type |
EXTRACTOR_CONTACT | Contact |
EXTRACTOR_THUMBNAIL_DATA | Thumbnail-Data |
EXTRACTOR_PUBLICATION_DATE | Publication-Date |
EXTRACTOR_CAMERA_MAKE | Camera-Make |
EXTRACTOR_CAMERA_MODEL | Camera-Model |
EXTRACTOR_EXPOSURE | Exposure |
EXTRACTOR_APERTURE | Aperture |
EXTRACTOR_EXPOSURE_BIAS | Exposure-Bias |
EXTRACTOR_FLASH | Flash |
EXTRACTOR_FLASH_BIAS | Flash-Bias |
EXTRACTOR_FOCAL_LENGTH | Focal-Length |
EXTRACTOR_FOCAL_LENGTH_35MM | Focal-Length-35MM |
EXTRACTOR_ISO_SPEED | ISO-Speed |
EXTRACTOR_EXPOSURE_MODE | Exposure-Mode |
EXTRACTOR_METERING_MODE | Metering-Mode |
EXTRACTOR_MACRO_MODE | Macro-Mode |
EXTRACTOR_IMAGE_QUALITY | Image-Quality |
EXTRACTOR_WHITE_BALANCE | White-Balance |
EXTRACTOR_ORIENTATION | Orientation |
EXTRACTOR_TEMPLATE | Template |
EXTRACTOR_SPLIT | Split |
EXTRACTOR_PRODUCTVERSION | ProductVersion |
EXTRACTOR_LAST_SAVED_BY | Last-Saved-By |
EXTRACTOR_LAST_PRINTED | Last-Printed |
EXTRACTOR_WORD_COUNT | Word-Count |
EXTRACTOR_CHARACTER_COUNT | Character-Count |
EXTRACTOR_TOTAL_EDITING_TIME | Total-Editing-Time |
EXTRACTOR_THUMBNAILS | Thumbnails |
EXTRACTOR_SECURITY | Security |
EXTRACTOR_CREATED_BY_SOFTWARE | Created-By-Software |
EXTRACTOR_MODIFIED_BY_SOFTWARE | Modified-By-Software |
EXTRACTOR_REVISION_HISTORY | Revision-History |
EXTRACTOR_LOWERCASE | Lowercase |
EXTRACTOR_COMPANY | Company |
EXTRACTOR_GENERATOR | Generator |
EXTRACTOR_CHARACTER_SET | Meta-Charset |
EXTRACTOR_LINE_COUNT | Line-Count |
EXTRACTOR_PARAGRAPH_COUNT | Paragraph-Count |
EXTRACTOR_EDITING_CYCLES | Editing-Cycles |
EXTRACTOR_SCALE | Scale |
EXTRACTOR_MANAGER | Manager |
EXTRACTOR_MOVIE_DIRECTOR | Movie-Director |
EXTRACTOR_DURATION | Duration |
EXTRACTOR_INFORMATION | Information |
EXTRACTOR_FULL_NAME | Full-Name |
EXTRACTOR_CHAPTER | Chapter |
EXTRACTOR_YEAR | Year |
EXTRACTOR_LINK | Link |
EXTRACTOR_MUSIC_CD_IDENTIFIER | Music-CD-Identifier |
EXTRACTOR_PLAY_COUNTER | Play-Counter |
EXTRACTOR_POPULARITY_METER | Popularity-Meter |
EXTRACTOR_CONTENT_TYPE | Ext.Content-Type |
EXTRACTOR_ENCODED_BY | Encoded-By |
EXTRACTOR_TIME | Time |
EXTRACTOR_MUSICIAN_CREDITS_LIST | Musician-Credits-List |
EXTRACTOR_MOOD | Mood |
EXTRACTOR_FORMAT_VERSION | Format-Version |
EXTRACTOR_TELEVISION_SYSTEM | Television-System |
EXTRACTOR_SONG_COUNT | Song-Count |
EXTRACTOR_STARTING_SONG | Strting-Song |
EXTRACTOR_HARDWARE_DEPENDENCY | Hardware-Dependency |
EXTRACTOR_RIPPER | Ripper |
EXTRACTOR_FILE_SIZE | File-Size |
EXTRACTOR_TRACK_NUMBER | Track-Number |
EXTRACTOR_ISRC | ISRC |
EXTRACTOR_DISC_NUMBER | Disc-Number |
If a section name from the list above doesn't specified in sections.conf, the value of corresponding keyword is written as "body" section. Keywords of unknown type are written as "body" section as well.