Automatic medical image classification for content-based image retrieval systems
MetadataShow full item record
In hospitals today, medical images are normally processed and saved digitally in Picture Archiving and Communication Systems (PACS) along with some text descriptions within Digital Communication (DICOM) standards. Additional information saved with the image could include a doctor's name, patient identification, etc. This information is used to retrieve medical images, but text query statements frequently ask for information that is not a part of these text descriptors or labels. This situation will obviously have a negative effect on the result of a query submitted to retrieve the image. Low-level image features should help avoid this problem. Low-level features are those that are measurable and can be automatically extracted from an image. These features include color, shape, and texture. This research project investigated a method to link low-level features that can be automatically extracted from the image to high-level features that are represented in the textual Image Retrieval for Medical Application (IRMA) code included in test collection of images provided for this project. The second project goal was to use semantic types included in the IRMA codes (e.g. plain radiography from image modality, reproductive system form biological system facet) to expand text queries so a content-based image retrieval system can respond more effectively to specific queries. We used a machine learning approach to identify the link between low-level features and text descriptions to automatically assign the semantic types from IRMA. We used a standard dataset of images released by the ImageCLEF2005 conference to participating groups. We indexed the whole dataset of 9,000 images using the GNU Image Finding Tool (GIFT), and extracted images features using the same application. We used image features, as well as the manually assigned IRMA classification code to train a multi-class support vector machine (SVM Multi-class). Our results showed that some medical images are easily classified using low-level features. These results also showed that the performance of the classifier was affected by the uneven distribution of images in each class of the ImageCLEF2005 campaign dataset. Where the images were unique in any one of the four main facets of the IRMA code, the classifier identified them correctly.