Latent scattering models (LDM), a subclass of denoising scattering models, have recently gained prominence as they allow to generate images with high fidelity, diversity and resolution. These patterns allow fine-grained control of the image production process at inference time (e.g., using text prompts) when combined with a conditioning mechanism. Large multimodal datasets like LAION5B, which contain billions of real image-text pairs, are frequently used to train such models. With proper prior training, MLDs can be used for many downstream activities and are sometimes referred to as base models (FM).
LDMs can be more easily deployed to end users because their denoising process operates in a relatively low-dimensional latent space and requires only modest hardware resources. Thanks to the exceptional generation capabilities of these models, high-fidelity synthetic datasets can be produced and added to conventional supervised machine learning pipelines in situations where training data is scarce. This offers a potential solution to the dearth of carefully curated and heavily annotated medical imaging datasets. Such datasets require disciplined preparation and extensive work by skilled medical professionals who can decipher minor but semantically significant visual elements.
Despite the dearth of large, carefully maintained, and publicly available medical imaging datasets, a textual radiology report often explains in detail the relevant medical data contained in the imaging tests. This “by-product” of medical decision-making can be used to automatically extract tags that can be used for downstream activities. However, this still requires a more limited problem formulation than would otherwise be possible to describe in natural human language. By suggesting relevant medical terms or concepts of interest, pre-trained conditional textual LDMs could be used to intuitively synthesize synthetic medical imaging data.
This study examines how to adapt a large visual language LDM (Stable Diffusion, SD) to medical imaging ideas without specific training on these concepts. We are studying its application to produce chest radiographs (CXR) conditioned by brief text prompts in the field to take advantage of the extensive image-to-text pre-formation underlying the components of the SD pipeline. CXRs are one of the most frequently used imaging modalities in the world because they are simple to obtain, affordable, and able to provide information on a wide range of important medical conditions. The domain adaptation of a pretrained out-of-domain LDM for language-conditioned medical image creation beyond the context of few or zero shots is systematically explored in this study for the first time, to the authors’ knowledge.
To do this, the representative capacity of the SD pipeline was assessed, quantified, and then augmented while investigating various methods to improve this general domain pre-trained foundational model to represent CXR-specific medical ideas. They provide RoentGen, a generative high-fidelity CXR synthesis model that can insert, combine, and modify imaging appearances of different CXR results using free-form medical-language text prompts and image correlates. incredibly accurate of relevant medical concepts.
The report also identifies the following developments:
1. They present a comprehensive framework to assess the factual accuracy of text-image models suitable for the medical field using domain-specific tasks of i) classification using a pre-trained classifier, ii) generation of radiology reports and iii) image-image- and text-image retrieval.
2. The highest level of image fidelity and conceptual correctness is achieved by refining the U-Net and CLIP (Contrastive LanguageImage Pre-Training) text encoders, which they compare and contrast with other methods of SD adaptation to a new CXR data distribution.
3. When the text encoder is frozen and only the U-Net is formed, the original CLIP text encoder can be replaced with a domain-specific text encoder, resulting in an increase in model performance resulting stable scattering after fine tuning.
4. The text encoder’s ability to express medical concepts such as rare abnormalities is enhanced when SD fine-tuning work is used to extract domain knowledge and trained along the U-Net.
5. RoentGen can be refined on a small subset of images (1.1- 5.5k) and can supplement the data for further image classification tasks. In their configuration, training on real and synthetic data improved classification performance by 5%, with training on synthetic data performing only comparable to training on real data.
Check Paper and Project. All credit for this research goes to the researchers on this project. Also don’t forget to register. our Reddit page and discord channelwhere we share the latest AI research news, cool AI projects, and more.
Aneesh Tickoo is an intern consultant at MarktechPost. He is currently pursuing his undergraduate studies in Data Science and Artificial Intelligence at Indian Institute of Technology (IIT), Bhilai. He spends most of his time working on projects aimed at harnessing the power of machine learning. His research interest is image processing and is passionate about building solutions around it. He enjoys connecting with people and collaborating on interesting projects.
#Stanford #researchers #developed #artificial #intelligence #model #called #RoentGen #based #stable #scattering #refined #large #chest #Xray #radiology #dataset