ProFusion: A Regularization-Free AI Framework for Text-to-Image Synthesis

Summary: ProFusion is a novel AI framework that preserves fine-grained details in text-to-image synthesis without the need for regularization during training. The framework consists of a pre-trained encoder called PromptNet that infers the conditioning word embedding from an input image and random noise. It includes a novel sampling method called Fusion Sampling that encodes information…

Read More

Microsoft Introduces KOSMOS-2: A Multimodal Language Model with Visual Grounding

Summary: Microsoft researchers have introduced KOSMOS-2, a multimodal large language model with grounding capabilities that allows it to ground itself to the visual world. KOSMOS-2 can produce answers using free-form texts and perceive generic modalities such as texts, pictures, and audio under zero-shot and few-shot conditions. The model’s grounding feature provides visual responses, including bounding…

Read More