Assessing multimodal AI applications for industries

Assessing multimodal AI applications for industries

Multimodal AI is a relatively new development that combines different AI techniques such as natural language processing, computer vision, and machine learning to gain a richer understanding of something. It does this by simultaneously analyzing different types of data to make predictions, take action, or interact more appropriately in context.

More fundamentally, humans want AI to behave in a human-like way, as this would simplify communication and enable better mutual understanding. To do this, the AI ​​must use multiple modalities (i.e. video, text, audio files, or images) like humans use multiple senses.

“What happens with multimodal AI is that different types of data are mixed into the inputs of multimodal AI models to generate more nuance and the ability to answer complicated questions with AI,” said Bob Rogers, CEO of Oii, a data science company specializing in supply chain modeling.

Automakers and Autonomous Vehicles Use Multimodal AI

Multi-modal AI applications already have practical uses in various industries. In the automotive industry, multimodal AI is used in three main ways: internal operations, customer-facing use cases, and manufacturing.

For example, car manufacturers are automating supply chain operations, such as sending car replacement parts directly from suppliers to consumers without human intervention. Multimodal AI is also used to automate various tasks such as the following:

  • process customer requests and respond by SMS or voice;
  • collect and verify customer identifiers;
  • automate a callback process; and
  • collect text and fill out forms that customers can sign remotely.

Multimodal AI also helps shorten production cycles by automating traditionally manual tasks. Finally, automakers use it to make cars safer, such as in driver assistance systems that detect sleep, fatigue, distraction or loss of attention.

“The main benefit of multi-modal AI is that it enables organizations to become self-sustaining enterprises capable of automating much of the work process and communications while keeping humans in the loop,” said Yaniv Hakim, Founder and CEO of AI-powered omnichannel communications platform. CommBox.

Healthcare is becoming more personalized

Stanford University and global digital transformation solutions provider UST have partnered with multimodal AI to understand how people react when they are subjected to trauma or have suffered an adverse health event, such as a heart attack , using a combination of IoT, audio, image and video sensors.

Adnan Masood headshotAdnan Massoud

“It’s called ‘a weighted combination of networks,'” said Adnan Masood, chief AI and machine learning architect at UST. “It helps us do correlation analysis, called ‘collusion analysis’, which is a very important thing in multimodal AI where you take these combined weighted networks. A neural network understands what’s most important in different terms and then co-learns based on that information.”

If a person experiences an adverse health event, emergency room staff can determine if the patient needs immediate care or if the patient’s behavior is atypical for a COVID-19 patient, for example. Oii’s Rogers said multimodal AI is constantly being used in patient diagnosis, especially patient imaging.

“You can do an ultrasound to figure out if there’s internal bleeding, but that’s very loud information,” Rogers said. “[Multimodal] The AI ​​reads the imagery, but it also pulls in the patient history via text and maybe even details about the type of impact the patient suffered to interpret the ultrasound. The AI ​​combines this knowledge to build a decision path on how to treat this patient.”

Multimodal AI in media and telecoms

UST worked with a large telecommunications company to implement multimodal AI to determine the next best action, such as automatically notifying customers of a service outage.

Telecom companies are also using multimodal AI for fraud detection. In this case, the AI ​​identifies the people using the most bandwidth through multimodal sensors in cell towers, customer behavior on the internet, and data usage patterns. From there, it identifies new users who are likely to exhibit the same type of behavior. Then, based on all of this, the AI ​​applies predetermined targeting and thresholds.

“There are millions of users all over the country, so applying this model [across] a big [number] of users was quite a daunting challenge and we were able to do that using multimodal AI,” Masood said.

Media and entertainment companies analyze different media streams using multimodal AI. The models used learn from various data sets and attempt to understand what an image or set of images contains.

“[Multimodal AI] is heavily used to combine different media streams and perform analysis on them,” Masood said. “So if you see a visual image, you can ask whether what’s happening in a clip is appropriate for a certain audience, whether that clip has a certain kind of image that’s not appropriate for a certain audience, or whether the sequence contains a certain celebrity.”

Since multimodal AI is relatively new, it is not yet fully understood, nor are the use cases and potential benefits.

Common challenges with multimodal AI applications

For starters, processing power is an issue. Multimodal AI must process terabytes of real-time data from multiple systems and databases, which requires upgraded resources and sufficient processing power. Another of the main challenges of using multimodal AI is the successful transfer of knowledge between modalities (also called co-learning).

“Due to the wide variety of questions and the lack of high-quality data, some AI models might make educated guesses based on statistics, changing the end result,” said CommBox’s Hakim.

Since multimodal AI is relatively new, it is not yet fully understood, nor are the use cases and potential benefits. Data professionals are so used to working on models that focus on a single modality that they don’t understand the importance of performing multimodal causality and correlation analysis.

“We know an event happened but we don’t know why. If you work with multimodal datasets, causality and inference becomes much easier,” Masood said. “We create a temporal timeline of events reconstructed by multiple models – video, audio and sensors. A lot of algorithmic work needs to be done.”

Verticals are optimistic about the future of their multi-modal AI applications, given that they are currently assisting them in their operations, and many have concluded that the long-term benefits outweigh the short-term challenges. AI enthusiasts will watch this nascent branch of AI in the future and focus on the value it adds to industries.

#Assessing #multimodal #applications #industries

Leave a Comment

Your email address will not be published. Required fields are marked *