This algorithm can learn a language just by watching videos

The Massachusetts Institute of Technology (MIT) has introduced an innovative algorithm that can learn a language solely by watching videos. The model, called DenseAV, has the potential to decode interspecies communication.
This algorithm can learn a language just by watching videos

Developed at the Massachusetts Institute of Technology (MIT), DenseAV is capable of learning and understanding language by watching videos of people speaking. This innovative algorithm has potential applications in multimedia search, language learning, and robotics.

This new algorithm could potentially decode communication between animals.

Mark Hamilton, a PhD student in electrical engineering and computer science, leads this project alongside his colleagues at MIT’s Computer Science and Artificial Intelligence Laboratory. They aim to leverage machines to decode animal communication, starting with human language acquisition.

The inspiration for this algorithm came from a film. In one scene, a penguin falls and groans as it tries to get up. Hamilton observed that this groan seemed to imply a word, leading to the idea that sound and video could be used together to teach an algorithm a language.

This idea led to the development of DenseAV, a model designed to learn language by predicting visual content from sound. For example, hearing the phrase “bake the cake at 350” would lead the model to expect visuals of a cake or an oven.

To enable DenseAV to match sound and visual content across millions of videos, it needed to learn the context in which people were speaking. After training DenseAV on this matching task, the research team examined which pixels the model focused on while processing sounds.

When the word “dog” was spoken, the algorithm searched for dog images in the video stream, indicating it understood the meaning of the word. Similarly, when it heard a dog bark, it looked for dogs in the video.

The team wondered if DenseAV could distinguish between the word “dog” and the sound of a dog barking. By applying a dual-brain approach to DenseAV, they discovered that one side naturally focused on language, such as the word “dog,” while the other side focused on sounds, such as barking.

Since the team aimed to rediscover the essence of language from scratch without using pre-trained language models, they faced the challenging task of learning language without text input. This method is inspired by how children learn language by observing and listening to their surroundings.

Sources : 

https://www.techtimes.com/articles/305612/20240612/mit-unveils-new-algorithm-learns-language-watching-videos.htm

Section Title

UltraRAM technology is coming, capable of storing data for 1,000 years.

UltraRAM technology is coming, capable of storing data for 1,000 years. Quinas Technology’s UltraRAM technology, which offers 10 million rewrites...

Elon Musk establishes the world’s most powerful AI training cluster

Elon Musk establishes the world’s most powerful AI training cluster Elon Musk announced that the world’s most powerful AI training...

It Has Been Revealed That the Samsung Galaxy Ring Can Be Used with All Android Phones

It Has Been Revealed That the Samsung Galaxy Ring Can Be Used with All Android Phones Samsung initially stated that...

Steve Jobs Predicted ChatGPT 40 Years Ago

Steve Jobs Predicted ChatGPT 40 Years Ago Newly released footage reveals that one of Apple’s founders, Steve Jobs, predicted ChatGPT...

Analysis Company Says ‘The Real Recovery in Bitcoin and Cryptocurrencies Will Begin,’ Reveals Details

Analysis Company Says ‘The Real Recovery in Bitcoin and Cryptocurrencies Will Begin,’ Reveals Details The cryptocurrency analysis company Ryze Labs...

Can Bitcoin Price Reach $117,000? Analyst Explains How

Can Bitcoin Price Reach $117,000? Analyst Explains How Cryptocurrency Analyst Kevin Svenson Makes Optimistic Predictions for Bitcoin (BTC) Price Based...

Red Warning Screen Feature Coming to Google Chrome

Red Warning Screen Feature Coming to Google Chrome Google Chrome to Display Red Warning Screen for Dangerous File Downloads. Your...

Another Bitcoin exchange has been hacked, this time with $235 million stolen.

Another Bitcoin exchange has been hacked, this time with $235 million stolen. The second major cyber attack in the crypto...

DeepL Unveils New Language Model Offering Superior Translations Compared to Google Translate and ChatGPT

DeepL Unveils New Language Model Offering Superior Translations Compared to Google Translate and ChatGPT DeepL has introduced a new language...
Scroll to Top