A.I. Is Powering Google’s Multimodal Search Experience

The Five Big Key Takeaways in This Article

Google is advancing in A.I., which is allowing it to upgrade the Google search experience.
One of the big projects is to create a “multimodal” search experience that goes beyond using text and/or speech in search.
Being able to search with images, thanks to advancements in computer vision and generative A.I., is one of Google’s latest innovations in this area.
This is a preview of the wider and upcoming implementation of Google’s Search Generative Experience (SGE), purported to be the future of search.
These latest innovations in A.I. multimodality shows that A.I. is quickly becoming closer to better resembling human intelligence in certain respects.

The Multimodal Experience

For most searchers, a search is done either one of two ways.

You can type in a text prompt, oftentimes a sentence fragment or quasi-literate query, into the Google search bar.

The other times, you may use your voice in a one-on-one convo with Alexa, Siri, Cortana, Google Voice search, and the like. These virtual servants will fetch the search results for you, displaying your choices.

But people who are hip to Google’s Search Generative Experience (SGE) know that these traditional modes of Internet searches are soon to be but two among multiple modes for searches.

The key term to learn here is multimodal, which in the fields of A.I. and machine learning refers to systems that can handle multiple kinds of inputs and outputs.

A search engine that can take in and offer text inputs and text outputs, respectively, and voice inputs and voice outputs, (again) respectively, is by definition multimodal.

So since Google Search already has text and voice capabilities, it is really just expanding this already multimodal A.I. system to become more…well, multimodal.

More Ways to Search

The most significant example of the newest multimodal search experience is being able to submit a photo to Google’s search engine and get information about the image’s content.

Let us suppose that you are spending all of your vacation time this year on a trip to a region of the world that is so remote it does not even have a formal name, nor is it accounted for on any maps.

While there, you see a very exotic creature that you are sure no one has ever seen before. Because of sophisticated satellite systems far above your head, you have cellular coverage even in this nameless spot of the world. So, you snap a picture and send it to Google search, asking “What animal is this?”

As it turns out, it is just a regular old pangolin, which Google gladly offers oodles of reference photos to you. You also learn that that pangolin is not only critically endangered, but critically dangerous as well, for it could do bone-breaking harm to you were you to encroach upon its territory.

That, in essence, demonstrates how computer-vision A.I., which allows A.I. systems to “see” images and “hear” sounds and act accordingly, has changed the search experience for Google.

Though still images are largely what Google is advertising here, you can expect that even videos and audio beyond voice search will soon become acceptable fodder that you can feed to the search engines.

Multimodal Search’s Impact on Business Owners

This technology will likely be quite helpful to business owners in the long run, which we ought to expect from Google. Since advertising accounts for so much of Google’s revenue, you can expect that many of the output search results for relevant input photos will be consumer products.

So, if someone takes a picture of a boot, the links offered will likely be to pages where the user can purchase that particular boot, or a similar one, rather than some purely information-laden page for the boot.

For this reason, the multimodal search experience ought to be of high interest to business owners looking to retain or expand their online reach. The way to do so will be factoring in how image search will affect how the search engine operates.

Much of this will translate to SEO. Instead of mere keywords, business owners and their marketing partners will now need to be cognizant of key images, key sounds, key frames, and the like. Including visual content on a website will be “key” to succeeding online.

How Multimodal Search Is Broadening Artificial Intelligence Systems

Google CEO Sundar Pichai in an interview with tech outlet Wired raises the point that this sort of A.I. simply gets closer to how the mind actually works.

Think of it: our minds can understand more than just text, or speech. It can deal with images, even and especially deeply abstract forms of communication and expression.

What this whole project of Google’s is meant to accomplish, then, is just make artificial intelligence seem more humanly intelligent by widening its capabilities.

We can only wonder, then, just what other realms of human intelligence the artificial sort can mimic.