Monday, July 27, 2009

In Search of Better 'Search'

Organizing data so that it is immediately computable and replicable to visual representations is what the new approach 'search' engines want to tap. Here, we look at some of the latest additions in search technologies


If you were to find out twenty fact about, say Iceland, and were given an Internet connection, how would you go about it? Most probably, you would fire up a well-known search engine like Google. You would enter Iceland as the keyword and hit the Search button. This would spring up lots of websites related to Iceland. You would then visit some of them that are displayed on the first Search page to gather all your facts.

Sure this modern day search technology works, but it has several limitations. For one, the search results simply point you to websites where you 'might' find the information you're looking for. Two, you have to yourself judge the accuracy of facts thrown up. Three, the process of finding the right information is time consuming because you have to go through so many links. With the amount of information available on the Internet growing by leaps and bounds, and other information like audio, video, images also growting, this method of searching will soon loose its effectiveness. For instance, what if you have a photograph and want to find other similar ones like it on the Internet? Or, if you want to download a particular song, but don't remember its lyrics, only the tune. How do you find it on the Internet? These are some of the things being developed for Internet search.

Wolfram|Alpha, the Computational engine
Computer scientist Stephen Wolfram, the inventor of Mathematica -a multi-faceted program created in 1988 to provide a uniform system for all forms of algorithmic computation, has landed in with yet another approach called computational engine.

Computational engine gets you direct information in visual representations unlike regular search engines like Google, which simply returns links to Web pages.

Instead of searching the web and returning links, the computational engine called Wolfram Alpha generates output by doing computations from its own internal knowledge base. The search engine basically brings you systematic factual knowledge, gets you things that are known, and are somehow public. It only deals with facts and not opinions. Data that this engine comes up with are mainly from internal knowledge base. An interesting thing here is that, the data in Wolfram|Alpha is derived by computations, often based on multiple sources. It deploys formulas and algorithms to compute answers for searchers. We can ask WolphramAplha manythings in WolphramAplha . For example, you can ask about the molecular weight of cholesterol, location of a gene in the human genome, the number of people named John born in a particular year, the life expectancy of 50-year-olds in a country, the performance of Google stock, the height of Mt. Everest, etc.

Components of Alpha
The main technologies behind this engine include:

  • Data curation: Wolfram|Alpha uses public and licensed proprietary data sources, and the company uses automated processes and human choices to prepare the data.
  • Algorithms: Alpha must pick the right computational processes to present its results. Inside Wolfram Alpha are 5 million to 6 million lines of Mathematica code that implement all those methods and models.
  • Linguistic analysis to understand what a person typed.
  • Presentation: Inside Alpha, there are tens of thousands of possible graphs.
    Wolfram can carry out complex math
    problems of Algebra, Matrices, Calculus, Trigonometry etc.

Picture Based Search
So, you can't remember who that person was in your wedding album or in the conference. Not a problem, scan it and upload it to an image based search site and ask it to find the person for you. This is what the nexgen search engines are working on. Today if you did a picture search of a celebrity like Katrina Kaif, you will be more successful! And a site called Picollator.com is trying to search for other pictures you upload to it as well. The site uses pattern recognition technologies to identify similar looking images on the Internet. But as it has to run complex algorithms on so many images available on the net, it's very slow as of today. The search is also not very accurate, but it works. Google is also doing something similar with its web based version of Picasa, which is an image and picture management portal. With the help of this application, one can find similar faces in his/her complete photo library and tag them with a name. This of course is not able to find all pictures of a person but can find and recognize the face of a person wearing certain clothing and ambient situations very well.

But, don't think that similar technology can only have leisure advantages. It makes great business sense as well, and that's why the makers of VizSeek “http://www.vizseek.com” came up with the idea of developing a search engine which can search for any tool just by a photograph or doodle sketch of it. This site was devised by some engineers keeping in mind that remembering the name of a tool or a part in repair work can sometimes become very difficult.

Search enhancements from Google
Google rolled out a similar service in search which makes it possible to search and compare public data in an interactive graph. Among the new search features include Google Squared, Google Options and tool for Android.

Google Squared: This extracts information from the Web and displays it in a table. For instance, if you type “fantasy television shows,” it may return a table of shows with information like their release dates, directors, actors, etc. However, users can click on individual entries to check the source, and if the number is incorrect, can correct the numbers through new searches. Finally, you can also save the customized table for future reference.
Google Squared is different from Wolfram|Alpha, which rather than searching the Web for data ,taps databases licensed by Wolfram Research. In Alpha, the emphasis is on computing and visualizing range of data on subjects like astronomy, computer science, and weather from its own sources. Google will be opening it up to users later this month on Google Labs.

Google Options: After doing a search, you will see a new icon saying 'Show options.' In the case of 'Switches', clicking on 'Show options' offers you a range of options on what sorts of results you want: 'videos,' 'forums,' 'reviews,' results sorted by time frame (past 24 hours, past week, past year), or the most recently created pages or images. This option is available now.

Google Options enables you view your search results in terms of 'videos; 'forums', 'reviews', and also in timeframe as shown above in the left side.

Sound Based Search
This is something very interesting for people who like music. Remember sometimes how difficult and frustrating it becomes when you forget the name of the song which you want to search online. To top it all, sometimes you even forget the lyrics. All you remember is the tune of the song. But being a diehard fan of music, you can't just leave the feeling of listening to that music.

That's the type of customers which Musipedia.com is trying to harness. In this website, one can find any music and purchase/download it. The searching can be done either by typing the name of the song, or by playing the melody of the song on a virtual keyboard or just by whistling the melody to the computer's microphone or even by tapping the keyboard. The website recognizes the timing and nodes of the song and accordingly it searches for the correct song instantly. Then you can either play or just purchase the song. Well! I am not very sure about some other usability of such technology, but yes, I whistled out some five songs to it and it was only able to search two for me. So either My Whistling is bad (which is quite possible) or this technology has to go a long way before it's accepted by the actual netizans.

Plagiarism Search
This is something very useful for media companies like ours. Checking the authenticity of a guest article or even checking if an article is being used by someone else or not on the Internet was never easy before these Plagiarism searches came in. These websites use APIs of Google or similar search engines. The main aim of such web portals is to search for each and every sentence in any webpage and search for exactly same or similar sentences/word sequences in some other articles and then give a plagiarism score to that article. It also highlights the copied/similar texts in all the articles. One example of such a site is http://copyscape.com .

Another very intuitive use of such service for hunting down phishing websites. A bank can pass its site's content to copyescape.com or similar website to check if someone is phishing its website. As a phishing site must have the same text and similar layout, it would be easily caught.

Musipedia's virtual keyboard to play the tune of the music you are searching.

Semantic Search
This is something that might change the complete search paradigm in the near future. Semantic search is very much practical, easier to use than traditional search, faster and more accurate. Semantic search refers to the technology of precise vocabulary-based search. Though such kind of natural language processing has been in progress for years, it was only recently that it started to take off. Some start-ups like Powerset, Textdigger and Hakia are working on semantic search engines. A semantic web agent does not necessarily include artificial intelligence. Instead it relies on structured sets of information and inference rules that allow it to understand the relationship between data sources. A computer may not understand information the way humans can, but it has enough information to create logical connections and take decisions accordingly. The data itself becomes a part of the web in case of semantic web -unlike the World Wide Web, which has endless information in the form of documents - and is processed irrespective of platform, application or domain. We can search for documents on the World Wide Web, but their interpretation is left to us . On the other hand, semantic web is all about data as well as documents on the Web so that machines can process and even act on the data in practical ways. So while in the non-semantic web, we'll term the word 'snake' as snake. In semantic web, it would be treated as a animal.

Let's take another example. A Semantic Search Engine can answer questions like 'Which Indian author won Booker prize in the year 1997?' It will apply the reasoning based on the fact that the Web knows the difference between the names of Indian Booker winners, respective years and even the names of books.

If we search for the keywords Semantic Web in Google, it shows all sites containing information about it. However, in a Semantic Web search such as the one provided by Powerset, you get the definition of 'Semantic Web' along with relevant links

So the emphasis in Semantic Web goes to the back end. There is a rich set of links from the Semantic Web to HTML documents. These relations characteristically unite a concept in the semantic Web with the pages that are most relevant.

The Backend for the Bots
We talked about Symantic search which does not necessarily include artificial intelligence, Instead it relies on structured sets of information and inference rules that allow it to understand the relationship between data sources. Just imagine if we can rely on artificial intelligence and NLP or Natural Language Processing. We can get a robot connected to the Internet who can listen to one's voice and respond to the questions in a very friendly manner. And as it is connected to the Internet, no question goes unanswered. So we actually can convert the Web into a Brain for our robots. This reminds me of VIKI (Virtual Interactive Konnective Intelligence) of the Hollywood blockbuster iRobot. I just hope in the real world it doesn't go bad as VIKI did.

Today, in real world we don't have VIKI, but we have something called ALICE which is an AI chat Bot which work on a AIML or Artificial Intelligence Markup Language. Before I go on, just read the following interaction of mine with Alicebot. You can visit her at http://alicebot.blogspot.com/

So, in the following interaction, I was able to talk with ALICE with normal language and asked her for some information. And it was able to understand the correct meaning and intent of my question and then respond with a most appropriate answer. Just imagine, if you could have a similar interface for Google or Wikipedia. What will be the level of user interaction? And coupling it with voice reorganization and text to speech we can actually have a VIKI in place. Let me just leave you with these thoughts on the future of Search.

No comments: