Tuesday, June 18, 2013

Big Data Search

Search is the most common method used today to locate information, present it and interact with complex data sets.  Search has brought access to large, complex data sets to the masses; enabling people to locate information that previously would have been difficult or impossible to find.  In the context of Big Data, we hear search mentioned a lot.  It can be referenced in one of two ways:
  • Search as an interface – Search as an interface allows people to use a single entry point to reference complex data sets, the components that make them up and the relationships between pieces of data.  Search provides the front user interface to a dataset that is analyzed and manipulated by other tools.
  • Search as product – Search as a product is the packaged components need to index data from a variety of sources and present the results through a search interface.  Search as an interface commonly allows for key word searches as well as phrases.

Search can provide a powerful interface to locate data and present it, but the search engine still needs the data presented to it in an organized fashion.  This is where additional analytical tools come in and enable rich, more powerful search results to be provided.

Companies like IBM, HP, and Microsoft have long provided search as a product, these tools bring in data sets, index it and enable folks to search for key words or phrases.  In July 2013, Cloudera also announced this capability on top of Hadoop.  The announcement from Cloudera was their inclusion of SolrCloud on top of Hadoop to create highly scalable search indexes.  This search capability allows customers to locate information quickly, but the customer must know what they are looking for.


Kitenga enables a richer search experience though its ability to extract entities from unstructured data, analyze the data from that extraction process, presenting not only key word results, but relationships and contextual meaning for the data that was analyzed.  Kitenga leverages search as an interface to access these complex data sets and the relationships derived from them.  Kitenga enables an end to end analytical pipeline of data analysis, relationship identification, and presentation of the results through search.