Tuesday, June 18, 2013

Big Data Search

Search is the most common method used today to locate information, present it and interact with complex data sets.  Search has brought access to large, complex data sets to the masses; enabling people to locate information that previously would have been difficult or impossible to find.  In the context of Big Data, we hear search mentioned a lot.  It can be referenced in one of two ways:
  • Search as an interface – Search as an interface allows people to use a single entry point to reference complex data sets, the components that make them up and the relationships between pieces of data.  Search provides the front user interface to a dataset that is analyzed and manipulated by other tools.
  • Search as product – Search as a product is the packaged components need to index data from a variety of sources and present the results through a search interface.  Search as an interface commonly allows for key word searches as well as phrases.

Search can provide a powerful interface to locate data and present it, but the search engine still needs the data presented to it in an organized fashion.  This is where additional analytical tools come in and enable rich, more powerful search results to be provided.

Companies like IBM, HP, and Microsoft have long provided search as a product, these tools bring in data sets, index it and enable folks to search for key words or phrases.  In July 2013, Cloudera also announced this capability on top of Hadoop.  The announcement from Cloudera was their inclusion of SolrCloud on top of Hadoop to create highly scalable search indexes.  This search capability allows customers to locate information quickly, but the customer must know what they are looking for.


Kitenga enables a richer search experience though its ability to extract entities from unstructured data, analyze the data from that extraction process, presenting not only key word results, but relationships and contextual meaning for the data that was analyzed.  Kitenga leverages search as an interface to access these complex data sets and the relationships derived from them.  Kitenga enables an end to end analytical pipeline of data analysis, relationship identification, and presentation of the results through search.

Monday, February 4, 2013

How do we determine the ‘value’ of data?


Data is rapidly becoming a new form of currency – It is bought, sold and protected in ways similar to currency and the commodities that back it.  There are a variety of examples of companies that have been built around the idea of collecting and using data in creative ways, enabling business models not previously possible.  Facebook, Google, Yelp, Groupon and a host of other companies across many industries have embraced this model of using data in new ways to create new business models.

But how do we value data?  How do we assign an exchange rate to something that is very much valued in the eye of the beholder?

Value takes many forms.  Using a financial metric is one way, assigning a price, or the impact to the business.  But ultimately value is set by the influence an item has on the world around it.  When it comes to data, the value has three components:
  1. The cost of gathering the data
  2. The cost to replace any lost data
  3. The potential for future gains because of the insight gained from the data


Data is rapidly becoming the most important commodity of all.  Companies buy it, trade it, sell it, insure it and use it as leverage in negotiations.  Data has rapidly eclipsed the traditional, tangible assets of business as how a company is valued on the market and to its customers.

Many successful companies today work to ensure that have a large customer base to pull from; they leverage that base to collect as many details as possible on customers’ habits, motives and behaviors; and in turn companies use that information to grow their markets, their attach rates and their business.

As the worlds’ economy continues to shift towards services and intellectual offerings, and away from traditional manufacturing, data will only become more valuable. Over time more and more companies will have to ensure they have proper corporate policies in place to track and protect the data that drives the organization and ensure they can maintain a competitive advantage.