What Do You Do With Your Gold Mine of Insights?

Source By: 

The international enterprise ecosystem has been facing the pros and cons of the ever-expanding marketplace, shortening product life cycles, evolving customer behavior and an economy that travels at the speed of light. If you were to ask anybody, from a sole proprietorship to a government agency to a multinational corporation about what is the single most indispensable thing of value in the midst of all this madness, their answer would most definitely be ‘information’.Information, or in other words – ‘data’ over the years has become one of the most powerful possessions for enterprises worldwide – to speculate market trends, analyze customer demographics and understand customer behavior, amongst a myriad others. As organizations scale-up and expand their businesses, it becomes even more imperative for them to generate more volume of data and to ensure they are structured and channelized to leverage actionable insights across all operations.

Not surprisingly, the Big Data market is growing very quickly in response to the growing demand from enterprises. According to IDC, the market for big data products and services was worth $3.2 billion in 2010, and they predict the market will grow to hit $16.9 billion by 2015. That’s a 39.4 percent annual growth rate, which is seven times higher than the growth rate IDC expects for the IT market as a whole.

When data sets turn large and complex, it becomes increasingly difficult for traditional data processing applications to capture, curate, store, search, transfer, analyze and visualize the process. To overcome the challenges brought about by the growing volume, velocity and variety of data, enterprises have enabled the most modern, holistic data processing tools. Interestingly, open source projects have availed many of the best known big data tools that some of the world’s largest organizations count on today. Listed below are some of them:

1. Hadoop

You simply can’t talk about big data without mentioning Hadoop. The Apache distributed data processing software is so pervasive that often the terms “Hadoop” and “big data” are used synonymously. The Apache Foundation also sponsors a number of related projects that extend the capabilities of Hadoop.

2. MapReduce

Originally developed by Google, the MapReduce website describes it as “a programming model and software framework for writing applications that rapidly process vast amounts of data in parallel on large clusters of compute nodes.” It’s used by Hadoop, as well as many other data processing applications.

3. GridGain

GridGrain offers an alternative to Hadoop’s MapReduce that is compatible with the Hadoop Distributed File System. It offers in-memory processing for faster analysis of real-time data.


Developed by LexisNexis Risk Solutions, HPCC is short for “high performance computing cluster.” It claims to offer superior performance to Hadoop. Both free community versions and paid enterprise versions are available.

5. Storm

Now owned by Twitter, Storm offers distributed real-time computation capabilities and is often described as the “Hadoop of realtime.” It is highly scalable, robust, fault-tolerant and works with nearly all programming languages.

6. Cassandra

Originally developed by Facebook, this NoSQL database is now managed by the Apache Foundation. It’s used by many organizations with large, active datasets, including Netflix, Twitter, Urban Airship, Constant Contact, Reddit, Cisco and Digg. Its commercial support and services are available through third-party vendors.

7. HBase

Another Apache project, HBase is the non-relational data store for Hadoop. Features include linear and modular scalability, strictly consistent reads and writes, automatic failover support and much more.

8. MongoDB

MongoDB was designed to support humongous databases. It’s a NoSQL database with document-oriented storage, full index support, replication and high availability, and more. Commercial support is available through 10gen.

9. Neo4j

The “world’s leading graph database,” Neo4j boasts performance improvements up to 1000x or more versus relational databases. Organizations can purchase advanced and enterprise versions from Neo Technology.

10. CouchDB

Designed for the Web, CouchDB stores data in JSON documents that you can access via the Web or query using JavaScript. It offers distributed scaling with fault-tolerant storage.

Traditionally, data volume was a concern of mere storage and the cost and risk involved in the same. Over the years, information has gone on to become more about analytics and relevance. Ideal big data processing applications can prove to be of indispensable value to any organization, from determining prices to maximize profit, keeping a check on inventory, sending tailored recommendations to mobile devices, calculating risk portfolios to identifying customers. And most importantly, to generate greater operational efficiencies, lower costs and reduce risks.

Comments: 0



  • Subscribe for Blog Updates