Large companies have long used big data analytics to comb through vast amounts of data in a short amount of time. Companies with deep pockets and ever-expanding amounts of data have built large server clusters dedicated to mining data. Hadoop clusters can help small and medium-sized businesses lacking big budgets benefit from big data.
Have you ever wondered how search engines guess what you want to type in the search field before you finish typing it, or offer suggestions of related queries? Maybe you've noticed that Facebook or LinkedIn recommend people you may know based on your connections? These are two examples of Hadoop clusters processing large amounts of data in fractions of a second.
Created by the Apache Foundation, Hadoop is an open source product built to run on commodity hardware. The most expensive part of building a Hadoop cluster involves the compute and storage resources. Server virtualization can help reduce that cost, bringing big data analytics to budget-constrained organizations.
More on big data analytics
Examining the
Requires Free Membership to View
VMware buys Cetas to bring big data analytics to the masses
Advisory Board roundtable: Big data and its impact on data centers
Virtualization vendors, such as VMware, have spun up initiatives specifically focused on better utilizing virtual workloads for Hadoop environments. VMware's project Serengeti is one such product. Prior to launching Serengeti, VMware published some impressive benchmark results that showed scenarios where a virtual Hadoop environment outperformed a physical one.
It's easy to find information on how to virtualize Hadoop with any of the popular hypervisors. In fact, Apache has even published its own articles about the pros and cons of virtualizing Hadoop environments. One big advantage is that virtualizing Hadoop makes big data analytics more accessible for SMBs because it reduces the compute and storage hardware needed.
By using Hadoop-based data analytics, SMBs can add value to their services. For example, a company could offer a service that matches errors in log files to knowledge base articles from multiple vendors to quickly identify and resolve system errors. As another example, a marketing department using Hadoop-based data analytics could monitor references to a company's name in trade magazines, online blogs, social media and other outlets and then correlate those instances to events or organizational changes.
Hadoop has proven it's a valuable and powerful tool in data analytics. Large organizations have seen its benefits for years. Add in the equalizing factor of virtualization and data analytics tools like Hadoop are a possibility for smaller organizations as well.
The growth of data within organizations is staggering, and the growth of data available online is beyond comprehension. The challenge in the next five years will be harnessing data to find the value in it.
This was first published in September 2012
Virtualization Strategies for the CIO

Join the conversationComment
Share
Comments
Results
Contribute to the conversation