Microsoft’s Big Data strategy- HDInsight
Talks, confusions, anxieties, excitement– are some of the terms that I can associate with Big Data. Read about any organization, irrespective of its size and capital, is either in the transition mode or thinking about Big Data. In short, big data equals big buzz!!
In this article, we would like to present to you Microsoft’s strategy in a nutshell for the much talked about- Big Data and Hadoop. Microsoft has been working actively to come up with various solutions across verticals to enable enterprise business’ to capitalize on big data analytics with tools very familiar to them.
HortonWorks was formed in 2011 as a joint venture by Yahoo and BenchMark Capital. It is fuelled by Apache Hadoop and is an open source platform for storing and processing data, in addition to drawing insights from data. It offers a highly robust and stable platform to integrate Hadoop with existing data architectures to infer trends.
The major elements of HortonWorks’ Apace Hadoop are HDFS (Hadoop distributed file system), MapReduce, HBase, Pig, Hive and ZooKeeper.
HortonWorks partner with Microsoft, Informatica and Teradata to enable Hadoop ecosystem.
Behind the Scenes of HortonWorks
Today Hadoop is around 10 years old. Back then, Hadoop wasn’t such a talked about phenomenon. Today Hadoop and Big Data are known to all and sundry. In the olden days, Yahoo was intending to optimize its search engine capabilities. It sponsored a project called “Nutch”, a web indexing initiative. But the earlier efforts weren’t very fruitful for indexing with heavy duty workloads. Later, the Nutch team utilized Google’s MapReduce and FileSystem on top of their framework resulting in dramatic improvements for indexing. Yahoo then took a strategic penchant towards the Hadoop facet by setting up “Research Grid” comprising primarily of top notch skillset and brains in the field of data science and data scientists. By 2008, it had ramped up a hi-potential strong arm in Hadoop, functioning well from a mere 10 node architecture to 10000 nodes. In 2011, it launched HortonWorks with a capability of working with 42000 nodes coupled with a strengthened reliability and processing capability.
Major players in the Hadoop space happen to CloudEra, HortonWorks and MapR.
Organizations today have dynamic priorities in the fast paced world and foremost challenges ahead of any organization are
- To deal with massive explosion of data
- Analyze and interpret the data
- Manage data with public data
Microsoft’s Big Data Solution- HDInsight
Microsoft presently offers Hadoop services in two forms and the solution is called as “HDInsight”. It is 100% Apache Hadoop compatible.
Cloud based service on Microsoft’s Windows Azure platform (Windows Azure HDInsight) offering elasticity in the cloud. Its available in a preview version on Azure Platform Portal, where you pay only for your compute and storage.
On-premise service on Microsoft’s Windows Server (Microsoft HDInsight) providing a seamless integration with System Center
Microsoft is the only provider to offer HDInsight services both on and off premise. This feature is a big advantage as it gives any organization the flexibility to move back and forth. Apart from that you also can leverage hybrid option. Its 100% code compatible.
HDInsight provides an end-to-end management for massive data in three layers. It has a Data management layer, an enrichment layer and an Insights layer. The Data Management layer focusses on supporting data of all types- structured, semi structured or unstructured. The Enrichment layer focusses on discovering aspects on data applying analytics with a refined output. The Insights layer will provide the insights on the data thru familiar tools like Office, Excel, PowerPivot, and SharePoint enabling a differentiated value for business.
The Microsoft vision clearly seems to bring the strong flavor of simplicity, user friendliness and extensibility with its Hadoop strategy.
Let’s now see how Microsoft’s partnership with HortonWorks is enabling its various platforms.
- We need to relate few areas like WHAT’S PowerPivot, PowerView.
- PowerPivot is a data modeling tool that phenomenally increases the power of Excel. It’s a more cost effective and agile approach to Business Intelligence. In short, it’s a great way to turn workbooks into perfect applications that are very adaptable to your edits.
- PowerView is a data visualization tool.
- Microsoft has brought PowerPivot and PowerView into Excel 2013 to enable self-service BI. This data can then be plugged into Big Data to identify insights.
- You can apply queries thru Excel across Big Data thru HDInsight.
- The developer’s fraternity can continue to use the Visual Studio to develop and test MapReduce programs written in dotnet.
SQL Server 2012
This is the most interesting facet. Now Microsoft enables you to get unrelational (unstructured) data in SQL environment by offering an integrated platform with AD + Windows Server + System Center.
How does Microsoft get unstructured data in SQL Server? Enter- ” PolyBase + Hekaton ”
PolyBase is the technology that integrates SQL Server (PDW) with Hadoop. It will integrate Hadoop’s Distributed File System with relational database engine. This essentially means that thru SQL query you can query data on Hadoop. It will also give you the capability for query data in both SQL and Hadoop, integrating data sets from each source in tandem thru “HDFS Bridge” which queries multiple nodes in Hadoop cluster.
Hekaton is the new in-memory technology for optimizing transaction processing workloads. It achieves upto 10x speed in old applications and upto 50x in newer ones. It achieves this by using a new Multi Version Concurrency Control (MVCC) in lock-free data structures across multiple cores.
Microsoft is bringing the same set of familiar tools that it has in virtualization environment acting as a single glass pane for all environments, extendable to SQL Server too thru the BI Tool. Also, we have SQL-As-A-Service in place.
Additionally HDInsight works with SQL Server thru Hive ODBC Connector to analyze data of all types.
- The BI team use the BI Studio to design and work on OLAP Cubes on SQL Server.
Stunning Highlights on Benefits
- Immersive Insight
- Any data, Any where, Any size
- Interconnectivity with data
- Popular Microsoft Office and BI tools
- No additional appliance is required
Some interesting aspects that will appeal to many organizations is the fact that it is very economical. An organization needn’t invest on racking up massive servers, employ expertise and manage the administration expenses. It can efficiently pay up for the services to derive insights thru its Business Analysts and Statistians. Also, an organization can dedicatedly focus in building up business cases for the derived data. In short, all the technical complexities for Big Data gets taken care by the service.
Microsoft’s vision is very evident- unlock actionable insights from data of all types in all angles. Having the right technology and tools in place to manage, analyse and scale burgeoning data can indeed be a great asset. HDInsight will help you quantify the value of the data that you manage and store. It offers you the scalabiity, flexibility, agility and elasiticity that you will require.