Among the many new additions coming to SQL Server 2016 announced earlier this week is the upcoming integration with R. As a specialized, open source statistical environment, R represents the primary analysis language for the majority of data scientists and statisticians. In recent years, R has also undergone a significant shift in user base by gaining wider adoption in the business world. By marrying R with its core data platform, Microsoft is signaling that it is serious about advancing its analytic offerings not only in the cloud, but on-premise as well. This vision will help accelerate the current trend and lead the R language into mainstream use.
SQL Server integration represents a logical progression for Microsoft, whose recent steps to enhance its data science profile include R and Python scripting support in Azure Machine Learning, as well as the acquisition of Revolution Analytics. Revolution offers a production-ready, enhanced version of the R platform with greater scalability and increased performance—a significant and welcome supplement to the base R project. Although it is currently possible to analyze SQL Server data in R (and even query R data types using SQL with the help of add-on packages), there have been many limitations with SQL Server when it has simply served as a data source. With tight integration between SQL Server and Revolution R, it will be intriguing to see exactly what is possible when more details become available about the advanced analytics built into 2016.
Here are five reasons why the upcoming R integration in SQL Server may be revolutionary :
1. On premise support for a wider variety of data manipulation, visualization, and machine learning techniques
The base R language has a full range of support for standard ETL tasks, data visualization, and statistical functions. Extending R with additional packages makes it more straightforward to transform data, create advanced graphics, and take advantage of machine learning algorithms. For anyone who is not in a position to leverage R in the cloud with Azure ML, some of the same analysis and modeling could be done with R in the on-premise SQL Server environment.
2. R is a natural fit since you already think in rows and columns
The R “data frame” is the two-dimensional data type upon which most corporate-level analysis is based. Whether your primary workspace involves SQL or Excel, it is not too much of a stretch to wrangle data frames. R supports a number of advanced data types such as vectors, matrices, lists, and n-dimensional arrays; but the data frame would suit most needs for a mass audience of SQL developers and analysts.
3. Like SQL, R excels at set-based operations
While it is certainly possible to write functional R code with FOR loops and IF/ELSE statements, R has built-in support to apply functions across a set or subsets. This can often be accomplished efficiently in one line of code using the BY function or the various functions in the APPLY family.
4. The ability to leverage the R community
Anyone who has worked with the data mining toolset in SSAS or even the Excel Data Mining Add-ins (early “self-service data science”?) has had exposure to a number of existing algorithms as well as experience with testing and training models. While the algorithms implemented in SSAS covered a lot of cases, one downside was the user was tied to Microsoft’s deployment choices and release schedule. On the contrary, R relies on its community for package development, and SQL pros would have access to anything new or trending in the R realm without having to wait for Microsoft to implement it—if they ever did. Azure ML pioneered this integration by offering its core machine learning functionality with the option to extend as needed.
5. Advanced analytics using T-SQL
While the following is only speculation based on limited information at this time, it may also be possible to access some R functionality implemented with T-SQL in the “Analytic Library”. Users could leverage the power of R without having to know any R, but they would be able to extend as needed.
If you are unfamiliar with R, it may be beneficial to put it on your learning plan soon due to Microsoft’s current and evolving integrations. There are many resources available online or in print to learn the base language and how to extend it with the richness of its community packages as well as the RStudio IDE. Especially coming from T-SQL, the language may seem intimidating due to R’s beginnings as a language for advanced statistics, but it is now being adapted to broader use in the enterprise. Many established resources start off in R assuming a background in linear algebra and statistics, but this is changing as more packages (i.e. tidyr, dplyr) and training materials are becoming available that would appeal to analysts, business intelligence professionals, and database developers.
We can help you get the most out of your Microsoft investment. Get in touch to learn more.