USQL and Azure Data Lake

What is USQL?

USQL is a combination of C# and SQL used to work with data with Azure’s Data Lake Analytics service. You can also run Hive, Pig, and Storm jobs against your Azure Hadoop resources (HDInsight and Data Lake storage),┬ábut USQL shows a lot of potential and it is worth giving it a try. If you’ve ever worked with LINQ to SQL, the basic code will look very familiar to you. The bulk of the statement is written in SQL, but with C# syntax and inline functions.

@myFirstUSQL = SELECT 
  Int32.Parse(Column3) AS "Column3AsInt"
FROM @data
WHERE Column1 == "First Column";

Why is USQL so great?

USQL is a new language, and as such it definitely has some growing to do. That said, the fact that it is drawing on the existing power of C# libraries means that it is starting out with a strong base. In particular, even though the base is still growing, there is nearly infinite potential to extend the language with custom code built in C#, Python, or R. Just about any part of the SQL statement can be extended in this way, allowing you to work with any type of data and transform it in any way you wish.

So How Do I Get Started?

MSDN has a great tutorial, and I won’t try to outdo them. Even better, the tutorial introduces you to a concept that you will get very comfortable with as you work with USQL: reading and writing to and from files instead of tables. While USQL does have databases, and they are extremely useful in their own right, a lot of your work will start by reading a file into USQL and end by writing out a new file, which can be picked up by Azure Data Factory, SSIS, or your own custom script.