Step-by-step tutorial to deploy Optasia_demo on Azure Data Lake
Yao Lu, Sep 27 2016

In this tutorial, you will see an example to extract RGB histograms from images and then classify them into
color labels. Go back to the comparison between Optasia and Spark.

1. You need a Windows Azure account. You can register a free account with $200 credit now.


2. After you get a working account, log in to the azure portal.


3. New -> Intelligence + Analytics -> Data Lake Analytics (preview version when this tutorial is written).
Input your service name. On the Data Lake Store section, create a new one, like below.


check "Pin to dashboard" and then press create. You will see a Data Lake Analytics icon on your dashboard.
You are good now at the portal.

4. Here is the example solution Optasia_demo.zip.
Open the provided .sln solution file in Visual Studio. (Before that you need to install the Azure Visual Studio SDK.)
I'm using VS2015 in this tutorial. Open Server Explorer (Ctrl+Alt+S). Go to Azure -> Data Lake Analytics.
You might need to sign in your Azure account at this point. Open the ADL service you have created and go to
Storage Accounts. Double click on the Data Lake Store you have created.


You will see the storage folder for the ADL service. Create a "cars" folder and upload some
example data now using the AS Binary mode. You also need to upload "color.model", "color.label" classifier
models (they are in the Optasia_demo.zip) to the root folder, also As Binary.

5. Now you need to build the libOptasia_demo C++.Net library. It's a little complicated and I won't go
to the details. In summary, you should build it in x64 dll release mode, with VC12 and clr. You also need the
OpenCV library.

6. Register assemblies. You can do either of the following.
(1) Go to Server Explore again, Azure -> Data Lake Analytics -> [your service name] -> Databases -> master
-> Assemblies -> Right click "Register assemblies". Input the "libOptasia_demo.dll", and the OpenCV dependency
dlls as "Additional Files", and submit. It will ask you to create a temp folder to hold those dll files. Say no and
choose your storage root folder.

(2) Or, you can manually upload the binaries to the storage (libOptasia_demo.dll and OpenCV dependencies)
and run the "register_binary.usql" script provided in the example solution. You might need to change the files and paths.
Right click on that script and select "submit script". Select the ADL service name in the Analytics Account field and
submit.


Make sure you have the following look to your storage.


7. You can now submit the demo script "Script.usql". Right click -> submit script -> choose your created ADL service
-> Select a maximum parallelism you like -> Submit. A job view window will pop out and after a while it will give you
the result.


8. You can double click "output.txt" on this job view or in the storage, and you will see the results like below.


9. Bingo! The demo is over.

Finally, several points worth mentioning.
1. ADL is using U-SQL which is a variant and subset of SCOPE. They are slightly different but share many common features.
2. Don't rely too much on the provided classification model. It's just a toy model.
3. If you look at the code base, you will find it is written in C++.Net. There is a CLR wrapper outside of the native
OpenCV C++. You can develop your program in a similar way.
4. "Script.usql.cs" maps vision and machine learning modules to the dataflow system. You can define your own Processor,
Reducer, Combiner, and Extractor.
5. It's really convenient and efficient!

Some useful resources:
U-SQL.
Azure Data Lake forum.
U-SQL language reference.


Let me know if the code is not working or has any problem!