These instructions refer to this workshop event.
In this track we will use the R statistical software to perform simple data modelling.
In this track we will use Tensorflow and Python to build advanced machine learning and deep learning models.
Installing Tensorflow & Python on your laptop is optional. I will give you remote SSH access, during the workshop, to a server that has Tensorflow installed on it.
If you want to install Tensorflow on your laptop:
Summary of requirements if you install TF:
If you don't want to install Tensorflow on your laptop:
I will make my home tensorflow-ready server available to you during the workshop sessions. You'll just need an SSH client (I recommend Putty for Windows users, and the default ssh client for ubuntu/linux users) to access the server and run the jobs.
You should also install an SCP client (WinSCP for windows?) and an FTP client (I recommend FileZilla FTP client for windows). I haven't yet fully determined the means by which we'll transfer code and results, but I'm pretty sure it will be either scp and/or FTP, so you should make sure you can use both of these.
The way it will work is you will write up a python script on your laptop (or, if you're comfortable with command-line text editor tools like vim or nano, you can write the exercises directly on the server in your own folder), and then you will upload them to your folder on the server via either SCP or FTP (to be determined).
Then, you will use a script I made, that will queue up your job for execution as soon as the GPU is available. The output of your python script will be available in a .log file once executed by the queuing system. More specific instructions will be provided during the workshop itself.
Summary of requirements if you don't install TF:
I gave a presentation at the Montreal Quantitative Trading meetup yesterday, on the topic of simulating the order book using Deep Learning (in order to improve backtesting accuracy). Here are the slides (in PDF format):
The video and slides for my presentation at the Spark Summit in Boston on February 8th are now available online:
The slides can be obtained here:
I was invited to present about the spark-timeseries library last week at the Montreal Apache Spark meetup.
The slides of the presentation are available here.
Coming up next: I will be giving a talk on the same topic at the upcoming Spark Summit in Boston, on February 8th!
I presented at the Montreal Apache Spark meetup on wednesday. The talk was about automatic feature generation, and I went a little deeper into Amdahl's Law and parallelism issues (related to a previous post about Amdahl's law). In particular we looked at MLlib's LASSO regression and its degree of parallelism.
The slides are available here.
I assisted to the highly successful Montreal FinTech 2015 conference (with over 500 participants who attended) as an exhibitor, in order to promote the automated statistical induction tool Nectarine that I'm working on.
I was surprised to see how often the words Apache, Spark, Hadoop, and even Flink came up. There were about 21 exibhitors, of which only a handful were software vendors, and yet my software wasn't the only one built on top of Hadoop and Spark (in fact I'm not sure if any of the software products being presented were not built on top of these technologies!).
It's good news to see that these technologies are exploding even outside of Big Data & Data Science communities.
In the picture I'm presenting a demo of Nectarine to the Ticksmith dev team, who have built very successful Big Data Fintech software on top of Hadoop & Spark.