Python Setup

IPython Notebook

Me?

  • Ben Zaitlen
  • @quasiben
  • Developer and Data Scientist at Continuum Analytics

Anaconda

Completely free Python distribution for large-scale data processing, predictive analytics, and scientific computing:

  • 130+ of the most popular Python packages for science, math, engineering, data analysis
  • Completely free - including for commercial use and even redistribution
  • Cross platform on Linux, Windows, Mac
  • Zipped Windows executable files for those behind firewalls

Miniconda available for small footprint installs -- contains conda and Python

Download at http://continuum.io

Important Pkgs:

  • NumPy
  • SciPy
  • Pandas
  • Numba
  • PyODBC
  • SQL
  • Matplotlib
  • Bokeh
  • llvm
  • curl
  • Spyder
  • ipython
  • ipython notebook
  • ...

The official conda repository contains 200+ pkgs

Installing Anaconda

$ bash Anaconda-1.x.x-Linux-x86[_64].sh

click on downloaded dmg/exe and follow instructions

Anaconda installs in a single directory and does not overwrite pre-existing Python environments.

With anaconda installed you are now ready to tackle the problem at hand...

Launch an ipython notebook

$ ipython notebook
#or
C:/>ipython notebook

Advanced Features

  • Environment managment
  • Pkg management
  • Pkg building

Important distinction:

  • Anaconda is really just a meta-package of pre-defined libraries, modules, and binaries

    • A flavor of Python (like a flavor of Linux)
    • Anaconda 1.9.1 (the latest) is the current release and includes latest updates to NumPy, Scipy, etc.
  • Conda is a cross-platform, Python-agnostic binary package manager.

The Problem

  • Packaging is extremely important!
    • Developers know the pain
    • Users can and should be oblivously to the problem
  • Binaries are hard to produce
  • Binaries are hard to produce for all platforms
  • Binaries are hard to produce for all platforms for multiple versions
NumPy Dependency Stack

System Packaging

  • Windows
    • ?
  • OSX
    • macports
    • homebrew
  • Linux
    • yum
    • apt-get/aptitude

Again, Conda is cross-platform.

The Python Problem

  • Python 3.x/Py3k breaks backwards compatibilty with Python 2.7
  • PEP 404: "There never will be an official Python 2.8 release. It is an ex-release. Python 2.7 is the end of the Python 2 line of development."
  • How to install multiple Python versions?

CONDA!

Conda Create

  • Conda solves dependency issues for you automatically
  • Creates a new environment with packages or metapackages defined
  • Environments are entirely self-contained Python runtime layouts

Anaconda with Python 3.3

    $ conda create -n py3k python=3.3 anaconda=1.9
    $ source activate py3k
    #on windows
    C:> activate py3k

Minimalist Environment

    $ conda create -n tinyp27 python=2.7
    $ source activate tinyp27
    #on windows
    C:> activate tinyp27

Conda Build/Binstar

  • Nice recipe defintions
  • Designed for cross-platform build
  • Hosted Binaries and CI with Binstar
  • For another time...

IPython Notebook: An Overview

An Overview

  • What is it?
  • Basic Usage
  • Magics and Bangs
  • Multi-lingual Support
  • Hosted IPython Notebooks
  • IPCluster
  • Diabetes and Data

"The IPython Notebook is a web-based interactive computational environment where you can combine code execution, text, mathematics, plots and rich media into a single document"

  • More than an IDE
  • Programmers and people who program
  • Integrated visualization and processing
  • Storying telling with Data

Starting IPython/IPython Notebook

Extremely Easy Method:

  • Download Anaconda
  • $ipython or $ipython notebook

Can also be installed via pip and building from master

The Cell

In []:

Cell Types

  • Code
  • Markdown
  • Raw
  • Heading
In [1]:
1+1
Out[1]:
2
In [2]:
%pylab inline 
#(import numpy as np and matplotlib)
plt.xkcd()
x = np.arange(0,2*np.pi,.01)
plt.plot(x,np.sin(x))
Populating the interactive namespace from numpy and matplotlib

Out[2]:
[<matplotlib.lines.Line2D at 0x1092dd410>]

Execute a cell with:

  • shift+enter
  • ctrl+enter

Cells Run arbitrary Python Code...

And HTML

In [3]:
from IPython.display import HTML
s = """<marquee>PyData EMC</marquee>"""
h = HTML(s); h
Out[3]:
PyData EMC
In [4]:
%%HTML
<button type="button" id="loading-example-btn" data-loading-text="Loading..." class="btn btn-primary">
  Loading state
</button>

And JS

In [5]:
%%javascript
console.log('hello world');
<IPython.core.display.Javascript at 0x105f51810>
  • bootstrap
  • jquery
  • codemirror
  • and a few other goodies

Bokeh Plotting Library

  • New plotting library for interactive visualization
  • Plots embed Javascript (BokehJS) and some CSS
  • Callable through Python!
In [6]:
import bokeh.plotting as bplt
bplt.output_notebook()
bplt.figure()
bplt.line(x, sin(x), color="red")
bplt.show()
Bokeh Plot

Configuring embedded BokehJS mode.

Bokeh Plot
Plots

Or A full page on an external site!

In [7]:
from IPython.display import HTML
HTML('<iframe src=http://fiddle.jshell.net/bokeh/K8P4P/show/ width=600 height=700></iframe>')
Out[7]:

Inline Audio and Video

In [8]:
import numpy as np
from IPython.display import Audio
framerate = 44100
t = np.linspace(0,5,framerate*5)
data = np.sin(2*np.pi*220*t**2)
Audio(data,rate=framerate)
Out[8]:
In [9]:
from IPython.display import YouTubeVideo
YouTubeVideo('xe_ATRmw0KM')
Out[9]:

Latex for Scientific expression

%%latex \begin{align} \frac{\partial u}{\partial t} + \nabla \cdot \left( \boldsymbol{v} u - D\nabla u \right) = f \end{align}

  • from IPython.display import Latex
  • from IPython.display import Math

Line and Cell Magics

  • Only found in IPython
  • Arbitrary manipulation of input
  • Line Magic % extends to the end of the line (ipython notebook/prompt)
  • Cell Magic %% extends to the end of the cell (ipython notebook)

Favorites:

- %lsmagic
- %timeit/%%timeit
- %%file
- %%bash
- %%cythonmagic
In [10]:
%timeit np.linalg.eigvals(np.random.rand(100,100))
100 loops, best of 3: 11.2 ms per loop

In [11]:
%%timeit
a = np.random.rand(100, 100)
np.linalg.eigvals(a)
100 loops, best of 3: 11 ms per loop

In [12]:
%%bash
echo "HELLO WORLD!"
HELLO WORLD!

In [13]:
%%file zenofpython.py
'''this is a brand new file'''
import this 
Overwriting zenofpython.py

In [14]:
import zenofpython
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

In [15]:
%load_ext cythonmagic
In [16]:
%%cython --annotate
def slow_f(n):
    x = 100.
    for i in range(n):
        x+=n
    return x

def fast_f(int n):
    cdef double x=100.
    cdef int i
    for i in range(n):
        x+=n
    return x
Out[16]:

Generated by Cython 0.19.1 on Fri Feb 7 22:22:24 2014

 1: def slow_f(n):
 2:     x = 100.
 3:     for i in range(n):
 4:         x+=n
 5:     return x
 6: 
 7: def fast_f(int n):
 8:     cdef double x=100.
 9:     cdef int i
 10:     for i in range(n):
 11:         x+=n
 12:     return x

Bang Bang !!

I've heard Bang or Shriek --> !

  • ! runs arbitrary unix commands
    • On windows if you have cygwin installed you should be fine
In [17]:
!ls
CGM_DEMO.ipynb              LICENSE                     myfile.txt
IPythonOverview.ipynb       README.md                   zenofpython.py
IPythonOverview.slides.html ca_website.png              zenofpython.pyc

In [18]:
!cat CGM_DEMO.ipynb | sort | uniq -c | sort -r | head 
  41     {
  41      "metadata": {},
  40     },
  21      ]
  21      "source": [
  21      "cell_type": "markdown",
  20      ],
  20      "outputs": []
  20      "language": "python",
  20      "input": [

Under the covers is where the magic is

In [19]:
%%ruby
puts "Hello from Ruby #{RUBY_VERSION}"
Hello from Ruby 1.8.7

In [20]:
%load_ext rmagic
##requires rpy2 and rtools and R 
X = np.array([0,1,2,3,4])
Y = np.array([3,5,4,6,7])
In [22]:
%%R -i X,Y -o XYcoef;
XYlm = lm(Y~X);
XYcoef = coef(XYlm);
#print(summary(XYlm))
par(mfrow=c(2,2))
plot(XYlm)
In [23]:
HTML('<iframe src=http://nbviewer.ipython.org/github/creswick/ihaskell-notebook/blob/master/examples/iHaskell%20Examples.ipynb  width=1024 height = 500></iframe>')
Out[23]:

Hosted Notebooks

  • Great for collaboration
  • Move code to data
  • Options:
    • Any cloud provider's linux vm + Anaconda -> your own notebook
    • Hassle-free: Wakari.io

IPCluster

  • Builtin parallel and distributing computing framework
  • Enables all types of parallel applications to be developed
    • Task Queues
    • Data Parallelism
    • MPI Message Passing
    • ...
  • Not MapReduce (Data should remain static)

  • Starts With: $ipcluster start -n 4

In [25]:
from IPython.parallel import Client
c = Client()
c.ids
Out[25]:
[0, 1, 2, 3]
In [26]:
c[:].apply_sync(lambda : "Hello, World")
Out[26]:
['Hello, World', 'Hello, World', 'Hello, World', 'Hello, World']

Great For Academics and Enterprise

Credits

  • IPython Team:

    • Fernando Perez, Brian Granger and Min Ragan-Kelley, and many many others...
    • IPython
    • IPython Wiki
  • Damián Avila authored the Reveal.js nbconvert utility

    • $ipython nbconvert IPythonOverview.ipynb --to slides
    • $ipython nbconvert IPythonOverview.ipynb --to slides --post serve

Continuum Analytics:

Rethinking how data is stored, computed, and visualized.

continuum.io

In [27]:
from IPython.display import Image
Image(filename='ca_website.png',width=800,height=700)
Out[27]:
In []: