Setup python environment

※ This post is mainly just a summary/translation of the Japanese blog,

TL;DR; I recommend to install “anaconda” instead of using “official python package”.

If you just want to proceed environment setup, jump to “Environment setup for each OS”. At first, I will explain little bit about the background knowledge of python & anaconda. 

Python version

Python version 2 and 3 are distributed, current latest version is python 2.7 and python 3.6.

Several years ago, it is said that “some library is still not compatible with python 3.x, and thus python 2.x is recommended”. However, now most of the popular library works well with python 3.x.

Here, I recommend to install python 3.x as a default environment, and switch to python 2.7 if necessary using conda‘s virtual environment functionality.

Problems for python development setup

When you use pure system python, you will face following problems. These problems can be solved with anaconda!

  • Version control: Change python version depending on the project.
    – Depending on the library you may need to change python2/python3 environment.
    – When you want to run other person’s code, sometimes it is written in python2 and sometimes in python3.
    → conda create command to create another environment with specific python version.
  • Development environment management
    – You might want to use developed branch/specific version of library only for specific project. You need to prepare multiple development environment to control python library version.
    → conda create command to create another environment.
  • pip install fails with some library for compilation depends on OS: 
    – Especially this happens for Windows users. Some library is only distributed for Linux user and compilation fails when installing with pip command.
    → Try ‘conda install library-name' to install library.
  • python 2.x is pre-installed to system on Linux/Mac
    – How to use python 3.x without conflicting with system python 2.x.
    → Use pyenv to avoid conflict with system python.

What is Anaconda?

Anaconda is one of the python distribution package, which includes popular libraries from default (numpy, scipy, pandas, ipython, jupyter, scikit-learn etc…).

  • There is also miniconda, which includes minimum package. 

Python version: Both python 2.x and python 3.x version are distributed. 

OS support: Linux, Max, Windows version are available, supports both 32 bit & 64 bit.

License: 

Anaconda is BSD licensed which gives you permission to use Anaconda commercially and for redistribution.
from https://www.continuum.io/downloads#_windows

What is conda?

When you install anaconda package, you can use conda .

Package management: conda is package management tool, which can be considered as an alternative for pip.

  • It supports over 400 packages
  • pip tries to compile the package in client environment, and the compile sometimes fails depending on your environment (OS, library etc).
  • conda provides pre-compiled package, and it reduces install failure case.
  • It does not interfere with pip command, you can still use pip if the package is not included in conda

Version controlconda supports python version control, as an alternative for pyenv

For example, you can create python 2.7 virtual environment named ‘py27’ by

conda create -n py27 python=2.7

To enter this virtual environment,

source activate py27

Virtual environment management: as an altenative for virtualenv/venv

Environment setup for each OS

Windows

Python is not pre-installed on windows, you can just install anaconda.

Setup

1. Install anaconda

You can download installer from official anaconda download site. Follow instruction of exe installer, it also manages to add system PATH environment for the convenience.

I recommend to install latest version (anaconda 4.4.0, python 3.6 at the time of writing 2917/6/28), you may create another python version (e.g. python 2.7) virtual environment easily after the installation.

2. Check installation (you may skip this)

Launch command line (Press Windows key, type ‘cmd’ and enter).

C:\Users\corochann>python --versionPython 3.5.2 :: Anaconda custom (64-bit) C:\Users\corochann>pip --versionpip 9.0.1 from c:\program files\anaconda3\lib\site-packages (python 3.5) C:\Users\corochann>conda --versionconda 4.3.21

Linux

Python is pre-installed on system, and it is usually python 2.7. You need to configure to use installed anaconda.

However, if you only install anaconda, it also installs curl, sqlite, openssl and override additional commands, which might cause conflict with existing environment.

Recommended way is to install anaconda on top of pyenv.

python environment architecture. After the environment setup, user1 can use anaconda3 (python 3) or virtual env py27 (python 2.7), which is independent from pre-installed system python (/usr/bin/python).

See this figure, assuming you are user1.

  1. As default, you can use anaconda3 which is python 3.x.
  2. If you create virtual env (ex, ‘py27’), you can use python 2.7 as well.
  3. It is user-dependent configuration, and does not affect to other user.
    If other user (user2, user3) did not setup, they will use system python.

This configuration has another advantage that your configuration does not affect to other user, it is good for construct work environment in shared server.

Setup

1. Install pyenv

Execute below commands in terminal,

$ git clone https://github.com/yyuu/pyenv.git ~/.pyenv
$ echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bashrc
$ echo 'export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bashrc
$ echo 'eval "$(pyenv init -)"' >> ~/.bashrc
$ source ~/.bashrc

First line clones the package
2nd – 4th line will add necessary environment setup command to .bashrc
Last line will initialize system with modified .bashrc

2. Install anaconda: you can install either python 3.x package or python 2.x package.

I think it is ok to install python 3.x version as a default, and you don’t need to install both because python version 2/3 can be switched with conda

# Check latest version, anaconda3-4.2.0 (anaconda2-4.2.0 for python 2.7)
$ pyenv install -l | grep ana

# Install anaconda, and configure to set anaconda as default python
$ pyenv install anaconda3-4.2.0
$ pyenv rehash
$ pyenv global anaconda3-4.2.0

# set PATH to avoid `activate` command conflict between pyenv and anaconda (use anaconda's activate)
$ echo 'export PATH="$PYENV_ROOT/versions/anaconda3-4.2.0/bin/:$PATH"' >> ~/.bashrc
$ source ~/.bashrc

# update conda itself
$ conda update conda

[Note] If you prefer, you may install miniconda instead of anaconda with the similar procedure.

3. Check installation (you may skip this)

$ python --version
Python 3.5.2 :: Anaconda custom (64-bit)
$ pip --version
pip 9.0.1 from /home/corochann/.pyenv/versions/anaconda3-4.2.0/lib/python3.5/site-packages (python 3.5)
$ conda --version
conda 4.3.22


If python and pip uses anaconda’s command under user’s directory, installation is ok. If it looks system python (python 2.7), installation is not successful.

Mac

I don’t have Mac, sorry. But the basic procedure is same with Linux except that pyenv installation is via homebrew.

conda basic usage

virtual environment

  • Create virtual env

conda create -n <environment-name> python=<version> <install libraries with space separated>

$ conda create -n py27 python=2.7 numpy scipy pandas jupyter

# "anaconda" option indicates install popular modules in package
$ conda create -n anaconda2 python=2.7 anaconda
  • Check virtual env
conda env list

# or
conda info -e
  • Switch virtual env
# Enter virtual env
# `activate py27` for Windows
source activate py27

# Exit virtual env
# `deactivate` for windows
source deactivate
  • Delete virtual env
conda remove -n py27 --all

package management

  • Install/uninstall package
# install
conda install numpy scipy  # specify multiple libraries, like pip
conda install numpy=1.10.4 # specify version
conda install -n py2 numpy scipy # -n option to specify virtual env

conda update numpy # update

pip install numpy  # pip can be used as well. Use it when the library is not in conda
source activate py2;pip install numpy # Install library in virtual env, install it after `activate`

# uninstall
conda uninstall -n py2 numpy 
  • Check package
# Show current installed package list
conda list

# -n option to specify virtual env
conda list -n py27

# Export list and use it in another environment
# However, package installed with `pip` cannot be exported.
# use `pip freeze` to output the list of package installed with pip.
conda list --export > env.txt
conda create -n py27_copy --file env.txt
  • Search package in anaconda cloud

Even the package is not distributed by official anaconda, other third-party may be uploaded to anaconda cloud (anaconda.org).

It is useful to check the package is distributed under anaconda cloud, and install it. 

To search,

anaconda search -t conda <package-name-to-search>

And once found, to install third party library,

conda install -c <USER> <PACKAGE>

here <USER> means third party’s name, and <PACKAGE> means the package name to install.

Below example shows how to install rdkit package, which is not distributed under anaconda but by rdkit.

anaconda search -t conda rdkit
Using Anaconda API: https://api.anaconda.org
Run 'anaconda show <USER/PACKAGE>' to get more details:
Packages:
     Name                      |  Version | Package Types   | Platforms
     ------------------------- |   ------ | --------------- | ---------------
     Clyde_Fare/rdkit          | 2015.09.2 | conda           | win-64
     Guillopflaume/rdkit       | 2014.09.1 | conda           | linux-64
     Guillopflaume/rdkit-postgresql | 2014.09.1 | conda           | linux-64
     RMG/rdkit                 | 2016.03.4 | conda           | linux-64, win-32, win-64, linux-32, osx-64
                                          : Open-Source Cheminformatics
     aschreyer/rdkit           | 2015.03.1 | conda           | osx-64
                                          : RDKit is an open source toolkit for cheminformatics.
     bioconda/rdkit            | 2016.03.3 | conda           | linux-64
                                          : Open-Source Cheminformatics Software
     connie/rdkit              | 2015.09.2 | conda           | linux-32
     eleonora1990/rdkit        | 2014.09.2 | conda           | linux-64
     eleonora1990/rdkit-postgresql | 2014.09.2 | conda           | linux-64
     greglandrum/rdkit         | 2017.03.1 | conda           | linux-64, win-32, win-64, osx-64
     greglandrum/rdkit-postgresql95 | 2016.03.4 | conda           | osx-64
     grizzly41/rdkit           | 2016.09.1.dev20160806 | conda           | osx-64
     jeprescottroy/rdkit       | 2016.03.3 | conda           | osx-64
     jochym/rdkit              | 2015_03_1 | conda           | linux-64
     karlleswing/rdkit         | 2016.09.2 | conda           | linux-64
     mforsythe/rdkit           | 2014.03.1 | conda           | osx-64
     mgbarnes/rdkit            | 2016.03.1 | conda           | osx-64
     mobleylab/rdkit           | 2016.03.1 | conda           | linux-64, osx-64
     mpharrigan/rdkit          |          | conda           | linux-64
     mwojcikowski/rdkit        | 2016.03.1 | conda           | linux-64
     nickvandewiele/rdkit      | 2015.09.2 | conda           | linux-64, win-32, osx-64, linux-32, win-64
                                          : Open-Source Cheminformatics
     nividic/rdkit             | 2016.03.1 | conda           | linux-64, osx-64
                                          : Cheminformatics Molecule Framework
     olexandr/rdkit            | 2016.03.1 | conda           | linux-64
     omnia/rdkit               | 2015.09.1 | conda           | linux-64, osx-64
                                          : Open-Source Cheminformatics Software
     <strong>rdkit/rdkit               | 2017.03.2 | conda           | linux-64, win-32, win-64, linux-32, osx-64</strong>
     rdkit/rdkit-postgresql    | 2016.03.4 | conda           | linux-64
                                          : RDKit cartridge for PostgreSQL
     rdkit/rdkit-postgresql95  | 2016.09.4 | conda           | linux-64
                                          : RDKit cartridge for PostgreSQL v9.5
     richlewis/rdkit           | 2016.03.1 | conda           | linux-64, win-64, linux-32, osx-64
     rmcgibbo/rdkit            | 2014.03.1 | conda           | linux-64
                                          : Open source toolkit for cheminformatics
     rmcgibbo/rdkit-utils      |      0.1 | conda           | linux-64
                                          : Utilities for working with the RDKit
     skearnes/rdkit            | 2014.03.1 | conda           | linux-64
                                          : Vanilla RDKit build without Avalon or InChI support.
     thibaudfreyd/rdkit        | 2016.03.1 | conda           | osx-64
     twz915/rdkit              | 2016.03.1 | conda           | osx-64
     zero323/rdkit             | 2015.09.2 | conda           | linux-64
Found 34 packages

rdkit is distributed by many third party. Here, let’s install rdkit/rdkit.

conda install -c rdkit rdkit

Appendix: install R with conda

R is also a popular language for data science community. Not only python, but R can be installed with conda.

conda create -n r -c r r-irkernel

Next: Install Chainer

Leave a Comment

Your email address will not be published. Required fields are marked *