Skip to content

Jupyter

Visual of Jupiter's Moons

Overview of the Julia-Python-R Universe

A side-by-side review of the main open source ecosystems supporting the Data Science domain: Julia, Python, R, sometimes abbreviated as Jupyter.

Motivation

A large component of Quantitative Risk Management relies on data processing and quantitative tools (aka Data Science. In recent years open source software targeting Data Science finds increased adoption in diverse applications. The Overview of the Julia-Python-R Universe project is a side by side comparison of a wide range of aspects of Python, Julia and R language ecosystems.

The comparison of the three ecosystems aims:

  • To be useful for people that are somewhat familiar with programming and want to inspect options and use the most appropriate tool
  • To promote interoperability, cross-validation and overall best-practices
  • To be factual as much as possible without drifting to judgement / opinions
  • To cover use cases relevant for the implementation of quantitative risk models

The comparison does not aim:

  • To be a detailed / comprehensive catalog of all available libraries (which count to many thousands!)
  • To cover use cases very removed from quantitative risk models
  • To be totally exhaustive (e.g., to identify all the possible computer systems one can run a Python interpreter on, or count all the possible ways one can perform linear regression in R)

Disclaimers

The comparison does absolutely not provide an assessment of which system is "better". The proper way to use the comparison is to start with one's objectives, knowledge level, use case.

The comparison attempted here is not entirely appropriate as the three systems have quite different origins and architectural design choices. For example, strictly speaking R is not a general programming language. R is a system for statistical computation and graphics. It consists of a sufficiently general language plus a run-time environment with graphics, a debugger, access to certain system functions, and the ability to run programs stored in script files. Yet despite the disclaimer a comparison is justified because in very large domain of applications and use cases the three frameworks can be used interchangeably (or nearly so)

Structure

The comparison data are provided in tabular format in several distinct tables. Each table documents a relevant language or ecosystem subdomain. The number and focus areas of the different table are somewhat arbitrary and may expand in the future. The order is roughly from more generic aspects towards more specialized / advanced areas, concluding with interoperability.

Each table entry (row) highlights key functionality within the subdomain. The language columns point to information or packages and (where applicable) there is commentary. Reference links are included when useful.

At the bottom of some tables there is a row indicated Package Review. This row has a collection of links to the CRAN Task Reviews that aim to summarize the large number of R packages available for some data science tasks. There are also links to a mirror effort to create Python Task Views (this content is still WIP - contributors welcome, see below)

Getting Involved

You can provide simple and anonymous feedback on the wiki version of the overview using the feedback button at the bottom of the page. Alternatively you can become an Open Risk Manual author and actively edit the page. If you are more comfortable using github / markdown, there is a mirror page available here. Please note that the tables are in html format as they are generated automatically.

People interested in developing the Python Task Views can do so via the gitlab repo.

History and Community

The objective of this section is to provide an overall comparison of the history of the two ecosystems, towards answering the question: who is really behind Python, R and Julia?

Aspect

Python

R

Julia

Comment

First Release

1991

1995

2009

Both the Python and R ecosystems have a long history of development and both received a lot of attention in the last few years as open source data science became more widerspread. Julia is relatively more recent

Initial Authors

Guido van Rossum

Ross Ihaka and Robert Gentleman

Jeff Bezanson, Stefan Karpinski, Viral B. Shah, and Alan Edelman

Current Stable Version

3.7

3.5

1.2

Check here for Python, Check here for R, Check here for Julia

Current Governance

Python Software Foundation (Non Profit)

R Foundation (Non Profit)

Julia Governance Overview

Open Source License

PSF License

GNU General Public License

MIT License

Size of Core Contributors

2-90 depending on definition

20

Python Core Team Size is difficult to establish (e.g. full-time / part-time, activity level) and there is no single authoritative source, Similarly for Julia

Size of Broader Developer Communities

Third most popular in number of repositories and number of contributors

Not in Top 10 of community size

Not in Top 10 of community size

Note: R programmers might not necessarily self-identify as developers (but as data scientists, statisticians etc.)

Developer Associations

UK Python Association, pyLadies

R-Ladies

Formally organized associations promoting Python, R or Julia

Important Non-Profit Sponsors

Numfocus

Bioconductor

Numfocus

A number non-profit organizations support these open source ecosystems explicitly or implicitly

Important Corporate Sponsors

Diverse

Diverse

Julia Computing, Inc.

Commercial sponsors may be supporting these ecosystems explicitly or implicitly

Important Conferences

pycon, europython

useR!

Juliacon

Important Journals

The R Journal

Journal of Open Source Software, Papers with Code covering all three systems

IRC Channels

#python

#julia

Reddit

Python subreddit, 428k members

R Stats subreddit, 30k members

Julia subreddit, 8k members

Data Science subreddit (discussing Python, R and Julia topics)

Online Forums and Blogs

Too many

Too many

The Python and R ecosystems have an extensive numbers of blogs, forums etc. (with varying level of quality)

Devices and Operating Systems

This section aims to answer the question: Where (as in what kind of device and operating system) can I use Python, R or Julia? NB: This is not a how-to install Python or R in your system!, just an overview of what is available where.

Aspect

Python

R

Julia

Comment

Linux Desktop

Comes pre-installed

apt-get install r-base

apt-get install julia / Linux installer file

Python is generally pre-installed as it is used by the Linux system itself. Different distributions may include different (potentially very old) versions of the three languages.

Windows

Windows installer

Windows installer

Windows installer

All three languages are available for both Windows 7 and Windows 10 and 32 bit / 64 bit.

MacOS

2.7 version is pre-installed

MacOS installer

MacOS installer file

Raspbian

Pre-installed

apt-get install r-base

apt-get install julia

Linux is the operating system of choice for IoT devices, which means a basic Python installation is generally available

Android / iOS

Via python-for-android

No

No

Python, R or Julia are not readily integrated on mobile devices (see also Deployment entry). Check Termux for an alternative option

iOS

No

No

No

Cloud Servers

As per Linux Desktop above

As per Linux Desktop above

As per Linux Desktop above

Cloud servers typically run the Linux operating system and have Python installations available

Package Management

This section aims to answer the question: How can I extend the Python, R or Julia functionality with existing libraries. The ease of finding and installing packages is a very important aspect of the popularity of both and in marked contrast e.g. to languages like C++

Aspect

Python

R

Julia

Comment

Discovery of Packages

Online Search, Built-in PyCharm access to PyPI

R-Studio Built-in access to CRAN

Julia Docs, Julia Observer

Python packages are released on PyPI, R packages are released on CRAN

Number of Packages (Oct 2019)

199,816

15102

~2496

Check here for the latest count: Python, R, Julia

Online Repositories

PyPI, via linux distributions

CRAN

github, gitlab, bitbucket etc are used for releasing Python, R and Julia for open source packages online, coordination of development and other community support

Package Installation

Done at OS level (PyPI, setup, conda, pip, easy_install, apt)

Built-in install.packages

Built-in Pkg package manager

Python installation methods are quite varied (and have evolved over time) and can be either system wide (e.g. a linux distro package) or user specific

Dependency Management

pip, virtualenv

packrat

Federated package management

virtualenv enables using isolated Python distributions and package collections within the same system. Julia uses project environments

Loading Packages

import statement

library statement

import / using statements

Package Documentation

This section aims to answer the question: How can I document a Python, R or Julia module? The ease and quality of documentation is an important factor in adoption and efficient use of a language as it both helps beginners learn new functionality and experienced users ensure better quality work

Aspect

Python

R

Julia

Comment

Source level documentation

Built-in docstrings

Docstrings

docstrings

Formats

markdown, reStructuredText

markdown, latex

Markdown

R packages in CRAN include References Manuals (PDF, typically from latex)

Documentation Generator

sphinx

roxygen2

Documenter

Online documentation

readthedocs

CRAN, bookdown

Julia Docs

Language Characteristics

This section aims to answer the question: What does code in Python, R or Julia look like from a programming perspective? Many standard aspects of programming languages are available in all three systems so are not included.

Aspect

Python

R

Julia

Comment

Compiled / Interpreted

Interpreted

Interpreted

Compiled Just-in-time (JIT)

Julia code can be executed interactively

Main Implementation Language

C (CPython)

C and Fortran

Julia

This is the language used for the interpretation of a Python or R script. Julia is written in Julia

Other Implementation Languages

Java (Jython), RustPython etc

pqR, Renjin, FastR etc

Many alternative implementations of the underlying interpreter exist for both Python and R. A new approach available for Python and Julia is to compile to Webassembly for native execution in the browser: Python/Pyodide, Julia/Charlotte

Type System

Dynamic (Duck) Typing

Dynamic

Dynamic (Duck) Typing

All three systems have essentially dynamic type systems (in contrast with languages such as C++, Java or Rust)

Primitive Data Types

Numbers (Integers, Float), Strings, Boolean

Numeric, Int, Character, Logical (and the pairlist)

Numbers, Char, Bool

Double precision is standard in all systems. Higher precision is only via libraries. Julia has a native 128 bit integer type.

Native Data Structures

List, Tuple, Dict

List, Vector, Data Frame, Factor

Tuple, Dict, Set, Array, Vector, Matrix and more

Object Oriented

Yes

Yes

Selective

R has a variety of Object Oriented implementations with different design and functionalities, they are denoted S3, S4, R5 and R6 respectively, Julia implements select OO aspects via the Struct composite type

Code Structure

Based on Indentation

Free Style

Free Style

Standard Libraries

Extensive

Built-in Functions

Base

Python has an extensive standard library as it covers a larger CS domain, In contrast R and Julia have a more extensive set of data science oriented features included by default

Building Packages / Extensions

Modules, Via bindings to C/C++

Creating R packages

Julia Packages

See below under HPC for more specific options

Development Environment

This section aims to answer the question: How can I develop and test code / applications written in Python, R or Julia?

Aspect

Python

R

Julia

Comment

Open Source IDE's

spyder, netbeans, eclipse, visual studio code

R Studio, RTVS

Juno

There are many other IDE's or advanced editors (Vim, Emacs etc.) that support programming languages via plugins. The degree of support varies (from syntax highlighting to supporting complete workflows within the IDE/editor)

Commercial IDE's with Community Version

pycharm community / pro, komodo

R Studio

Intellij + Julia Plugin

Here we list closed source IDE's with free, or commercial versions

Notebooks / Literate Programming

Jupyter, pweave

Jupyter, R Markdown, swave, knitr

Jupyter, Weave.jl, Literate.jl

Jupyter stands for Julia-Python-R Language!

Debugger

pdb

various built-in functions (browser, traceback, debug)

Debugger.jl

Testing

tox, pytest, unittest

runit, testthat, assertthat

Base.test

(R testthat is for typical unit tests, R assertthat is to declare the pre and post conditions that code should satisfy)

Package Reviews

Reproducibility Task Views

Reproducible Research

Jupyter is available for all three systems

Files, Databases and Data Manipulation

This section aims to answer the following questions: What direct connectors to files stored on disk or data stored in databases are available for Python, R and Julia? Further, once we have connected to a data source, how can we fetch, store in memory and do preliminary work with the imported data?

Aspect

Python

R

Julia

Comment

Loading Local Files

Builti-in, Pandas

Built-in

Built-in

General file input from local directories is built-in in all systems

CSV Loading

Pandas

Built-in (read.csv), data.table, readr

CSV.jl

XLS/ODF Loading

xlrd, openpyxl

XLConnect, xlsx

OdsIO.jl

Hiearchical Data Formats (HDF)

h5py, pandas.read_hdf

rhdf5

HDF5.jl

URL Requests

requests, PycURL

data.table, rCurl

HTTP.jl

The Julia package is still new and not tested in production systems

Relational Database Connectors

MySQLdb, psycopg2, sqlite3

RODBC / RODBCExt, RMySQL, RPostgresSQL, RSQLite

MySQL.jl, PostgreSQL.jl, SQLite.jl

Graph Databases Connectors

neo4j, pyArango

neo4R

Neo4j.jl

Object Relational Mapping

SQLAlchemy, Django ORM

General Data Wrangling

pandas

Built-in data.table, (dplyr, tidyr, stringr, part of the tidyverse)

DataFrames.jl

The concept of a data frame has been a core aspect of R and pandas has emulated this in Python, DataFrame in Julia

Missing Data

Pandas functionality, sklearn.impute

Amelia and many others

Impute.jl

Advanced datetime handling

dateutil

lubridate

These packages provide datetime specific extensions to built-in functionality

Package Reviews

Databases Task Views

Databases, Missing Data

General Purpose Mathematical Libraries

This section aims to answer the question: What building blocks are available for undertaking basic quantitative (numerical) work in Python, R and Julia respectively? NB: The division of what is core mathematics and what is a specialized domain is a bit arbitrary.

Aspect

Python

R

Julia

Comment

General Purpose vectors and n-dimensional arrays (as storage)

numpy

Built-in array

The R system comes with many basic array functionalities available built-in

Numerical Linear Algebra (matrix operations)

numpy.linalg

Matrix, RcppArmadillo, RcppEigen

Built-in support (LinearAlgebra.Basic), StaticArrays, BandedMatrices, IterativeSolvers

For specialized operations (large / sparse matrices see below in HPC), eigenpy and pybind11 provide alternative means to use C++ numerical linear algebra in Python

Mathematical (Special) Functions such as Gamma, Beta, Bessel

scipy

Built-in functions

SpecialFunctions.jl

| The R system comes with many basic functionalities available built-in

Random Number Generation

Built-in, numpy.random

Built-in functions

Built-in (Random.Random)

This entry is about generic random numbers. More specialized applications mentioned below

Mathematical Optimisation

JuMP

Symbolic Algebra

sympy

Symata

Curve Fitting

scipy.optimize, numpy.polyfit

Built-in

ApproxFun

Package Reviews

Mathematics Task Views

Numerical Mathematics, Optimization

Core Statistics Libraries

This section aims to answer the question: What libraries are available for undertaking standard statistical studies in Python, R or Julia? There is a large number of packages / modules with significant duplication / overlap, especially for the R system, hence only the major / indicative ones are considered.

Aspect

Python

R

Julia

Comment

Exploratory Data Analysis (descriptive statistics, moments, etc)

pandas.describe, pandas profiling, scipy.stats, statsmodels

Base R (stats), car, caret, dplyr

describe(DataFrame)

EDA is quite broad and loosely defined. Here we take a fairly narrow view that remains as much as possible non-parametric and model-agnostic

Correlation

pandas.corr, numpy.corrcoef

Built-in (cor)

Built-in (cor)

ANOVA

scipy.stats, statsmodels

Built-in (aov, anova), car, caret

ANOVA.jl

Linear Regression Analysis

scikit-learn, statsmodels

Built-in

Regression.jl

Generalized Linear Regression

scikit-learn, statsmodels

Built-in glmnet

Regression.jl

This category includes logistic regression (which is available in many R packages), multinomial regression etc.

Survival Analysis

lifelines

survival

Survival.jl

Gaussian Processes

GPy

GauPro, GPfit, kergp, mlegp

GaussianProcesses.jl

Package Reviews

Statistics Task Views

Probability Distributions, Multivariate Statistics, Extreme Value Analysis, Robust Statistical Methods, Survival Analysis

Econometrics / Timeseries Libraries

This section aims to answer the question: What libraries are available for undertaking econometric / timeseries studies in Python, R or Julia?

Aspect

Python

R

Julia

Comment

Basic Econometric Analysis (stationarity, trends, seasonality)

statsmodels.tsa

Built-in (ts)

TimeSeries.jl, Econometrics.jl

ARMA Processes / Univariate Models

statsmodels.tsa, pmdarima

auto, forecast, tseries

ARCHModels.jl

Heteroskedastic (GARCH) processes

statsmodels, arch

tseries, zoo, vars

ARCHModels.jl

Vector Auto Regressions (VAR)

statsmodels.tsa

mts, vars

VectorAutoregressions.jl (WIP)

General Timeseries

pflux, prophet

prophet (R API)

TimeSeries.jl

Frequency Domain Analysis

numpy.fft

Built-in (spectrum)

Package Reviews

Econometrics Task Views

Econometrics, Time Series Analysis

Machine Learning Libraries

This section aims to answer the question: What libraries are available for machine learning projects in Python, R or Julia? The term machine learning is not too specific so we use this category to group various advanced / specialized libraries that are relevant for data science (but not e.g. computer vision and other specialized ML applications). NB: Machine learning algorithms are typically compute intensive and are thus implemented in system languages with eventual binding and API provided to Python or R environments

Aspect

Python

R

Julia

Comment

Network Analysis

networkx

igraph, sna

LightGraphs.jl

Cluster Analysis (Unsupervised Learning)

scikit-learn

cluster

Clustering.jl

K-means and other clustering algorithms

Random Forests

scikit-learn

randomForest, ranger

DecisionTree.jl

Gradient Boosting

scikit-learn

XGBoost Interface

XGBoost.jl Interface

Probabilistic Graphical Models

pgmpy

bnlearn, gRain

PGM.jl

Neural Networks

tensorflow, pytorch, keras, Interface to MXNet

Interface to h2o, Interface to MXNet, Interface to keras

Flux, MLJ, Knet

R studio offers an interface to tensorflow

Package Review

Machine Learning Task Views

Bayesian Inference, Cluster Analysis & Finite Mixture Models, Machine Learning, Graphical Models

GeoSpatial Libraries

This section aims to answer the question: What libraries are available for working with GIS / geospatial data in Python, R or Julia? The geospatial package space is particularly fragmented, the selection focuses on some key anchor concepts.

Aspect

Python

R

Julia

Comment

Geo Data Structures

GeoPandas.GeoSeries, GeoPandas.GeoDataFrame

raster, sp, sf, stars

GDAL

gdal

rgdal

GDAL.jl

GeoJSON

geojson

geojson, rgdal

GeoJSON

PostGIS

geojson

rpostgis

GeoJSON

GeoMaping

CartoPy, Descartes

gmt

GMT

OpenStreetMap

openstreetmap

OpenStreetMap

OpenStreetMap.jl

Spatial Statistics

pysal

gstat, geoR, geoRglm

R has a large number of specialized spatial statistics packages (see Task Views)

Spatial Econometrics

pysal.spreg

Package Review

Geospatial Task Views

Spatial Data, Handling and Analyzing Spatio-Temporal Data

Visualization

This section aims to answer the question: What functionality is available to produce data driven visualization in Python, R or Julia?

Aspect

Python

R

Julia

Comment

Low level API's

matplotlib

grid, gridExtra

Plots.jl

Graph packages

seaborn, plotly, bokeh

ggplot2

Gadfly.jl

Declarative Visualizations

Altair

Vega.jl

XKCD style plots :-)

Available!

Available!

Package Review

Visualization Task Views

Graphic Displays & Visualization

Web, Desktop and Mobile Deployment

This section aims to answer the question: What tools does each language ecosystem provide for the deployment of data based applications, whether this is via the web, desktop or mobile apps.

Aspect

Python

R

Julia

Comment

Native Webservers

Tornado, Gunicorn, CherryPy, Twisted

OpenCPU, plumber

HTTP.jl

As a general remark these native servers are not exposed directly in production but are fronted by e.g. apache httpd and nginx servers

Classic Web Frameworks

Flask, Pyramid, Django

R Shiny, rApache

Genie.jl

Web frameworks typically used behind a production web server (Apache, Nginx etc.)

Web Formats

xml, json (built-in)

XML, rjson, jsonlite

JSON.jl

Web Sockets

websockets

WebSockets.jl

WebSocket connection allows full-duplex communication between a client and server so that either side can push data to the other through an established connection

Client Side (Browser)

Brython, RustPython, Pyodide

Mobile Apps

Kivy, Beeware

Both kivy and beeware allow cross-platform app development.

Package Review

Web Task Views

Model Deployment, Web Technologies

Semantic Web / Semantic Data

This section aims to answer the question: What tools and libraries are available for working with semantic data (RDF, OWL, JSON-LD etc) and other relevant domain specific metadata schemas?

Aspect

Python

R

Julia

Comment

RDF Format

rdflib

rrdf

JSON-LD Format

rdflib.jsonld

JSON-LD is an alternative web-friendly serialization format for RDF

OWL Ontologies

ontospy, owlready2

Querying RDF (SPARQL)

rdflib

Rredland

Serving RDF (SPARQL)

rdflib

SDMX Format

pandasdmx

rsdmx

SDMX is the statistical data and metadata exchange format

Package Review

Semantic Data Task View

High Performance Computing

For our purposes high performance computing (HPC) is any use case that requires more than a single CPU (and its own RAM or disk). This section aims to answer the question: what are my options if I have performance bottlenecks in terms of CPU, memory or disk, hence covering topics such as concurrency or GPU computing. NB: Julia aims to address performance issues through compilation and other design choices

Aspect

Python

R

Julia

Comment

Bindings to C/C++

Cython, pybind11

Rcpp

Cxx.jl

Native Python, R are slow compared to lower level / compiled languages. A common approach to make full use of existing CPU is to extend the language via bindings to a faster language. Bindings might also be useful to re-use existing libraries

Bindings to Java

py4j, pyO3

renjin

JavaCall.jl

Bindings to other performing languages (Rust etc)

pyO3

Coroutines

Built-in (async/await, since Python 3.5)

Built-in (Tasks/Channels)

Multi-threading

Built-in (thread)

foreach

Built-in (Base.Threads) (Experimental)

Multi-core

multiprocessing

doParallel, future

Built-in (Distributed)

Spark interface

pySpark

SparkR, sparklyr

Spark.jl

GPU Computing

pyCUDA

gpuR

CUDAnative.jl

GPU interfaces are offered also via some ML packages (e.g pytorch, tensorflow, MXnet.jl)

Distributed Data

dask

multidplyr

JuliaDB.jl

Package Review

HPC Task Views

High-Performance and Parallel Computing

Using R, Python and Julia together

The section aims to answer the question: How can I use R from Python, Python from Julia, Julia from R and vice versa :-). The first rows of this table have the From/To Format (From X Call Y) for native integration between the three systems, where "Native" means that the integration is done using language bindings within the respective interpreters / REPL (not explicitly using the operating system or a server API)

Aspect

Call Python

Call R

Call Julia

Comment

From Python

rpy2

pyjulia

From R

PythonInR, rPython

XRJulia

From Julia

PyCall.jl

RCall.jl

Python/R Cross-Development and Integration

r4intellij, rpy2

reticulate

Via Server API's

Rserve

Via OS / Shell Scripts

Built-in (subprocess)

Built-in (system2)

Built-in (Base.run)