JupyterData
History and Community
The objective of this section is to provide an overall comparison of the history of the two ecosystems, towards answering the question: who is really behind Python, R and Julia?
Aspect |
Python |
R |
Julia |
Comment |
---|---|---|---|---|
First Release |
1991 |
1995 |
2009 |
Both the Python and R ecosystems have a long history of development and both received a lot of attention in the last few years as open source data science became more widerspread. Julia is relatively more recent |
Initial Authors |
Guido van Rossum |
Ross Ihaka and Robert Gentleman |
Jeff Bezanson, Stefan Karpinski, Viral B. Shah, and Alan Edelman |
|
Current Stable Version |
3.7 |
3.5 |
1.2 |
Check here for Python, Check here for R, Check here for Julia |
Current Governance |
Python Software Foundation (Non Profit) |
R Foundation (Non Profit) |
||
Open Source License |
||||
Size of Core Contributors |
2-90 depending on definition |
Python Core Team Size is difficult to establish (e.g. full-time / part-time, activity level) and there is no single authoritative source, Similarly for Julia |
||
Size of Broader Developer Communities |
Third most popular in number of repositories and number of contributors |
Not in Top 10 of community size |
Not in Top 10 of community size |
Note: R programmers might not necessarily self-identify as developers (but as data scientists, statisticians etc.) |
Developer Associations |
Formally organized associations promoting Python, R or Julia |
|||
Important Non-Profit Sponsors |
A number non-profit organizations support these open source ecosystems explicitly or implicitly |
|||
Important Corporate Sponsors |
Diverse |
Diverse |
Commercial sponsors may be supporting these ecosystems explicitly or implicitly |
|
Important Conferences |
||||
Important Journals |
Journal of Open Source Software, Papers with Code covering all three systems |
|||
IRC Channels |
#python |
#julia |
||
Data Science subreddit (discussing Python, R and Julia topics) |
||||
Online Forums and Blogs |
Too many |
Too many |
The Python and R ecosystems have an extensive numbers of blogs, forums etc. (with varying level of quality) |
Devices and Operating Systems
This section aims to answer the question: Where (as in what kind of device and operating system) can I use Python, R or Julia? NB: This is not a how-to install Python or R in your system!, just an overview of what is available where.
Aspect |
Python |
R |
Julia |
Comment |
---|---|---|---|---|
Linux Desktop |
Comes pre-installed |
apt-get install r-base |
apt-get install julia / Linux installer file |
Python is generally pre-installed as it is used by the Linux system itself. Different distributions may include different (potentially very old) versions of the three languages. |
Windows |
All three languages are available for both Windows 7 and Windows 10 and 32 bit / 64 bit. |
|||
MacOS |
2.7 version is pre-installed |
MacOS installer |
MacOS installer file |
|
Raspbian |
Pre-installed |
apt-get install r-base |
apt-get install julia |
Linux is the operating system of choice for IoT devices, which means a basic Python installation is generally available |
Android / iOS |
No |
No |
Python, R or Julia are not readily integrated on mobile devices (see also Deployment entry). Check Termux for an alternative option |
|
iOS |
No |
No |
No |
|
Cloud Servers |
As per Linux Desktop above |
As per Linux Desktop above |
As per Linux Desktop above |
Cloud servers typically run the Linux operating system and have Python installations available |
Package Management
This section aims to answer the question: How can I extend the Python, R or Julia functionality with existing libraries. The ease of finding and installing packages is a very important aspect of the popularity of both and in marked contrast e.g. to languages like C++
Aspect |
Python |
R |
Julia |
Comment |
---|---|---|---|---|
Discovery of Packages |
Online Search, Built-in PyCharm access to PyPI |
R-Studio Built-in access to CRAN |
Python packages are released on PyPI, R packages are released on CRAN |
|
Number of Packages (Oct 2019) |
199,816 |
15102 |
~2496 |
|
Online Repositories |
PyPI, via linux distributions |
CRAN |
github, gitlab, bitbucket etc are used for releasing Python, R and Julia for open source packages online, coordination of development and other community support |
|
Package Installation |
Done at OS level (PyPI, setup, conda, pip, easy_install, apt) |
Built-in install.packages |
Built-in Pkg package manager |
Python installation methods are quite varied (and have evolved over time) and can be either system wide (e.g. a linux distro package) or user specific |
Dependency Management |
pip, virtualenv |
virtualenv enables using isolated Python distributions and package collections within the same system. Julia uses project environments |
||
Loading Packages |
import statement |
library statement |
import / using statements |
Package Documentation
This section aims to answer the question: How can I document a Python, R or Julia module? The ease and quality of documentation is an important factor in adoption and efficient use of a language as it both helps beginners learn new functionality and experienced users ensure better quality work
Aspect |
Python |
R |
Julia |
Comment |
---|---|---|---|---|
Source level documentation |
Built-in docstrings |
Docstrings |
||
Formats |
markdown, latex |
Markdown |
R packages in CRAN include References Manuals (PDF, typically from latex) |
|
Documentation Generator |
||||
Online documentation |
CRAN, bookdown |
Language Characteristics
This section aims to answer the question: What does code in Python, R or Julia look like from a programming perspective? Many standard aspects of programming languages are available in all three systems so are not included.
Aspect |
Python |
R |
Julia |
Comment |
---|---|---|---|---|
Compiled / Interpreted |
Interpreted |
Interpreted |
Compiled Just-in-time (JIT) |
Julia code can be executed interactively |
Main Implementation Language |
C (CPython) |
C and Fortran |
Julia |
This is the language used for the interpretation of a Python or R script. Julia is written in Julia |
Other Implementation Languages |
Java (Jython), RustPython etc |
Many alternative implementations of the underlying interpreter exist for both Python and R. A new approach available for Python and Julia is to compile to Webassembly for native execution in the browser: Python/Pyodide, Julia/Charlotte |
||
Type System |
Dynamic (Duck) Typing |
Dynamic |
Dynamic (Duck) Typing |
All three systems have essentially dynamic type systems (in contrast with languages such as C++, Java or Rust) |
Primitive Data Types |
Numbers (Integers, Float), Strings, Boolean |
Numeric, Int, Character, Logical (and the pairlist) |
Numbers, Char, Bool |
Double precision is standard in all systems. Higher precision is only via libraries. Julia has a native 128 bit integer type. |
Native Data Structures |
List, Tuple, Dict |
List, Vector, Data Frame, Factor |
Tuple, Dict, Set, Array, Vector, Matrix and more |
|
Object Oriented |
Yes |
Yes |
Selective |
R has a variety of Object Oriented implementations with different design and functionalities, they are denoted S3, S4, R5 and R6 respectively, Julia implements select OO aspects via the Struct composite type |
Code Structure |
Based on Indentation |
Free Style |
Free Style |
|
Standard Libraries |
Extensive |
Built-in Functions |
Base |
Python has an extensive standard library as it covers a larger CS domain, In contrast R and Julia have a more extensive set of data science oriented features included by default |
Building Packages / Extensions |
See below under HPC for more specific options |
Development Environment
This section aims to answer the question: How can I develop and test code / applications written in Python, R or Julia?
Aspect |
Python |
R |
Julia |
Comment |
---|---|---|---|---|
Open Source IDE's |
There are many other IDE's or advanced editors (Vim, Emacs etc.) that support programming languages via plugins. The degree of support varies (from syntax highlighting to supporting complete workflows within the IDE/editor) |
|||
Commercial IDE's with Community Version |
pycharm community / pro, komodo |
R Studio |
Intellij + Julia Plugin |
Here we list closed source IDE's with free, or commercial versions |
Notebooks / Literate Programming |
Jupyter, pweave |
Jupyter, R Markdown, swave, knitr |
Jupyter stands for Julia-Python-R Language! |
|
Debugger |
various built-in functions (browser, traceback, debug) |
|||
Testing |
Base.test |
(R testthat is for typical unit tests, R assertthat is to declare the pre and post conditions that code should satisfy) |
||
Package Reviews |
Jupyter is available for all three systems |
Files, Databases and Data Manipulation
This section aims to answer the following questions: What direct connectors to files stored on disk or data stored in databases are available for Python, R and Julia? Further, once we have connected to a data source, how can we fetch, store in memory and do preliminary work with the imported data?
Aspect |
Python |
R |
Julia |
Comment |
---|---|---|---|---|
Loading Local Files |
Builti-in, Pandas |
Built-in |
Built-in |
General file input from local directories is built-in in all systems |
CSV Loading |
Pandas |
Built-in (read.csv), data.table, readr |
||
XLS/ODF Loading |
||||
Hiearchical Data Formats (HDF) |
h5py, pandas.read_hdf |
|||
URL Requests |
data.table, rCurl |
The Julia package is still new and not tested in production systems |
||
Relational Database Connectors |
||||
Graph Databases Connectors |
||||
Object Relational Mapping |
||||
General Data Wrangling |
Built-in data.table, (dplyr, tidyr, stringr, part of the tidyverse) |
The concept of a data frame has been a core aspect of R and pandas has emulated this in Python, DataFrame in Julia |
||
Missing Data |
Pandas functionality, sklearn.impute |
Amelia and many others |
||
Advanced datetime handling |
These packages provide datetime specific extensions to built-in functionality |
|||
Package Reviews |
General Purpose Mathematical Libraries
This section aims to answer the question: What building blocks are available for undertaking basic quantitative (numerical) work in Python, R and Julia respectively? NB: The division of what is core mathematics and what is a specialized domain is a bit arbitrary.
Aspect |
Python |
R |
Julia |
Comment |
---|---|---|---|---|
General Purpose vectors and n-dimensional arrays (as storage) |
Built-in array |
The R system comes with many basic array functionalities available built-in |
||
Numerical Linear Algebra (matrix operations) |
numpy.linalg |
Built-in support (LinearAlgebra.Basic), StaticArrays, BandedMatrices, IterativeSolvers |
For specialized operations (large / sparse matrices see below in HPC), eigenpy and pybind11 provide alternative means to use C++ numerical linear algebra in Python |
|
Mathematical (Special) Functions such as Gamma, Beta, Bessel |
Built-in functions |
SpecialFunctions.jl |
| The R system comes with many basic functionalities available built-in |
|
Random Number Generation |
Built-in, numpy.random |
Built-in functions |
Built-in (Random.Random) |
This entry is about generic random numbers. More specialized applications mentioned below |
Mathematical Optimisation |
JuMP |
|||
Symbolic Algebra |
Symata |
|||
Curve Fitting |
scipy.optimize, numpy.polyfit |
Built-in |
ApproxFun |
|
Package Reviews |
Core Statistics Libraries
This section aims to answer the question: What libraries are available for undertaking standard statistical studies in Python, R or Julia? There is a large number of packages / modules with significant duplication / overlap, especially for the R system, hence only the major / indicative ones are considered.
Aspect |
Python |
R |
Julia |
Comment |
---|---|---|---|---|
Exploratory Data Analysis (descriptive statistics, moments, etc) |
pandas.describe, pandas profiling, scipy.stats, statsmodels |
describe(DataFrame) |
EDA is quite broad and loosely defined. Here we take a fairly narrow view that remains as much as possible non-parametric and model-agnostic |
|
Correlation |
pandas.corr, numpy.corrcoef |
Built-in (cor) |
Built-in (cor) |
|
ANOVA |
scipy.stats, statsmodels |
Built-in (aov, anova), car, caret |
||
Linear Regression Analysis |
scikit-learn, statsmodels |
Built-in |
||
Generalized Linear Regression |
scikit-learn, statsmodels |
Built-in glmnet |
Regression.jl |
This category includes logistic regression (which is available in many R packages), multinomial regression etc. |
Survival Analysis |
||||
Gaussian Processes |
GauPro, GPfit, kergp, mlegp |
GaussianProcesses.jl |
||
Package Reviews |
Probability Distributions, Multivariate Statistics, Extreme Value Analysis, Robust Statistical Methods, Survival Analysis |
Econometrics / Timeseries Libraries
This section aims to answer the question: What libraries are available for undertaking econometric / timeseries studies in Python, R or Julia?
Aspect |
Python |
R |
Julia |
Comment |
---|---|---|---|---|
Basic Econometric Analysis (stationarity, trends, seasonality) |
statsmodels.tsa |
Built-in (ts) |
||
ARMA Processes / Univariate Models |
statsmodels.tsa, pmdarima |
|||
Heteroskedastic (GARCH) processes |
statsmodels, arch |
tseries, zoo, vars |
ARCHModels.jl |
|
Vector Auto Regressions (VAR) |
statsmodels.tsa |
VectorAutoregressions.jl (WIP) |
||
General Timeseries |
prophet (R API) |
TimeSeries.jl |
||
Frequency Domain Analysis |
numpy.fft |
Built-in (spectrum) |
||
Package Reviews |
Machine Learning Libraries
This section aims to answer the question: What libraries are available for machine learning projects in Python, R or Julia? The term machine learning is not too specific so we use this category to group various advanced / specialized libraries that are relevant for data science (but not e.g. computer vision and other specialized ML applications). NB: Machine learning algorithms are typically compute intensive and are thus implemented in system languages with eventual binding and API provided to Python or R environments
Aspect |
Python |
R |
Julia |
Comment |
---|---|---|---|---|
Network Analysis |
||||
Cluster Analysis (Unsupervised Learning) |
scikit-learn |
K-means and other clustering algorithms |
||
Random Forests |
scikit-learn |
|||
Gradient Boosting |
scikit-learn |
|||
Probabilistic Graphical Models |
||||
Neural Networks |
Flux, MLJ, Knet |
R studio offers an interface to tensorflow |
||
Package Review |
Bayesian Inference, Cluster Analysis & Finite Mixture Models, Machine Learning, Graphical Models |
GeoSpatial Libraries
This section aims to answer the question: What libraries are available for working with GIS / geospatial data in Python, R or Julia? The geospatial package space is particularly fragmented, the selection focuses on some key anchor concepts.
Aspect |
Python |
R |
Julia |
Comment |
---|---|---|---|---|
Geo Data Structures |
GeoPandas.GeoSeries, GeoPandas.GeoDataFrame |
raster, sp, sf, stars |
||
GDAL |
rgdal |
|||
GeoJSON |
geojson, rgdal |
GeoJSON |
||
PostGIS |
rpostgis |
GeoJSON |
||
GeoMaping |
CartoPy, Descartes |
gmt |
GMT |
|
OpenStreetMap |
||||
Spatial Statistics |
pysal |
gstat, geoR, geoRglm |
R has a large number of specialized spatial statistics packages (see Task Views) |
|
Spatial Econometrics |
||||
Package Review |
Visualization
This section aims to answer the question: What functionality is available to produce data driven visualization in Python, R or Julia?
Aspect |
Python |
R |
Julia |
Comment |
---|---|---|---|---|
Low level API's |
grid, gridExtra |
|||
Graph packages |
||||
Declarative Visualizations |
Vega.jl |
|||
XKCD style plots :-) |
||||
Package Review |
Web, Desktop and Mobile Deployment
This section aims to answer the question: What tools does each language ecosystem provide for the deployment of data based applications, whether this is via the web, desktop or mobile apps.
Aspect |
Python |
R |
Julia |
Comment |
---|---|---|---|---|
Native Webservers |
As a general remark these native servers are not exposed directly in production but are fronted by e.g. apache httpd and nginx servers |
|||
Classic Web Frameworks |
Web frameworks typically used behind a production web server (Apache, Nginx etc.) |
|||
Web Formats |
xml, json (built-in) |
|||
Web Sockets |
WebSocket connection allows full-duplex communication between a client and server so that either side can push data to the other through an established connection |
|||
Client Side (Browser) |
||||
Mobile Apps |
Both kivy and beeware allow cross-platform app development. |
|||
Package Review |
Semantic Web / Semantic Data
This section aims to answer the question: What tools and libraries are available for working with semantic data (RDF, OWL, JSON-LD etc) and other relevant domain specific metadata schemas?
Aspect |
Python |
R |
Julia |
Comment |
---|---|---|---|---|
RDF Format |
rdflib |
rrdf |
||
JSON-LD Format |
rdflib.jsonld |
JSON-LD is an alternative web-friendly serialization format for RDF |
||
OWL Ontologies |
ontospy, owlready2 |
|||
Querying RDF (SPARQL) |
rdflib |
Rredland |
||
Serving RDF (SPARQL) |
rdflib |
|||
SDMX Format |
SDMX is the statistical data and metadata exchange format |
|||
Package Review |
High Performance Computing
For our purposes high performance computing (HPC) is any use case that requires more than a single CPU (and its own RAM or disk). This section aims to answer the question: what are my options if I have performance bottlenecks in terms of CPU, memory or disk, hence covering topics such as concurrency or GPU computing. NB: Julia aims to address performance issues through compilation and other design choices
Aspect |
Python |
R |
Julia |
Comment |
---|---|---|---|---|
Bindings to C/C++ |
Native Python, R are slow compared to lower level / compiled languages. A common approach to make full use of existing CPU is to extend the language via bindings to a faster language. Bindings might also be useful to re-use existing libraries |
|||
Bindings to Java |
renjin |
|||
Bindings to other performing languages (Rust etc) |
||||
Coroutines |
Built-in (async/await, since Python 3.5) |
Built-in (Tasks/Channels) |
||
Multi-threading |
Built-in (thread) |
Built-in (Base.Threads) (Experimental) |
||
Multi-core |
Built-in (Distributed) |
|||
Spark interface |
SparkR, sparklyr |
|||
GPU Computing |
GPU interfaces are offered also via some ML packages (e.g pytorch, tensorflow, MXnet.jl) |
|||
Distributed Data |
||||
Package Review |
Using R, Python and Julia together
The section aims to answer the question: How can I use R from Python, Python from Julia, Julia from R and vice versa :-). The first rows of this table have the From/To Format (From X Call Y) for native integration between the three systems, where "Native" means that the integration is done using language bindings within the respective interpreters / REPL (not explicitly using the operating system or a server API)
Aspect |
Call Python |
Call R |
Call Julia |
Comment |
---|---|---|---|---|
From Python |
||||
From R |
||||
From Julia |
||||
Python/R Cross-Development and Integration |
r4intellij, rpy2 |
|||
Via Server API's |
||||
Via OS / Shell Scripts |
Built-in (subprocess) |
Built-in (system2) |
Built-in (Base.run) |