Welcome to Python Data Science

Python Data Science is an open source, collaborative project aiming to document best practice approaches to data science tasks using Python. At present there are two main classes of resources:

  • A collection of Python Task Views
  • The Jupyter Overview that compares Python functionality against the R and Julia data science frameworks

Python Task Views

Task Views is a collection of documents in markdown format that provide guidance on which python packages are relevant for which data science task.

Task views aim to outline which packages could be included (or excluded) in a certain project to achieve certain functionality. They are not meant to endorse the "best" package for any given task.

The initial proposed list of python task views mirrors the CRAN set of corresponding Task Views for the R system. Over time this may develop to reflect more accurately the grouping of tasks in the Python universe

How Does it Work?

Each Python Task View document lives in a separate file hosted in this repository.

The files are automatically displayed in a Gitlab Pages website


Anybody who has good knowledge of python data science tools used in a specific domain is welcome to contribute to the knowledge base. Similar to CRAN, we want to have a small number of moderators per topic to help organize the content and ensure it is a high quality resource that adds value to all users.

Python Datascience Taskview List

Jupyter (Python versus R versus Julia) overview

While Task Views are dedicated exclusively to Python data science tools, the Jupyter overview project is inspired by the R ecosystem CRAN views. It offers a side-by-side comparison with R and Julia packages available for data science helps identify important sub-domains where Python may currently lag. The overview is available in two formats: