Skip to content

Welcome to Python Data Science

Logo

Python Data Science is an open source, collaborative project aiming to document best practice approaches to data science tasks using the wonderful Python platform.

At present there are two main classes of resources supporting this mission:

  • The collection of Python Task Views that focus on categorizing tools and libraries available within the Python ecosystem
  • The Jupyter Overview project that compares Python functionality against the R and Julia data science frameworks

Python Task Views

Task Views is the main deliverable of the Python Data Science project. At its core it is a collection of documents in Markdown format that provide guidance on which python packages are relevant for which data science task.

Task Views aim to outline, for example, which packages could be included (or excluded) in a certain project to achieve a desired functionality. They are not meant to endorse the "best" package for any given task.

The initial proposed list of Python Task Views:

  • includes select categories from the PyPI taxonomy
  • mirrors the CRAN set of corresponding Task Views for the R system
  • reflects topics and subjects as seen in other code / publication platforms

The overall structure of the classification is outlined in the Taxonomy

The Jupyter Comparison Project (Python versus R versus Julia)

While Task Views are dedicated exclusively to Python data science tools, the Jupyter overview project offers a side-by-side comparison with R and Julia packages available for data science. This helps identify important subdomains where Python may currently lag, hence encouraging a multilingual approach to Data Science.

The Jupyter overview is available in two formats: