Welcome to Python Data Science
Python Data Science is an open source, collaborative project aiming to document best practice approaches to data science tasks using Python. At present there are two main classes of resources:
- A collection of Python Task Views
- The Jupyter Overview that compares Python functionality against the R and Julia data science frameworks
Python Task Views
Task Views is a collection of documents in markdown format that provide guidance on which python packages are relevant for which data science task.
Task views aim to outline which packages could be included (or excluded) in a certain project to achieve certain functionality. They are not meant to endorse the "best" package for any given task.
The initial proposed list of python task views mirrors the CRAN set of corresponding Task Views for the R system. Over time this may develop to reflect more accurately the grouping of tasks in the Python universe
How Does it Work?
Each Python Task View document lives in a separate file hosted in this repository.
The files are automatically displayed in a Gitlab Pages website
Moderators
Anybody who has good knowledge of python data science tools used in a specific domain is welcome to contribute to the knowledge base. Similar to CRAN, we want to have a small number of moderators per topic to help organize the content and ensure it is a high quality resource that adds value to all users.
Python Datascience Taskview List
- Analysis of GeoSpatial Data
- Analysis of Pharmacokinetic Data
- Analysis of Ecological and Environmental Data
- Bayesian Inference
- Chemometrics and Computational Physics
- Clinical Trial Design, Monitoring, and Analysis
- Cluster Analysis & Finite Mixture Models
- Databases
- Differential Equations
- Econometrics
- Design of Experiments (DoE) & Analysis of Experimental Data
- Extreme Value Analysis
- Empirical Finance
- Functional Data Analysis
- Graphical Models in Python
- Handling and Analyzing Spatio-Temporal Data
- High-Performance and Parallel Computing with Python
- Hydrological Data and Modeling
- Machine Learning & Statistical Learning
- Medical Image Analysis
- Meta-Analysis
- Missing Data
- Model Deployment with Python
- Multivariate Statistics
- Natural Language Processing
- Numerical Mathematics
- Official Statistics & Survey Methodology
- Optimization and Mathematical Programming
- Phylogenetics, Especially Comparative Methods
- Probability Distributions
- Psychometric Models and Methods
- Regression
- Reproducible Research
- Robust Statistical Methods
- Semantic Data
- Statistics for the Social Sciences
- Statistical Genetics
- Survival Analysis
- Teaching Statistics
- Time Series Analysis
- Visualization
- Web Technologies and Services
Jupyter (Python versus R versus Julia) overview
While Task Views are dedicated exclusively to Python data science tools, the Jupyter overview project is inspired by the R ecosystem CRAN views. It offers a side-by-side comparison with R and Julia packages available for data science helps identify important sub-domains where Python may currently lag. The overview is available in two formats:
- As a wiki page: Jupyter Overview Wiki.
- As a markdown document: Jupyter Overview Markdown