cookiekaker is another cookiecutter deep learning template

metya a9b7b35044 cookiekaker v0.1 6 anos atrás
docs a9b7b35044 cookiekaker v0.1 6 anos atrás
tests a9b7b35044 cookiekaker v0.1 6 anos atrás
{{ cookiecutter.repo_name }} a9b7b35044 cookiekaker v0.1 6 anos atrás
.gitattributes a9b7b35044 cookiekaker v0.1 6 anos atrás
.gitignore a9b7b35044 cookiekaker v0.1 6 anos atrás
LICENSE a9b7b35044 cookiekaker v0.1 6 anos atrás
README.md a9b7b35044 cookiekaker v0.1 6 anos atrás
cookiecutter.json a9b7b35044 cookiekaker v0.1 6 anos atrás
requirements.txt a9b7b35044 cookiekaker v0.1 6 anos atrás

README.md

Cookiecutter Cookiekaker Data Science Template inspired by @vasinkd and @drivendata

A not quite logical, nad unreasonably standardized, but flexible project structure for doing and sharing data science work at certain motivation and place.

Cookiecutter Data Science is a real game changer for data science projects. I use it, but change many things, because, you know, automotization!

I made several tweaks on base of drivendata template which helps me to improve my working routine.

HOW TO USE:

First of all, install cookiecutter with:

$ pip install cookiecutter

or

$ conda install cookiecutter

or

$ apt install cookiecutter

After that you can use template with:

$ cookiecutter https://github.com/metya/cookiekaker

Features:

  • May choose python3.5, python3.6, python3.7, python3.8

  • Creation of virtual envronment is limited to virtualenv.

  • Creation of virtual envronment also sets up git vcs and dvc vcs and pre-commit hooks

  • Project library renamed from src to project_name which lets you use the created library on your machine from anythere

  • Added pipeline folder to store all dvc pipelines there

  • Added data/features folder

  • Added settings.py to illustrate how to use .env file

  • Added an empty noteboook "1.0-{{cookiecutter.author_name}}-dvc-pipeline.ipynb" to store all dvc pipelines creation commands and to illustrate that numeration of notebooks is a good idea

  • Cleared make_dataset.py since I find it too restrictive and confusing

  • Removed aws sync functions

  • Removed data folder from .gitignore since dvc version control takes care of .gitignore

  • Removed tox.ini since .pre-commit.yaml is enough for me

The resulting directory structure


The directory structure of your new project looks like this:

├── LICENSE
├── Makefile           <- Makefile with commands like `make data` or `make train`
├── README.md          <- The top-level README for developers using this project.
├── data
│   ├── external       <- Data from third party sources.
│   ├── interim        <- Intermediate data that has been transformed.
│   ├── processed      <- The final, canonical data sets for modeling.
│   ├── features       <- Features may be stored here
│   ├── inference      <- Inference stages may be stored here
│   └── raw            <- The original, immutable data dump.
│
├── docs               <- A default Sphinx project; see sphinx-doc.org for details
│
├── models             <- Trained and serialized models, model predictions, or model summaries
│
├── notebooks          <- Jupyter notebooks. Naming convention is a number (for ordering),
│                         the creator's initials, and a short `-` delimited description, e.g.
│                         `1.0-jqp-initial-data-exploration`.
│
├── references         <- Data dictionaries, manuals, and all other explanatory materials.
│
├── reports            <- Generated analysis as HTML, PDF, LaTeX, etc.
│   └── figures        <- Generated graphics and figures to be used in reporting
│
├── .pre-commit-config.yaml <- Stores pre-commit settings
│
├── requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
│                         generated with `pip freeze > requirements.txt`
│
├── __init__.py
│
└── <project_name>     <- Source code for use in this project.
    ├── __init__.py    <- Makes {{cookiecutter.repo_name}} a Python module
    │    
    ├── settings.py <- illustrates how to use .env file
    │
    ├── data           <- Scripts to download or generate data
    │   └── make_dataset.py
    │
    ├── features       <- Scripts to turn raw data into features for modeling
    │   └── featurize.py
    │
    └── models         <- Scripts to train models and then use trained models to make
        │                 predictions
        └── train.py