Data Science Project Structure

Sources:

Build a Reproducible and Maintainable Data Science Project by Khuyen Tran

Assuming that Python is they main language for a data science project. Assuming also to use Python venv.

The following structure is adapted from: dvc-pip

.
├── README.md
├── pyproject.toml
├── .gitignore
├── LICENSE
│
├── configs/                # Main configuration file
│   ├── main.yaml           # Configurations for training model
│   ├── model/
|   |   |── model1.yaml
|   |   |── model2.yaml
|   ├── process/            # Configuration for process
|       ├── process1.yaml
│
├── data/                      
│   ├── final/              # Data after training
│   ├── processed/          # Data after processing
|	├── raw/                # Raw data
|
├── docs/                   # Documentation
├── models/                 # Store models  
├── notebooks/              # Store notebooks
│
├── src/                    # Source code
│   ├── __init__.py
│   ├── process.py
│   ├── __pycache__/
│   ├── train_mode.py
│
├── scripts/                    # Utility scripts
│   ├── download_data.sh
│   ├── download_models.sh
│
├── tests/                      # Unit tests
│   ├── __init__.py
│   ├── test_process.py
│   └── test_training.py

Explanation with each part:

For pyproject.toml see pyproject.toml

Obsidian + 🪴 Quartz 4.0

Data Science Project Structure

Data Science Project Structure

Graph View

Backlinks

Obsidian + 🪴 Quartz 4.0

Data Science Project Structure

Data Science Project Structure §

Graph View

Backlinks

Data Science Project Structure