Skip to content
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
96 changes: 28 additions & 68 deletions docs/design-doc/appendix/14-open-questions.md
Original file line number Diff line number Diff line change
@@ -1,85 +1,45 @@
# Open Questions

### 13.1 Technical Questions
Numbering note: this file is the chapter immediately following the operations section. Anchor links in older docs may reference these by "13.x" - that's stale; the file's section numbers below are the canonical ones.

1. **Resolved:** Using OpenTofu with terraform-exec orchestration and standard Terraform state management
2. **Multi-Cluster:** How to manage multiple clusters in one state file? (Options: separate states, or cluster array in state)
3. **Custom Kubernetes Distributions:** Support for k0s, k3d, RKE2? (v1: No, v2: Maybe)
4. **Helm Chart Storage:** Where to store foundational software Helm charts? (OCI registry? Git?)
5. **Operator HA:** Should operator run in HA mode (multiple replicas)? (Recommendation: Yes, with leader election)
## 14.1 Technical Questions

### 13.2 Configuration Questions
1. **Resolved**: OpenTofu via `pkg/tofu` (`terraform-exec` wrapper) is used by the AWS provider. Other providers do not use tofu (Hetzner uses `hetzner-k3s` directly; local uses Kind; existing is a no-op). See [ADR-0004](../../adr/0004-out-of-tree-provider-plugins.md) for the proposed out-of-tree plugin direction that formalizes this.
2. **Multi-Cluster**: How to manage multiple clusters from one NIC invocation? Today: one cluster per `nic deploy` invocation. Still open.
3. **Custom Kubernetes Distributions**: Support for k0s, k3d, RKE2? Today: Kind for local, k3s for Hetzner, EKS for AWS. RKE2/k0s remain open.
4. **Helm Chart Storage**: Foundational charts live in `pkg/argocd/templates/apps/` as ArgoCD `Application` manifests that reference upstream Helm repositories. OCI mirroring for offline installs is still open.
5. **Operator HA**: Should the Nebari Operator run HA with leader election? Owned upstream at [`nebari-dev/nebari-operator`](https://github.com/nebari-dev/nebari-operator).

6. **Config Validation:** Schema validation via JSON Schema or custom Go validation? (Recommendation: Custom Go + JSON Schema for IDE support)
7. **Config Inheritance:** Support for base + overlay configs? (Recommendation: No for MVP, Yes in future versions via `extends` field)
8. **Secrets Management:** How to handle secrets in config (Keycloak admin password, etc.)? (Options: external secrets operator, sealed secrets, cloud secrets manager)
## 14.2 Configuration Questions

### 13.3 Deployment Questions
6. **Config Validation**: Today: custom Go validation in `pkg/config/config.go` (`NebariConfig.Validate`). JSON Schema export for IDE support remains open.
7. **Config Inheritance** (`extends`): Not implemented. See [`15-future-enhancements.md`](15-future-enhancements.md).
8. **Secrets Management**: **Resolved for MVP**: env vars via `.env` (loaded by `godotenv` in `cmd/nic/main.go`). Git auth uses env-var indirection (`ssh_key_env` / `token_env`). External Secrets Operator / Sealed Secrets / cloud secrets managers remain open as longer-term options.

9. **Rollback Strategy:** Should `nic rollback` be a command? (Recommendation: Yes, Phase 2)
10. **Blue/Green Deployments:** Support for blue/green cluster deployments? (Recommendation: Future)
11. **Canary Deployments:** For foundational software updates? (Recommendation: Future)
## 14.3 Deployment Questions

### 13.4 Integration Questions
9. **Rollback Strategy**: Should `nic rollback` exist? Still open. Today: re-apply a previous config.
10. **Blue/Green Cluster Deployments**: Future.
11. **Canary Deployments for foundational software updates**: Future (depends on ArgoCD's own progressive sync features).

12. **CI/CD Integration:** Should NIC provide GitHub Actions / GitLab CI templates? (Recommendation: Yes, Phase 2)
13. **Monitoring Integration:** Should NIC phone home telemetry (opt-in)? (Recommendation: Phase 2, opt-in only)
14. **Marketplace Integration:** Package as AWS Marketplace / GCP Marketplace offering? (Recommendation: Future)
## 14.4 Integration Questions

### 13.5 Platform Automation Questions
12. **CI/CD Templates**: Should NIC ship GitHub Actions / GitLab CI templates? Still open; the `git_repository:` consumption side is shipped, but template generation is not.
13. **Phone-Home Telemetry**: Should NIC emit opt-in usage telemetry? Still open.
14. **Marketplace Integration**: AWS/GCP Marketplace listings? Future.

15. **Git Repository Provisioning:** Should NIC automatically provision Git repositories and setup CI/CD workflows for infrastructure changes?
## 14.5 Platform Automation Questions

- **Use Case:** `nic init` creates GitHub repo, adds config.yaml, sets up GitHub Actions/GitLab CI for automated infrastructure updates
- **Providers:** GitHub, GitLab, Gitea (self-hosted)
- **Features:** Branch protection, PR-based workflow, automated validation, auto-apply on merge
- **Recommendation:** Phase 2, start with GitHub integration
15. **Git Repository Provisioning**: NIC **consumes** an existing GitOps repo today (`pkg/git`, `git_repository:` config). The **provisioning** side (auto-create the repo on GitHub/GitLab/Gitea, configure protections, etc.) is still open. See `15-future-enhancements.md` §2.

16. **CI/CD Workflow Generation:** Should NIC auto-generate and manage CI/CD pipelines for infrastructure automation?
- **Workflows:**
- PR validation: `nic validate` + `nic plan` on every PR
- Auto-deploy: `nic deploy` on merge to main (with approval gates)
- Scheduled drift detection: Daily `nic status` to detect manual changes
- Automated testing: Integration tests before deployment
- **Customization:** Template-based with user overrides
- **Recommendation:** Phase 2, essential for GitOps workflow
16. **CI/CD Workflow Generation**: Auto-generate validation/deploy/drift workflows. Still open.

### 13.6 Application Stack Questions
## 14.6 Application Stack Questions

17. **Software Stack Specification:** Should NIC support declarative specifications for complete software stacks (databases, message queues, caching, etc.) deployable on top of foundational software?
17. **Software Stack Specification**: Declarative specs for full platform stacks (databases, queues, apps). Still open. Today: user packs install themselves via ArgoCD using `NebariApp` CRs from the upstream operator.
18. **Full Stack in One Repo**: Still open. The GitOps repo layout is owned by NIC for the foundational set today; users overlay their own applications.
19. **Stack Templates & Marketplace**: Still open. The "Software Pack" concept exists in the broader Nebari ecosystem; a curated marketplace is future work.

- **Use Case:** Define entire platform + applications in single config.yaml
- **Example Stacks:**
- Data Science: PostgreSQL + Redis + MinIO + JupyterHub + Dask
- ML Platform: MLflow + Kubeflow + Model Registry + Feature Store
- Web Platform: PostgreSQL + Redis + RabbitMQ + Object Storage
- **Integration:** Via Helm chart repositories, ArgoCD ApplicationSets
- **Recommendation:** Phase 2, using Helm chart catalogs and pre-defined stack templates
## 14.7 Provider Plugin Architecture

18. **Full Stack in One Repo:** Should users be able to define foundational software + application stacks + configuration in a single repository?

- **Structure:**

```
nebari-deployment/
├── config.yaml # Platform + stacks
├── stacks/
│ ├── postgresql-values.yaml # DB config
│ ├── jupyterhub-values.yaml # App config
│ └── dask-values.yaml # Compute config
├── policies/ # OPA policies
└── .github/workflows/ # Auto-generated CI/CD
```

- **Benefits:** Single source of truth, version controlled, auditable, reproducible
- **Recommendation:** Phase 2, core feature for platform teams

19. **Stack Templates & Marketplace:** Should NIC provide pre-built stack templates (data science, ML, web app) and a marketplace for community stacks?
- **Built-in Templates:**
- nebari-data-science-stack
- nebari-ml-platform-stack
- nebari-web-platform-stack
- **Community Marketplace:** GitHub-based registry of vetted stack configurations
- **Recommendation:** Phase 2 for templates, Future for marketplace

---
20. **Out-of-Tree Provider Plugins** ([ADR-0004](../../adr/0004-out-of-tree-provider-plugins.md), Proposed): Open questions from the ADR include scope of plugin kinds, relationship to Nebari stages, credential model, validation without install, trust/signing, and migration of existing in-tree providers. These are tracked in the ADR rather than duplicated here.
7 changes: 7 additions & 0 deletions docs/design-doc/appendix/15-future-enhancements.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,12 @@
# Future Enhancements

> **Status**: this document describes future/aspirational features. Config snippets here use a hypothetical schema and **do not** match the current NIC config format. See [`16-configuration-reference.md`](16-configuration-reference.md) for the current schema (`cluster.<provider>:` / `dns.<provider>:` discriminator pattern; no top-level `provider:` field; no `version:`, `kubernetes:`, `node_pools:`, `tls:`, `foundational_software:`, `images:`, or `features:` blocks). CLI commands like `nic plan`, `nic status`, `nic state`, `nic unlock`, `nic init`, `nic stack`, `nic marketplace` do not exist today; only `deploy`, `destroy`, `validate`, `kubeconfig`, and `version` are implemented.
>
> Some items below have shipped in part:
>
> - **§2 Git Repository Provisioning & CI/CD**: the **consumption** side is done (`pkg/git`, `git_repository:` config block, env-var auth, `file://` local repos). The **provisioning** side (`nic init` creating a new repo, auto-generated workflows) is still future work.
> - **Secrets management** via `.env` + env-var indirection is shipped for MVP (see [`14-open-questions.md`](14-open-questions.md) §14.2).

This document provides detailed specifications for future enhancements planned for NIC.

## 1. Configuration Overlays for Multi-Environment Support
Expand Down
Loading
Loading