# k8s-monitoring

**Repository Path**: ceagle/k8s-monitoring

## Basic Information

- **Project Name**: k8s-monitoring
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2026-02-13
- **Last Updated**: 2026-02-14

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# Kubernetes 监控系统文档

欢迎来到 Kubernetes 监控系统文档！本文档提供了完整的系统架构、部署指南、配置参考和故障排查信息。

## 📚 文档导航

### 核心文档
- [01-🏗️ 系统架构](./docs/01-architecture.md) - 系统设计和组件关系
- [02-🚀 部署指南](./docs/02-deployment.md) - 完整部署流程
- [03-⚙️ 配置指南](./docs/03-configuration.md) - 系统配置详解
- [04-💾 NFS 存储配置](./docs/04-nfs-storage-guide.md) - 存储详细配置
- [05-🔒 私有镜像仓库配置](./docs/05-private-registry-configuration.md) - 镜像仓库配置
- [06-📖 API 参考](./docs/06-api-reference.md) - API 文档
- [07-🔄 部署环境与脚本流程](./docs/07-deployment-workflow.md) - 脚本执行顺序和环境部署
- [🔧 故障排查](./docs/troubleshooting/README.md) - 问题解决指南

### 专题文档
- [01-📥 crictl 镜像拉取问题](./docs/troubleshooting/01-crictl-image-pull.md) - 镜像拉取故障排查
- [02-🖼️ Grafana 仪表盘无数据显示](./docs/troubleshooting/02-grafana-dashboard-no-data.md) - 仪表盘问题解决

## 🔧 系统概览

本监控系统基于 Prometheus 生态构建，提供完整的 Kubernetes 集群监控解决方案。

### 核心组件

| 组件 | 功能 | 部署方式 | 存储 |
|------|------|----------|------|
| **Prometheus** | 时序数据库 | StatefulSet | 20Gi NFS |
| **Grafana** | 数据可视化 | StatefulSet | 5Gi NFS |
| **Alertmanager** | 告警管理 | StatefulSet | 2Gi NFS |
| **Node Exporter** | 节点指标 | DaemonSet | - |
| **cAdvisor** | 容器指标 | DaemonSet | - |
| **Kube State Metrics** | K8s 状态 | Deployment | - |
| **Pushgateway** | 指标推送 | Deployment | 1Gi NFS |

### 系统架构图

```
┌─────────────────────────────────────────────────────────────┐
│                      监控架构                                │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  ┌──────────┐   ┌──────────┐   ┌──────────┐               │
│  │Prometheus│   │Alertmgr  │   │ Grafana  │               │
│  │  (存储)   │   │ (告警)   │   │ (可视化) │               │
│  └────┬─────┘   └──────────┘   └──────────┘               │
│       │                                                     │
│  ┌────┴────┬──────────┬──────────┬──────────┐              │
│  │         │          │          │          │              │
│  ▼         ▼          ▼          ▼          ▼              │
│ ┌────┐  ┌────┐    ┌────┐    ┌────┐    ┌────┐             │
│ │Node│  │Kube│    │cAdv│    │Push│    │... │             │
│ │Exp │  │State│   │isor│    │GW  │    │    │             │
│ └────┘  └────┘    └────┘    └────┘    └────┘             │
│                                                             │
│  ┌─────────────────────────────────────────────────────┐   │
│  │                 NFS 存储后端                         │   │
│  └─────────────────────────────────────────────────────┘   │
│                                                             │
└─────────────────────────────────────────────────────────────┘
```

## 🚀 快速开始

### 1. 环境准备
- Kubernetes 集群 v1.20+
- kubectl 客户端
- NFS 服务器 (NFS_IP)
- Harbor (HARBOR_IP:8082)
- 至少 30Gi 存储空间

### 2. 一键部署
```bash
# 克隆仓库
git clone https://gitee.com/ceagle/k8s-monitoring.git
cd k8s-monitoring

# 配置 NFS (如有需要)
# 编辑 base/storage/nfs/nfs-config.yaml

# 部署监控系统
./scripts/deploy-monitoring.sh deploy

# 验证部署
./scripts/deploy-monitoring.sh status
```

### 3. 访问服务

| 服务 | 访问方式 | 地址 | 账号 |
|------|----------|------|------|
| Grafana | NodePort | http://节点IP:31064 | admin/admin |
| Prometheus | 端口转发 | `kubectl port-forward svc/prometheus 9090:9090` | - |
| Alertmanager | 端口转发 | `kubectl port-forward svc/alertmanager 9093:9093` | - |

## 📋 部署验证清单

- [ ] 所有 Pod 状态为 Running
- [ ] 所有 Service 正常创建
- [ ] PVC 已绑定到 PV
- [ ] Grafana 可以正常访问
- [ ] Prometheus 可以查询指标
- [ ] 仪表盘可以正常显示数据

## 🎯 使用场景

### 集群资源监控
监控集群整体资源使用情况：
- CPU、内存、磁盘使用率
- 节点健康状态
- Pod 资源消耗

### 应用性能监控
监控应用程序性能指标：
- 请求延迟和吞吐量
- 错误率统计
- 自定义业务指标

### 告警通知
配置智能告警规则：
- 资源阈值告警
- 服务可用性告警
- 多渠道通知（邮件、Slack、PagerDuty）

## 🤝 贡献指南

欢迎提交 Issue 和 Pull Request！

### 开发流程
1. Fork 本仓库
2. 创建功能分支 (`git checkout -b feature/AmazingFeature`)
3. 提交更改 (`git commit -m 'Add some AmazingFeature'`)
4. 推送到分支 (`git push origin feature/AmazingFeature`)
5. 创建 Pull Request

## 📄 许可证

本项目采用 MIT 许可证

## 📞 支持

- 🐛 [Issues](https://gitee.com/ceagle/k8s-monitoring/issues)
- 💬 Slack: #k8s-monitoring
- 📧 邮箱: ceagleme@qq.com


---
**注意**: 本文档会随项目更新而更新，建议定期查看最新版本。