如何实现大文件分片上传与断点续传?
🎯 面试题:大文件上传怎么做?断点续传原理是什么?
一、痛点分析
普通单次上传的问题:
❌ 文件太大(几个 GB),HTTP 超时中断
❌ 网络波动后重新上传,已上传部分丢失
❌ 服务器内存压力大
❌ 无法并发加速
❌ 无法增量更新
解决方案:
✅ 分片上传:切成多个小块,各自独立上传
✅ 断点续传:记录已上传分片,中断后可从断点继续
✅ 秒传:服务端已有相同文件,直接返回成功
二、整体架构
┌────────────────────────────────────────────────────────────┐
│ 分片上传完整流程 │
│ │
│ 前端:计算文件 MD5 │
│ ↓ │
│ 查后端秒传接口 → 已存在?→ 秒传成功 ✅ │
│ ↓ 不存在 │
│ 获取 uploadId(全局唯一上传会话) │
│ ↓ │
│ 前端:按 2MB/片分片,并发上传(如 3 路并发) │
│ ↓ │
│ 每个分片独立请求 POST /upload/chunk │
│ ↓ │
│ 服务端:存储分片文件,记录已上传分片列表 │
│ ↓ │
│ 所有分片上传完成 → 通知合并 │
│ ↓ │
│ 服务端:按分片顺序合并为完整文件 │
│ ↓ │
│ 校验 MD5,匹配则上传成功 │
└────────────────────────────────────────────────────────────┘
三、服务端实现
1. 分片上传接口
@RestController
@Slf4j
public class FileUploadController {
@Autowired
private RedisTemplate<String, String> redisTemplate;
@Autowired
private FileStorageService fileStorageService;
private static final String CHUNKS_DIR = "/data/upload-chunks/";
private static final long MAX_CHUNK_SIZE = 5 * 1024 * 1024L; // 5MB
/**
* Step 1: 初始化上传会话
* 返回 uploadId,后续所有请求都带上这个 ID
*/
@PostMapping("/upload/init")
public Result<UploadInitVO> initUpload(@RequestBody UploadInitRequest req) {
// 1. 校验文件大小、类型
validateFile(req.getFileName(), req.getFileSize());
// 2. 生成全局唯一 uploadId
String uploadId = UUID.randomUUID().toString().replace("-", "");
// 3. 记录上传会话(Redis,24 小时过期)
String sessionKey = "upload:session:" + uploadId;
Map<String, String> session = new HashMap<>();
session.put("fileName", req.getFileName());
session.put("fileSize", String.valueOf(req.getFileSize()));
session.put("md5", req.getFileMd5());
session.put("totalChunks", String.valueOf(req.getTotalChunks()));
session.put("createTime", String.valueOf(System.currentTimeMillis()));
redisTemplate.opsForHash().putAll(sessionKey, session);
redisTemplate.expire(sessionKey, 24, TimeUnit.HOURS);
// 4. 创建分片存储目录
Path chunkDir = Paths.get(CHUNKS_DIR + uploadId);
try { Files.createDirectories(chunkDir); } catch (IOException e) { /* ignore */ }
log.info("[Upload] Init: uploadId={}, file={}, size={}, chunks={}",
uploadId, req.getFileName(), req.getFileSize(), req.getTotalChunks());
return Result.success(new UploadInitVO(uploadId, MAX_CHUNK_SIZE));
}
/**
* Step 2: 秒传检查(MD5 命中直接成功)
*/
@PostMapping("/upload/check")
public Result<CheckVO> checkUpload(@RequestBody CheckRequest req) {
String md5 = req.getMd5();
String filePath = fileStorageService.getFilePathByMd5(md5);
if (filePath != null) {
// 秒传成功
return Result.success(new CheckVO(true, null, "秒传成功"));
}
// 返回 uploadId 和已上传分片(断点续传)
String uploadId = findActiveUploadId(md5, req.getFileName());
Set<Integer> uploadedChunks = getUploadedChunks(uploadId);
return Result.success(new CheckVO(false, uploadId, uploadedChunks));
}
/**
* Step 3: 上传单个分片
*/
@PostMapping("/upload/chunk")
public Result uploadChunk(
@RequestParam("file") MultipartFile chunk,
@RequestParam("uploadId") String uploadId,
@RequestParam("chunkIndex") int chunkIndex,
@RequestParam("totalChunks") int totalChunks
) {
// 1. 校验上传会话
String sessionKey = "upload:session:" + uploadId;
if (Boolean.FALSE.equals(redisTemplate.hasKey(sessionKey))) {
return Result.error("上传会话不存在或已过期");
}
// 2. 校验分片序号
int total = Integer.parseInt(
(String) redisTemplate.opsForHash().get(sessionKey, "totalChunks"));
if (chunkIndex < 0 || chunkIndex >= total) {
return Result.error("分片序号非法");
}
// 3. 写入分片文件
Path chunkPath = Paths.get(CHUNKS_DIR + uploadId + "/chunk_" + chunkIndex);
try {
Files.createDirectories(chunkPath.getParent());
chunk.transferTo(chunkPath.toFile());
} catch (IOException e) {
log.error("[Upload] Failed to save chunk: {}", chunkPath, e);
return Result.error("保存分片失败");
}
// 4. 记录分片已上传
String chunksKey = "upload:" + uploadId + ":chunks";
redisTemplate.opsForSet().add(chunksKey, String.valueOf(chunkIndex));
redisTemplate.expire(chunksKey, 24, TimeUnit.HOURS);
// 5. 检查是否全部上传完成
long uploadedCount = redisTemplate.opsForSet().size(chunksKey);
if (uploadedCount == totalChunks) {
// 异步合并
CompletableFuture.runAsync(() -> mergeChunks(uploadId, totalChunks));
}
log.info("[Upload] Chunk uploaded: uploadId={}, chunk={}/{}",
uploadId, chunkIndex + 1, totalChunks);
return Result.success(Map.of(
"uploaded", uploadedCount,
"total", totalChunks,
"progress", String.format("%.1f%%", uploadedCount * 100.0 / totalChunks)
));
}
/**
* Step 4: 合并分片
*/
@PostMapping("/upload/merge")
public Result mergeUpload(@RequestParam String uploadId) {
String sessionKey = "upload:session:" + uploadId;
Map<Object, Object> session = redisTemplate.opsForHash().entries(sessionKey);
if (session.isEmpty()) {
return Result.error("上传会话不存在");
}
int totalChunks = Integer.parseInt((String) session.get("totalChunks"));
String fileMd5 = (String) session.get("md5");
String fileName = (String) session.get("fileName");
// 检查所有分片是否齐全
Set<String> chunks = redisTemplate.opsForSet()
.members("upload:" + uploadId + ":chunks");
if (chunks.size() != totalChunks) {
return Result.error("分片不完整,已上传 " + chunks.size() + "/" + totalChunks);
}
// 执行合并
String finalPath = fileStorageService.mergeChunks(
CHUNKS_DIR + uploadId,
totalChunks,
fileName,
fileMd5
);
// 清理临时文件
cleanup(uploadId);
return Result.success(Map.of("filePath", finalPath, "md5", fileMd5));
}
}
2. 分片合并逻辑
@Service
@Slf4j
public class FileStorageService {
@Autowired
private RedisTemplate<String, String> redisTemplate;
/**
* 按分片顺序合并文件
*/
public String mergeChunks(String chunkDir, int totalChunks,
String fileName, String md5) {
// 1. 目标文件路径
String filePath = "/data/uploads/" + md5 + "_" + fileName;
File outFile = new File(filePath);
outFile.getParentFile().mkdirs();
try (OutputStream out = new BufferedOutputStream(
new FileOutputStream(outFile), 8 * 1024 * 1024)) { // 8MB buffer
for (int i = 0; i < totalChunks; i++) {
Path chunkPath = Paths.get(chunkDir + "/chunk_" + i);
if (!Files.exists(chunkPath)) {
throw new IllegalStateException("分片缺失: " + i);
}
Files.copy(chunkPath, out);
Files.deleteIfExists(chunkPath); // 及时删除已合并的分片
}
} catch (IOException e) {
throw new RuntimeException("合并失败: " + e.getMessage(), e);
}
// 2. MD5 校验
String computedMd5 = DigestUtils.md5Hex(new FileInputStream(outFile));
if (!computedMd5.equalsIgnoreCase(md5)) {
outFile.delete();
throw new IllegalStateException("MD5 校验失败,上传过程中文件被篡改");
}
// 3. 注册文件(MD5 → 文件路径映射)
String key = "file:md5:" + md5;
redisTemplate.opsForValue().set(key, filePath);
// 同时写 DB
fileMapper.insert(new FileDO(md5, filePath, fileName, new Date()));
log.info("[FileStorage] Merged: {} -> {}", chunkDir, filePath);
return filePath;
}
public String getFilePathByMd5(String md5) {
return redisTemplate.opsForValue().get("file:md5:" + md5);
}
}
四、前端实现
// 分片上传核心逻辑(前端)
class ChunkUploader {
constructor(file, options = {}) {
this.file = file;
this.chunkSize = options.chunkSize || 2 * 1024 * 1024; // 2MB
this.concurrency = options.concurrency || 3; // 并发数
this.uploadedChunks = new Set(); // 已上传的分片索引
this.uploadId = null;
}
// 计算整个文件的 MD5(用于秒传)
async calcFileMD5() {
const spark = new SparkMD5.ArrayBuffer();
const reader = new FileReader();
const step = 10 * 1024 * 1024; // 每次读 10MB
return new Promise((resolve, reject) => {
let offset = 0;
const loadNext = () => {
const slice = this.file.slice(offset, offset + step);
reader.readAsArrayBuffer(slice);
reader.onload = e => {
spark.append(e.target.result);
offset += step;
if (offset < this.file.size) loadNext();
else resolve(spark.end());
};
reader.onerror = reject;
};
loadNext();
});
}
// 秒传检查
async checkInstant() {
const md5 = await this.calcFileMD5();
const res = await api.post('/upload/check', {
md5,
fileName: this.file.name,
fileSize: this.file.size
});
return { md5, ...res.data };
}
// 获取已上传分片(断点续传)
async getUploadedChunks(uploadId) {
const res = await api.post('/upload/check', {
md5: await this.calcFileMD5(),
fileName: this.file.name,
fileSize: this.file.size
});
return res.data.uploadedChunks || [];
}
// 上传单个分片
async uploadChunk(index, uploadId) {
const start = index * this.chunkSize;
const end = Math.min(start + this.chunkSize, this.file.size);
const chunk = this.file.slice(start, end);
const formData = new FormData();
formData.append('file', chunk);
formData.append('uploadId', uploadId);
formData.append('chunkIndex', index);
formData.append('totalChunks', this.totalChunks);
await api.post('/upload/chunk', formData, {
headers: { 'Content-Type': 'multipart/form-data' }
});
}
// 并发控制上传
async uploadAll() {
const md5Result = await this.checkInstant();
if (md5Result.exists) {
console.log('秒传成功');
return { success: true, instant: true };
}
// 初始化上传会话
this.totalChunks = Math.ceil(this.file.size / this.chunkSize);
this.uploadId = md5Result.uploadId;
if (!this.uploadId) {
const initRes = await api.post('/upload/init', {
fileName: this.file.name,
fileSize: this.file.size,
fileMd5: md5Result.md5,
totalChunks: this.totalChunks
});
this.uploadId = initRes.data.uploadId;
}
// 断点续传:获取已上传分片
const uploadedChunks = await this.getUploadedChunks(this.uploadId);
uploadedChunks.forEach(i => this.uploadedChunks.add(i));
// 构建所有分片任务
const tasks = [];
for (let i = 0; i < this.totalChunks; i++) {
if (!this.uploadedChunks.has(i)) {
tasks.push(i);
}
}
// 并发控制
let running = 0;
const results = [];
return new Promise((resolve) => {
const run = () => {
while (running < this.concurrency && tasks.length > 0) {
const index = tasks.shift();
running++;
this.uploadChunk(index, this.uploadId)
.then(() => {
this.uploadedChunks.add(index);
this.onProgress(this.uploadedChunks.size, this.totalChunks);
})
.catch(err => tasks.push(index)) // 失败重试
.finally(() => { running--; run(); });
}
if (running === 0 && tasks.length === 0) {
// 全部完成,通知服务端合并
this.notifyMerge();
resolve({ success: true });
}
};
run();
});
}
async notifyMerge() {
await api.post('/upload/merge', { uploadId: this.uploadId });
}
onProgress(uploaded, total) {
const pct = ((uploaded / total) * 100).toFixed(1);
document.getElementById('progress').textContent = `${pct}%`;
}
}
五、断点续传原理
实现关键点:
1. 服务端记录每个 uploadId 已上传的分片列表
2. 前端每次上传前,先查服务端已上传分片
3. 只上传缺失的分片,已上传的跳过
暂停/恢复:
前端维护一个本地记录(localStorage):
{ uploadId, uploadedChunks: [0,1,2,4,5], totalChunks: 10 }
刷新页面后:
1. 拿 localStorage 的 uploadId 查服务端已上传分片
2. 比对本地记录和服务端记录,取并集
3. 从缺失分片继续上传
六、高频面试题
Q1: 秒传的原理是什么?
服务端按文件内容 MD5 存文件映射:MD5 → 文件路径。上传前先计算文件 MD5 查服务端,存在则说明文件已上传过,直接关联用户即可,无需再传数据。
Q2: 分片上传如何保证文件完整性?
① 前端计算文件整体 MD5;② 每个分片上传时带分片序号和总分片数;③ 所有分片上传完成后,服务端按序号顺序合并;④ 合并后再次计算 MD5,与前端传的值比对,匹配则成功。
Q3: 并发上传多个分片时,分片丢失怎么处理?
合并前检查所有分片是否齐全(Redis Set 记录已上传分片数量)。发现分片缺失时前端重试缺失的分片。合并操作只有在所有分片齐全后才执行。
Q4: 分片上传如何限流?
① 服务端限流:令牌桶限制整体上传带宽;② 客户端并发数控制(默认 3 路);③ 服务端按 userId + uploadId 限流,防止单个用户过度占用资源。
参考链接: