Design 3: Designing a Scalable File Upload System

In this design, we’ll explore how to build a robust file upload system that supports large CSV/Excel files, handles validations, allows error recovery, and is scalable using event-driven and big data components.

Step 1: Define Functional Requirements

Upload large CSV/Excel files through a web UI
Validate each record with business rules
Store valid records in database or data lake
Track and log failed records with error reasons
Allow users to download failed records
Support reprocessing or reupload of failed data
Provide real-time status updates for uploads

Step 2: Define Non-Functional Requirements

Scalability to handle millions of rows
Resilient and fault-tolerant processing
Secure file handling and access control
Low latency for upload acknowledgment
Monitoring and retry capabilities

Step 3: Define API Services

POST /upload – Accepts file and metadata
GET /upload-status/{uploadId} – Checks processing progress
GET /failed-records/{uploadId} – Downloads failed records
POST /reprocess – Accepts corrected records for retry

Step 4: High-Level Architecture

Frontend App: Uploads file via UI, shows progress
Backend API (Java): Accepts upload, stores in S3
Amazon S3: Stores raw files and failed records
Event Bus (Amazon SNS/SQS or EventBridge): Triggers async processing
Worker/Processor (AWS Lambda / ECS Fargate): Validates and ingests records
Amazon RDS / Redshift: Stores valid data for downstream use
Status Tracker (DynamoDB + CloudWatch): Tracks job state and record stats

Step 5: Key Architectural Decisions

Use S3 to decouple file ingestion and processing
Adopt event-driven flow using EventBridge or SQS for scalable async processing
Chunk large files for parallel processing and fault isolation
Store failed records separately in S3 with retry metadata
Design workers to be idempotent and support partial reprocessing

Step 6: Additional Considerations

Support for resumable or chunked uploads
Role-based access control for file uploads
Client-side validations before file submission
Observability: Upload dashboards, error metrics
Future monetization of recorded data processing pipelines

Conclusion

A scalable file upload system requires asynchronous processing, robust validations, and strong observability. By leveraging event-driven architecture and AWS services like S3 (for file storage), Lambda or ECS (for processing), SNS/SQS (for messaging), and Redshift or RDS (for downstream storage), we can build a system that handles millions of records efficiently while ensuring a seamless user experience and reliable failure recovery.

← Back to Designs Next Design →