Design 3: Designing a Scalable File Upload System
In this design, we’ll explore how to build a robust file upload system that supports large CSV/Excel files, handles validations, allows error recovery, and is scalable using event-driven and big data components.
Step 1: Define Functional Requirements
- Upload large CSV/Excel files through a web UI
- Validate each record with business rules
- Store valid records in database or data lake
- Track and log failed records with error reasons
- Allow users to download failed records
- Support reprocessing or reupload of failed data
- Provide real-time status updates for uploads
Step 2: Define Non-Functional Requirements
- Scalability to handle millions of rows
- Resilient and fault-tolerant processing
- Secure file handling and access control
- Low latency for upload acknowledgment
- Monitoring and retry capabilities
Step 3: Define API Services
- POST
/upload
– Accepts file and metadata - GET
/upload-status/{uploadId}
– Checks processing progress - GET
/failed-records/{uploadId}
– Downloads failed records - POST
/reprocess
– Accepts corrected records for retry
Step 4: High-Level Architecture
- Frontend App: Uploads file via UI, shows progress
- Backend API (Java): Accepts upload, stores in S3
- Amazon S3: Stores raw files and failed records
- Event Bus (Amazon SNS/SQS or EventBridge): Triggers async processing
- Worker/Processor (AWS Lambda / ECS Fargate): Validates and ingests records
- Amazon RDS / Redshift: Stores valid data for downstream use
- Status Tracker (DynamoDB + CloudWatch): Tracks job state and record stats
Step 5: Key Architectural Decisions
- Use S3 to decouple file ingestion and processing
- Adopt event-driven flow using EventBridge or SQS for scalable async processing
- Chunk large files for parallel processing and fault isolation
- Store failed records separately in S3 with retry metadata
- Design workers to be idempotent and support partial reprocessing
Step 6: Additional Considerations
- Support for resumable or chunked uploads
- Role-based access control for file uploads
- Client-side validations before file submission
- Observability: Upload dashboards, error metrics
- Future monetization of recorded data processing pipelines
Conclusion
A scalable file upload system requires asynchronous processing, robust validations, and strong observability. By leveraging event-driven architecture and AWS services like S3 (for file storage), Lambda or ECS (for processing), SNS/SQS (for messaging), and Redshift or RDS (for downstream storage), we can build a system that handles millions of records efficiently while ensuring a seamless user experience and reliable failure recovery.