수만 건 데이터, StreamingHttpResponse로 메모리 문제 없이 응답

대용량 데이터를 응답할 때 발생하는 메모리 문제를 StreamingHttpResponse로 해결하여 서버 안정성 확보

기존 방식의 문제점

일반적인 HttpResponse나 DRF Response로 대량 데이터 처리 시:

모든 데이터를 메모리에 한 번에 로드
사용자가 몰리면 메모리 부족으로 서버 다운 위험
응답 시작까지 모든 데이터 준비 완료 필요

StreamingHttpResponse 동작 원리

데이터를 작은 조각(Chunk)으로 나누어 순차적으로 전송:

메모리 사용량이 데이터 크기와 무관하게 일정 수준 유지
첫 번째 데이터 조각 준비되자마자 응답 시작
Generator를 활용해 데이터를 실시간 생성 및 전송

기본 사용법

from django.http import StreamingHttpResponse

def big_csv_view(request):
    def csv_generator():
        yield 'id,name,email\n'
        for user in User.objects.all().iterator():
            yield f'{user.id},{user.name},{user.email}\n'

    response = StreamingHttpResponse(
        csv_generator(),
        content_type="text/csv",
    )
    response['Content-Disposition'] = 'attachment; filename="users.csv"'
    return response

핵심 최적화: queryset.iterator()

User.objects.all(): 모든 데이터를 한 번에 메모리 로드
User.objects.all().iterator(): 데이터를 하나씩 또는 작은 그룹으로 가져옴
iterator(chunk_size=2000): DB에서 지정된 개수씩 배치 처리로 네트워크 오버헤드 감소

DRF ViewSet 활용

from rest_framework.decorators import action
import csv

class ProductViewSet(viewsets.ReadOnlyModelViewSet):
    @action(detail=False, methods=['get'])
    def download_csv(self, request):
        class Echo:
            def write(self, value):
                return value

        def csv_row_generator(data):
            pseudo_buffer = Echo()
            writer = csv.writer(pseudo_buffer)
            yield writer.writerow(['name', 'price', 'stock'])
            for item in data:
                yield writer.writerow([item.name, item.price, item.stock])

        response = StreamingHttpResponse(
            csv_row_generator(self.get_queryset().iterator()),
            content_type="text/csv",
        )
        response['Content-Disposition'] = 'attachment; filename="products.csv"'
        return response

주요 주의사항

미들웨어 충돌

GZipMiddleware 등 response.content를 읽는 미들웨어와 충돌 가능
웹 서버 레벨에서 압축 처리하거나 스트리밍 뷰에서 미들웨어 비활성화 필요

에러 처리 한계

스트리밍 시작 후 에러 발생 시 상태 코드 변경 불가
제너레이터 내부에서 try-except로 예외 처리 및 로깅 필수
첫 yield 전에 유효성 검사 등 실패 가능 로직 처리

Content-Length 헤더 부재

전체 데이터 크기를 미리 알 수 없어 진행률 표시 불가
기획 단계에서 미리 고려 필요

테스트 방법

def test_csv_streaming_view(self):
    response = self.client.get('/path/to/view/')
    content_lines = [line.decode('utf-8') for line in response.streaming_content]
    expected_content = "id,name,email\n1,John,[email protected]\n"
    self.assertEqual("".join(content_lines), expected_content)

Previous느린 직렬화(Serialization) 과정 최적화하기 NextDB Connection Pooling으로 커넥션 부하 줄이기 (pgBouncer)

Last updated 1 month ago

Good night

hashtag기존 방식의 문제점

hashtagStreamingHttpResponse 동작 원리

hashtag기본 사용법

hashtag핵심 최적화: queryset.iterator()

hashtagDRF ViewSet 활용

hashtag주요 주의사항

hashtag테스트 방법

기존 방식의 문제점

StreamingHttpResponse 동작 원리

기본 사용법

핵심 최적화: queryset.iterator()

DRF ViewSet 활용

주요 주의사항

테스트 방법