[Protocol Buffer] gRPC streaming services vs. repeated fields

프로그래밍/Go lang2020. 11. 5. 16:33

gRPC/Protocol Buffer를 공부하다가 고민에 빠졌습니다.

반복되는 데이터가 gRPC에서 사용되는 경우에 gRPC 함수에서 stream 키워드를 사용해야 할지 아니면 message에서 repeated 키워드를 사용해야할 지에 대해서 말입니다.

신기하게도 마소 공식 홈에 꽤 좋은 글이 있습니다.

docs.microsoft.com/en-us/dotnet/architecture/grpc-for-wcf-developers/streaming-versus-repeated

간단히 보면

반복되는 데이터가 1초 이내에 생성될 정도로 작으면 repeated 필드를, 이와 반대로 반복되는 데이터가 아주 클 경우에는 stream 키워드를 사용하면 될 것 같습니다.

gRPC services provide two ways of returning datasets, or lists of objects. The Protocol Buffers message specification uses the repeated keyword for declaring lists or arrays of messages within another message. The gRPC service specification uses the stream keyword to declare a long-running persistent connection. Over that connection, multiple messages are sent, and can be processed, individually.

You can also use the stream feature for long-running temporal data such as notifications or log messages. But this chapter will consider its use for returning a single dataset.

Which you should use depends on factors such as:

The overall size of the dataset.
The time it took to create the dataset at either the client or server end.
Whether the consumer of the dataset can start acting on it as soon as the first item is available, or needs the complete dataset to do anything useful.

When to use repeated fields

For any dataset that's constrained in size and that can be generated in its entirety in a short time—say, under one second—you should use a repeated field in a regular Protobuf message. For example, in an e-commerce system, to build a list of items within an order is probably quick and the list won't be very large. Returning a single message with a repeated field is an order of magnitude faster than using stream and incurs less network overhead.

If the client needs all the data before starting to process it and the dataset is small enough to construct in memory, then consider using a repeated field. Consider it even if the creation of the dataset in memory on the server is slower.

When to use stream methods

When the message objects in your datasets are potentially very large, it's best for you transfer them by using streaming requests or responses. It's more efficient to construct a large object in memory, write it to the network, and then free up the resources. This approach will improve the scalability of your service.

Similarly, you should send datasets of unconstrained size over streams to avoid running out of memory while constructing them.

For datasets where the consumer can separately process each item, you should consider using a stream if it means that progress can be indicated to the user. Using a stream can improve the responsiveness of an application, but you should balance it against the overall performance of the application.

Another scenario where streams can be useful is where a message is being processed across multiple services. If each service in a chain returns a stream, then the terminal service (that is, the last one in the chain) can start returning messages. These messages can be processed and passed back along the chain to the original requestor. The requestor can either return a stream or aggregate the results into a single response message. This approach lends itself well to patterns like MapReduce.