What are some ways to consume large SOW queries?

Follow

AMPS has sophisticated and configurable slow consumer mitigation functionality that protects AMPS and other consumers from performance degradation due to a single over-subscribed consumer.  The defaults for the slow consumer configuration may not be enough to handle large SOW queries where the consumer will have large bursts of over-subscription followed by long quiet periods of processing.

Imagine a consumer wants to execute a SOW query that returns 3 million records that average in size 1024 bytes.  This would require greater than 3GB of storage while the consumer pulled all of that data over the network and by default would result in an almost immediate slow consumer disconnect event.

Side Note: A SOW Query of 3 million records with an average record size of 1024 bytes takes a bit over 3GB of data to send to the client.  It will take your client about 30 seconds to pull that across a 1Gbps network link (or 3 seconds over a 10Gbps link), and that estimate assumes that your message handling code can operate at the line rate.

 AMPS 5.2 and later

In AMPS 5.2 and later versions, support for the skip_n option (working with the top_n option offered in AMPS 5.0 and later) makes it possible to retrieve a specific part of a larger result set.  This is particularly useful for applications that use only a part of the overall data at a given time. For example, if the SOW contains 3 million rows, but the application only displays 500 rows at a time, the application may never need to request the full set of records from the SOW. (Notice, however, that these options apply strictly to the SOW query, and not to subsequent updates. If an application needs updates to the query, the methods below can be more useful.)

AMPS 5.0 and later

In AMPS 5.0 and later versions, slow consumer mitigation contains settings for both individual clients and for resource consumption across the instance as a whole. For most applications, these changes make slow client disconnection much less frequent.

MessageMemoryLimit Sets the maximum amount of memory to use for messages (for this Transport or the instance as a whole). (units = bytes, default = 10% of total host memory or 10% of the amount of memory AMPS is allowed to consume, whichever is lowest)

MessageDiskLimit Sets the maximum amount of disk space to use for messages (for either this Transport or for the instance as a whole). (units = bytes, default = 1GB or the size of the MessageMemoryLimit, whichever is highest)

ClientMaxCapacity Sets the amount of the available limit capacity a single client can consume. (units = bytes, default = 100%)

ClientMessageAgeLimit Sets the maximum age of the oldest message held for the client. If the oldest message AMPS has buffered for this client is older than this limit, AMPS disconnects the client. (units = time interval, default = unlimited)

Recommended sizing:

MessageMemoryLimit 60East recommends leaving this configuration parameter at the default where possible. If more space is required, increase the parameter by 1-2% at a time. Use caution with settings greater than 20%.

MessageDiskLimit To estimate the message disk limit, start with MaxResultSize * (1.0 + 150/AverageRecordSize) * NumberOfSimultaneousClients , or the MessageMemoryLimit, whichever is greater. If clients are still offlined, 60East recommends growing this limit in increments of MaxResultSize * (1.0 + 150/AverageRecordSize)

 

Versions previous to AMPS 5.0

ClientBufferThreshold : The number of message bytes to buffer in memory before storing to an offline store (units = bytes, default = 50MB)The slow consumer mitigation logic is controlled by 3 thresholds found in your Transport configuration:

ClientOfflineThreshold : The number of messages to buffer to disk before disconnecting the client (units = number of messages, default = 100,000).

ClientMaxBufferThreshold : The maximum number of bytes to allow in memory and the client offline store before the client is disconnected (units = bytes, default = 1GB) (Notice that this maximum applies even if you set SlowClientDisconnect to disabled -- AMPS enforces this as a hard limit.)

Recommended sizing

To handle large queries, you'll want to increase the configured slow consumer values to notify AMPS that these kinds of queries are to be expected.

ClientBufferThreshold : This controls the maximum memory consumed by each connected client, we'd recommend keeping this <=100MB, unless you have special needs such as lots of memory and now local storage for offlined files. (Recommendation: 50MB)

ClientOfflineThreshold : When the query executes, it's highly likely all of the messages from the query will be stored in the offline store. That means this value needs to be large enough to contain the maximum allowed query size, in this case 3 million records. (Recommendation: 3000000).

ClientMaxBufferThreshold : This value needs to be set to the maximum size of the query in bytes, including both data and metadata for the records such as SOW key, topic, and query identifier. Our 3M record query with an average record size of 1KB needs at least 3GB of space to contain the entire query without forcing a disconnect.  It's good to give a bit of headroom for metadata in the query response. (Recommendation: 3500000000). To approximate this size, use this formula to find an upper bound for the amount of data necessary to buffer:

MaxBufferThreshold = MaxResultSize * (1.0 + 150/AverageRecordSize)

The above is enough to get AMPS to stop disconnecting your large consumers, but you can additionally use the BatchSize parameter on your SOW Query to increase the efficiency of the SOW results being returned to you (less metadata overhead). We recommend setting the batch size to 10, which is a good default (and the default used starting with the 4.0 version of the 60East Client APIs.)

Have more questions? Submit a request

Comments

Powered by Zendesk