General Splunk
As you develop a custom view you start with one chart or one table. After a while you’ve added and added, and you’re dispatching several searches. Often you’ll see that a lot of searches are pretty similar to each other. You’re getting the same events off disk more than once and you’re making Splunk do extra work. If you get the nagging feeling that there’s a better way, you’re right; it’s called “postProcess” and it’s a part of the core Splunk API.
Post process searches allow you to avoid this inefficiency. They allow you to dispatch only one “base search” get the events off disk only once, but then at request-time, carve up that base set of results in 2 or more different ways, to render different ‘slices’.
The first thing everyone does is very intuitive – they make a “base search” that’s a simple search that returns raw events, and they make postProcess searches that contain transforming commands like stats or timechart. Makes perfect sense, and it’s a TERRIBLE IDEA. DO NOT DO THIS. Read on.
Splunk behaves a little different when the ‘search results’ are actually events. In particular, it does not preserve complete information about the events once you pass 10,000 rows (UPDATE: in more recent versions of Splunk this has increased to 500,000). The problem is that you will not get any warning about this and the rows will be silently discarded from your base search in the postProcess dashboard and therefore your postProcessed results will be wrong. Conversely if the base search contains transforming commands like stats, splunk will preserve all the rows in the base search results, to 10,000 rows and beyond.
you have fallen into this pit when the postprocessed results displayed seem wrong or truncated, or WORSE they don’t seem wrong and you don’t find out they are wrong until much later.
If a field is not mentioned explicitly in your base-search somewhere, splunkd will think it doesn’t need to extract and preserve those values when it runs the job. Then come postprocess-time that field will be absent and you’ll be extremely confused. If you always group your desired fields and rows with the stats command, everything is much more explicit and you sidestep this confusion.
you have fallen into this pit when you’ve spent hours staring at your config wondering why your postProcess search acts like some field isn’t there.
you have fallen into this pit when your slick postprocess-heavy dashboard actually has terrible performance.
Note that a corollary of this pitfall is that you should avoid using a “pure events” search as your base search because such searches will have a large number of rows. Throw a “stats count sum(foo) by bar baz” on there and summarize the rows down to the ones you’ll actually use.
If you’ve read this far perhaps you’re hunting for specific examples. Here are two!
Below we’re using access data from SplunkWeb to show a table of the bytes transferred by filename, and also the number of requests by HTTP status. In the normal approach we’d have to use two different searches:
Notice that both searches have to get the same events off disk. This makes it a good candidate for post process.
Base search: index=_internal source=”*web_access.log”
This is wrong for several reasons, and it wont work anyway. See the “Pitfalls” section above to find out why.
Base search: index=_internal source=”*web_access.log” | stats count sum(bytes) as totalBytes by file, status
When time is involved, you have to use Splunk’s “bin” command to bucket all the values into some reasonable number of time buckets.
Here’s a similar example to the above, except instead of the ‘request count by status’ on the right, we want the right side to show a ‘count over time by status’:
Base search: index=_internal source=”*web_access.log”
This is wrong for several reasons, and it wont work anyway. See the “Pitfalls” section above to find out why.
Base search: index=_internal source=”*web_access.log” | bin _time span=15min | stats count sum(bytes) as totalBytes by file, _time, status