Using Leg Types to make your life easier

First, a quick chat about call legs vs. calls.   We all know what a call is – that period of time from when you pick up a phone until you put it back down again.  In modern systems each call  is comprised of one or more call legs, with each leg being a single source/destination combination.  So if you try to call Sally, you could see the following legs:

  • Leg 1: You dialed Sally’s extension.  It rings.  No answer.
  • Leg 2: Your call gets forwarded to voicemail.  You leave a message.  You hang up.

That’s a two-leg call.  Simple, right?

Introducing Leg Types

Leg types are a way to add a reusable, human readable “tag” to the individual legs to make both searching and seeing call flow easier and better.

For instance, you could define leg types for “Abandoned at voicemail” to catch those legs where the caller went to voicemail and just hung up on it.  Another could be “Left voicemail”, which is just like before except that they actually left a voicemail.  Maybe even “jumped out of voicemail” to tag legs that went to voicemail but then transferred themselves out again.  This is a little more tricky –  these legs should be ones that went to voicemail, but which there is no “on hook” party – i.e. no  one actually hung up at that leg.

There are two things to be aware of –

  1. Make sure that each call leg is matched to only one leg type.
  2. These are leg types, not call types.  They operate on individual legs, not on what you or I think of as “calls”.

For the former: there should be no overlap on an individual leg, that gets weird, and may very well behave incorrectly or break something.  We do try to test for such overlaps in the Health Check page (under “Setup”), but it’s easier to just make sure they don’t overlap in the first place.  This means you shouldn’t have a generic “went to voicemail” leg and also a “went to voicemail and left message” leg – because a leg that went to voicemail and left a message would pick up *both* legs.  In that case, just redefine the more general one to be “not what that more specific one is”, so “went to voicemail” becomes “went to voicemail and didn’t leave a message”.  (This is in the example below, so if that sounds confusing perhaps continuing on will clear it up!)

For the latter: We hope to implement a call types feature on top of leg types and other functionality in the future, but baby steps.., so stay tuned.

Now that we understand what they’re about, let’s jump in to a short example!

Voicemail calls example

Let’s build a leg type for calls that ended in voice mail, and another for calls that ended up getting hung up on when they went to voice mail.

Step 1: Find your voicemail calls

Step 1.1: Define what a voicemail call is.

The first task is to define what it means “to end in voice mail”.   This is dependent on how your calls get routed and thus varies from place to place, but we’ve found several common threads in most people’s environment which I’ve outlined below:

device_type=”unityvm”

If you used the default voicemail naming scheme, you should be able to see voicemail by looking for device_type of “unitvm”.  This is the most commonly needed search for finding voicemails.

on_hook_party=”caller”

A leg which has the on_hook_party set as the caller means this is the leg where the caller hung up and thus terminated the call.  This helps determine if they *ended* at voicemail or if they then hopped elsewhere via some menu option.

deviceName=X

Underneath our created “device_type” is an assumption that the default voicemail naming convention was followed, specifically that voicemail devices are named like “CiscoUM-VI*.”  If “device_type” never says “unityvm” even though it should, this is probably the issue.  The resolution is to just use the deviceName(s) that you configured, perhaps with wildcards, like deviceName = MyVoiceMailSystem*

Step 1.2: Construct and validate the search

In our example, let’s assume that when the device type is the default “unityvm”  and the calling party hangs up, the call is one we want tagged as voicemail_left_message or voicemail_abandoned.

To test this, let’s open up Browse Calls, then in the field “other search terms” type in device_type=”unityvm” on_hook_party=”caller”

Review that list, make sure it looks right!

Now we have to decide on a duration to use – I’m going to pretend 5 seconds is our cutoff.  If the person was in voicemail for more than 5 seconds, they left a message.  If 5 seconds or under, they hung up right away.  Your own threshold may be different, but it seems between 5 and 10 seconds is the most common range used.

This gives us two non-overlapping cases:
When they “hung up” on the voicemail – device_type=”unityvm” on_hook_party=”caller” duration<=5
When they left a voicemail – device_type=”unityvm” on_hook_party=”caller” duration>5

Again, test both of those using search, make sure they look correct.  You might have to leave yourself a few voicemails – and abandon a few – to see how the duration should be set.  I recommend testing with some coworker who happens to not be at their desk.  Most folks feel loved when they come back from a break and find someone left them a few voicemails!

Step 2: Building leg types

Now that we have our searches, let’s build some leg types!

  • Click Settings, then Event Types.
    • We are going to assume you have no leg types already – if you did, search for them by typing “leg_type” in the search box and pressing enter, then review what you have to make sure we aren’t duplicating legs.
  • Click the green “New Event Type” in the upper right.  Fill it in as such:
    • Name: leg_type_voicemail_abandoned
    • Search: device_type=”unityvm” on_hook_party=”caller” duration<=5
    • (Note the initial “leg_type_…” in the name is important, it’s how our app knows these are for it!)
  • Click Save

Now, build the “left_voicemail” version.

  • Click Settings, then Event Types.
  • Click the green “New Event Type” in the upper right.  Fill it in as such:
    • Name: leg_type_voicemail_left_message
    • Search: device_type=”unityvm” on_hook_party=”caller” duration>5
    • (Note the initial “leg_type_…” in the name is important, it’s how our app knows these are for it!)
  • Click Save

Step 3: Testing.

Click back in your browser a few times to get back to your search, or reselect the Cisco CDR Reporting and Analytics app then go to Browse Calls.

If you do not see the field “leg_types”, click “Fields” in the upper right, search for “leg_type” on the left and add it into the right pane.  Then obviously click Save in the field selector.

As long as any calls matching the search we built above are in the results, you should see leg_type populated.  Fiddle with your timeframe a bit if you need to.

Step 4: Using Leg Types.

Now that we have leg types defined for a few cases, we can search for those using the ‘other search terms’ field.

Rather than bore you with prose, how about I just make a little table with some examples and see how that looks?

 

To see all calls that have: Search this:
Any leg_type defined
leg_type="*"
A leg_type of “voicemail_left_message”
leg_type="voicemail_left_message"
A leg_type starting with “voicemail…”
leg_type="voicemail_*"

The neat part is that though these are defined on individual call legs, those legs roll up into calls and keep their call legs as part of them.  Searching for leg types of X means the app returns your *calls* that include legs of those types.

 Additional possibilities:

This is by no means complete, but some random thoughts on leg types:

  • tagging legs that were placed to your call center as perhaps “call_center_received”
    • split that into “call_center_received”, “call_center_abandoned” and “call_center_no_answer”
  • tagging a certain DID block with “incoming_sales_took_call”
    • NOTE that “OR” is OK, but put parentheses around them, like “(finalCalledPartyNumber=7344 OR finalCalledPartyNumber=7345) duration>=15 on_hook_party=*”, which would legs that were to either 7344 or 7345, lasted more than 15 seconds, and where someone hung up without transferring (so they didn’t get transferred away).

There are many possibilities for building and using these and we’d love to see the system you come up with!

 

Tagged , |

Choropleth Maps!

If you read our last installment on Maps, you’ll know we can put calls on a map.

There are even more cool maps to display calls on!  In addition to Cluster maps, Splunk also has bundled with it Choropleth maps for both Countries and for US States.

A refresher

Before starting, you may want to go review our post on building Cluster Maps.  Come on back when you are done there and let’s get our hands dirty.

We assume you can find your data.

So we won’t tell you how to do it beyond Browse > Browse Calls.

Adding Required Fields

  • Way over on the right click the green Edit Fields button.
  • For users with a lot of international calls, search for and add the fields callingPartyCountry and finalCalledPartyCountry
  • Or if your calls are mostly just US, try adding callingPartyState and finalCalledPartyState
  • In either case, when you have your fields selected click Save

Change to showing raw data

Let’s now show this in the core Splunk UI to do the custom visualizations we need.

  • Click the link to >> see full search syntax in the upper right.
  • A New Search window will open with a big long search already populated.

Add the magic commands

This is where things go different from the previous article. For one thing, we’re going to go through using “Countries” here, if you are in the US and want to use States it’s this same process only with a slightly different command.  We will do US States as a second example below (but read through this one, we’ll use an abbreviated version of it so you need to be familiar with it anyway).

Last time we built a cluster map by adding one command, “geostats”.  To build a Choropleth map we need to add two commands, one (stats) to “sum” up the counts by country, another (geom) to tell Splunk how to display that “place”.

  • To the end of that search, paste in one of the two below commands, depending if you want the *calling* parties or the *called* parties to display.  (Calling is inbound, finalCalled is outbound).
    | stats count BY finalCalledPartyCountry | geom geo_countries featureIdField="finalCalledPartyCountry"
        -- OR --
    | stats count BY callingPartyCountry | geom geo_countries featureIdField="callingPartyCountry"
    
  • Click the Search button (or just press enter while your cursor is in the search text field).
  • Change to the Statistics tab and let’s take a quick look there to confirm.

Notice that I added the search from above and that I’m currently looking at the Statistics tab.  The stats part is responsible for coming up with the “count” of 53 for Australia.  The “geom” command is what came up with that big pile of numbers on the right, which if you squint really hard at is a polygon shaped just like Australia.  I promise.  You might have to squint *really* hard to see that, or maybe let’s have Splunk show us this!

Make it pretty

  • Change to the Visualization tab.

Splunk *should* pre-select the map type , because we’ve sent the data through the geom command. If so, there’s nothing else you need to do except wait a few moments for the data to populate.

If on the other hand you do not have a Choropleth Map showing,

  • Click the Visualization tab, then the Visualization type.
  • Change it to Choropleth. This should be under the Recommended section.  If not, look farther down.

Give that a little while to load…

For U.S. States

As promised, here is how to do U.S. States.  This relies on the process above, so if you have any questions on how to do a particular thing, refer to the Countries sections above.

  • Go to Browse Calls
  • Optionally filter/find certain calls.
  • Click >> see full search syntax
  • After getting your New Search window, paste into the end of it
    | stats count BY finalCalledPartyState | geom geo_us_states featureIdField="finalCalledPartyState"
        -- OR --
    | stats count BY callingPartyState | geom geo_us_states featureIdField="callingPartyState"
    
  • Click the Search button (or just press enter while your cursor is in the search text field).
  • Change to the Visualization tab
  • Change to the Choropleth map (if it doesn’t automatically load it).

Wrapping up

We hope to have given you the tools to create some nice visualizations using your CDR data.  Now maybe those dashboards of incoming calls won’t look so plain!

Tagged , |

Maps!

The question

Have you ever wondered where your inbound calls come from?  Do you suspect agents are placing a lot of calls on the company dime to Loja, Ecuador to find out if the high temp there is supposed to be 74F again today?

Well, you are in luck!  Today we’ll show you how to display the call counts in a Cluster Map!

Finding some data

First, let’s find the data you want to display.  This could be a lot of things, but for now let’s use your own main extension, let’s say it’s “2126”.

  • Browse > Browse Calls.
  • In the number/ext field, type in 2126.
  • Change the “scan only the last 1000 records” to “all records”.
  • Click the search icon.

There’s no reason you have to use your main extension – you could leave all these options blank and see all the calls that end up with location information in them. The sky is the limit here.

Adding latitude/longitude fields

  • Once you have calls showing up, way over on the right click the green “Edit Fields” button.
  • Search for keyword “lat” and in the resulting list, click on the green arrow to add the fields “callingPartyLat” and “finalCalledPartyLat” to the right side.
  • Do the same for “long”, adding “callingPartyLong” and “finalCalledPartyLong”.
  • Once you have all four fields added, click the Save button.

Change to showing raw data

Now that you have some useful, specific data, we need to display this data in the core Splunk UI to do some custom visualizations.

  • Click the link to “>> see full search syntax” in the upper right.
  • A “New Search” window will open with a big long search already populated.

Don’t fret if it just looks like a bunch of  gobbledygook – we already did the hard work for you so you just have to add a few small commands to the very end of it.

Add the magic commands

  • To the end of that search, paste in
    | geostats latfield=callingPartyLat longfield=callingPartyLong count
  • The result should look like this:
  • Then click the search button (or just press enter while your cursor is in the search text field).

This runs the geostats command, telling it to plot the ‘count’ for each latitude and longitude.  We have to tell the command which fields in our data contain the latitude and longitude, hence the “latfield=<my latitude field name> longfield=<my longitude field name>” in the middle.

Make it pretty

  • Change to the “Visualization” tab.

If Splunk is already displaying a Cluster Map, there’s nothing else you need to do except wait a few moments for the data to populate.

If on the other hand you do not have a Cluster Map showing,

  • Click the Visualization tab, then the Visualization type.
  • Change it to Cluster Map. This should be under the “Recommended” section.  If not, look farther down.

Note there are two “Maps” style visualization.  The other one (with shaded countries instead of dots) is called a Choropleth Map.  We don’t have the right data in this example for the Choropleth map, so be sure not to pick that one.  We will do a Choropleth map in a future blog, so stay tuned!

And that’s it, you should now have a map populated with the call counts.

Some minor variations

Display outbound call destinations instead of inbound call sources

To change from plotting the incoming calls’ location to the location of the outgoing, use fields ‘finalCalledPartyLat’ and ‘finalCalledPartyLong’.

| geostats latfield=finalCalledPartyLat longfield=finalCalledPartyLong count

Counting by the final disposition of the call

If you want your little dots to be something other than one single color, an option may be to count BY something.  One of the more popular ‘by’ clauses is by the field “cause_description”.  The field “cause_description” contains values like “Normal call clearing” (which is a call that ended normally), “Call split” (which is when a call gets transferred), “No answer from user (user notified)” which should be self explanatory, or maybe even the dreaded “No circuit/channel available” which means that you have filled your pipes and couldn’t get a free line to place a call with.

Anyway, enough description – adding the BY clause is easy.  To the end of either one of the above, simply add ‘ BY cause_description’.  So if you were doing the final called party version, it would now be

| geostats latfield=finalCalledPartyLat longfield=finalCalledPartyLong count BY cause_description

Now when you click search, your little blue dots should now be divided up into little slices for different cause descriptions.  Hold your mouse over them to see more detail.

Tagged , |

Enabling CUBE or vCUBE data

Cisco Contact Center gives you great visibility for Contact Center, and products like ours give you great visibility into CallManager…

…but have you noticed there’s a CUBE-sized blind spot in your picture of overall call flow?

Lucky for you, we can make sense of this data now. All the H.323 and SIP traffic, media streams (both RTP and RTPC), all the handoffs to DTMF and all the other things that CUBE and vCUBE can do – we shine a flashlight into that darkness and let you start using that data as part of the overall picture you can get from our Cisco CDR Reporting and Analytics app.

Prerequisite information and notes

We are going to assume that:

  • you have set up our product already following our install docs and you have an SFTP server running on a Splunk Universal Forwarder (UF).
  • that this UF is on a Linux box of some sort and that you some basic comfort with a *nix command line,
  • that your existing UF configuration is indexing the CallManager CDR and CMR using a sinkhole input,
  • that you can install software on that system,
  • that you will use your existing SFTP user account on that system for the new CUBE CDR data
  • and that you have admin access to your CUBE system or can find someone who does to run a half dozen commands for you.

The steps that we will perform to enable ingesting CUBE CDR data are:

  • Install an FTP server on the existing *nix UF
  • Configure vCUBE/CUBE to save CDR data to that FTP server
  • Reconfigure the existing UF to pick up that new data.

Step 1, Setting up an FTP server

CUBE/vCUBE (from now on I’m just going to write CUBE since it covers both products) only supports FTP as far as we can tell. This means that the standard and recommended method we use for collecting CDR data from CallManager – SFTP – can’t be used with CUBE.

There are many FTP packages that you could use and practically any of them should work fine.  If you don’t have one installed already, then follow along below to get some guidance on getting FTP up and running.

Find which distribution you are using:

If you already know the Linux distribution installed on your UF (Red Hat Enterprise Linux, Ubuntu, Slackware, etc…) you can skip this step.

  1. Log into an SSH session on the existing UF.
  2. Run the one line command cat /etc/*-release
  3. In the output, you’ll see either a release name like “Red Hat Enterprise Linux”, or somewhere in the output may be a “Description” field that says “Ubuntu 16.04 LTS”.  Yours may say something completely different, like Debian or Slackware.  Just note what it says.

Install the FTP server software

This step is distribution specific, so if you don’t know which distribution you are using please see the section immediately above this one, then come back here.

For Ubuntu, follow setting up an FTP server on Ubuntu.  You’ll only want the steps vsftpd – FTP Server Installation and User Authenticated FTP Configuration – DO NOT set up Anonymous FTP!  Also be very careful to not accidentally set the anon_upload_enable=YES flag, which for some reason is stuck in the middle of the Authenticated FTP configuration section.

For Red Hat and its various versions you can follow these instructions on setting up an FTP server on Red Hat.

Other Linuxes (Linuxen?  Linuxii?) – just search the internet for “<my distribution> FTP server” and try to find the most “official” looking instructions you can to enable non-anonymous FTP.  If you check out the two directions linked above you can get a feel for what that might look like.

Also, if you have a preference for an FTP server you are comfortable with, by all means use it instead of our instructions.  It won’t hurt our feelings.

Confirm the FTP server works

You can use any ftp client you have available to confirm this.  Preferably one on a different system so you can confirm there’s no firewall on the local system blocking you.

We recommend creating a temporary file with any content you want and confirming

  1. You can upload that file using the username and password for the existing SFTP user
  2. That the file ends up where you expect it to be

If you have any problems at this point, review the installation steps you used to install and also confirm there’s no firewall either between you and the FTP server or on the local FTP server itself.  If so, adjust the firewall settings to allow FTP traffic.

Step 2, Configuring CUBE to save CDR data to our FTP server

Log into the server used for file accounting (e.g. your CUBE server) with an account with administrative permissions.  Then run the below listed commands to set up gw-accounting to file, change the cdr-format to “detailed”, configure the ftp server information, and tell the system to flush new data to file once per minute.  Finally, we make sure this configuration gets saved.

  1. enable
  2. configure terminal
  3. gw-accounting file
  4. cdr-format detailed
  5. primary ftp 10.0.0.100/cube_ username cdr_user password cdr_user_passwd
  6. maximum cdrflush-timer 1
  7. end
  8. copy running-config startup-config

Step 5 is the one to pay attention to!  In step 5 be sure to change the information for your server IP, username and password. Also notice that in step 5 the cube_ in 10.0.0.100/cube_ is the file prefix.  The FTP software will put the file into the right place in the directory structure, the cube_ piece here tells it to prepend the word “cube_” to the front of the filename it creates.  This is later how we’ll tell the UF to pick up that data specifically.

To confirm, from that same SSH session to your CUBE server run the command file-acct flush with-close.  You should see a new file nearly immediately appear in your FTP folder.  This file might be nearly empty with only a timestamp in it if there were no phone calls in the short period involved, but in any case it should be there.

Step 3, Tell the UF to index this data

The UF needs only a few tiny pieces of configuration.  There should already be a working configuration for indexing the Cisco CDR data via the TA_cisco_cdr app and its inputs.conf file.  We will now edit that to get our new data files to be sent in as well

  1. Edit your $SPLUNK_HOME/etc/apps/TA_cisco_cdr/local/inputs.conf file
  2. You’ll see two stanzas already for your existing CDR data, with sourcetypes of cisco_cdr.  (If you do not see those two stanzas, you are in the wrong place.  Check other inputs.conf files on that system. )
  3. Go to the end of the file and add a third entry that looks like:
    [batch:///path/to/files/cube_*] 
    index = cisco_cdr 
    sourcetype = cube_cdr 
    move_policy = sinkhole 
    
  4. Save the file
  5. Restart the UF.

Finalizing.

Now that you have this data in, for all CallManager calls where we recognize the matching CUBE record(s), the fields from those CUBE events will be available in the field picker popup, in “Browse Calls”. To talk about desired functionality in other parts of the app (notably General Report), and about your needs in general give us call. We can help in the short term even if it’s a bit manual for now, and we’ll be very interested to hear all the messy details to help guide our next few releases as we flesh this out.   It’ll be fun!

 

 


Welcome Rich Mahlerwein!

I am extremely excited to announce that Rich Mahlerwein has joined Sideview as of yesterday morning! Rich is a Splunker extraordinaire, an expert on technologies all across the datacenter, a fantastically helpful member of the Splunk community and even a Splunk Trust member. In fact he’s so unique that Splunk literally sent someone out to his office and his house to make a movie about him.

Rich seems to be that person you work with who is involved in almost everything after a while, with a sprawling job title that resists confinement on a mere mortal’s business card. Here at Sideview we’re going to sort of short-circuit that process by making him involved in pretty much everything, immediately. It’s going to be fantastic!

Welcome!


How to build everything backwards

Our minds, for better or worse, are wired a certain way. We hear someone’s problem and we immediately want to solve it, even before they’re done describing it. This doesn’t even sound like a bad thing does it?

A story.

The first version of a Thing is built. It’s purpose is to take a process that sucks and make it suck less. Its creators think about the Thing a LOT, the process it’s fitting into, what the Thing is going to do, how it’s going to be useful to its intended end-users. They work hard. But they don’t think nearly as much about how the Thing’s users are going to tell them whether it worked for them. Or where it kinda didn’t. Or what other stuff those users still have to do manually, that sucks as much. There’s probably an email alias. That’s fine right? Users will send emails there when they hit bugs, so…. they can totally use the same alias to tell us about any bigger problems too surely.
What they think about even less than that, is how the more empowered users on the ground might themselves answer the questions – “Who is using this thing? what parts of it are they using? Who is not using this tool at all? Can we see any evidence of what else they are using instead?”.

The Thing is deployed and soon reality has moved on a bit. The people who use the Thing are always the first to notice, if only because they start bending how they’re using it – maybe abandoning one part, maybe circumventing the Thing entirely sometimes. They probably think about emailing the alias, or maybe they do but it’s not really a “defect” so they don’t. If they know a person on the inside, maybe they’ll ask them about it next time they’re chatting.
at any rate, this doesn’t have an easy answer and remember they have a problem still. They have to solve that, and they can think of a workaround so they move forward. Thus in that workaround is born the “Shadow Process”. As time goes on, the people in the know start using more Shadow Process.

Deprived of its most advanced and powerful users and their feedback, the official process withers further. The group owning the tool despairs! They may not even be aware that most of the workload they think is being done, is actually being borne by the Shadow Process. They seize on the officially reported defects and the feedback, most of which inexplicably has come from the consensus of internal stakeholders (and much of which interestingly concerns adding additional use cases and features) and they announce a grand plan to do a Next Big Version of the Thing.


Change one thing, and the story changes. Don’t release the product first and think about its feedback mechanisms last (never). Instead build the tools and/or processes that the end-users can use to easily send in feedback of all levels – from bugs, to enhancement requests, to full-on “Come To Jesus” moments.

Pretty much as soon as you start doing that, you’ll realize that those hero users on the ground don’t actually have enough data to do this well.
“I don’t know, it works for me. I have some defects reported but I passed those on. I think the other 193 users are probably using it fine.”
So realizing that, you’ll start spending time and energy giving those hero users some kind of little tools and processes of their own. For example so they can see “188 out of 193 agents are using this thing” and ask “Well.. what the hell are the other 5 doing??”. Or so they can see “That page that was supposed to be the workhorse of this whole thing, literally only one person has ever visited it in the last week”.

The entire purpose of these second order tools, is to enable the local heroes to send in and/or summarize these collected facepalms back to the mothership. So because they’ve been enabled to do so, following through and sending the data back in, even if it’s a screenshot, will be a natural next step. And hey that screenshot is more valuable than any three of the feature suggestions from your Product Council.


The solution I think starts with admitting that you suck. That you haven’t been doing this at all. I suck too and I haven’t been doing nearly enough of this. But I’m resolving to fix this in the next releases of all our products (before first giving a way to see if the fix is working).


Postprocess searches – pitfalls galore

As you develop a custom view you start with one chart or one table. After a while you’ve added and added, and you’re dispatching several searches. Often you’ll see that a lot of searches are pretty similar to each other. You’re getting the same events off disk more than once and you’re making Splunk do extra work. If you get the nagging feeling that there’s a better way, you’re right; it’s called “postProcess” and it’s a part of the core Splunk API.

Post process searches allow you to avoid this inefficiency. They allow you to dispatch only one “base search” get the events off disk only once, but then at request-time, carve up that base set of results in 2 or more different ways, to render different ‘slices’.

The first thing everyone does is very intuitive – they make a “base search” that’s a simple search that returns raw events, and they make postProcess searches that contain transforming commands like stats or timechart. Makes perfect sense, and it’s a TERRIBLE IDEA. DO NOT DO THIS. Read on.

Skipping to the end – “what could go wrong?”

  1. PITFALL #1: base search is a “pure events” search that matches more than 10,000 events.

    Splunk behaves a little different when the ‘search results’ are actually events. In particular, it does not preserve complete information about the events once you pass 10,000 rows. The problem is that you will not get any warning about this and the rows will be silently discarded from your base search in the postProcess dashboard and therefore your postProcessed results will be wrong. Conversely if the base search contains transforming commands like stats, splunk will preserve all the rows in the base search results, to 10,000 rows and beyond.

    you have fallen into this pit when the postprocessed results displayed seem wrong or truncated, or WORSE they don’t seem wrong and you don’t find out they are wrong until much later.

  2. PITFALL #2: base search is a “pure events” search and postprocess uses a field not explicitly named in base search.

    If a field is not mentioned explicitly in your base-search somewhere, splunkd will think it doesn’t need to extract and preserve those values when it runs the job. Then come postprocess-time that field will be absent and you’ll be extremely confused. If you always group your desired fields and rows with the stats command, everything is much more explicit and you sidestep this confusion.

    you have fallen into this pit when you’ve spent hours staring at your config wondering why your postProcess search acts like some field isn’t there.

  3. PITFALL #3: avoid using postProcess searches in cases where the number of rows returned by the ‘base search’ is extremely high. You’re setting yourself up for very bad performance in your dashboard.

    you have fallen into this pit when your slick postprocess-heavy dashboard actually has terrible performance.

    Note that a corollary of this pitfall is that you should avoid using a “pure events” search as your base search because such searches will have a large number of rows. Throw a “stats count sum(foo) by bar baz” on there and summarize the rows down to the ones you’ll actually use.

  4. There are other strong reasons to not use a “pure events” search as the base search, when you’re using postProcess, but they’re extremely technical and have to do with map-reduce and distributed search and all kinds of tweaky things that would take too long to explain. Just don’t do it OK?

If you’ve read this far perhaps you’re hunting for specific examples. Here are two!

Example 1) How to use postprocess when _time is not involved

Below we’re using access data from SplunkWeb to show a table of the bytes transferred by filename, and also the number of requests by HTTP status. In the normal approach we’d have to use two different searches:

  1. index=_internal source=”*web_access.log” | stats sum(bytes) as totalBytes by file | sort – totalBytes
  2. index=_internal source=”*web_access.log” | stats count by status | sort – count

Notice that both searches have to get the same events off disk. This makes it a good candidate for post process.

THE WRONG BUT INTUITIVE WAY

Base search: index=_internal source=”*web_access.log”

  1. PostProcess 1:| stats sum(bytes) as totalBytes by file | sort – totalBytes
  2. PostProcess 2:| stats count by status | sort – count

This is wrong for several reasons, and it wont work anyway. See the “Pitfalls” section above to find out why.

THE RIGHT WAY

Base search: index=_internal source=”*web_access.log” | stats count sum(bytes) as totalBytes by file, status

  1. PostProcess 1:| stats sum(totalBytes) as totalBytes by file | sort – totalBytes
  2. PostProcess 2:| stats sum(count) as count by status | sort – count

Example 2) how to use postProcess when time is involved

When time is involved, you have to use Splunk’s “bin” command to bucket all the values into some reasonable number of time buckets.

Here’s a similar example to the above, except instead of the ‘request count by status’ on the right, we want the right side to show a ‘count over time by status’:

THE WRONG, BUT INTUITIVE WAY

Base search: index=_internal source=”*web_access.log”

  1. PostProcess 1:| stats sum(bytes) as totalBytes by file | sort – totalBytes
  2. PostProcess 2:| timechart span=15min count by status

This is wrong for several reasons, and it wont work anyway. See the “Pitfalls” section above to find out why.

THE RIGHT WAY

Base search: index=_internal source=”*web_access.log” | bin _time span=15min | stats count sum(bytes) as totalBytes by file, _time, status

  1. PostProcess 1:| stats sum(totalBytes) as totalBytes by file | sort – totalBytes
  2. PostProcess 2:| timechart span=15min sum(count) by status


There’s always a worse way

In the Splunk search language there is almost always a better way, and someone on answers.splunk.com to teach you about it. Less commonly advertised though, is the fact that there is ALWAYS a worse way…

So let’s drive the wrong way down a one way street. Bear with me.

First, a warning. Driving the wrong way down a one way street is not something you should do, and likewise there are some searches here that you should NOT RUN EVER.

Challenge #1 Let’s make 1 empty row with foo=1!

No problem. You’ve probably seen someone do this:

| stats count | fields - count | eval foo="1"

Ooh neat. What if we need 7 empty rows with foo=1?
No problem.

| stats count | fields - count | eval foo=mvrange(0,7) | mvexpand foo | eval foo="1" 

Can we optimize this slightly to make sure it only runs on our search head?
Sure.

| noop | stats count | fields - count | eval foo=mvrange(0,7) | mvexpand foo | eval foo="1" 

or if you prefer

| localop | stats count | fields - count | eval foo=mvrange(0,7) | mvexpand foo | eval foo="1" 

These are getting pretty clunky though. And on 6.4 there’s a much better way!

| makeresults count=7  

GLORIOUS!!!!
So…. we made it better. What if we went the other way and made it…. WORSE.

Worse how?
Well, let’s do some unnecessary things, AND let’s make it break sometimes randomly! And lets force it to talk to every one of the indexers and ask them each to give us one event!!

index=* OR index=_* | head 1 | fields - * | eval foo="1" | table *

ooh. that’s horrible. If there’s nothing in the main index, or if a user only can see a subset of indexes that happen to be empty during a given timerange, it’ll produce no row at all. But it still hits every indexer.

Let’s keep going though. There’s a lot more Horrible down here.

Let’s get all the lookups that we can get from our current app context, and smash them together into one giant result set. then throw it all away and keep only one row.

| rest /services/data/lookup-table-files | fields title | map search="| inputlookup $title$" | head 1 | fields - * | eval foo="1"

oh that’s marvelous. We’re probably generating a ton of errors somewhere since not all lookups can be loaded from the app context we happen to be in. And depending on who we’re running this as, we might get no rows at all. So much fail.

To Be Continued…..

Challenge #2 Get the list of fields.

Well you might find the transpose command first and thus find yourself doing this:

* | table * | transpose | rename column as field | fields field

Which is pretty evil. We’re getting every event off every disk only to throw all the values away. The better way is:

* | fieldsummary | fields field 

neat. Is there a worse way though?
You betcha. This isn’t quite as evil as our starting point of “* | table * | transpose” search but it’s pretty evil.

* | stats dc(*) as * | transpose | rename column as field | fields field

Wildcards in stats/chart and timechart are fantastic. As long as they’re used sparingly which here they are NOT. We’re forcing the pipeline to keep track of every single distinct value of every field. If you have 100 or 200 fields this can get pretty ugly.

More Horrible! OK let’s make it keep track of the actual distinct values themselves and give them back to the search head where…. we just throw them away. Mou ha ha!

* | stats values(*) as * | transpose | rename column as field | fields field

Even More Horrible. let’s make it keep track of ALL values of all fields as huge multivalue fields and send them all back to us so we can throw them away.

* | stats list(*) as * | transpose | rename column as field | fields field

Except wait! That was a trick! list() actually truncates at 100 values, whereas values() just keeps on going…. so that search is slightly less evil than the earlier values(*) one.

OK sorry. Let’s make up for lost ground. Can we make a horrible search that technically uses fieldsummary itself?
Sure!

* | fieldsummary | fields field | map search="search index=* $field$=* | head 1 " | fieldsummary | fields field

And we didn’t even use join or append once!!

If this makes you think of any good “Evil SPL”, please email it to evil_spl@sideviewapps.com.


New Sideview Utils module – NavBar

The latest release of Sideview Utils (3.3.9) includes a new module called NavBar that can be used to replace Splunk’s AppBar module.

Now why do I need to replace AppBar?.

Well, you very well might not! It has to do with the “displayview” setting in savedsearches.conf. If you’re a Splunk app developer who’s been around for a while you might have seen that setting. The idea of displayview is that a saved search can remember what view it is supposed to load in, and go to that view when run. If the key is empty, then the savedsearch of course loads in the generic Splunk UI.

Unfortunately in Splunk 6.3 the implementation of the main navigation in AppBar was changed and a (presumably unintended) side effect was that the “displayview” setting in savedsearches.conf broke. Thus in 6.3 and beyond, all saved searches and reports just load in the generic search/report view when run from the main nav. {sad trombone}.

but.. ok let’s say I used this key and it worked, how would form elements in my view repopulate correctly?.

With just the advanced XML, they wouldn’t! which is the main reason why displayview was so seldom used. (Prepopulation of form elements was part of the old 4.0 “resurrection” system which was incomplete and unstable, and turned off by Splunk some time ago anyway.)

But if you’re using Sideview Utils, Sideview Utils replaced this stuff with its own systems long ago, around a new key it added to savedsearches.conf (did you know that apps can actually extend the core conf space? Yay Splunk for doing many things properly long ago).

Specifically when the user saves a search or report in the SearchControls module, if
a) they use the Sideview SearchControls module and if
b) the view includes “sideview_utils/save_create_patches” in the customJavascript param of the Sideview Utils module and
c) the view uses the URLLoader module (most views with form elements really should anyway)

then it will file away upstream form element state into a key called request.ui_context. Then when any user runs the report later, it will load back in that view courtesy of displayview, and the form elements there would get set correctly courtesy of request.ui_context


Field pickers for everyone!

By popular demand, you can now pick the columns and fields in Browse Calls and also in the “raw call legs” table inside Call Detail. So when you’re looking into something around say call quality fields, with just couple clicks you can make Browse Calls show you everything about those fields instead. And then the next day when you need to cross-check some userid’s or some device names, you can easily do that too. Hopefully another side benefit of this is that our users will get more familiar with all the fields themselves, and thus will get more comfortable using those fields in reports.

Send in your feedback please! It could be one sentence or it could be a 68 page pdf, either way it will be read and digested and quite possibly incorporated into a future release.