How to build everything backwards

Our minds, for better or worse, are wired a certain way. We hear someone’s problem and we immediately want to solve it, even before they’re done describing it. This doesn’t even sound like a bad thing does it?

A story.

The first version of a Thing is built. It’s purpose is to take a process that sucks and make it suck less. Its creators think about the Thing a LOT, the process it’s fitting into, what the Thing is going to do, how it’s going to be useful to its intended end-users. They work hard. But they don’t think nearly as much about how the Thing’s users are going to tell them whether it worked for them. Or where it kinda didn’t. Or what other stuff those users still have to do manually, that sucks as much. There’s probably an email alias. That’s fine right? Users will send emails there when they hit bugs, so…. they can totally use the same alias to tell us about any bigger problems too surely.
What they think about even less than that, is how the more empowered users on the ground might themselves answer the questions – “Who is using this thing? what parts of it are they using? Who is not using this tool at all? Can we see any evidence of what else they are using instead?”.

The Thing is deployed and soon reality has moved on a bit. The people who use the Thing are always the first to notice, if only because they start bending how they’re using it – maybe abandoning one part, maybe circumventing the Thing entirely sometimes. They probably think about emailing the alias, or maybe they do but it’s not really a “defect” so they don’t. If they know a person on the inside, maybe they’ll ask them about it next time they’re chatting.
at any rate, this doesn’t have an easy answer and remember they have a problem still. They have to solve that, and they can think of a workaround so they move forward. Thus in that workaround is born the “Shadow Process”. As time goes on, the people in the know start using more Shadow Process.

Deprived of its most advanced and powerful users and their feedback, the official process withers further. The group owning the tool despairs! They may not even be aware that most of the workload they think is being done, is actually being borne by the Shadow Process. They seize on the officially reported defects and the feedback, most of which inexplicably has come from the consensus of internal stakeholders (and much of which interestingly concerns adding additional use cases and features) and they announce a grand plan to do a Next Big Version of the Thing.


Change one thing, and the story changes. Don’t release the product first and think about its feedback mechanisms last (never). Instead build the tools and/or processes that the end-users can use to easily send in feedback of all levels – from bugs, to enhancement requests, to full-on “Come To Jesus” moments.

Pretty much as soon as you start doing that, you’ll realize that those hero users on the ground don’t actually have enough data to do this well.
“I don’t know, it works for me. I have some defects reported but I passed those on. I think the other 193 users are probably using it fine.”
So realizing that, you’ll start spending time and energy giving those hero users some kind of little tools and processes of their own. For example so they can see “188 out of 193 agents are using this thing” and ask “Well.. what the hell are the other 5 doing??”. Or so they can see “That page that was supposed to be the workhorse of this whole thing, literally only one person has ever visited it in the last week”.

The entire purpose of these second order tools, is to enable the local heroes to send in and/or summarize these collected facepalms back to the mothership. So because they’ve been enabled to do so, following through and sending the data back in, even if it’s a screenshot, will be a natural next step. And hey that screenshot is more valuable than any three of the feature suggestions from your Product Council.


The solution I think starts with admitting that you suck. That you haven’t been doing this at all. I suck too and I haven’t been doing nearly enough of this. But I’m resolving to fix this in the next releases of all our products (before first giving a way to see if the fix is working).

Leave a comment

Postprocess searches – pitfalls galore

As you develop a custom view you start with one chart or one table. After a while you’ve added and added, and you’re dispatching several searches. Often you’ll see that a lot of searches are pretty similar to each other. You’re getting the same events off disk more than once and you’re making Splunk do extra work. If you get the nagging feeling that there’s a better way, you’re right; it’s called “postProcess” and it’s a part of the core Splunk API.

Post process searches allow you to avoid this inefficiency. They allow you to dispatch only one “base search” get the events off disk only once, but then at request-time, carve up that base set of results in 2 or more different ways, to render different ‘slices’.

The first thing everyone does is very intuitive – they make a “base search” that’s a simple search that returns raw events, and they make postProcess searches that contain transforming commands like stats or timechart. Makes perfect sense, and it’s a TERRIBLE IDEA. DO NOT DO THIS. Read on.

Skipping to the end – “what could go wrong?”

  1. PITFALL #1: base search is a “pure events” search that matches more than 10,000 events.

    Splunk behaves a little different when the ‘search results’ are actually events. In particular, it does not preserve complete information about the events once you pass 10,000 rows. The problem is that you will not get any warning about this and the rows will be silently discarded from your base search in the postProcess dashboard and therefore your postProcessed results will be wrong. Conversely if the base search contains transforming commands like stats, splunk will preserve all the rows in the base search results, to 10,000 rows and beyond.

    you have fallen into this pit when the postprocessed results displayed seem wrong or truncated, or WORSE they don’t seem wrong and you don’t find out they are wrong until much later.

  2. PITFALL #2: base search is a “pure events” search and postprocess uses a field not explicitly named in base search.

    If a field is not mentioned explicitly in your base-search somewhere, splunkd will think it doesn’t need to extract and preserve those values when it runs the job. Then come postprocess-time that field will be absent and you’ll be extremely confused. If you always group your desired fields and rows with the stats command, everything is much more explicit and you sidestep this confusion.

    you have fallen into this pit when you’ve spent hours staring at your config wondering why your postProcess search acts like some field isn’t there.

  3. PITFALL #3: avoid using postProcess searches in cases where the number of rows returned by the ‘base search’ is extremely high. You’re setting yourself up for very bad performance in your dashboard.

    you have fallen into this pit when your slick postprocess-heavy dashboard actually has terrible performance.

    Note that a corollary of this pitfall is that you should avoid using a “pure events” search as your base search because such searches will have a large number of rows. Throw a “stats count sum(foo) by bar baz” on there and summarize the rows down to the ones you’ll actually use.

  4. There are other strong reasons to not use a “pure events” search as the base search, when you’re using postProcess, but they’re extremely technical and have to do with map-reduce and distributed search and all kinds of tweaky things that would take too long to explain. Just don’t do it OK?

If you’ve read this far perhaps you’re hunting for specific examples. Here are two!

Example 1) How to use postprocess when _time is not involved

Below we’re using access data from SplunkWeb to show a table of the bytes transferred by filename, and also the number of requests by HTTP status. In the normal approach we’d have to use two different searches:

  1. index=_internal source=”*web_access.log” | stats sum(bytes) as totalBytes by file | sort – totalBytes
  2. index=_internal source=”*web_access.log” | stats count by status | sort – count

Notice that both searches have to get the same events off disk. This makes it a good candidate for post process.

THE WRONG BUT INTUITIVE WAY

Base search: index=_internal source=”*web_access.log”

  1. PostProcess 1:| stats sum(bytes) as totalBytes by file | sort – totalBytes
  2. PostProcess 2:| stats count by status | sort – count

This is wrong for several reasons, and it wont work anyway. See the “Pitfalls” section above to find out why.

THE RIGHT WAY

Base search: index=_internal source=”*web_access.log” | stats count sum(bytes) as totalBytes by file, status

  1. PostProcess 1:| stats sum(totalBytes) as totalBytes by file | sort – totalBytes
  2. PostProcess 2:| stats sum(count) as count by status | sort – count

Example 2) how to use postProcess when time is involved

When time is involved, you have to use Splunk’s “bin” command to bucket all the values into some reasonable number of time buckets.

Here’s a similar example to the above, except instead of the ‘request count by status’ on the right, we want the right side to show a ‘count over time by status’:

THE WRONG, BUT INTUITIVE WAY

Base search: index=_internal source=”*web_access.log”

  1. PostProcess 1:| stats sum(bytes) as totalBytes by file | sort – totalBytes
  2. PostProcess 2:| timechart span=15min count by status

This is wrong for several reasons, and it wont work anyway. See the “Pitfalls” section above to find out why.

THE RIGHT WAY

Base search: index=_internal source=”*web_access.log” | bin _time span=15min | stats count sum(bytes) as totalBytes by file, _time, status

  1. PostProcess 1:| stats sum(totalBytes) as totalBytes by file | sort – totalBytes
  2. PostProcess 2:| timechart span=15min sum(count) by status

Leave a comment

There’s always a worse way

In the Splunk search language there is almost always a better way, and someone on answers.splunk.com to teach you about it. Less commonly advertised though, is the fact that there is ALWAYS a worse way…

So let’s drive the wrong way down a one way street. Bear with me.

First, a warning. Driving the wrong way down a one way street is not something you should do, and likewise there are some searches here that you should NOT RUN EVER.

Challenge #1 Let’s make 1 empty row with foo=1!

No problem. You’ve probably seen someone do this:

| stats count | fields - count | eval foo="1"

Ooh neat. What if we need 7 empty rows with foo=1?
No problem.

| stats count | fields - count | eval foo=mvrange(0,7) | mvexpand foo | eval foo="1" 

Can we optimize this slightly to make sure it only runs on our search head?
Sure.

| noop | stats count | fields - count | eval foo=mvrange(0,7) | mvexpand foo | eval foo="1" 

or if you prefer

| localop | stats count | fields - count | eval foo=mvrange(0,7) | mvexpand foo | eval foo="1" 

These are getting pretty clunky though. And on 6.4 there’s a much better way!

| makeresults count=7  

GLORIOUS!!!!
So…. we made it better. What if we went the other way and made it…. WORSE.

Worse how?
Well, let’s do some unnecessary things, AND let’s make it break sometimes randomly! And lets force it to talk to every one of the indexers and ask them each to give us one event!!

index=* OR index=_* | head 1 | fields - * | eval foo="1" | table *

ooh. that’s horrible. If there’s nothing in the main index, or if a user only can see a subset of indexes that happen to be empty during a given timerange, it’ll produce no row at all. But it still hits every indexer.

Let’s keep going though. There’s a lot more Horrible down here.

Let’s get all the lookups that we can get from our current app context, and smash them together into one giant result set. then throw it all away and keep only one row.

| rest /services/data/lookup-table-files | fields title | map search="| inputlookup $title$" | head 1 | fields - * | eval foo="1"

oh that’s marvelous. We’re probably generating a ton of errors somewhere since not all lookups can be loaded from the app context we happen to be in. And depending on who we’re running this as, we might get no rows at all. So much fail.

To Be Continued…..

Challenge #2 Get the list of fields.

Well you might find the transpose command first and thus find yourself doing this:

* | table * | transpose | rename column as field | fields field

Which is pretty evil. We’re getting every event off every disk only to throw all the values away. The better way is:

* | fieldsummary | fields field 

neat. Is there a worse way though?
You betcha. This isn’t quite as evil as our starting point of “* | table * | transpose” search but it’s pretty evil.

* | stats dc(*) as * | transpose | rename column as field | fields field

Wildcards in stats/chart and timechart are fantastic. As long as they’re used sparingly which here they are NOT. We’re forcing the pipeline to keep track of every single distinct value of every field. If you have 100 or 200 fields this can get pretty ugly.

More Horrible! OK let’s make it keep track of the actual distinct values themselves and give them back to the search head where…. we just throw them away. Mou ha ha!

* | stats values(*) as * | transpose | rename column as field | fields field

Even More Horrible. let’s make it keep track of ALL values of all fields as huge multivalue fields and send them all back to us so we can throw them away.

* | stats list(*) as * | transpose | rename column as field | fields field

Except wait! That was a trick! list() actually truncates at 100 values, whereas values() just keeps on going…. so that search is slightly less evil than the earlier values(*) one.

OK sorry. Let’s make up for lost ground. Can we make a horrible search that technically uses fieldsummary itself?
Sure!

* | fieldsummary | fields field | map search="search index=* $field$=* | head 1 " | fieldsummary | fields field

And we didn’t even use join or append once!!

If this makes you think of any good “Evil SPL”, please email it to evil_spl@sideviewapps.com.

Leave a comment

New Sideview Utils module – NavBar

The latest release of Sideview Utils (3.3.9) includes a new module called NavBar that can be used to replace Splunk’s AppBar module.

Now why do I need to replace AppBar?.

Well, you very well might not! It has to do with the “displayview” setting in savedsearches.conf. If you’re a Splunk app developer who’s been around for a while you might have seen that setting. The idea of displayview is that a saved search can remember what view it is supposed to load in, and go to that view when run. If the key is empty, then the savedsearch of course loads in the generic Splunk UI.

Unfortunately in Splunk 6.3 the implementation of the main navigation in AppBar was changed and a (presumably unintended) side effect was that the “displayview” setting in savedsearches.conf broke. Thus in 6.3 and beyond, all saved searches and reports just load in the generic search/report view when run from the main nav. {sad trombone}.

but.. ok let’s say I used this key and it worked, how would form elements in my view repopulate correctly?.

With just the advanced XML, they wouldn’t! which is the main reason why displayview was so seldom used. (Prepopulation of form elements was part of the old 4.0 “resurrection” system which was incomplete and unstable, and turned off by Splunk some time ago anyway.)

But if you’re using Sideview Utils, Sideview Utils replaced this stuff with its own systems long ago, around a new key it added to savedsearches.conf (did you know that apps can actually extend the core conf space? Yay Splunk for doing many things properly long ago).

Specifically when the user saves a search or report in the SearchControls module, if
a) they use the Sideview SearchControls module and if
b) the view includes “sideview_utils/save_create_patches” in the customJavascript param of the Sideview Utils module and
c) the view uses the URLLoader module (most views with form elements really should anyway)

then it will file away upstream form element state into a key called request.ui_context. Then when any user runs the report later, it will load back in that view courtesy of displayview, and the form elements there would get set correctly courtesy of request.ui_context

Leave a comment

Field pickers for everyone!

By popular demand, you can now pick the columns and fields in Browse Calls and also in the “raw call legs” table inside Call Detail. So when you’re looking into something around say call quality fields, with just couple clicks you can make Browse Calls show you everything about those fields instead. And then the next day when you need to cross-check some userid’s or some device names, you can easily do that too. Hopefully another side benefit of this is that our users will get more familiar with all the fields themselves, and thus will get more comfortable using those fields in reports.

Send in your feedback please! It could be one sentence or it could be a 68 page pdf, either way it will be read and digested and quite possibly incorporated into a future release.

Leave a comment

some words from the HTML module

Since I’ve been negligent in posting any news, I thought I’d let the HTML module take the wheel for a few minutes. What it has chosen to share are the donut-related comments above a few of its defined methods, written in 2010 when Sideview was just a few months old.

    /**
     * clients call this when it may or may not be time to make the donuts.
     */
    worthRequestingNewRows: function(context) {
    ...

    /**
     * time to make the donuts
     */
    renderHTML: function(context) {
    ...

    /**
     * someone on the outside may want to barge in the moment there are donuts
     */
    onHTMLRendered: function() {
    ...

    /**
     * it may or may not be time to make the donuts
     */
    onJobDone: function() {
    ...

    /**
     * it may or may not be time to make the donuts
     */
    onJobProgress: function() {
    ...

    /** 
     * called each time we get new data from the server.
     * once we have the data, we make the donuts.
     * if you are reading these comments then I probably know you already and 
     * you're one of my favorite splunk developers. If one or both statements 
     * are temporarily untrue email me to rectify both.
     */
    renderResults: function(jsonStr) {
    ...
Leave a comment

Shotgun Reporting app released

Earlier this year I wrote up an app called “Shotgun Reporting” as an entry for the Splunk Apptitude contest. Here’s its page on the contest site. I like to think it had a shot at winning but it didn’t. In any event it was a fun and pretty intense week prototyping and building it.

In the weeks after, the idea of it got a lot of attention from Splunk, and a few people asked me about it. There was no official way to actually get a copy of it though, which was lame. So I’m happy to write that today that’s been fixed. The app is available for download here as a 90 day trial. Check it out. Give it some of your data and see if it shows you something weird that you didn’t already know. Let me know either way cause I’m curious.

I touch on this in the description as you’ll see, but this really is a milestone in a larger body of work to build a kind of SuperApp. What this comes from is my app development methodology itself. On day one I’ll get a large body of sample data from some new technology or product. Then within a single day I’ll try to both understand that data, figure out what some key use cases around that data might be, and also build a first working version of the app. This first version typically has custom list views and detail views for the key entities represented in the data, and a general report builder with built in drilldown to other reports. I’ve done this a lot and each time it’s maddening because many of the steps I go through could easily be automated, to reduce the time from 1 day to… 4 hours? an hour? 5 minutes? Who knows?

Anyway, hopefully this ends up being the first in a series of apps, on the way to the SuperApp, but until then check out Shotgun Reporting!

Leave a comment

New Testimonials page for the Splunk for Cisco CDR app

We just put a couple testimonials for our CallManager solution, Splunk for Cisco CDR.   Check them out if you get a chance.

 

Leave a comment

New screen capture demo of Splunk for Cisco CDR

If you use Cisco CallManager, you will want to check out this 10 minute demo and overview of our reporting solution for CallManager – “Splunk for Cisco CDR”.

 

And afterwards don’t forget that you can set up a trial version on your own live data in about 15 minutes.  Download the trial, check out the setup documentation and read about pricing all on the app’s homepage.

Leave a comment

Sideview Utils 2.7.1 released this morning.

I’m not sure why I haven’t been posting release notes as “news” items on the site. Going forward I think I’m going to post the release notes emails here. Without further ado:

2.7.1 (November 26th, 2013)
> Fixed a bug in the Filters module where certain back-button and
forward-button scenarios would lead to inconsistent UI state.
> Fixed a bug in the Timeline module where it wouldn’t call abandonJob
which made it not filter correctly for “transforming” searches.
> Fixed a bug in the Lookup Updater where if you entered a space or
a comma while editing it would submit the edit. Now the only way to
submit it while typing is to hit Enter.

These are fairly small maintenance fixes but if you use either of these modules or the Lookup Updater then you should upgrade.

Sideview Utils is still alive and well and being downloaded more than ever. There have been a lot of questions about ongoing relevance now that Splunk 6 is out. Here is a short version:

– The Simple XML updates in 6.0 are nice but fairly minimal compared to the set of Simple XML limitations that drive people from Simple XML to Sideview Utils.

– As for Splunk’s new UI framework, if you are a web developer and you like writing and maintaining significant amounts of source code, then you will enjoy diving into the new framework. However over time you may become one of those who return to Sideview Utils due to shorter iteration times and due to the higher number of moving parts in the new framework.

– And if you are not a developer or if you dont like writing and maintaining significant amounts of source code, Sideview Utils is probably still the best path for you.

Also I know there are some folks who haven’t seen the licensing FAQ yet so here’s that link again.
http://sideviewapps.com/apps/sideview-utils/licensing-faq/

To all the Sideview users in the US – happy thanksgiving!  To everyone else – Enjoy the week when american’s dont answer email very well, or seem to be only half-conscious when they do!

 

Leave a comment