We occasionally get asked questions involving data retention policies or involving disk consumption.

Overview

Our data retention policies are actually just Splunk data retention policies.  On individual indexes (like the one where our data is stored) you can set either size or time limits on how much data to keep.

Cisco CDR data is pretty small as Splunk data goes.  Most of our customers accumulate ten to perhaps the low hundreds of megabytes per day.  “Megabytes” was not a typo there!  With that little data, keeping several years of history isn’t a hard task.

Disk Space Consumption

The topic of disk consumption worries a lot of people, but for CDR records this is usually not much of an issue.

Around 1,000,000 calls in my test system ends up a bit more than 1 GB.  This can be affected by if your data is full of long FQDNs or shorter names, how many legs usually comprise each call (adding up call legs would probable be more accurate, but that seems like too much effort), how you use calling party normalization and things like that, but from my information a reasonable upper bound with ample space for cushion is about 2GB/million calls.  We’ll use that below.

Therefore however long it takes you to get to 1 million calls is about how long it’ll take you to get 2 GB of data in your index.  If you do 1000 calls per day, that’s maybe 3 years to get to 2 GB.  If you do a million calls per day, you’ll be at 2 GB/day or less.  See below on how to check!

Our recommendation is to, if you are an admin, check out the following screens.

  • Click Settings, then click on the Monitoring console in the left of that menu.
  • Click Indexing, then Indexes and Volumes, then Index Detail: Instance.
  • Set your Group and Instance correctly, then set the Index to cisco_cdr

That page is a whole topic on its own!  Hopefully you’ll find what you need, there.

If you aren’t an admin you can’t see this information.  And in that case, well, this really isn’t your problem, is it?  Make sure someone’s keeping their eye on it and move on.

One last tip on checking disk space – if you know you have 6 months of data sitting in your index, and your index and all its related files (the folder cisco_cdr under $SPLUNKHOME/var/lib/splunk/) are 500 MB, a little simple multiplication is all it takes to arrive at 1 GB/year disk space requirements (plus some cushion, maybe call it 2GB/year).  You know this number is right because you aren’t guessing,  you just measured it.  🙂

Data Retention

There are several aspects to this answer, depending on your actual needs.

Terminology and how Splunk handles data

Splunk stores data in what it calls “buckets”.  You can think of them as files.  These buckets “roll” (as in “roll over to…”) through a sequence of logical states – Hot, Warm, Cold and Frozen.

Hot buckets are those that Splunk’s actively writing to.  When Splunk restarts, or when these buckets reach a certain size or after a certain amount of time passes, these buckets will be rolled to Warm.  The settings controlling the amount of time or size to trigger a Hot to Warm roll have some knobs I’ll talk about later.

Warm and Cold are buckets that are searchable but not being written to.  In larger environments Warm is that data they search often and is sometimes kept on faster storage while Cold can often be on slower disks because it’s older and searched less often.   In smaller environments (and a lot of medium sized environments, and sometimes even in big ones) there’s no real distinction between warm and cold.

There are no Frozen buckets by default, because by default rolling Cold to Frozen deletes the data in them.  Luckily, if you want to keep the data in a non searchable “archive” format that’s easy to “make searchable” again, setting a directory to copy the files into with coldToFrozenDir will do the trick.  (There’s options – see the docs – on using custom scripts to handle this too, but that’s way beyond the scope of this simple docs page).

The settings we’ll talk about

There are adjustments to the length of time buckets stay in each stage, but most of those settings are for large environments with specific needs.  For the matter of retention, we only care about a couple of settings.

  • Hot-to-Warm settings (determines bucket “sizes” and “timespan”)
    • maxHotSpanSecs is the maximum number of seconds that can be contained in a Hot bucket before it’s rolled to Warm.  This is one of two common settings that determine “bucket size” and the one we might change.
      • NOTE special historical bug – don’t set this to exactly 86400!  You can use 86399, or 86401 (or anything else like that), but not 86400 exactly.  The story is too long to go into here…
    • maxDataSize is the second setting determining “bucket size”, and unless you do a million calls per day or more, probably one you won’t want to mess with.  Leaving this at “auto” or “auto_high” is best.
  • Cold-to-Frozen settings (determines retention, given the constraints of bucket sizes above)
    • frozenTimePeriodInSecs is the numbers of seconds which controls when a bucket rolls from Cold to Frozen.  Remember that Frozen data is by default deleted.
    • maxTotalDataSizeMB is the size in MB that controls when a bucket rolls from Cold to Frozen.  Frozen data is by default deleted.
    • coldToFrozenDir is the simplest setting which makes the Cold to Frozen transition save data instead of delete it.  It becomes nonsearchable when it’s frozen, but it won’t be deleted and it’s easy to restore.

All of these are more fully explained in the Splunk Index Storage docs.

So, let’s explore two and a half common scenarios!

I’m required to keep the data searchable for at least X days/months/years

You need two pieces of information:

  • How much data is this?  (To make sure your maximum index size will be big enough).
  • How long do I want to keep it?  (To make sure you set your frozenTimePeriodInSecs to a high enough value).

How much?

Use the topic “Disk Space” above, add a bit of cushion to it and perhaps check that your estimate stays true every now and then.  Let’s say we calculate that 100 GB will be required.

How long?

Calculate the number of seconds in the period involved.  There’s 86400 seconds in a day (well, close enough).  So if you want to keep the data for at least 1 year, 86400*365 is 31,536,000.

Now that we have all our information…

  • On the indexes’ stanza in the local/indexes.conf (or in the UI) make sure frozenTimePeriodInSecs is at least the number of seconds you worked out above.
  • Check that maxTotalDataSizeMB is at least the size required (100000).  Maybe cushion it out to 150000 because what’s 50 GB between friends?
    • Double check your math, and that you have the *right number of zeros* in your answers!

When those settings take effect (Splunk restart, or other methods), your data should remain in your index and searchable until it hits either limit.  Since we’ve just set those high enough to ensure they’re longer-ago and bigger than what we need, you should now be fine.

Review this every now and then (perhaps quarterly).  Make sure retention is where it should be, or is moving older and older if you haven’t gotten that far yet.  Make sure you aren’t running out of disk space, either!

As a safety factor, you could also set the coldToFrozenDir to make Splunk archive off buckets it would have deleted because they expired.  That way even if it the data does expire too early it’s easily recovered.  The Splunk Index Storage docs talk about this, and they have a link to take you to their page on Archiving Data.

I’m required to *delete* all data after X days/months/years.

This use case covers where you actually want to delete the data after X amount of time and NOT retain information longer than this period.

You need two pieces of information:

  • How much data is this?  (You need to just make sure your maximum index size will be big enough).
  • How long do I want to keep it?  (For obvious reasons).

How much?

Use the topic “Disk Space” above, add a bit of cushion to it and perhaps check that your estimate stays true every now and then.  Let’s say we calculate that 50 GB will be required.

How long?

Let’s supposed retention needs to be no more than 3 months.  (Note, “3 months” is not the same as “90 days”, and in fact “3 months” may vary in length by a couple of days depending on which months it covers.  For this purpose, I’m using 92 days.)  My math says it’s 7,948,800 seconds.

Now that we have all our information…

  • On the indexes stanza in the local/indexes.conf (or in the UI) set frozenTimePeriodInSecs = 7948800
  • Also in there set maxHotSpanSecs = 86399 (remember our 86400 bug!).  This causes buckets to be no more than 1 day in size so this is how you get 1-day granularity on retention.
  • Check that maxTotalDataSizeMB is at least 50000, maybe cushion it out to 100000 because what’s 50 GB between friends?  It just has to be “big enough.”
  • Leave coldToFrozenDir alone so that you don’t archive the cold buckets but instead do the default “delete” of them.

When those settings take effect (Splunk restart, or other methods), any data in those indexes older than 92 days should be deleted within a few minutes.  From then on, as buckets hit the age limit (remember, *all* events in that bucket are more than 92 days old), it’ll get deleted.  So actual on-disk retention will vary around 92 and 93 days, just depending exactly on when you check.

The half-example – I have no real requirements here, just somewhat limited disk space.

If you have have no real requirements here…

Well, if you left everything at their defaults, a stock Splunk instance will use 500 GB and 6 years of retention if you don’t change any settings when you create an index.

500 GB over 6 years is about 225 MB/day. Using our “rule of thumb” that a million calls is 1-2 GB, that’s 150,000 or so calls per day.

So by default, Splunk should be able to handle up to about 150,000 calls per day in the default sized 500 GB index with the default 6 year retention, and at that call volume should just about hit both limits near the same time.

At 15,000 calls per day you’d need only 50 GB to store 6 years.

At 1,500 calls per day you’d only need 5 GB to store 6 years.

My recommendation is, especially for folks with a lower call volume measuring under perhaps 10,000 calls per day, just leave things at their defaults and let data roll off at 6 years – it shouldn’t require more than about 50 GB of disk space.

But if you really want to limit things

I can say that it’s a rare customer who ever needs to go back in time farther than two or three years, so if you really wanted to save a little space, calculate how much disk space you need for either 2 years or 3 years of retention and set your timeframe and disk limits to that.

For instance, if you wanted to limit it to no more than 25 GB, and only keep 2 years around (this would mean at 15,000 calls per day you’d *probably* trim old calls due to age before trimming them due to size):

  • On the indexes stanza in the local/indexes.conf (or in the UI) set frozenTimePeriodInSecs = 63072000
    • (That’s 86400 seconds per day, times two years)
  • Set maxTotalDataSizeMB = 25000 
    • (I’m not worrying about any possible GB to MB, 1000 or 1024 conversion rates, feel free to look up the specs on which Splunk uses and do the “real math”, I just don’t think it’s really important here.)

Conclusion

I  hope this foray into retention helps!  If you have more issues or need more help, we recommend the following resources:

Splunk’s page on configuring indexes (sort of a jumping off point)

Splunk’s indexes.conf spec page (all the nitty gritty stuff, always double-check the settings you want to set in here before setting them, in case of warnings!)

And, feel free to jump on Splunk’s community Slack – Ask in #admin about this if you have questions!  (We’re in there, too – @richfez and @madscient – and we even have a couple of #cisco_cdr and #canary channels for questions.  It’s not our *primary* method of support, but we’ll totally be there if you have a question!)





If you have any comments at all about the documentation, please send it in to docs@sideviewapps.com.