Our installation and setup docs specify that you create a “sinkhole” input to index the CDR and CMR data, and a side effect of this is that the raw files on disk will be deleted as the records are ingested (aka indexed) into Splunk. Sometimes this causes a hiccup though, when customers or trial prospects don’t want the raw data to be deleted from disk.
Why might you want to keep the files around instead of deleting them?
Some possible reasons to not sinkhole the files may include:
- You might want to keep the raw data around as an emergency backup or crosscheck.
- There is a preexisting External Billing Server config in CUCM, and deleting the files would break some other integration.
When it’s reason #1, it may be enough to know that historical CDR and CMR can be exported out of CUCM quite far back, and they come out of UCM in such a way that the historical files are fairly easy to index (see “Loading Historical Data”). To see how far back you can go, just try exporting — it’ll only let you export data you have.
When it’s reason #2, one easy workaround is to not reuse that other external billing server config. Instead, create a second separate External Billing Server entry. This is after all what they are for and why UCM allows you to have several of them. It also offers a great advantage that if an administrator of that other integration makes a change and forgets you were also using the same files, your data won’t stop coming in.
What are the risks of keeping the files?
Splunk’s monitor inputs are great, but they have a feature where they constantly check all the matching files for appended changes. In the case of CDR and CMR, this serves no purpose – once UCM FTP’s a file over, it will never FTP the same file ever again, nor will it ever append data into an existing file. However, there’s no way to configure a Splunk monitor input to not always be checking those files.
With only 50 or even 500 files, Splunk and the operating system can work together to check all the files without using many resources. However, when there are 50,000 files or more, this task starts to become extremely onerous. It’s a long story about OS file descriptors and I/O contention, but the long story short is that your Splunk server will start to run extremely slow. Searches will take seemingly forever to run, indexing will fall behind by hours and then days or even weeks. Eventually, when you have about a million files accumulated you might have difficulty even accessing the server at all. What is happening is that all the server’s considerable resources are being thrown at constantly and pointlessly opening and seeking through tens of thousands of files. With one CDR file per CallManager node per minute by default, the number of files in the destination directories can rise to 50,000 surprisingly quickly.
Now that I understand the risks, how can I set this up?
If you want to go ahead anyway, you can technically create a monitor input instead of a sinkhole input. There are some critical extra steps that need to be taken when you do this. Here’s how, along with some recommendations on dealing with the resulting files.
Please follow the appropriate section below for a new install or for conversion of an existing installation either direction.