Cisco CDR Reporting & Analytics | Administration
Our installation and setup docs specify that you create a “sinkhole” input to index the CDR and CMR data, and a side effect of this is that the raw files on disk will be deleted as the records are ingested (aka indexed) into Splunk. Sometimes this causes a hiccup though, when customers or trial prospects don’t want the raw data to be deleted from disk.
Some possible reasons to not sinkhole the files may include:
When it’s reason #1, it may be enough to know that historical CDR and CMR can be exported out of CUCM quite far back, and they come out of UCM in such a way that the historical files are fairly easy to index (see “Loading Historical Data”). To see how far back you can go, just try exporting — it’ll only let you export data you have.
When it’s reason #2, one easy workaround is to not reuse that other external billing server config. Instead, create a second separate External Billing Server entry. This is after all what they are for and why UCM allows you to have several of them. It also offers a great advantage that if an administrator of that other integration makes a change and forgets you were also using the same files, your data won’t stop coming in.
Splunk’s monitor inputs are great, but they have a feature where they constantly check all the matching files for appended changes. In the case of CDR and CMR, this serves no purpose – once UCM FTP’s a file over, it will never FTP the same file ever again, nor will it ever append data into an existing file. However, there’s no way to configure a Splunk monitor input to not always be checking those files.
With only 50 or even 500 files, Splunk and the operating system can work together to check all the files without using many resources. However, when there are 50,000 files or more, this task starts to become extremely onerous. It’s a long story about OS file descriptors and I/O contention, but the long story short is that your Splunk server will start to run extremely slow. Searches will take seemingly forever to run, indexing will fall behind by hours and then days or even weeks. Eventually, when you have about a million files accumulated you might have difficulty even accessing the server at all. What is happening is that all the server’s considerable resources are being thrown at constantly and pointlessly opening and seeking through tens of thousands of files. With one CDR file per CallManager node per minute by default, the number of files in the destination directories can rise to 50,000 surprisingly quickly.
If you want to go ahead anyway, you can technically create a monitor input instead of a sinkhole input. There are some critical extra steps that need to be taken when you do this. Here’s how, along with some recommendations on dealing with the resulting files.
Please follow the appropriate section below for a new install or for conversion of an existing installation either direction.
[monitor:///path/to/files/cdr_*] index = cisco_cdr sourcetype = cucm_cdr
[monitor:///path/to/files/cmr_*] index = cisco_cdr sourcetype = cucm_cmr
[monitor://D:\path\to\files\cdr_*] index = cisco_cdr sourcetype = cucm_cdr
[monitor://D:\path\to\files\cmr_*] index = cisco_cdr sourcetype = cucm_cmr
Important Notes:
forfiles /p "D:\path\to\files" /D -3 /C "cmd /c del @path"
find /path/to/cdr/files* -mtime +3 -exec rm {} \;
Important Notes:
If you find you need to keep the raw files around even though you already have a sinkhole input set up for them, the conversion is easy.
[batch://C:\path\to\files\cdr_*] index = cisco_cdr sourcetype = cucm_cdr move_policy = sinkhole
[monitor://C:\path\to\files\cdr_*] index = cisco_cdr sourcetype = cucm_cdr
forfiles /p "D:\path\to\files" /D -3 /C "cmd /c del @path"
find /path/to/cdr/files* -mtime +3 -exec rm {} \;
Important Notes:
There are a variety of reasons you may want to move a monitor input to a sinkhole input. The monitor input could be legacy from long ago. It could be that you need to remove the extra point of failure of having a cron job or scheduled task. Perhaps you just realize that keeping the files around is more hassle than it’s worth since you can export any potentially missing data from CUCM and import it again.
In any of those cases, the crucial issue in doing this is that if you just switch the input config and restart, all of the monitored files remaining on disk will get indexed a second time. Monitor inputs use a Splunk mechanism called the “fishbucket” to make sure they never index any events in any file twice. Unfortunately, sinkhole inputs do NOT pay any attention to the monitor input’s fishbucket, so the new batch input will re-read the already indexed files and you’ll have duplicated data.
We recommend building a new Billing Server entry, pointing to a new SFTP folder on the SFTP server. This is the easiest method to use that ensures you don’t duplicate data.
The files in the new path/location should disappear within a minute or two and “Investigate Calls” should now have current data again. From this point forward you are unlikely to even see files appear in this folder, because the moment the SFTP server puts them there, Splunk will whisk them away to be indexed and delete them from the filesystem.
Once that’s confirmed working, you can delete the old files in the old path. You can also finish the removal of the old configs in the inputs.conf if you only commented them out, though it does no real harm to leave them there commented out, either.
If you have any comments at all about the documentation, please send them to docs@sideviewapps.com.