Sinkhole vs Monitor Inputs

Our installation and setup docs specify that you create a “sinkhole” input to index the CDR and CMR data, and a side effect of this is that the raw files on disk will be deleted as the records are ingested (aka indexed) into Splunk. Sometimes this causes a hiccup though, when customers or trial prospects don’t want the raw data to be deleted from disk.

Why might you want to keep the files around instead of deleting them?

Some possible reasons to not sinkhole the files may include:

You might want to keep the raw data around as an emergency backup or crosscheck.
There is a preexisting External Billing Server config in CUCM, and deleting the files would break some other integration.

When it’s reason #1, it may be enough to know that historical CDR and CMR can be exported out of CUCM quite far back, and they come out of UCM in such a way that the historical files are fairly easy to index (see “Loading Historical Data”). To see how far back you can go, just try exporting — it’ll only let you export data you have.

When it’s reason #2, one easy workaround is to not reuse that other external billing server config. Instead, create a second separate External Billing Server entry. This is after all what they are for and why UCM allows you to have several of them. It also offers a great advantage that if an administrator of that other integration makes a change and forgets you were also using the same files, your data won’t stop coming in.

What are the risks of keeping the files?

Splunk’s monitor inputs are great, but they have a feature where they constantly check all the matching files for appended changes. In the case of CDR and CMR, this serves no purpose – once UCM FTP’s a file over, it will never FTP the same file ever again, nor will it ever append data into an existing file. However, there’s no way to configure a Splunk monitor input to not always be checking those files.

With only 50 or even 500 files, Splunk and the operating system can work together to check all the files without using many resources. However, when there are 50,000 files or more, this task starts to become extremely onerous. It’s a long story about OS file descriptors and I/O contention, but the long story short is that your Splunk server will start to run extremely slow. Searches will take seemingly forever to run, indexing will fall behind by hours and then days or even weeks. Eventually, when you have about a million files accumulated you might have difficulty even accessing the server at all. What is happening is that all the server’s considerable resources are being thrown at constantly and pointlessly opening and seeking through tens of thousands of files. With one CDR file per CallManager node per minute by default, the number of files in the destination directories can rise to 50,000 surprisingly quickly.

Now that I understand the risks, how can I set this up?

If you want to go ahead anyway, you can technically create a monitor input instead of a sinkhole input. There are some critical extra steps that need to be taken when you do this. Here’s how, along with some recommendations on dealing with the resulting files.

Please follow the appropriate section below for a new install or for conversion of an existing installation either direction.

Set up a monitor input on a new install

Create the input by adding this config to an inputs.conf file located at “$SPLUNK_HOME/etc/apps/cisco_cdr/local/inputs.conf”. You may need to create the folder“local” and the file itself. Make sure the user Splunk runs under has permissions to this file and folder.
To that file, add the following contents depending on your UF’s Operating System:
1. for Linux or Unix, the contents of inputs.conf will look like these — with the /path/to/files/pointing to the folder where your SFTP server saves the files:
```
[monitor:///path/to/files/cdr_*]
index = cisco_cdr
sourcetype = cucm_cdr
```
```
[monitor:///path/to/files/cmr_*]
index = cisco_cdr
sourcetype = cucm_cmr
```
2. or Windows, the contents of inputs.conf will look like these — with the D:\path\to\files\ pointing to the folder where your SFTP server saves the files:
```
[monitor://D:\path\to\files\cdr_*]
index = cisco_cdr
sourcetype = cucm_cdr
```
```
[monitor://D:\path\to\files\cmr_*]
index = cisco_cdr
sourcetype = cucm_cmr
```
  Important Notes:
  - Be careful with your direction of and count of slashes. Use the examples as a reference.
Save the file.
Restart Splunk on this host. It may be a universal forwarder, an HF, or a standalone indexer, but in any event, it must be restarted to pick up the data input change.
Set up a script to run every day to delete or move away any and all files whose modified dates are more than 3 days from the current time. The key goal is to make sure the files thus moved no longer match the path of the data inputs.
1. On Windows, an example that deletes all files older than 3 days is:
```
forfiles /p "D:\path\to\files" /D -3 /C "cmd /c del @path"
```
2. On Linux, an example is:
```
find /path/to/cdr/files* -mtime +3 -exec rm {} \;
```
  Important Notes:
  - Change the path in either case to match your situation.
  - You may also need a different time period before deleting. If you want to shorten it go right ahead, but please be very careful if you want to lengthen it too much.
  - Don’t forget to actually schedule this, using task scheduler or cron.
  - A lot of other examples can be found on the web.

Convert a sinkhole input into a monitor input

If you find you need to keep the raw files around even though you already have a sinkhole input set up for them, the conversion is easy.

Open the inputs.conf file containing the data input. It will typically be at $SPLUNK_HOME/etc/system/local/inputs.conf or perhaps $SPLUNK_HOME/etc/apps/cisco_cdr/local/inputs.conf on a standalone indexer.
Find the two stanzas for the batch input (one will be to match CDR and one for CMR). They will look like this:
```
[batch://C:\path\to\files\cdr_*]
index = cisco_cdr
sourcetype = cucm_cdr
move_policy = sinkhole
```
Carefully change the word “batch” in those 2 stanzas to “monitor“.
Delete the entire line with “move_policy = sinkhole“. The result would look like this (and remember there are two stanzas that need to be changed – both CDR and CMR):
```
[monitor://C:\path\to\files\cdr_*]
index = cisco_cdr
sourcetype = cucm_cdr
```
Save the file.
Restart Splunk on this host. It may be a universal forwarder, an HF, or a standalone indexer, but in any event, it must be restarted to pick up the data input change.
Set up a script to run every day to delete or move away any and all files whose modified dates are more than 3 days from the current time. The key goal is to make sure the files thus moved no longer match the path of the data inputs.
1. On Windows, an example that deletes all files older than 3 days is:
```
forfiles /p "D:\path\to\files" /D -3 /C "cmd /c del @path"
```
2. On Linux, an example is:
```
find /path/to/cdr/files* -mtime +3 -exec rm {} \;
```
  Important Notes:
  - Change the path in either case to match your situation.
  - You may also need a different time period before deleting. If you want to shorten it go right ahead, but please be very careful if you want to lengthen it.
  - A lot of other examples can be found on the web.

Safely convert a monitor input to a sinkhole input

There are a variety of reasons you may want to move a monitor input to a sinkhole input. The monitor input could be legacy from long ago. It could be that you need to remove the extra point of failure of having a cron job or scheduled task. Perhaps you just realize that keeping the files around is more hassle than it’s worth since you can export any potentially missing data from CUCM and import it again.

In any of those cases, the crucial issue in doing this is that if you just switch the input config and restart, all of the monitored files remaining on disk will get indexed a second time. Monitor inputs use a Splunk mechanism called the “fishbucket” to make sure they never index any events in any file twice. Unfortunately, sinkhole inputs do NOT pay any attention to the monitor input’s fishbucket, so the new batch input will re-read the already indexed files and you’ll have duplicated data.

We recommend building a new Billing Server entry, pointing to a new SFTP folder on the SFTP server. This is the easiest method to use that ensures you don’t duplicate data.

Create a new SFTP server login or new folder for the SFTP server to put files in and test everything works with the new place.
In UCM, delete or disable the old Billing Server entry, and create a new one pointing to the new location/account using our UCM installation steps as a guide.
Wait for a few minutes to:
1. confirm the old set of files are all “caught up” and read. Feel free to confirm the latest calls in “Investigate Calls” matches when you switched the billing server.
2. confirm there are now cdr and cmr files accumulating in the new location.
Once everything is caught up, use our docs on setting up data collection as a guide to edit the existing $SPLUNK_HOME/etc/apps/cisco_cdr/local/inputs.conf file and:
1. add two new batch inputs for the new file location.
2. delete or comment out the old inputs. (Comment out by putting a # in front of each line).
Save the file.
Restart Splunk on this host. It may be a universal forwarder, or an HF, or a standalone indexer, but in any event, it must be restarted to pick up the data input change.

The files in the new path/location should disappear within a minute or two and “Investigate Calls” should now have current data again. From this point forward you are unlikely to even see files appear in this folder, because the moment the SFTP server puts them there, Splunk will whisk them away to be indexed and delete them from the filesystem.

Once that’s confirmed working, you can delete the old files in the old path. You can also finish the removal of the old configs in the inputs.conf if you only commented them out, though it does no real harm to leave them there commented out, either.

If you have any comments at all about the documentation, please send them to docs@sideviewapps.com.

Administration

Sinkhole vs Monitor Inputs

Why might you want to keep the files around instead of deleting them?

What are the risks of keeping the files?

Now that I understand the risks, how can I set this up?

Set up a monitor input on a new install

Convert a sinkhole input into a monitor input

Safely convert a monitor input to a sinkhole input

Related

Define Quality Thresholds

Troubleshooting the Data Flow

Migrating to a New Splunk Deployment

UF vs HF