We have found an "issue" where all event ingestion stops for all queues after a Splunk restart on a universal forwarder (v6.4.1) if our development indexer (the only server in the splunkdev TCP output queue) is not configured for receiving and we need to determine if this behaviour is known (by design) or if it is a fault / bug - we suspect the latter.
As long as our universal forwarder remains running, we can enable & disable the receiving port on the splunkdev indexer without too much problem (other than not getting the data into development). Our other TCP output to production indexers continues along without impact.
The problem comes if we restart the universal forwarder WHILE the splunkdev receiving port is not configured.
We have a universal forwarder configured with three TCP output queues (three seperate groups of indexers)
We specifically configure some inputs to route to alternate TCP output queues (for example, a development indexer)
All other inputs use the default output queue (defaultGroup = primary_indexers)
If we perform a full stop / start (splunk stop ......... waiting ....... splunk start) then it appears to be ok - processing kicks off normally & logs are read / forwarded off to our production environment.
If we perform a restart (splunk restart) - it appears that the forwarder continually tries to connect to the splunkdev indexer, and will not process any other queues - stopping all processing, all log reading & forwarding to our production indexers (different TCP output queue).
We see a problem where the TailReader cannot send data to the parsingQueue on the forwarder - after the splunk restart of the forwarder - we found these logs in splunkd.log
10-26-2016 04:38:09.030 +0000 INFO TailReader - Could not send data to output queue (parsingQueue), retrying...
We also see that forwarding to the splunkdev is blocked for xxxx seconds (expected)
10-26-2016 04:39:44.036 +0000 WARN TcpOutputProc - Forwarding to indexer group splunkdev blocked for 100 seconds.
The Universal Forwarder did not send any events after this until we turned up the receiving port on our splunkdev host (tcp9997)
After we configured the receiving port on the development indexer - we see the connection made, the TailReader continues, and then the forwarder goes ahead & watches all the files, reads all the data & forwards it all as configured.
10-26-2016 22:21:14.403 +0000 INFO StatusMgr - destHost=splunkdev, destIp=1.2.3.4, destPort=9997, eventType=connect_try, publisher=tcpout, sourcePort=8089, statusee=TcpOutputProcessor
10-26-2016 22:21:14.404 +0000 INFO StatusMgr - destHost=splunkdev, destIp=1.2.3.4, destPort=9997, eventType=connect_fail, publisher=tcpout, sourcePort=8089, statusee=TcpOutputProcessor
10-26-2016 22:21:25.260 +0000 WARN TcpOutputProc - Forwarding to indexer group splunkdev blocked for 63801 seconds.
10-26-2016 22:21:44.403 +0000 INFO StatusMgr - destHost=splunkdev, destIp=1.2.3.4, destPort=9997, eventType=connect_try, publisher=tcpout, sourcePort=8089, statusee=TcpOutputProcessor
10-26-2016 22:21:44.692 +0000 INFO StatusMgr - destHost=splunkdev, destIp=1.2.3.4, destPort=9997, eventType=connect_done, publisher=tcpout, sourcePort=8089, statusee=TcpOutputProcessor
10-26-2016 22:21:45.269 +0000 INFO TailReader - ...continuing.
tcpout & inputs configs below:
root@unifwd:~# /opt/splunkforwarder/bin/splunk cmd btool outputs list tcpout
[tcpout]
ackTimeoutOnShutdown = 30
autoLBFrequency = 30
blockOnCloning = true
blockWarnThreshold = 100
compressed = false
connectionTimeout = 20
defaultGroup = primary_indexers
disabled = false
dropClonedEventsOnQueueFull = 5
dropEventsOnQueueFull = -1
forceTimebasedAutoLB = false
forwardedindex.0.whitelist = .*
forwardedindex.1.blacklist = _.*
forwardedindex.2.whitelist = (_audit|_introspection|_internal)
forwardedindex.filter.disable = false
heartbeatFrequency = 30
indexAndForward = false
maxConnectionsPerIndexer = 2
maxFailuresPerInterval = 2
maxQueueSize = 7MB
readTimeout = 300
secsInFailureInterval = 1
sendCookedData = true
sslQuietShutdown = false
tcpSendBufSz = 0
useACK = true
writeTimeout = 300
[tcpout:primary_indexers]
server = index01:9997,index02:9997,index03:9997,index04:9997
[tcpout:splunkdev]
server = splunkdev:9997
[tcpout:splunkuat]
server = indexuat01:9997,indexuat02:9997
root@unifwd:~# /opt/splunkforwarder/bin/splunk cmd btool inputs list monitor:///var/log/hosts/splunk/logsource1
[monitor:///var/log/hosts/splunk/logsource1]
_rcvbuf = 1572864
blacklist = \.(bz2)$
disabled = false
host = unifwd
host_segment = 6
index = index1
sourcetype = sourcetype1
root@unifwd:~# /opt/splunkforwarder/bin/splunk cmd btool inputs list monitor:///var/log/hosts/splunk/logsource2
[monitor:///var/log/hosts/splunk/logsource2]
_TCP_ROUTING = splunkdev
_rcvbuf = 1572864
blacklist = \.(bz2)$
disabled = false
host = unifwd
host_segment = 6
index = index2
sourcetype = sourcetype2
root@unifwd:~#
↧