Splunk-7.0.0-Data - Getting Data in
Splunk-7.0.0-Data - Getting Data in
0
Generated: 11/22/2017 11:10 am
i
Table of Contents
Get Windows data
Monitor Active Directory..........................................................................104
Monitor Windows event log data.............................................................116
Monitor file system changes....................................................................141
Monitor data through Windows Management Instrumentation (WMI).....147
Monitor Windows Registry data..............................................................159
Monitor Windows performance...............................................................167
Monitor Windows data with PowerShell scripts.......................................187
Monitor Windows host information..........................................................192
Monitor Windows printer information.......................................................198
Monitor Windows network information....................................................203
Configure timestamps.....................................................................................250
How timestamp assignment works..........................................................250
Configure timestamp recognition............................................................253
Configure timestamp assignment for events with multiple timestamps...266
Specify time zones for timestamps.........................................................267
Tune timestamp recognition for better indexing performance.................270
ii
Table of Contents
Configure host values.....................................................................................300
About hosts.............................................................................................300
Set a default host for a Splunk instance..................................................302
Set a default host for a file or directory input...........................................304
Set host values based on event data......................................................310
Change host values after indexing..........................................................312
iii
Introduction
To get data into your Splunk deployment, point it at a data source. Tell it a bit
about the source. That source then becomes a data input. Splunk Enterprise
indexes the data stream and transforms it into a series of events. You can view
and search those events right away. If the results aren't exactly what you want,
you can tweak the indexing process until they are.
If you have Splunk Enterprise, the data can be on the same machine as an
indexer (local data) or on another machine (remote data). If you have Splunk
Cloud, the data resides in your corporate network and you send it to your Splunk
Cloud deployment. You can get remote data into your Splunk deployment using
network feeds or by installing Splunk forwarders on the hosts where the data
originates. For more information on local vs. remote data, see Where is my data?
Splunk offers apps and add-ons, with pre-configured inputs for things like
Windows- or Linux-specific data sources, Cisco security data, Blue Coat data,
and so on. Look on Splunkbase for an app or add-on that fits your needs. Splunk
Enterprise also comes with dozens of recipes for data sources like web server
logs, Java 2 Platform, Enterprise Edition (J2EE) logs, or Windows performance
metrics. You can get to these from the Add data page in Splunk Web. If the
recipes and apps don't cover your needs, then you can use the general input
configuration capabilities to specify your particular data source.
For more information on how to configure data inputs, see Configure your inputs.
Splunk provides tools to configure many kinds of data inputs, including those that
are specific to particular application needs. Splunk also provides the tools to
configure any arbitrary data input types. In general, you can categorize Splunk
1
inputs as follows:
A lot of data comes directly from files and directories. You can use the files and
directories monitor input processor to get data from files and directories.
To monitor files and directories, see Get data from files and directories.
Network events
Splunk Enterprise can index data from any network port, for example, remote
data from syslog-ng or any other application that transmits over the TCP
protocol. It can also index UDP data, but you should use TCP instead whenever
possible for enhanced reliability.
Splunk Enterprise can also receive and index SNMP events, alerts fired off by
remote devices.
To get data from network ports, see Get data from TCP and UDP ports in this
manual.
To get SNMP data, see Send SNMP events to your Splunk deployment in this
manual.
Windows sources
Splunk Cloud and the Windows version of Splunk Enterprise accept a wide range
of Windows-specific inputs. Splunk Web lets you configure the following
Windows-specific input types:
2
To index and search Windows data on a non-Windows instance of Splunk
Enterprise, you must first use a Windows instance to gather the data. See
Considerations for deciding how to monitor remote Windows data.
For a more detailed introduction to using Windows data in Splunk Enterprise, see
Monitoring Windows data in this manual.
Splunk software also supports other kinds of data sources. For example:
Metrics
Get metrics data from from your technology infrastructure, security
systems, and business applications.
Scripted inputs
Get data from APIs and other remote data interfaces and message
queues.
Modular inputs
Define a custom input capability to extend the Splunk Enterprise
framework.
Alternatively, you can download and enable an app, such as the Splunk App for
Microsoft Exchange or Splunk IT Service Intelligence.
After you configure the inputs or enable an app, your Splunk deployment stores
and processes the specified data. You can go to either the Search app or the
main app page and begin exploring the data that you collected.
3
To learn how to configure an input, see Configure your inputs.
To learn how to add data to your Splunk deployment, see How do you
want to add data?.
To learn how to experiment with adding a test index, see Use a test index.
To learn about how to add source types, see "The Set Sourcetype page."
To learn what event processing is and how to configure it, see How
Splunk software handles your data.
To learn how to delete data from your Splunk deployment, see Delete
indexed data and start over.
To learn about how to configure your inputs with a default index, see Point
your inputs at the default index.
You can repeat this task to add other inputs as you familiarize yourself with the
getting data in process.
4
Index custom data
Splunk software can index any time-series data, usually without additional
configuration. If you have logs from a custom application or device, process it
with the default configuration first. If you do not get the results you want, you can
tweak things to make sure the software indexes your events correctly.
See Overview of event processing and How indexing works so that you can
make decisions about how to make Splunk software work with your data.
Consider the following scenarios for collecting data.
Are the events in your data more than one line? See Configure event line
breaking.
Is your data in an unusual character set? See Configure character set
encoding.
Is the Splunk software unable to determine the timestamps correctly? See
How timestamp assignment works.
Local Data
A local resource is a fixed resource that your Splunk Enterprise instance has
direct access to. You are able to access a local resource, and whatever it
contains, without having to attach, connect, or perform any other intermediate
action (such as authentication or mapping a network drive). If your data is on
such a resource, the data is considered local.
5
Data on a hard disk or solid state drive installed in a desktop, laptop, or
server host.
Data on a resource that has been permanently mounted over a
high-bandwidth physical connection that the host can access at boot time.
Data on a RAM disk.
Remote Data
A remote resource is any resource that does not meet the definition of a "local"
resource. Data that exists on such a resource is remote data. Some examples of
remote resources are:
Exceptions
Some cases where resources might be considered remote are actually not
remote. Here are some examples.
For example, if you have a number of Apache Web servers that generate data
that you want to search centrally, you can set up forwarders on the Apache
hosts. The forwarders take the Apache data and send it to your Splunk
6
deployment for indexing, which consolidates, stores, and makes the data
available for searching. Because of their reduced resource footprint, forwarders
have minimal performance impact on the Apache servers.
What forwarders do
Forwarders get data from remote machines. They represent a more robust
solution than raw network feeds, with their capabilities for the following actions:
Forwarders usually do not index the data, but rather forward the data to a Splunk
deployment that does the indexing and searching. A Splunk deployment can
process data that comes from many forwarders. For detailed information on
forwarders, see the Forwarding Data or Universal Forwarder manuals.
7
transform the data in any way as it comes into Splunk.
3. Download Splunk Enterprise or the universal forwarder for the platform
and architecture of the host with the data.
4. Install the forwarder onto the host.
5. Enable forwarding on the host and specify a destination
6. Configure inputs for the data that you want to collect from the host. You
can use Splunk Web if the forwarder is a full Splunk Enterprise instance.
7. Confirm that data from the forwarder arrives at the receiving indexer.
Here are the main ways that you can configure data inputs on a forwarder:
Apps typically target specific data types and handle everything from configuring
the inputs to generating useful views of the data. For example, the Splunk App
for Windows Infrastructure provides data inputs, searches, reports, alerts, and
dashboards for Windows host management. The Splunk App for Unix and Linux
8
offers the same for Unix and Linux environments. There is a wide range of apps
to handle specific types of application data, including the following:
Splunk DB Connect
Splunk Stream
Splunk Add-on for Amazon Web Services
For more information on apps, see What are apps and add-ons? in the Admin
Manual. In particular, Where to get more apps and add-ons tells you how to
download and install apps.
For information on how to create your own apps, see the Developing Views and
Apps for Splunk Web manual.
Apps. Splunk has a variety of apps that offer preconfigured inputs for
various data types. For more information, see Use apps to get data in.
Splunk Web. You can configure most inputs using the Splunk Web data
input pages. You can access the Add Data landing page from Splunk
Home. In addition, when you upload or monitor a file, you can preview and
make adjustments to how the file is to be indexed..
The inputs.conf configuration file. When you specify your inputs with
Splunk Web or the CLI, the details are saved in a configuration file,
inputs.conf. If you have Splunk Enterprise, you can edit that file directly.
Some advanced data input needs might require you to edit it.
9
In addition, if you configure forwarders to send data from outlying machines to a
central indexer, you can specify some inputs at installation time. See Use
forwarders to get data in.
You can add data inputs from Splunk Home or the Settings > Data Inputs menu
The Add Data page has options to get data in. Click an icon to go to a page to
define the data you want to upload, monitor, or forward.
Upload
Monitor
Forward
For more help on how to use the "Add Data" page, see How do you want to add
data?
When you add an input through Splunk Web, Splunk Enterprise adds that input
to a copy of inputs.conf. The app context, that is, the Splunk app you are
currently in when you configure the input, determines where Splunk Enterprise
writes the inputs.conf file.
For example, if you navigated to the Settings page directly from the Search page
and then added an input, Splunk Enterprise adds the input to
$SPLUNK_HOME/etc/apps/search/local/inputs.conf.
When you add inputs, confirm that you are in the app context that you want to be
in. For background on how configuration files work, read About configuration files
in the Admin manual.
If you have Splunk Enterprise, you can use the Splunk CLI to configure many
inputs. From a shell or command prompt, navigate to the $SPLUNK_HOME/bin/
10
directory and use the ./splunk command. For example, the following command
adds /var/log/ as a data input:
Edit inputs.conf
You can edit inputs.conf to configure your inputs. You use a text editor to create
or modify the file, where you can add a stanza for each input. You can add the
stanza to the inputs.conf file in $SPLUNK_HOME/etc/system/local/, or in your
custom application directory (in $SPLUNK_HOME/etc/apps/<app name>/local).
You can configure the data input by adding key/value pairs to its stanza. You can
set multiple settings in an input stanza. If you do not specify a value for a setting,
Splunk Enterprise uses the default setting value. Default values for all
inputs.conf attributes are in $SPLUNK_HOME/etc/system/default/inputs.conf.
If you have not worked with configuration files, see About configuration files.
before starting to add inputs.
[tcp://:9995]
connection_host = dns
sourcetype = log4j
source = tcp:9995
For information on how to configure a specific input, see the topic in this manual
for that input. For example, to learn how to configure file inputs, see Monitor files
and directories with inputs.conf.
The topic for each data input describes the main attributes available for that
input. See the inputs.conf spec file for the complete list of available attributes,
including descriptions of the attributes and several examples.
11
How Splunk Enterprise handles your data
Splunk Enterprise consumes data and indexes it, transforming it into searchable
knowledge in the form of events. The data pipeline shows the main processes
that act on the data during indexing. These processes constitute event
processing. After the data is processed into events, you can associate the
events with knowledge objects to enhance their usefulness.
Incoming data moves through the data pipeline, which is described in How data
moves through Splunk deployments: The data pipeline in the Distributed
Deployment Manual.
12
Event processing
Event processing occurs in two stages, parsing and indexing. All data enters
through the parsing pipeline as large chunks. During parsing, Splunk software
breaks these chunks into events. It then hands off the events to the indexing
pipeline, where final processing occurs.
During both parsing and indexing, Splunk software transforms the data. You can
configure most of these processes to adapt them to your needs.
Extracting a set of default fields for each event, including host, source,
and sourcetype.
Configuring character set encoding.
Identifying line termination using line breaking rules. You can also modify
line termination settings interactively, using the "Set Sourcetype" page in
Splunk Web.
Identifying or creating timestamps. At the same time that it processes
timestamps, Splunk software identifies event boundaries. You can modify
timestamp setings interactively, using the "Set sourcetype" page.
Anonymizing data, based on your configuration. You can mask sensitive
event data (such as credit card or social security numbers) at this stage.
Applying custom metadata to incoming events, based on your
configuration.
Breaking all events into segments that can then be searched. You can
determine the level of segmentation, which affects indexing and searching
speed, search capability, and efficiency of disk compression.
Building the index data structures.
Writing the raw data and index files to disk, where post-indexing
compression occurs.
The distinction between parsing and indexing pipelines matters mainly for
forwarders. Heavy forwarders can parse data locally and then forward the parsed
data on to receiving indexers, where the final indexing occurs. Universal
forwarders offer minimal parsing in specific cases such as handling structured
data files. Additional parsing occurs on the receiving indexer.
13
For information about events and what happens to them during the indexing
process, see Overview of event processing in this manual.
After the data has been transformed into events, you can make the events more
useful by associating them with knowledge objects, such as event types, field
extractions, and reports. For information about managing Splunk knowledge, see
the Knowledge Manager manual, starting with "What is Splunk knowledge?".
14
How to get data into your Splunk
deployment
After you log into your Splunk deployment, the Home page appears.
To add data, click the Add Data button (to the right of the list of apps.) The Add
Data page appears. (If your Splunk deployment is a self-service Splunk Cloud
deployment, choose Settings and click Add Data.)
There are some conditions where the Add Data page does not appear:
This instance is part of a search head cluster. See About search head
clustering in the Distributed Search manual.
This instance is a managed Splunk Cloud instance.
15
There are three options for getting data into your Splunk deployment with Splunk
Web: Upload, Monitor, and Forward.
Upload
The Upload option lets you upload a file or archive of files for indexing. When you
click Upload, Splunk Web goes to a page that starts the upload process. See
Upload data.
Monitor
The Monitor option lets you monitor one or more files, directories, network
streams, scripts, Event Logs (on Windows hosts only), performance metrics, or
any other type of machine data that the Splunk Enterprise instance has access
to. When you click Monitor, Splunk Web loads a page that starts the monitoring
process. See Monitor data.
Forward
The Forward option lets you receive data from forwarders into your Splunk
deployment. When you click on the "Forward" button, Splunk Web takes you to a
page that starts the data collection process from forwarders. See Forward data.
Upload data
The Upload page lets you specify a file to upload directly to your Splunk
Enterprise instance from your computer.
Note: Windows Event Log (.evt) and Windows Event Log XML (.evtx) files that
have been exported from another host do not work with the upload feature. This
is because those files contain information that is specific to the host that
generated them. Other hosts won't be able to process the files in their unaltered
form. See Index exported event log (.evt or .evtx) files for more information about
the constraints for working with these kinds of files.
16
The "Upload" page
Drag the file you want to index from your desktop to the "Drop your data
file here" area on the page.
or
In the upper left of the screen, click Select File and select the file that you
want to index.
Splunk software then loads the file and processes it, depending on what type of
file it is. After it has completed loading, you can then click the green Next button
on the upper right to proceed to the next in the "Add data" process.
Next steps
Set sourcetype
Monitor data
You can use the "Monitor data" page to monitor files and network ports on the
host that runs the Splunk Enterprise instance.
17
The Monitor page
When you access the "Monitor" page, choose the type of data that you want
Splunk Enterprise to monitor. Default inputs are listed first, followed by forwarded
inputs, and then any modular inputs that are installed on the instance.
The "Monitor" page shows only the types of data sources that you can monitor,
which depends on the type of Splunk deployment you have (Splunk Enterprise or
Splunk Cloud) as well as the platform that the Splunk Enterprise instance runs
on. See Types of data sources for more information.
Some data sources are available only on certain operating systems. For
example, Windows data sources only available on hosts that run Windows.
If you experience problems with this procedure, the logged-in Splunk user
account might not have permissions to add data or see the data source you want
to add.
1. Select a source from the left pane by clicking it once. The page updates
based on the source you selected. For example, if you select "Files &
Directories", the page updates with a field to enter a file or directory name
and specify how Splunk software should monitor the file or directory.
2. Follow the on-screen prompts to complete the selection of the source
object that you want to monitor.
3. Click Next to proceed to the next step in the Add data process.
Next Steps
Set sourcetype
18
Forward data
The "Forward data" page lets you select forwarders that have connected to the
Splunk Enterprise instance to configure and send data to the instance. Splunk
Web loads this page when you click the Forward button on the Add data page.
If you have multiple machines in your Splunk deployment that perform indexing,
then this page is not useful. See About deployment server and forwarder
management in Updating Splunk Enterprise Instances to learn about the
deployment server and how to use it to manage forwarder configurations to send
to multiple indexers.
If you have a managed Splunk Cloud deployment, then this page is not available.
Instead, you can install a deployment server on-premises to synchronize
forwarder configurations so that you do not have to configure forwarders
manually.
To determine what type of Splunk Cloud deployment you have, follow the
procedures in Types of Splunk Cloud Deployment.
Prerequisites
To use the Forward Data page to configure data inputs, you must configure at
least one forwarder as a deployment client. If you have not configured a
forwarder as a deployment client, the page notifies you that no deployment
clients have been found.
19
The Select Forwarders page
When you select "Forward Data" from the "Add Data" page, the following page
appears.
You can define server classes and add forwarders to those classes. Server
classes are logical groupings of hosts based on things such as architecture or
host name.
This page only displays forwarders that you configured to forward data and act
as deployment clients to this instance. If you have not configured any forwarders,
the page warns you of this.
20
5. Click Next. The "Select Source" page shows source types that are valid
for the forwarders that you selected.
6. Select the data sources that you want the forwarders to send data to this
instance.
7. Click Next to proceed to the Set Sourcetype page.
Next step
The Set Sourcetype page appears after you use the Upload or Monitor pages to
specify a single file as a source of data.
On the Set Sourcetype page, you can make adjustments to how the software
indexes your data. You can adjust and improve the indexing process interactively
so that when Splunk software indexes and stores the incoming data, the event
data ends up in the format that you want.
The Set Sourcetype page helps you apply the correct source type to your
incoming data. The source type is one of the default fields that is assigned to all
incoming data, and determines how Splunk software formats the data during
indexing. By assigning the correct source type to your data, the indexed version
of the data (the event data) will look the way you want it to, with correct
timestamps and event breaks.
Splunk Enterprise comes with many predefined source types, and attempts to
assign the correct source type to your data based on its format. In some cases,
you might need to manually select a different predefined source type to the data.
In other cases, you might need to create a new source type with customized
event processing settings.
21
The page displays how Splunk Enterprise will index the data based on the
application of a predefined source type. You can modify the settings interactively
and save those modifications as a new source type.
See what your data will look like without any changes, using the default
event-processing configuration.
Apply a different source type to see whether it offers results more to your
liking.
Modify settings for timestamps and event breaks to improve the quality of
the indexed data and save the modifications as a new source type.
Create a new source type from scratch.
The page saves any new source types to a props.conf file that you can later
distribute across the indexers in your deployment, so that the source types are
available globally. See "Distribute source type configurations".
For information on source types, see "Why source types matter" in this manual.
When the Set Sourcetype page loads, Splunk Enterprise chooses a source type
based on the data you specified. You can accept that recommendation or change
it.
1. Look at the preview pane on the right side of the page to see how Splunk
Enterprise will index the data. Review event breaks and time stamps.
2. (Optional) View the event summary by clicking on the View event
summary link on the right. Splunk Web displays the event summary in a
new window. See "View event summary".
3. If the data appears the way that you want, then proceed to Step 5.
Otherwise, choose from one of the following options:
22
Choose an existing source type to change the data formatting.
See "Choose an existing source type."
Adjust time stamps, delimiters, and line breaking manually,
then save the changes as a new source type. See Adjust time
stamps and event line breaks."
4. After making the changes, return to Step 1 to preview the data again.
5. After you are satisfied with the results, click "Next" to proceed to the Input
Settings page.
If the data does not appear the way that you want, see whether or not an existing
source type fixes the problem.
Note: If Splunk Enterprise can detect a source type, the source type is displayed
in the "Sourcetype <sourcetype>" button. If a source type cannot be determined,
"Sourcetype: System Defaults" is displayed. Input Settings
23
source type. You might need to scroll to see all source types in a category.
4. Review your data again.
You can see a summary of the events within the data sample by clicking the
"View Event Summary" link on the right side of the page. This summary shows
the following information:
If you choose an existing source type without success, then you can manually
adjust how Splunk Enterprise processes timestamps and event line breaks for
the incoming data.
To manually adjust time stamp and event line breaking parameters, use the
Event Breaks, Timestamp, Delimited Settings, and Advanced drop-down tabs
on the left pane of the "Set Sourcetypes" page. The preview pane updates as
you make changes to the settings.
Note: Some tabs appear only if Splunk Enterprise detects that a file is a certain
type, or if you select a specific source type for a file.
24
The Event breaks tab appears when Splunk Enterprise cannot determine
how to line-break the file, or if you select a source type that does not have
line breaking defined.
The Delimited settings tab appears only when Splunk Enterprise detects
that you want to import a structured data file, or you select a source type
for structured data (such as csv).
For more information about how to adjust time stamps and event breaks, see
Modify event processing.
1. Click the Event breaks tab. The tab displays the Break type buttons,
which control how Splunk software line-breaks the file into events.
Auto: Detect event breaks based on the location of the time stamp.
By Line: Breaks every line into a single event.
Regex?: Uses the specified regular expression to determine line
breaking.
2. Click the Timestamps tab. The tab expands to show options for
extraction. Select from one of the following options.
Auto: Extract timestamps automatically by looking in the file for
timestamp events.
Current time: Apply the current time to all events detected.
Advanced: Specify the time zone, timestamp format (in a specific
format known as strptime()), and any fields that comprise the
timestamp.
3. Click the Delimited settings tab to display delimiting options.
25
regular expression.
After the results look the way you want, save your changes as a new
source type, which you can then apply to the data when it is indexed.
4. Click the Advanced tab to display fields that let you enter attribute/value
pairs that get committed directly to the props.conf configuration file.
Caution: The "Advanced" tab requires advanced knowledge of Splunk
features, and changes made here might negatively affect the indexing of
your data. Consider consulting a member of Splunk Professional Services
for help in configuring these options.
Next step
You can direct some sample network data into a file, which you can then either
upload or add as a file monitoring input. Several external tools can do this. On
*nix, the most popular tool is netcat.
For example, if you listen for network traffic on UDP port 514, you can use
netcat to direct some of that network data into a file.
26
For best results, run the command inside a shell script that has logic to kill netcat
after the file reaches a size of 2MB. By default, Splunk software reads only the
first 2MB of data from a file when you preview it.
After you have created the "sample_network_data" file, you can add it as an
input, preview the data, and assign any new source types to the file.
If all the files in a directory are similar in content, then you can preview a single
file and be confident that the results will be valid for all files in the directory.
However, if you have directories with files of heterogeneous data, preview a set
of files that represents the full range of data in the directory. Preview each type of
file separately, because specifying any wildcard causes Splunk Web to disable
the "Set Sourcetype" page.)
Splunk Web displays the first 2MB of data from a file in the "Set Sourcetypes"
page. In most cases, this amount provides a sufficient sampling of your data. If
you have Splunk Enterprise, you can sample a larger quantity of data by
changing the max_preview_bytes attribute in limits.conf. Alternatively, you can
edit the file to reduce large amounts of similar data, so that the remaining 2MB of
data contains a representation of all the types of data in the original file.
To create the new source type, use the event-breaking and timestamp
parameters, then save the source type.
27
On the left side of the "Set Sourcetypes" page, there are collapsible tabs and
links for the three types of adjustments that you can perform:
Event Breaks. Adjust the way that Splunk Enterprise breaks the data into
events.
Timestamps. Adjust the way Splunk Enterprise determines event
timestamps.
Advanced mode. If you have Splunk Enterprise, edit props.conf.
Event breaks
To modify event break parameters, click Event Breaks. The bar opens to display
the following buttons:
For information on line breaking, See Configure event linebreaking. For a primer
on regular expression syntax and usage, see Regular-Expressions.info. You can
test your regular expression by using it in a search with the rex search command.
The Splunk software also maintains a list of useful third-party tools for writing and
testing regular expressions.
28
Timestamps
Timezone. The time zone that you want to use for the events.
Timestamp format. A string that represents the timestamp format for
Splunk Enterprise to use when searching for timestamps in the data.
Timestamp prefix. A regular expression that represents the characters
that appear before a timestamp.
29
Lookahead. The number of characters that Splunk Enterprise should look
into the event (or, for the regular expression that you specified in
"Timestamp prefix") for the timestamp.
Note: If you specify a timestamp format in the "Timestamp format" field and the
timestamp is not located at the very start of each event, you must also specify a
prefix in the Timestamp prefix field. Otherwise, Splunk Enterprise cannot
process the formatting instructions, and every event will contain a warning about
the inability to use strptime. (It's possible that you still end up with a valid
timestamp, based on how Splunk Enterprise attempts to recover from the
problem.)
Advanced
To modify advanced parameters, click the Advanced tab. The tab shows options
that let you specify source type properties by editing the underlying props.conf
file.
You can add or change source type properties by specifying attribute/value pairs.
See props.conf for details on how to set these properties.
The "Advanced" box shows the current, complete set of properties for the
selected source type:
30
For information on how to set source type properties, see "props.conf" in the
Configuration file reference. See also "How timestamp assignment works" and
"event linebreaking."
The settings changes you make in Advanced mode take precedence. For
example, if you alter a timestamp setting using the Timestamps tab and also
make a conflicting timestamp change in Advanced mode, the Advanced mode
change takes precedence over the modification that you made in the
"Timestamps" tab.
Starting with highest precedence, here is how Splunk Enterprise combines any
adjustments with the underlying default settings:
Also, if you return to the Event Breaks or Timestamps tabs after making changes
in Advanced mode, the changes will not be visible from those tabs.
When you are ready to view the effect of your changes, select Apply settings.
Splunk Web refreshes the screen, so you can review the effect of your changes
on the data.
To make further changes using any of the three adjustment methods available
again, select Apply changes to view the effect of the changes on your data.
31
Save modifications as a new source type
1. Click "Save As" next to the "Sourcetype" button. Splunk Web displays a
dialog box where you can name your new source type, choose the
category in which it should be shown in the "Sourcetype" button dialog,
and the application context it should use.
Next steps
You have several options after you save the source type:
1. (Optional) Click Next to apply the source type to your data and proceed to
the Input settings page.
2. (Optional) Click "<" to go back and choose a new file to upload or monitor.
3. (Optional) Click Add data to return to the beginning of the Add Data
wizard.
32
You can specify additional parameters for your data input, such as its source
type, its application context, its host value, and the index where data from the
input should be stored.
You can specify the source type to be applied to your data with the "Source type"
setting. This setting appears when:
If your data source does not meet these criteria, then the "Source type" setting
does not appear.
33
Choose an existing source type
1. From the "Select Source Type" drop-down, choose the category that best
represents the source type you want.
2. Choose the source type from the pop-up list that appears.
1. In the "Source Type" text field, enter the name of the new source type.
2. Choose a category for the source type in the "Source Type Category" drop
down.
3. In the "Source Type Description" field, enter the description for the source
type.
The Application Context setting determines the context in which the input
should collect data. Application contexts improve manageability of input and
source type definitions. App contexts are loaded based on precedence rules. See
Configuration file precedence in the Admin manual.
Select the application context you want this input to operate within by
clicking the drop-down list and selecting the application context you want.
Splunk Enterprise tags events with a host. You can configure how the software
determines the host value.
IP: Uses the IP address of the host from which the event originates.
DNS: Use Domain Name Services (DNS). Events are tagged with the host
name that Splunk software determines using DNS name resolution.
Custom: Uses the host value you assign in the "Host field value" text box
that appears when you select this option.
Index
The "Index" setting determines the index where the events for this input should
be stored.
1. To use the default index, leave the drop-down list set to "Default".
Otherwise, click the drop-down list and select the index you want the data
34
to go to by clicking the selection in the list.
2. (Optional) If the index you want to send the data to is not in the list, and
you have permissions to create indexes, you can create a new index by
clicking the Create a new index button.
After you make your selections, click Next to proceed to the final step of the "Add
Data" process.
You can use either the "Set source type" or source type management pages in
Splunk Web to create new source types, which you can then assign to inputs
from specific files or directories, or for network inputs. Either of these pages
saves a new source type to a props.conf configuration file on the local Splunk
Enterprise instance. You can then distribute this file to other Splunk Enterprise
instances so that they recognize the new source type.
You can use a new source type in a distributed environment where you have
forwarders consuming data and then sending the data to indexers.
1. Distribute the props.conf file that contains the source type definition to the
$SPLUNK_HOME/etc/system/local directory on indexers that you want to
index data with the source type you created.
2. Use the new source type when you define an input on forwarders that
send data to those indexers.
When a forwarder sends data that has been tagged with the new source type to
an indexer, the indexer can correctly process it into events.
When you create a source type in the "Set Sourcetype" page, the software saves
the source type definition as a stanza in a props.conf file in the app that you
35
selected when you saved the source type. If you later create additional source
types, they are saved to the same props.conf file.
For example, if you selected the "Search and Reporting" app, the file resides in
$SPLUNK_HOME/etc/apps/search/local/props.conf. The only exception is the
"System" app: If you choose that app when saving the source type, the file
resides in $SPLUNK_HOME/etc/system/local..
After you create source types, you can distribute props.conf to another Splunk
Enterprise instance. That instance can then index any incoming data that you tag
with the new source type.
A Splunk best practice is to place the configuration file in its own app directory on
the target Splunk Enterprise instance; for example,
$SPLUNK_HOME/etc/apps/custom_sourcetype/local/.
Note: Splunk software uses the source type definitions in props.conf to parse
incoming data into events. For this reason, you can only distribute the file to a
Splunk Enterprise instance that performs parsing (either an indexer or a heavy
forwarder.)
Forwarders (with the exception of the heavy forwarder) do not have Splunk Web.
This means that you must configure their inputs through the CLI or the
inputs.conf configuration file. When you specify an input in that file, you can
also specify its source type. For information on inputs.conf, read the section on
inputs.conf in the Configuration file reference.
1. To tag a forwarder input with a new source type, add the source type to
the input stanza in inputs.conf. For example:
36
[tcp://:9995]
sourcetype = new_network_type
2. Confirm that all of the indexers that the forwarder sends data to have
copies of the props.conf file that contains the source type definition for
new_network_type. When the forwarder sends data to the indexers, they
can identify the new source type and correctly format the data.
37
Get data from files and directories
You can use monitor to add nearly all your data sources from files and
directories. However, you might want to use upload to add one-time inputs, such
as an archive of historical data.
On hosts that run Windows Vista or Windows Server 2008 and later, you can use
MonitorNoHandle to monitor files which the system rotates automatically. The
MonitorNoHandle input works only on Windows hosts.
Splunk Web
The CLI
inputs.conf
You can add inputs to MonitorNoHandle using either the CLI or inputs.conf.
Use the "Set Sourcetype" page to see how the data from a file will be indexed.
See The "Set Sourcetype" page for details.
Specify a path to a file or directory and the monitor processor consumes any new
data written to that file or directory. This is how you can monitor live application
logs such as those coming from Web access logs, Java 2 Platform Enterprise
Edition (J2EE) or .NET applications, and so on.
Splunk Enterprise monitors and indexes the file or directory as new data
appears. You can also specify a mounted or shared directory, including network
file systems, as long as Splunk Enterprise can read from the directory. If the
specified directory contains subdirectories, the monitor process recursively
examines them for new files, as long as the directories can be read.
You can include or exclude files or directories from being read by using
whitelists and blacklists.
38
If you disable or delete a monitor input, Splunk Enterprise does not stop indexing
the files that the input references. It only stops checking those files again. To
stop all in-process data indexing, the Splunk server must be stopped and
restarted.
When the Splunk server is restarted, it continues processing files where it left off.
It first checks for the file or directory specified in a monitor configuration. If the file
or directory is not present on start, Splunk Enterprise checks for it every 24 hours
from the time of the last restart. The monitor process scans subdirectories of
monitored directories continuously.
Monitor inputs may overlap. So long as the stanza names are different, Splunk
Enterprise treats them as independent stanzas and files matching the most
specific stanza will be treated in accordance with its settings.
Archive files (such as a .tar or .zip file, are decompressed before being
indexed. The following types of archive files are supported:
.tar
.gz
.bz2
.tar.gz and .tgz
.tbz and .tbz2
.zip
.z
If you add new data to an existing archive file, the entire file is reindexed, not just
the new data. This can result in event duplication.
How Splunk Enterprise monitors files that the operating system rotates on
a schedule
The monitoring process detects log file rotation and does not process renamed
files that it has already indexed (with the exception of .tar and .gz archives. See
How Splunk Enterprise handles log file rotation.
39
How Splunk Enterprise monitors nonwritable Windows files
Windows can prevent Splunk Enterprise from reading open files. If you need to
read files while they are being written to, you can use the monitorNoHandle input.
Splunk Enterprise cannot monitor a file whose path exceeds 1024 characters.
Files with a .splunk filename extension are also not monitored, because files
with that extension contain Splunk metadata. If you need to index files with a
.splunk extension, use the add oneshot CLI command.
You can also use the CLI add oneshot or spool commands for the same
purpose. See Use the CLI for details.
If you have Splunk Enterprise, you can use the batch input type in inputs.conf
to load files once and destructively. By default, the Splunk batch processor is
located in $SPLUNK_HOME/var/spool/splunk. If you move a file into this directory,
the file is indexed and then deleted.
Note: For best practices on loading file archives, see How to index different sized
archives on the Community Wiki.
This Windows-only input lets you read files on Windows systems as Windows
writes to them. It does this by using a kernel-mode filter driver to capture raw
data as it gets written to the file. Use this input stanza on files which get locked
open for writing. You can use this input stanza on a file which the system locks
open for writing, such as the Windows DNS server log file.
40
You can only monitor single files with MonitorNoHandle. To monitor more
than one file, you must create a MonitorNoHandle input stanza for each
file.
You cannot monitor directories with MonitorNoHandle.
If a file you choose to monitor with MonitorNoHandle already exists, Splunk
Enterprise does not index its current contents, only new information that
comes into the file as processes write to it.
When you monitor a file with MonitorNoHandle, the source field for the file
is MonitorNoHandle, not the name of the file. If you want to have the
source field be the name of the file, you must set it explicitly in
inputs.conf. See Monitor files and directories with inputs.conf.
You add an input from the Add Data page in Splunk Web.
Splunk Home
Splunk Settings
Splunk Settings:
Splunk Home:
Note: Forwarding a file requires additional setup. See the following topics:
41
Conflgure the universal forwarder if you work with universal forwarders.
Enable forwarding on a Splunk Enterprise instance if you work with heavy
and light forwarders.
For more information on how to whitelist and blacklist data, see Whitelist or
blacklist specific incoming data.
When you add a new file input, Splunk Enterprise lets you set the source type of
your data and preview how it will look once it has been indexed. This lets you
ensure that the data has been formatted properly and make any necessary
adjustments.
For information about this page, see The Set Sourcetype page.
If you skip previewing the data, the Input Settings page appears.
You can specify application context, default host value, and index in the Input
42
Settings page. All parameters are optional.
After you specifying all input settings, review your selections. Splunk Web lists
the options you selected, including but not limited to the type of monitor, the
source, the source type, the application context, and the index.
The CLI has built-in help. Access the main CLI help by typing splunk help.
Individual commands have their own help pages as well. Access that help by
typing splunk help <command>.
The following commands are available for input configuration using the CLI:
43
edit monitor
edit [-source] <source> Edit a previously added monitor input for
monitor [-parameter value] <source>.
...
remove remove monitor Remove a previously added monitor input for
monitor [-source] <source> <source>.
list
list monitor List the currently configured monitor inputs.
monitor
Copy the file <source> directly into Splunk.
This uploads the file once, but Splunk
Enterprise does not continue to monitor it.
add oneshot
add <source> You cannot use the oneshot command
oneshot [-parameter value] against a remote Splunk Enterprise instance.
... You also cannot use the command with
either recursive folders or wildcards as a
source. Specify the exact source path of the
file you want to monitor.
Copy the file <source> into Splunk Enterprise
using the sinkhole directory. Similar to add
oneshot, except that the file spools from the
sinkhole directory, rather than being added
immediately.
spool spool <source>
You cannot use the spool command against
a remote Splunk Enterprise instance. You
also cannot use the command with either
recursive folders or wildcards as a source.
Specify the exact source path of the file you
want to monitor.
CLI parameters for input configuration
Note: You can only set one -hostname, -hostregex or -hostsegmentnum per
command.
44
Unlike the other parameters, the syntax for this
parameter can be the value itself. It does not
have to follow a parameter flag. You can use
either of "./splunk monitor <source>" or
"./splunk monitor -source <source>".
Specify a sourcetype field value for events from
sourcetype No
the input source.
Specify the destination index for events from
index No
the input source.
Specify a host name to set as the host field
value for events from the input source.
hostname or host No
These parameters are functionally equivalent.
Specify a regular expression to use to extract
hostregex or the host field value from the source key.
No
host_regex
These parameters are functionally equivalent.
An integer, which determines what "/" separated
segment of the path to set as the host field
hostsegmentnum value. If set to 3, for example, the third segment
No
or host_segment of the path is used.
45
Example 2: Monitor windowsupdate.log
The following example shows how to monitor the Windows Update log file where
Windows logs automatic updates, sending the data to an index called
"newindex".
This example shows how to monitor the default location for Windows IIS logging.
This example shows how to upload a file into Splunk. Splunk Enterprise
consumes the file only once. It does not monitor it continuously.
Unix Windows
./splunk add oneshot .\splunk add oneshot C:\Program
/var/log/applog Files\AppLog\log.txt
You can also upload a file through the sinkhole directory with the <code>spool
command:
Unix Windows
.\splunk spool C:\Program
./splunk spool /var/log/applog
Files\AppLog\log.txt
The result is the same with either command.
46
$SPLUNK_HOME/etc/apps/. (To configure inputs for Splunk Cloud, use Splunk
Web.)
You can set multiple attributes in an input stanza. If you do not specify a value for
an attribute, Splunk Enterprise uses the default for that attribute, as defined in
$SPLUNK_HOME/etc/system/default/inputs.conf.
For more information about configuration files, see About configuration files.
Configuration settings
Use the following attributes in both monitor and batch input stanzas.
47
indexing.
48
Sets the segment of the path
as the host, using <integer> to the default
determine the segment. For "host ="
example, if host_segment = 2, attribute, if the
host_segment = <integer>
host becomes the second value is not an
segment of the path. Path integer, or is
segments are separated by the less than 1
'/' character.
Monitor syntax and examples
Monitor input stanzas direct Splunk Enterprise to watch all files in the <path> (or
<path> itself if it represents a single file). You must specify the input type and
then the path, so put three slashes in the path if the path includes the root
directory.
You can use wildcards for the path. See Specify input paths with wildcards.
[monitor://<path>]
<attrbute1> = <val1>
<attrbute2> = <val2>
...
The following are additional attributes you can use when defining monitor input
stanzas:
49
Forces Splunk Enterprise to
consume files that have matching
CRCs (cyclic redundancy checks).
By default, the software only
performs CRC checks against the
first few lines of a file. This behavior
prevents indexing of the same file
twice, even though you might have
renamed it, such as with rolling log
files. However, because the CRC
counts only the first few lines of the
file, it is possible for legitimately
different files to have matching
CRCs, particularly if they have
crcSalt = <string> identical headers.) N/A
50
Splunk Enterprise does not index
files whose modification time falls
outside <time_window> when it first
attempts to monitor the file.
51
The modification time delta required
before Splunk Enterprise can close
time_before_close a file on End-of-file (EOF). Tells the
3
= <integer> system not to close files that have
been updated in the past <integer>
seconds.
If false, Splunk Enterprise ignores
followSymlink =
true|false
symbolic links that it finds within a true
monitored directory.
Example 1. To load anything in /apache/foo/logs or /apache/bar/logs, etc.
[monitor:///apache/.../logs]
Example 2. To load anything in /apache/ that ends in .log.
[monitor:///apache/*.log]
MonitorNoHandle syntax and examples
You must specify a valid path to a file when you use MonitorNoHandle. You
cannot specify a directory. If you specify a file that already exists, Splunk
Enterprise does not index the existing data in the file. It only indexes new data
that the system writes to the file.
MonitorNoHandle sets the source for files you monitor to MonitorNoHandle. If you
want to specify the file name as the source, you must specify it with the source
setting in the stanza for the MonitorNoHandle input for the file.
You can only configure MonitorNoHandle using inputs.conf or the CLI. you cannot
configure it in Splunk Web.
[MonitorNoHandle://<path>]
source = <path>
<attrbute1> = <val1>
<attrbute2> = <val2>
...
52
Batch syntax and examples
Use batch to set up a one time, destructive input of data from a source. For
continuous, non-destructive inputs, use monitor. Remember, after the batch
input is indexed, Splunk Enterprise deletes the file.
[batch://<path>]
move_policy = sinkhole
<attrbute1> = <val1>
<attrbute2> = <val2>
...
When you define batch inputs, you must include the attribute, move_policy =
sinkhole. This loads the file destructively. Do not use the batch input type for
files that you do not want to delete after indexing.
Example: This example batch loads all files from the directory
system/flight815/, but does not recurse through any subdirectories under it:
[batch://system/flight815/*]
move_policy = sinkhole
Note: To ensure that new events are indexed when you copy over an existing file
with new contents, set the CHECK_METHOD = modtime attribute in props.conf for the
source. This checks the modification time of the file and re-indexes it when it
changes. Be aware that the entire file will be re-indexed, which can result in
duplicate events.
Wildcard overview
A wildcard is a character that you can substitute for one or more unspecified
characters when searching text or selecting multiple files or directories. You can
use wildcards to specify the input path for a file or directory monitor input.
53
Reg. Exp.
Wildcard Description Example(s)
equivalent
/foo/.../bar.log matches
The ellipsis wildcard
the files /foo/1/bar.log,
recurses through
/foo/2/bar.log,
directories and any
/foo/1/2/bar.log, etc., but
number of levels of
does not match
subdirectories to find
/foo/bar.log, or
matches.
/foo/3/notbar.log
... .*
If you specify a folder
Because a single ellipse
separator (for example,
recurses through all folders
//var/log/.../file), it
and subfolders,
does not match the first
/foo/.../bar.log matches
folder level, only
the same as
subfolders.
/foo/.../.../bar.log.
54
text between directory separator characters ("/" or "\") in the stanza definition. If
you specify a monitor stanza that contains segments with both wildcards and
regular expression metacharacters (such as (, ), [, ], and |), those
characters behave differently depending on where the wildcard is in the stanza.
[monitor:///var/log/log(a|b).log]
monitors the /var/log/log(a|b).log file.
The (a|b) is not treated as a regular
expression because no wildcards are present.
[monitor:///var/log()/log*.log]
monitors all files in the /var/log()/ directory that begin with log and have the
extension .log. The () is not treated as a regular expression because it is in the
segment before the wildcard.
[monitor:///var/log()/log(a|b)*.log]
monitors all files in the /var/log()/ directory that begin with either loga or logb
and have the extension .log. The first set of () is not treated as a regular
expression because the wildcard is in the following segment. The second set of
() does get treated as a regular expression because it is in the same segment as
the wildcard '*'.
[monitor:///var/.../log(a|b).log]
monitors all files in any subdirectory of the /var/ directory named loga.log and
logb.log. Splunk Enterprise treats (a|b) as a regular expression because of the
wildcard '...' in the previous stanza segment.
[monitor:///var/.../log[A-Z0-9]*.log]
monitors all files in any subdirectory of the /var/ directory that:
55
contain any other characters, then
end in .log.
Input examples
[monitor:///apache/.../logs/*]
To monitor /apache/foo/logs, /apache/bar/logs, but not /apache/bar/1/logs or
/apache/bar/2/logs:
[monitor:///apache/*/logs]
To monitor any file directly under /apache/ that ends in .log:
[monitor:///apache/*.log]
To monitor any file under /apache/ under any level of subdirectory that ends in
.log:
[monitor:///apache/.../*.log]
The "..." followed by a folder separator will imply that the wildcard level folder will
be excluded.
[monitor:///var/log/.../*.log]
the tailing logic will become '^\/var\/log/.*/[^/]*\.log$'
[monitor:///var/log/]
whitelist=\.log$
recurse=true
#true by default
56
Wildcards and whitelisting
When you specify wildcards in a file input path, Splunk Enterprise creates an
implicit whitelist for that stanza. The longest wildcard-free path becomes the
monitor stanza, and Splunk Enterprise translates the wildcards into regular
expressions.
Splunk Enterprise anchors the converted expression to the right end of the file
path, so that the entire path must be matched.
[monitor:///foo/bar*.log]
Splunk Enterprise translates this into
[monitor:///foo/]
whitelist = bar[^/]*\.log$
On Windows, if you specify
[monitor://C:\Windows\foo\bar*.log]
Splunk Enterprise translates it into
[monitor://C:\Windows\foo\]
whitelist = bar[^/]*\.log$
Note: In Windows, whitelist and blacklist rules do not support regular
expressions that include backslashes. Use two backslashes (\\) to escape
wildcards.
57
your forwarders, you can set up a deployment server.
It is not necessary to define both a whitelist and a blacklist in a stanza. They are
independent settings. If you do define both and a file matches both, Splunk
Enterprise does not index that file as blacklist overrides whitelist.
Whitelist and blacklist rules use regular expression syntax to define the match on
the file name/path. They must be contained within a configuration stanza, for
example [monitor://<path>]. Splunk software whitelists and blacklists outside of
stanzas. When you define whitelist and blacklist entries, you must use exact
regular expression syntax.
Instead of whitelisting or blacklisting your data inputs, you can filter specific
events and send them to different queues or indexes. You can also use the crawl
feature to predefine files you want to index or to exclude from indexing when they
get added to your file system.
[monitor:///mnt/logs]
whitelist = \.log$
You can whitelist multiple files in one line, using the "|" (OR) operator. For
example, to whitelist filenames that contain query.log OR my.log.
whitelist = query\.log$|my\.log$
Or, you can whitelist an exact match.
58
whitelist = /query\.log$|/my\.log$
Note: The "$" anchors the regular expression to the end of the line. There is no
space before or after the "|" operator.
blacklist = <your_custom_regex>
If you create a blacklist line for each file you want to ignore, Splunk Enterprise
activates only the last filter.
To ignore and not monitor only files with the .txt extension:
[monitor:///mnt/logs]
blacklist = \.txt$
Example 2: Blacklist files with either a .txt or .gz extension
To ignore and not monitor all files with either the .txt extension OR the .gz
extension (note that you use the "|" for this):
[monitor:///mnt/logs]
blacklist = \.(?:txt|gz)$
Example 3: Blacklist an entire directory
[monitor:///mnt/logs]
blacklist = archive|historical|\.bak$
This example tells Splunk Enterprise to ignore all files under /mnt/logs/ within the
archive or historical directories and all files ending in *.bak.
To ignore files whose names contain a specific string, you can do:
59
[monitor:///mnt/logs]
blacklist = 2009022[89]file\.txt$
This example ignores the webserver20090228file.txt and
webserver20090229file.txt files under /mnt/logs/.
The monitoring processor picks up new files and reads the first 256 bytes of the
file. The processor then hashes this data into a begin and end cyclic redundancy
check (CRC), which functions as a fingerprint representing the file content.
Splunk Enterprise uses this CRC to look up an entry in a database that contains
all the beginning CRCs of files it has seen before. If successful, the lookup
returns a few values, but the important ones are a seekAddress, meaning the
number of bytes into the known file that Splunk Enterprise has already read, and
a seekCRC which is a fingerprint of the data at that location.
Using the results of this lookup, Splunk Enterprise can categorize the file.
No matching record for the CRC from the file beginning in the database.
This indicates a new file. Splunk Enterprise picks it up and consumes its
data from the start of the file. Splunk Enterprise updates the database with
the new CRCs and Seek Addresses as it consumes the file.
A matching record for the CRC from the file beginning in the database, the
content at the Seek Address location matches the stored CRC for that
location in the file, and the size of the file is larger than the Seek Address
that Splunk Enterprise stored. While Splunk Enterprise has seen the file
before, data has been added since it was last read. Splunk Enterprise
opens the file, seeks to Seek Address--the end of the file when Splunk
Enterprise last finished with it--and starts reading the new from that point.
A matching record for the CRC from the file beginning in the database, but
the content at the Seek Address location does not match the stored CRC
at that location in the file. Splunk Enterprise has read some file with the
same initial data, but either some of the material that it read has been
60
modified in place, or it is in fact a wholly different file which begins with the
same content. Because the database for content tracking is keyed to the
beginning CRC, it has no way to track progress independently for the two
different data streams, and further configuration is required.
Because the CRC start check runs against only the first 256 bytes of the file by
default, it is possible for non-duplicate files to have duplicate start CRCs,
particularly if the files are ones with identical headers. To handle such situations
you can:
Do not use crcSalt = <SOURCE> with rolling log files, or any other scenario in
which logfiles get renamed or moved to another monitored location. Doing so
prevents Splunk Enterprise from recognizing log files across the roll or rename,
which results in the data being reindexed.
61
Get data from network sources
For security, Splunk Cloud accepts connections only from forwarders with the
correct Secure Sockets Layer (SSL) certificates. If you want to send data from a
TCP or UDP source such as syslog, use the Splunk Universal Forwarder to listen
to the source and forward the data to your Splunk Cloud deployment.
TCP is the network protocol that underlies the Splunk Enterprise data distribution
scheme. It is the recommended protocol for sending data from any remote host
to your Splunk Enterprise server. Splunk Enterprise can index remote data from
syslog-ng or any other application that transmits via TCP.
Splunk Enterprise supports monitoring over UDP, but you should use TCP to
send network data instead whenever possible. UDP is not desirable as a
transport because, among other reasons, it does not guarantee delivery of
network packets.
When you monitor TCP network ports, the user Splunk Enterprise runs as must
have access to the port you want to monitor. On many Unix operating systems,
by default, you must run Splunk Enterprise as the root user to listen directly on a
port below 1024.
See Working with UDP connections on the Splunk Community Wiki for
recommendations if you must send network data with UDP.
Before you begin monitoring the output of a network device with the Splunk
Enterprise network monitor, confirm how the device interacts with external
network monitors.
If you configure TCP logging on some network devices, such as a Cisco Adaptive
62
Security Appliance (ASA), and the device cannot connect to the monitor, it might
cause reduced performance or stop logging, or worse. By default, the Cisco ASA
stops accepting incoming network connections when it encounters network
congestion or connectivity problems.
By Splunk Settings:
1. Click Settings.
By Splunk Home:
3. If you selected Forward, choose or create the group of forwarders you want
this input to apply to.
4. Click Next.
2. Click the TCP or UDP button to choose between a TCP or UDP input.
63
3. In the Port field, enter a port number.
4. In the Source name override field, enter a new source name to override the
default source value, if necessary.
Note: Consult Splunk Support before changing the "Source name override"
value.
5. If this is a TCP input, specify whether this port should accept connections from
all hosts or only one host in the Only accept connections from field. If you only
want the input to accept connections from one host, enter the host name or IP
address of the host. You can use wildcards to specify hosts.
The Input Settings page lets you specify source type, application context,
default host value, and index. All of these parameters are optional.
1. Set the Source type. This is a default field that Splunk Enterprise adds to
events and uses to determine processing characteristics, such as timestamps
and event boundaries.
IP. Sets the input processor to rewrite the host with the IP address of the
remote server.
DNS. Sets the host to the DNS entry of the remote server.
Note: Host only sets the host field in the resulting events. It does not
direct Splunk Enterprise to look on a specific host on your network.
3. Set the Index that Splunk Enterprise should send data to for this input. Leave
the value as "default" unless you have defined multiple indexes to handle
different types of events. In addition to indexes for user data, Splunk Enterprise
has a number of utility indexes, which also appear in this dropdown box.
64
4. Click Review.
After specifying all your input settings, review your selections. Splunk Enterprise
lists the options you selected, including the type of monitor, the source, the
source type, the application context, and the index.
2. If they are not what you want, click < to go back to the previous step in the
wizard. Otherwise, click Submit.
Splunk Enterprise then loads the "Success" page and begins indexing the
specified network input.
If you get stuck, the CLI has help. Access the main CLI help by typing splunk
help. Individual commands have their own help pages as well and can be
accessed by typing splunk help <command>.
The following CLI commands are available for network input configuration:
You can modify the configuration of each input by setting any of these additional
parameters:
65
Parameter Required? Description
Specify a sourcetype field value for events from
sourcetype No
the input source.
Specify the destination index for events from the
index No
input source.
Specify a host name to set as the host field value
hostname No
for events from the input source.
Specify an IP address to exclusively accept data
remotehost No
from.
Set to true or false (T | F). Default is False. Set to
resolvehost No true to use DNS to set the host field value for
events from the input source.
Specify a host name or IP address that this input
restrictToHost No
should accept connections from only.
Examples
Configure a UDP input to watch port 514 and set the source type to
"syslog":
Set the UDP input host value via DNS. Use auth with your username and
password:
If you decide to only accept connections from a specific host when you create a
TCP input, once you save that input, you can neither change nor remove that
host later, either from Splunk Web or the CLI.
To change or remove the restricted host of a port, you must first delete the input
that contains the old restricted host. Then, you must add a new input that either
contains the new restricted host or has no restriction.
66
Add a network input using inputs.conf
You can set any number of attributes and values following an input type. If you
do not specify a value for one or more attributes, Splunk Enterprise uses the
defaults that are preset in $SPLUNK_HOME/etc/system/default/ (noted below).
[tcp://<remote server>:<port>]
<attrbute1> = <val1>
<attrbute2> = <val2>
...
Tells Splunk Enterprise to listen to <remote server> on <port>. If <remote
server> is blank, Splunk Enterprise listens to all connections on the specified
port.
67
for searchability and for applying the hard-coded default.
relevant formatting for this type of
data during parsing and indexing.
68
"none" leaves the host as specified.
Configure a TCP input over SSL
[tcp-ssl:<port>]
Use this stanza type if you receive encrypted, unparsed data from a forwarder or
third-party system. Set <port> to the port on which the forwarder or third-party
system is sending unparsed, encrypted data.
[udp://<remote server>:<port>]
<attrbute1> = <val1>
<attrbute2> = <val2>
...
This type of input stanza is similar to the TCP type, except that it listens on a
UDP port.
If you specify <remote server>, the specified port only accepts data from
that host.
If you specify nothing for <remote server> - [udp://<port>] - the port
accepts data sent from any host.
69
Also declares the source type based on various
for this data, as opposed to aspects of the data.
letting Splunk Enterprise There is no
determine it. This is important hard-coded default.
both for searchability and for
applying the relevant
formatting for this type of data
during parsing and indexing.
70
"indexQueue" to send your
data directly into the index.
1,572,864 unless the
value is too large for
an OS. In this case,
Sets the receive buffer for the
Splunk Enterprise
UDP port, in bytes. If the
_rcvbuf = <integer> halves the value from
value is 0 or negative, Splunk
this default
Enterprise ignores the value.
continuously until the
buffer size is at an
acceptable level.
Sets how Splunk Enterprise
handles receiving syslog
data.
71
timestamp and host to
received events.
UDP packets and line merging
Splunk Enterprise does not index each UDP packet as an independent event.
Instead, it performs event merging on the data stream and merges events
together if they don't have a clear timestamp.
You can avoid this problem by editing the underlying source type in props.conf
and setting the SHOULD_LINEMERGE attribute to false. This keeps Splunk
Enterprise from merging packets together.
Answers
Have questions? Visit Splunk Answers and see what and answers the Splunk
community has about questions UDP inputs, TCP inputs, and inputs in general,
After you enable HEC, as a developer, you can use HEC tokens in your app to
send data to HEC. You do not need to include Splunk credentials in your app or
supported files.
For information about HEC on Splunk Enterprise, see Getting data into HTTP
Event Collector on the Splunk Dev Portal.
HTTP Event Collector runs on Splunk Enterprise, and self-service and managed
Splunk Cloud. How it works depends on the type of Splunk instance you have.
72
HEC and Splunk Enterprise
HEC offers full configurability and functionality on the Splunk Enterprise platform
on-premises. It offers the following additional benefits on Splunk Enterprise over
the other Splunk types:
HEC can accept events that you send to it over the HTTP protocol in
addition to the HTTPS protocol.
HEC can forward events to another Splunk indexer with an optional
forwarding output group.
You can use the deployment server to distribute HEC tokens across
indexers in a distributed or clustered indexer deployment.
For instructions on how to enable and manage HEC on Splunk Enterprise, see
Configure HTTP Event Collector on Splunk Enterprise.
73
You cannot forward data that HEC receives to another set of Splunk
indexers as Splunk Cloud does not support forwarding output groups.
The index that you choose to store events that HEC receives must already
exist. You cannot create a new index during the setup process.
After you create tokens, you can monitor progress of the token as it is
deployed across your managed Splunk Cloud instance.
For instructions on how to enable and manage HEC on managed Splunk Cloud,
see Configure HTTP Event Collector on managed Splunk Cloud.
Tokens are entities that let logging agents and HTTP clients connect to the HEC
input. Each token has a value, which is a 32-bit number that agents and clients
use to authenticate their connections to HEC. When the clients connect, they
present this token value. If HEC receives a valid token, it accepts the connection
and the client can deliver its payload of application events in either text or
JavaScript Object Notation (JSON) format.
HEC receives the events and Splunk Enterprise indexes them based on the
configuration of the token. HEC uses the source, source type, and index that was
specified in the token. If a forwarding output group configuration exists on a
Splunk Enterprise instance, HEC forwards the data to indexers in that output
group.
Before you can use Event Collector to receive events through HTTP, you must
enable it. For Splunk Enterprise, enable HEC through the Global Settings dialog
box.
74
4. In the All Tokens toggle button, select Enabled.
5. (Optional) Choose a Default Source Type for all HEC tokens. You can
also type in the name of the source type in the text field above the
drop-down before choosing the source type.
6. (Optional) Choose a Default Index for all HEC tokens.
7. (Optional) Choose a Default Output Group for all HEC tokens.
8. (Optional) To use a deployment server to handle configurations for HEC
tokens, click the Use Deployment Server check box.
9. (Optional) To have HEC listen and communicate over HTTPS rather than
HTTP, click the Enable SSL checkbox.
10. (Optional) Enter a number in the HTTP Port Number field for HEC to
listen on. Note: Confirm that no firewall blocks the port number that you
specified in the HTTP Port Number field, either on the clients or the
Splunk instance that hosts HEC.
11. Click Save.
75
9. Click Next.
10. (Optional) Confirm the source type and the index for HEC events.
11. Click Review.
12. Confirm that all settings for the endpoint are what you want.
13. If all settings are what you want, click Submit. Otherwise, click < to make
changes.
14. (Optional) Copy the token value that Splunk Web displays and paste it into
another document for reference later.
For information about HEC tokens, see About Event Collector tokens.
You can make changes to an HEC token after you have created it.
76
to the token name.
5. (Optional) Edit the description of the token by entering updated text in the
Description field.
6. (Optional) Update the source value of the token by entering text in the
Source field.
7. (Optional) Choose a different source type by selecting it in the Source
Type drop-down.
1. Choose a category.
2. Select a source type in the pop-up menu that appears.
3. (Optional) You can also type in the name of the source type in the
text box at the top of the drop-down.
You can delete an HEC token. Deleting an HEC token does not affect other HEC
tokens, nor does it disable HEC.
You cannot undo this action. Clients that use this token to send data to your
Splunk deployment can no longer authenticate with the token. You must
generate a new token and change the client configuration to use the new token.
77
1. Click Settings > Data Inputs.
2. Click HTTP Event Collector.
3. Locate the token that you want to delete in the list.
4. In the Actions column for that token, click Delete.
5. In the Delete Token dialog box, click Delete.
You can enable or disable a single HEC token from within the HEC management
page. Changing the status of one token does not change the status of other
tokens. To enable or disable all tokens, use the Global Settings dialog. See
Enable the HTTP Event Collector.
78
5. (Optional) Choose a Default Source Type for all HEC tokens. You can
also type in the name of the source type in the text field above the
drop-down before choosing the source type.
6. (Optional) Choose a Default Index for all HEC tokens.
7. Click Save.
For information about HEC tokens, see About Event Collector tokens.
79
You can make changes to an HEC token after you have created it.
80
8. (Optional) Choose a different index by selecting it in the Available
Indexes pane of the Select Allowed Indexes control.
9. (Optional) Choose whether you want indexer acknowledgment enabled for
the token.
10. Click Save.
You can delete an HEC token. Deleting an HEC token does not affect other HEC
tokens, nor does it disable the HEC endpoint.
You cannot undo this action. Clients that use this token to send data to your
Splunk deployment can no longer authenticate with the token. You must
generate a new token and change the client configuration to use the token.
You can enable or disable an HEC token from within the HEC management
page. Changing the status of one token does not change the status of other
tokens. To enable or disable all tokens, use the Global Settings dialog. See
Enable the HTTP Event Collector.
81
1. Click Settings > Data Inputs.
2. Click HTTP Event Collector.
3. In the Actions column for that token, click the Enable link, if the token is
not active, or the Disable link, if the token is active. The token status
toggles and the link changes to Enable or Disable based on the changed
token status.
To use HEC, you must configure at least one token. In managed Splunk Cloud
instances, the token is distributed across the deployment. The token is not ready
for use until distribution has completed.
For information about HEC tokens, see About Event Collector tokens.
82
For information on indexer acknowledgement, see Enable indexer
acknowledgement. Indexer acknowledgement in HTTP Event Collector is not the
same indexer acknowledgement capability in Splunk Enterprise.
You can check the distribution status of an HEC token from the HEC token page.
When a distribution is in progress, the page displays "Operation in progress" and
a progress bar. Otherwise the page displays "Last deployment status."
You can make changes to an HEC token after it has been created.
83
Indexes pane of the Select Allowed Indexes control.
9. (Optional) Choose whether you want indexer acknowledgment enabled for
the token.
10. Click Save.
You can delete an HEC token. Deleting an HEC token does not affect other HEC
tokens, nor does it disable the HEC endpoint.
You cannot undo this action. Clients that use this token to send data to your
Splunk deployment can no longer authenticate with the token. You must
generate a new token and change the client configuration to use the new value.
You can enable or disable a token from within the HEC management page.
Changing the active status of one token does not change the status of other
tokens.
You must satisfy all of the following conditions when you send data to HEC:
84
There are several options for sending data to HTTP Event Collector:
You can make an HTTP request using your favorite HTTP client and send
your JSON-encoded events.
As a developer, you can use the Java, JavaScript (node.js), and .NET
logging libraries in your application to send data to HEC. These libraries
are compatible with popular logging frameworks. See Java, JavaScript
(Node.js), and .NET on the Splunk Dev Portal.
You send data to a specific Uniform Resource Indicator (URI) for HEC.
The standard form for the HEC URI in Splunk Enterprise is as follows:
<protocol>://<host>:<port>/<endpoint>
Where:
Depending on the type of Splunk Cloud that you use, you must send data using a
specific URI for HEC.
The standard form for the HEC URI in self-service Splunk Cloud is as follows:
<protocol>://input-<host>:<port>/<endpoint>
The standard form for the HEC URI in managed Splunk Cloud is as follows:
<protocol>://http-inputs-<host>:<port>/<endpoint>
Where:
85
<host> is the Splunk instance that runs HEC
<port> is the HEC port number
8088 on self-service Splunk Cloud instances
443 on managed Splunk Cloud instances
<endpoint> is the HEC endpoint you want to use. In many cases, you use
the /services/collector endpoint for JavaScript Object Notation
(JSON)-formatted events or the services/collector/raw endpoint for raw
events
For self-service Splunk Cloud plans, you must pre-pend the hostname
with input-
For managed Splunk Cloud plans, pre-pend the hostname with
http-inputs-
If you do not include these prefixes before your Splunk Cloud hostname when
you send data, the data cannot reach HEC.
The following example makes a HTTP POST request to the HEC on port 8088
and uses HTTPS for transport. This example uses the curl command to
generate the request, but you can use a command line or other tool that better
suits your needs.
You can configure the network port and HTTP protocol settings independently of
settings for other instances of HEC in your Splunk Enterprise or self-service
Splunk Cloud deployment.
The following cURL command uses an example HTTP Event Collector token
(B5A79AAD-D822-46CC-80D1-819F80D7BFB0), and uses
https://localhost as the hostname. Replace these values with your own
before running this command.
When you make a JSON request to send data to HEC, you must specify the
"event" key in the command.
curl -k https://hec.example.com:8088/services/collector/event -H
"Authorization: Splunk B5A79AAD-D822-46CC-80D1-819F80D7BFB0" -d
'{"event": "hello world"}'
{"text": "Success", "code": 0}
86
More information on HEC for developers
If you have Splunk Cloud, you cannot configure your deployment as a syslog
server or a syslog message sender, but you can configure the Splunk Universal
Forwarder to listen on a UDP network port and forward data to your Splunk
Cloud deployment.
If you must retain raw syslog data (for example, a data retention policy requires
access to untouched events), then consider using a tool such as syslog-ng to
simultaneously save the raw data to a log file and forward events to your Splunk
deployment. This gives you the added advantage of indexing the log file later if
you want.
See the diagrams later in this topic for a description of how Splunk Enterprise
handles syslog events over UDP.
87
If the data contains a syslog header, Splunk Enterprise strips it out unless you
set the no_priority_stripping attribute in the stanza.
Splunk Enterprise does not modify TCP packets in this fashion. If you send
syslog data over TCP, Splunk Enterprise does not strip priority information from
the events. It does, however, prepend a host name and time stamp to the event
unless you tell it not to.
Splunk Enterprise can also forward events to another syslog server. When it
does, it prepends the priority information to the event so that the downstream
syslog server can translate the events properly.
When the event reaches the downstream syslog server, that host prepends a
timestamp, priority, and connected host name, which is the Splunk Enterprise
instance.
You can also prepend a timestamp and host name to the event at the time you
forward the event to the syslog server.
For information on configuring routing, filtering, and usage of source types, see
Route and filter data in the Forwarding Data manual and the props.conf spec file
in the Admin manual.
The following diagram shows how Splunk Enterprise moves two syslog
messages from one syslog server to another. In the diagram, Splunk Enterprise
listens on a UDP network port and indexes incoming events. On the other side,
the same instance forwards events to a second, third-party syslog server.
88
In the diagram, Message A originates as a syslog event and Message B
originates as a similar event that does not have priority information associated
with it. Upon receipt, Splunk Enterprise tags the events with a timestamp and the
host that generated the event.
The initial Messages A and B are identical to the first example. In this example,
Splunk Enterprise prepends the event with an originating host name or IP
address.
89
How Splunk Enterprise moves syslog events when you
configure it with timestamping
You can also configure Splunk Enterprise to add timestamps to syslog events
when you forward those events. You could time stamp the events when you don't
want the downstream server to add its own timestamp. The following diagram
shows the required attribute and depicts how Splunk Enterprise deals with the
data.
The initial Messages A and B are identical to the first and second examples.
Splunk Enterprise prepends the events with a timestamp and an originating host
name or IP address.
90
Send SNMP events to your Splunk deployment
Simple Network Management Protocol (SNMP) traps are alerts that remote
devices send out. This topic describes how to send SNMP traps to a Splunk
deployment.
Note: The procedures shown in this topic (for both *nix and Windows) are
examples only. There are a number of ways to send SNMP traps to a Splunk
deployment. For example, instead of using Net-SNMP, you can use other tools,
such as Snare or SNMPGate, to write SNMP traps to files that you can monitor.
For Splunk Enterprise, the most effective way to index SNMP traps is to write
them to a file on the Splunk Enterprise server and configure Splunk Enterprise to
monitor the file. If you have Splunk Cloud, write the data to a file that is monitored
by the Splunk Universal Forwarder.
1. Configure the remote devices to send their traps directly to the Splunk
Enterprise instance IP address. The default port for SNMP traps is udp:162.
2. Write the SNMP traps to a file on the Splunk Enterprise instance, as described
in "Write SNMP traps to a file on the Splunk Enterprise server."
91
3. Configure Splunk Enterprise to monitor the file, as described in "Monitor files
and directories".
Note: This topic does not cover SNMP polling, which is a way to query remote
devices.
Use your favorite SNMP software to write the SNMP traps to a file. For
information about available SNMP software, visit the SNMP portal
(http://www.snmplink.org) website.
For *nix
On *nix, you can use the Net-SNMP project snmptrapd binary to write SNMP
traps to a file.
Before you install snmptrapd on your system, see the local documentation for the
version of snmptrapd that comes with your distribution of *nix. See also the
manual page for snmptrapd.
UDP port 162 is a privileged network port. If you need to use this port,
then you must run snmptrapd as root.
92
You can use the -f flag to keep snmptrapd in the foreground while testing.
You can use the -Lo flags instead of -Lf to log to standard output.
You can use the snmptrapd command to generate an example trap, as in:
For Windows
1. Download and install the latest version of NET-SNMP for Windows from the
NET-SNMP website.
Note: The OpenSSL library must not be installed on the system because it
conflicts with NET-SNMP.
3. Edit C:\usr\etc\snmp\snmptrapd.conf:
The vendor of the device you receive SNMP traps from can provide a specific
MIB. For example, all Cisco device MIBs can be located using the online Cisco
SNMP Object Navigator.
1. Download and copy the MIB file into the MIB search directory. On the *nix
version of Net-SNMP, the default location is /usr/local/share/snmp/mibs. You
can set a different directory by providing the -m argument to snmptrapd.
93
2. Instruct snmptrapd to load the MIB(s) by passing a colon-separated list to the
-m argument.
Note:
If you add a leading '+' character for the parameters in the -m argument,
snmptrapd loads the MIB in addition to the default list, instead of
overwriting the list.
The special keyword ALL tells snmptrapd to load all MIB modules in the
MIB directory.
snmptrapd -m +ALL
94
Get Windows data
Remote monitoring over WMI. Splunk Enterprise can use WMI to access
event log and performance data on remote machines.
95
The Splunk App for Windows Infrastructure
The Splunk App for Windows Infrastructure provides data inputs, searches,
reports, alerts, and dashboards for Windows server and desktop management.
You can monitor, manage, and troubleshoot Windows operating systems from
one place. The app includes inputs for CPU, disk I/O, memory, event logs,
configurations, and user data, plus a web-based setup UI for indexing Windows
event logs.
When you install and deploy Splunk Enterprise on Windows, consider the
following:
Shared hosts. Before you install Splunk Enterprise on a host that runs
other services, such as Exchange, SQL Server, or a hypervisor, see
Introduction to capacity planning for Splunk Enterprise in the Capacity
Planning manual.
The most efficient way to gather data from any Windows server is to install
universal forwarders on the hosts that you want to gather data. Universal
forwarders use limited resources. In some cases, such as Registry monitoring,
you must use a forwarder, because you cannot collect Registry data over WMI.
96
You can collect the following Windows data with Splunk software:
Since only Windows machines provide this data, only the Windows version of
Splunk Enterprise can get the data. Other operating systems cannot collect
Windows data directly. You can send Windows data from Windows machines to
Splunk Enterprise instances that do not run Windows. If you have Splunk Cloud
and want to monitor these inputs, the Splunk Universal Forwarder is your only
option.
The following table lists the control messages that the splunkd service sends to
modular and scripted Windows inputs during start-up and shutdown.
Start-up Shutdown
CreateProcess CTRL_BREAK_EVENT
Use Splunk Web to collect Windows data
Nearly all Windows inputs let you use the Splunk Web interface to get the data.
The exception is the MonitorNoHandle input, which you must set up with a
configuration file.
97
1. Log into your Splunk deployment.
2. Click Settings in the upper right corner, then click Data inputs. The Data
inputs page appears.
3. Find the Windows input that you want to add in the list of available inputs
by clicking Add new in the Actions column for the input.
4. Follow the instructions in the subsequent pages for the input type you
selected.
5. Click Save. In most cases, data collection begins immediately.
In cases where you cannot use Splunk Web to configure Windows inputs, such
as when you use a universal forwarder to collect the data, you must use
configuration files (the universal forwarder installer on Windows lets you
configure some Windows inputs at installation time.)
Configuration files offer more control over Splunk Web in many cases. Some
inputs can only be configured this way.
Splunk Enterprise collects remote Windows data for indexing in one of two ways:
For Splunk Cloud deployments you must use the Splunk Universal Forwarder to
monitor remote Windows data.
98
Use a forwarder to collect remote Windows data
Use a universal forwarder to get remote Windows data whenever possible. The
universal forwarder has these advantages:
After you install a universal forwarder, it gathers information locally and sends it
to a Splunk deployment. You tell the forwarder what data to gather either during
the installation process or later, by distributing configuration updates manually or
with a deployment server. You can also install add-ons into the universal
forwarder.
There are some drawbacks to using the universal forwarder, depending on your
network configuration and layout. See "Forwarders versus remote collection
through WMI" in this topic.
This configuration:
99
There are some caveats to this method of collection. See Forwarders versus
WMI in this topic.
Also, while Active Directory (AD) monitoring does not use WMI, it has the same
authentication considerations as data inputs that do use it. For information on
how Splunk Enterprise monitors AD, see Monitor Active Directory in this manual.
When collecting remote Windows data over WMI, consider the following:
When you install Splunk Enterprise, you can specify that it run as the Local
System user, or another user. This choice has ramifications for both installation
and data collection.
The user you tell Splunk Enterprise to run as determines the kind of data it can
retrieve from remote machines. To get the data you want, you must provide an
appropriate level of permission to this user.
Confirm that either the Splunk Enterprise user password never expires, or
that you manually change the password before it expires, as defined by
the password policy.
Restart Splunk services that run as that account on all hosts in your
network, once you change the password.
100
You should also assign the Splunk Enterprise account the "Deny log on locally"
user rights assignment in Local Security Policy to prevent the user from logging
in interactively to workstations. This method gives you more control and is more
secure than handing out domain administrator access.
Individual Getting Data In topics in this manual that deal with remote access to
Windows machines contain additional information and recommendations on how
to configure the user Splunk Enterprise runs as for least-permissive access.
Review the "Security and remote access considerations" section on those pages.
On recent versions of Windows Server, you can use managed service accounts
(MSAs) to address challenges with password expiry. See Managed service
accounts on Windows Server 2008 and Windows 7 in the Installation manual.
Monitor network bandwidth usage closely, especially in networks with slow or thin
WAN links. For this reason alone, universal forwarders are a better choice for
large remote data collection operations.
Disk bandwidth is a concern as well. Anti-virus scanner drivers and drivers that
intermediate between Splunk Enterprise and the operating system should always
be configured to ignore the Splunk Enterprise directory and processes,
regardless of the type of installation.
Use a universal forwarder to get data in from a remote Windows host. A universal
forwarder offers the most types of data sources, provides more detailed data (for
example, in performance monitoring metrics), minimizes network overhead, and
reduces operational risk and complexity. It is also more scalable than WMI in
many cases.
These are the main areas of tradeoff between WMI and forwarders:
Performance
101
Deployment
Management
Performance
You collect local event logs or flat files. A forwarder requires less CPU and
performs basic precompression of the data in an effort to reduce network
overhead.
You want to collect data from a machine without having to worry about
authentication. When you install a forwarder as the Local System user, it
has administrative access to the machine, letting you collect any data from
it.
You want to collect data from busy hosts such as AD domain controllers or
machines that consistently experience periods of high utilization, such as
Exchange, SQL Server/Oracle, VMWare, Hyper-V, or SharePoint servers.
This is because WMI might have problems keeping up with the amount of
data these services generate. WMI polling is best-effort by design, and
Splunk Enterprise also throttles WMI calls to prevent unintentional
denial-of-service attacks.
You are concerned about CPU and network utilization. Forwarders use as
little of these resources as possible, while WMI uses more CPU and
network resources to transfer data.
You are concerned about scalability. Universal forwarders scale very well.
Heavy forwarders do not scale as well as universal forwarders, but both
types of forwarder scale considerably better than WMI.
WMI is a better choice when you have concerns about memory usage on a
system with high memory utilization. Because forwarders have more polling
options available, and reside on the local machine while collecting data, they use
more memory than WMI does.
Deployment
You have control of the base build of the OS, as is the case when you
create system images.
You have many data sources to collect, particularly if the data requires
transformation of any kind.
102
Note: Except for a few cases, you cannot use a universal forwarder to process
data before it reaches the indexer. If you need to make any changes to your data
before you index it, you must use a heavy forwarder.
You don't have control of the base OS build, or you don't have domain
administrator access, or local administrator privileges on the machines
from which you want to get data.
You want or need only a limited set of data from a large number of hosts
(for example, CPU data for usage billing).
A common deployment scenario is to first test using remote polling, then add
successful or useful data inputs to your forwarder configurations later, or when
you do large scale forwarder installations.
Management
The table shows a list of data sources and indicates which data collection type(s)
are appropriate for each data source.
103
** Splunk Enterprise supports remote log file collection using the
"\\SERVERNAME\SHARE" syntax; however, you must use CIFS (Common
Internet File System, or Server Message Block) as your application layer file
access protocol, and Splunk Enterprise must have at least read access to both
the share and the underlying file system.
You can index and search your Windows data on a non-Windows Splunk
deployment, but you must first use a Windows instance of Splunk Enterprise to
get the Windows data. You can do this by installing a Splunk forwarder onto the
Windows computer and configuring it to forward Windows data to the
non-Windows instance of Splunk Enterprise.
Set up forwarders locally on each Windows machine that you want data.
These forwarders can send the Windows data to the non-Windows
receiving instance.
Set up a forwarder on a separate Windows machine. The forwarder can
use WMI to collect data from all the Windows machines in the
environment and then forward the combined data to a non-Windows
receiving instance of Splunk.
You can configure AD monitoring to watch for changes to your Active Directory
forest and collect user and machine metadata. You can use this feature
combined with dynamic list lookups to decorate or modify events with any
information available in AD.
After you have configured Splunk Enterprise to monitor your Active Directory, it
takes a baseline snapshot of the AD schema. It uses this snapshot to establish a
104
starting point for monitoring.
If you maintain the integrity, security, and health of your Active Directory, then
what happens with it day to day is a concern. Splunk Enterprise lets you monitor
what and when things changed in your AD, and who changed them.
You can transform this data into reports for corporate security compliance or
forensics, for example. You can also use the data retrieved for intrusion alerts for
immediate response. Additionally, you can create health reports with the data
indexed for future AD infrastructure planning activities, such as assignment of
operations master roles, AD replicas, and global catalogs across DCs.
The following table lists the permissions you must have to monitor an Active
Directory schema.
For best results with monitoring AD, read and understand the following:
105
The permissions that the user has determine which parts of AD Splunk
can monitor.
For information on deciding which user Splunk should run as at installation time,
see Choose the user Splunk should run as in the Installation Manual.
The AD monitor uses the following logic to interact with Active Directory after you
set it up:
1. If you specify a domain controller when you define the input (either with
the targetDc setting in inputs.conf or the "Target domain controller" field
in Splunk Web, then the input uses that domain controller for AD
operations.
2. If you do not specify a domain controller, then the input does the following:
1. The input attempts to use the local system cache to authenticate or
resolve SIDs.
2. If the monitor cannot authenticate or resolve SIDs that way, it
attempts a connection to the domain controller that the machine
that runs the input used to log on.
3. If that does not work, then the input attempts to use the closest AD
domain controller that has a copy of the Global Catalog.
3. If the domain controller that you specify is not valid, or a domain controller
cannot be found, then the input generates an error message.
If the AD monitor makes an LDAP query and receives a referral, it does not
chase this referral to complete the query. An LDAP referral represents a problem
with your LDAP configuration and you or your designated administrators should
determine and fix the configuration problem within AD.
106
Configure AD monitoring with Splunk Web
Splunk Home
Splunk Settings
By Splunk Settings:
By Splunk Home:
The Input Settings page lets you specify application context, default host value,
and index. All of these parameters are optional.
107
Note: Host only sets the host field in the resulting events. It does not tell the
input to look on a specific host on your network.
After specifying all your input settings, review your selections. Splunk Enterprise
lists all options you selected, including the type of monitor, the source, the source
type, the application context, and the index.
Splunk Enterprise then loads the "Success" page and begins indexing the
specified Active Directory node.
108
inputs.conf settings
inputs.conf contains one stanza for each AD monitoring input, with a header like
the following:
[admon://<name of stanza>]
109
the monitorSubtree attribute. Enterprise
can
The value of startingNode must be within the access.
scope of the DC you are targeting for Splunk
Enterprise to get AD data.
1 (monitor
How much of the target AD container to index. A all
value of 0 means to index only the target domains
container, and not traverse into subcontainers that
monitorSubtree No
within that container. A value of 1 means to Splunk
enumerate all sub-containers and domains that it Enterprise
has access to. has
access to)
Whether or not the input enumerates all existing
available AD objects when it first runs. A value of 1 (set the
baseline No
0 means not to set a baseline. A value of 1 baseline.)
means to set a baseline.
the
index No The index to route AD monitoring data to. 'default'
index.
Whether or not the Splunk should run the input. A
0
disabled No value of 0 means that the input is enabled, and a
(enabled).
value of 1 means that the input is disabled.
Example AD monitoring configurations
[admon://NearestDC]
targetDc =
startingNode =
To use a DC that is at a higher root level than an OU you want to target for
monitoring:
110
# the Computers OU in this forest. We want schema data for the entire AD
tree, not
# just this node.
[admon://DefaultTargetDc]
targetDc = pri01.eng.ad.splunk.com
startingNode = OU=Computers,DC=eng,DC=ad,DC=splunk,DC=com
To monitor multiple domain controllers:
# Get change data from two domain controllers (pri01 and pri02) in the
same AD tree.
# Index both and compare/contrast to ensure AD replication is occurring
properly.
[admon://DefaultTargetDc]
targetDc = pri01.eng.ad.splunk.com
startingNode = OU=Computers,DC=eng,DC=ad,DC=splunk,DC=com
[admon://SecondTargetDc]
targetDc = pri02.eng.ad.splunk.com
startingNode = OU=Computers,DC=eng,DC=ad,DC=splunk,DC=com
Sample AD monitoring output
When the Splunk AD monitoring utility runs, it gathers AD change events, which
are then indexed by Splunk software. To view these events as they arrive, use
the Search app.
There are several types of AD change events that Splunk software can index.
Examples of these events follow. Some of the content of these events has been
obscured or altered for publication purposes.
Update event
2/1/10
3:17:18.009 PM
02/01/2010 15:17:18.0099
dcName=stuff.splunk.com
admonEventType=Update
Names:
objectCategory=CN=Computer,CN=Schema,CN=Configuration
name=stuff2
displayName=stuff2
111
distinguishedName=CN=stuff2,CN=Computers
Object Details:
sAMAccountType=805306369
sAMAccountName=stuff2
logonCount=4216
accountExpires=9223372036854775807
objectSid=S-1-5-21-3436176729-1841096389-3700143990-1190
primaryGroupID=515
pwdLastSet=06:30:13 pm, Sat 11/27/2010
lastLogon=06:19:43 am, Sun 11/28/2010
lastLogoff=0
badPasswordTime=0
countryCode=0
codePage=0
badPwdCount=0
userAccountControl=4096
objectGUID=blah
whenChanged=01:02.11 am, Thu 01/28/2010
whenCreated=05:29.50 pm, Tue 11/25/2008
objectClass=top|person|organizationalPerson|user|computer
Event Details:
uSNChanged=2921916
uSNCreated=1679623
instanceType=4
Additional Details:
isCriticalSystemObject=FALSE
servicePrincipalName=TERMSRV/stuff2|TERMSRV blah
dNSHostName=stuff2.splunk.com
operatingSystemServicePack=Service Pack 2
operatingSystemVersion=6.0 (6002)
operatingSystem=Windows Vista? Ultimate
localPolicyFlags=0
Delete event
When an AD object has been marked for deletion, Splunk software generates a
delete event. The event type is similar to admonEventType=Update, except that it
contains the isDeleted=True key/value pair at the end of the event.
2/1/10
3:11:16.095 PM
02/01/2010 15:11:16.0954
dcName=stuff.splunk.com
admonEventType=Update
Names:
name=SplunkTest
DEL:blah
distinguishedName=OU=SplunkTest\0ADEL:blah,CN=Deleted
112
Objects
DEL:blah
Object Details:
objectGUID=blah
whenChanged=11:31.13 pm, Thu 01/28/2010
whenCreated=11:27.12 pm, Thu 01/28/2010
objectClass=top|organizationalUnit
Event Details:
uSNChanged=2922895
uSNCreated=2922846
instanceType=4
Additional Details:
dSCorePropagationData=20100128233113.0Z|20100128233113.0Z|20100128233113
lastKnownParent=stuff
'''isDeleted=TRUE'''
Sync event
2/1/10
3:11:09.074 PM
02/01/2010 15:11:09.0748
dcName=ftw.ad.splunk.com
admonEventType=Sync
Names:
name=NTDS Settings
distinguishedName=CN=NTDS
Settings,CN=stuff,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration
cn=NTDS Settings
objectCategory=CN=NTDS-DSA,CN=Schema,CN=Configuration,DC=ad,DC=splunk,DC
fullPath=LDAP://stuff.splunk.com/<GUID=bla bla bla>
CN=NTDS Settings
Object Details:
whenCreated=10:15.04 pm, Tue 02/12/2008
whenChanged=10:23.00 pm, Tue 02/12/2008
objectGUID=bla bla bla
objectClass=top|applicationSettings|nTDSDSA
classPath=nTDSDSA
Event Details:
113
instanceType=4
Additional Details:
systemFlags=33554432
showInAdvancedViewOnly=TRUE
serverReferenceBL=CN=stuff,CN=Domain System Volume
(SYSVOL share),CN=File Replication Service,CN=System
options=1
msDS-hasMasterNCs=DC=ForestDnsZones|DC=DomainDnsZones|CN=Schema,CN=Confi
msDS-HasInstantiatedNCs=
msDS-HasDomainNCs=blah
msDS-Behavior-Version=2
invocationId=bla bla bla
hasMasterNCs=CN=Schema,CN=Configuration|CN=Configuration
dSCorePropagationData=
dMDLocation=CN=Schema,CN=Configuration
nTSecurityDescriptor=NT AUTHORITY\Authenticated Users
SchemaName=LDAP://stuff.splunk.com/schema/nTDSDSA
Schema event
02/01/2010 15:11:16.0518
dcName=LDAP://stuff.splunk.com/
admonEventType=schema
className=msExchProtocolCfgSMTPIPAddress
classCN=ms-Exch-Protocol-Cfg-SMTP-IP-Address
instanceType=MandatoryProperties
nTSecurityDescriptor=MandatoryProperties
objectCategory=MandatoryProperties
objectClass=MandatoryProperties
adminDescription=OptionalProperties
adminDisplayName=OptionalProperties
allowedAttributes=OptionalProperties
allowedAttributesEffective=OptionalProperties
allowedChildClasses=OptionalProperties
allowedChildClassesEffective=OptionalProperties
bridgeheadServerListBL=OptionalProperties
canonicalName=OptionalProperties
cn=OptionalProperties
createTimeStamp=OptionalProperties
description=OptionalProperties
directReports=OptionalProperties
displayName=OptionalProperties
displayNamePrintable=OptionalProperties
114
distinguishedName=OptionalProperties
dSASignature=OptionalProperties
dSCorePropagationData=OptionalProperties
extensionName=OptionalProperties
flags=OptionalProperties
fromEntry=OptionalProperties
frsComputerReferenceBL=OptionalProperties
fRSMemberReferenceBL=OptionalProperties
fSMORoleOwner=OptionalProperties
heuristics=OptionalProperties
isCriticalSystemObject=OptionalProperties
isDeleted=OptionalProperties
isPrivilegeHolder=OptionalProperties
lastKnownParent=OptionalProperties
legacyExchangeDN=OptionalProperties
managedObjects=OptionalProperties
masteredBy=OptionalProperties
memberOf=OptionalProperties
modifyTimeStamp=OptionalProperties
mS-DS-ConsistencyChildCount=OptionalProperties
mS-DS-ConsistencyGuid=OptionalProperties
msCOM-PartitionSetLink=OptionalProperties
msCOM-UserLink=OptionalProperties
msDFSR-ComputerReferenceBL=OptionalProperties
msDFSR-MemberReferenceBL=OptionalProperties
msDS-Approx-Immed-Subordinates=OptionalProperties
msDs-masteredBy=OptionalProperties
msDS-MembersForAzRoleBL=OptionalProperties
msDS-NCReplCursors=OptionalProperties
msDS-NCReplInboundNeighbors=OptionalProperties
msDS-NCReplOutboundNeighbors=OptionalProperties
msDS-NonMembersBL=OptionalProperties
msDS-ObjectReferenceBL=OptionalProperties
msDS-OperationsForAzRoleBL=OptionalProperties
msDS-OperationsForAzTaskBL=OptionalProperties
msDS-ReplAttributeMetaData=OptionalProperties
msDS-ReplValueMetaData=OptionalProperties
msDS-TasksForAzRoleBL=OptionalProperties
msDS-TasksForAzTaskBL=OptionalProperties
msExchADCGlobalNames=OptionalProperties
msExchALObjectVersion=OptionalProperties
msExchHideFromAddressLists=OptionalProperties
msExchInconsistentState=OptionalProperties
msExchIPAddress=OptionalProperties
msExchTurfList=OptionalProperties
msExchUnmergedAttsPt=OptionalProperties
msExchVersion=OptionalProperties
msSFU30PosixMemberOf=OptionalProperties
name=OptionalProperties
netbootSCPBL=OptionalProperties
nonSecurityMemberBL=OptionalProperties
objectGUID=OptionalProperties
115
objectVersion=OptionalProperties
otherWellKnownObjects=OptionalProperties
ownerBL=OptionalProperties
partialAttributeDeletionList=OptionalProperties
partialAttributeSet=OptionalProperties
possibleInferiors=OptionalProperties
proxiedObjectName=OptionalProperties
proxyAddresses=OptionalProperties
queryPolicyBL=OptionalProperties
replicatedObjectVersion=OptionalProperties
replicationSignature=OptionalProperties
replPropertyMetaData=OptionalProperties
replUpToDateVector=OptionalProperties
repsFrom=OptionalProperties
repsTo=OptionalProperties
revision=OptionalProperties
sDRightsEffective=OptionalProperties
serverReferenceBL=OptionalProperties
showInAddressBook=OptionalProperties
showInAdvancedViewOnly=OptionalProperties
siteObjectBL=OptionalProperties
structuralObjectClass=OptionalProperties
subRefs=OptionalProperties
subSchemaSubEntry=OptionalProperties
systemFlags=OptionalProperties
unmergedAtts=OptionalProperties
url=OptionalProperties
uSNChanged=OptionalProperties
uSNCreated=OptionalProperties
uSNDSALastObjRemoved=OptionalProperties
USNIntersite=OptionalProperties
uSNLastObjRem=OptionalProperties
uSNSource=OptionalProperties
wbemPath=OptionalProperties
wellKnownObjects=OptionalProperties
whenChanged=OptionalProperties
whenCreated=OptionalProperties
wWWHomePage=OptionalProperties
116
Splunk Enterprise can monitor event log channels and files stored on the local
machine, and it can collect logs from remote machines. The event log monitor
runs as an input processor within the splunkd service. It runs once for every
event log input that you define in Splunk Enterprise. If you have Splunk Cloud
and want to monitor event log channels, use the Splunk Universal Forwarder to
collect the data and forward it to your Splunk Cloud deployment.
New for versions 6.4.5 and later of Splunk Enterprise, the Windows Event Log
monitoring input has improved performance.
Windows event logs are the core metric of Windows machine operations - if there
is a problem with your Windows system, the Event Log service has logged it.
Splunk Enterprise indexing, searching, and reporting capabilities make your logs
accessible.
Splunk Enterprise collects event log data from remote machines using either
WMI or a forwarder. Splunk recommends using a universal forwarder to send
event log data from remote machines to an indexer. See The universal forwarder
in the Universal Forwarder manual for information about how to install, configure
117
and use the forwarder to collect event log data.
To install forwarders on your remote machines to collect event log data, you can
install the forwarder as the Local System user on these machines. The Local
System user has access to all data on the local machine, but not on remote
machines.
To use WMI to get event log data from remote machines, you must ensure that
your network and Splunk instances are properly configured. You cannot install
the Splunk platform as the Local System user, and the user you install with
determines the event logs Splunk software sees. See Security and remote
access considerations in the Monitor WMI-based data topic in this manual for
additional information on the requirements you must satisfy to collect remote data
properly using WMI.
How the Windows Event Log monitor interacts with Active Directory (AD)
When you set up an Event Log monitoring input for WMI, the input connects to
an AD domain controller to authenticate and, if necessary, perform any security
ID (SID) translations before it begins to monitor the data.
The Event Log monitor uses the following logic to interact with AD after you set it
up:
1. If you specify a domain controller when you define the input (with the
evt_dc_name setting in inputs.conf), then the input uses that domain
controller for AD operations.
2. If you do not specify a domain controller, then the input does the following:
1. The input attempts to use the local system cache to authenticate or
resolve SIDs.
2. If the monitor cannot authenticate or resolve SIDs that way, it
attempts a connection to the domain controller that the machine
that runs the input used to log on.
3. If that does not work, then the input attempts to use the closest AD
domain controller that has a copy of the Global Catalog.
3. If the domain controller that you specify is not valid, or a domain controller
cannot be found, then the input generates an error message.
118
Collect event logs from a remote Windows machine
You have several choices to collect data from a remote Windows machine:
You can install a universal forwarder on the Windows machine and instruct it to
collect event logs. You can do this manually, or use a deployment server to
manage the forwarder configuration.
For specific instructions to install the universal forwarder, see Install a Windows
universal forwarder from an installer in the Universal Forwarder manual.
1. On the Windows machine that you want to collect Windows Event Logs,
download the universal forwarder software from Splunk.
2. Run the universal forwarder installation package to begin the installation
process.
3. When the installer prompts you, configure a receiving indexer.
4. When the installer prompts you to specify inputs, enable the event log
inputs by checking the "Event logs" checkbox.
5. Complete the installation procedure.
6. On the receiving indexer, use Splunk Web to search for the event log
data. An example search string follows:
Use WMI
If you choose to collect event logs using WMI, you must install Splunk Enterprise
with an Active Directory domain user. If the selected domain user is not a
member of the Administrators or Domain Admins groups, then you must
configure event log security to give the domain user access to the event logs.
To change event log security for access to the event logs from remote machines,
you must:
Have administrator access to the machine from which you are collecting
event logs.
Understand how the Security Description Definition Language (SDDL)
(external link) works, and how to assign permissions with it.
119
See Considerations for deciding how to monitor remote Windows data for
information on collecting data from remote Windows machines.
On Windows Vista and Server 2008 R2 systems, you might see some event logs
with randomly-generated machine names. This is the result of those systems
logging events before the user has named the system, during the OS installation
process.
This anomaly occurs only when you collect logs from the above-mentioned
versions of Windows remotely over WMI.
To get local Windows event log data, point your Splunk instance at your Event
Log service:
Splunk Home
Splunk Settings
By Splunk Settings:
120
By Splunk Home:
The Input Settings page lets you specify application context, default host value,
and index. All of these parameters are optional.
Host only sets the host field in the resulting events. It does not direct Splunk
Enterprise to look on a specific machine on your network.
121
Review your choices
After you specify all your input settings, you can review your selections. Splunk
Enterprise lists all options you selected, including the type of monitor, the source,
the source type, the application context, and the index.
Splunk Enterprise then loads the "Success" page and begins indexing the
specified Event Log channels.
The process for configuring remote event log monitoring is nearly identical to the
process for monitoring local event logs.
Caution: Selecting all of the Event Log channels can result in the indexing of a
lot of data, possibly more than your Splunk license can support.
122
11. Follow the instructions to specify input settings, as described in "Specify
input settings."
12. Follow the instructions to review your choices, as described in "Review
your choices."
Note: You can always review the defaults for a configuration file by looking at the
examples in %SPLUNK_HOME%\etc\system\default or at the spec file in the Admin
Manual.
The next section describes the available configuration values for event log
monitoring.
Windows event log (*.evt) files are in binary format. You cannot monitor them like
you do a normal text file. The splunkd service monitors these binary files by
using the appropriate APIs to read and index the data within the files.
123
disabled = 0
Monitor non-default Windows event logs
You can also configure Splunk Enterprise to monitor non-default Windows event
logs. Before you can do this, you must import them to the Windows Event
Viewer. After you import the logs, you can add them to your local copy of
inputs.conf, as follows:
[WinEventLog://DNS Server]
disabled = 0
[WinEventLog://Directory Service]
disabled = 0
[WinEventLog://File Replication Service]
disabled = 0
Use the "Full Name" log property in Event Viewer to specify complex Event
Log channel names properly
You can use the "Full Name" Event Log property in Event Viewer to ensure that
you specify the correct Event Log channel in an inputs.conf stanza.
[WinEventLog://Microsoft-Windows-TaskScheduler/Operational]
disabled = 0
Disable an event log stanza
To disable indexing for an event log, add disabled = 1 below its listing in the
stanza in %SPLUNK_HOME%\etc\system\local\inputs.conf.
Splunk software uses the following attributes in inputs.conf to monitor Event Log
files:
124
How events are to be read. Acceptable
values are oldest (meaning read logs from
the oldest to the newest) and newest
(meaning read logs from the newest to the
oldest.)
125
How Splunk software interacts with Active
Directory while indexing Windows Event
Log channels. Valid values are 1 (meaning
resolve Active Directory objects like
Globally Unique IDentifier (GUID) and
Security IDentifier (SID) objects to their
canonical names for a specific Windows
event log channel) and 0 (meaning not to
attempt any resolution.)
126
You can precede either format with two
backslash characters. This attribute does
not have a default.
The fully-qualified DNS name of the
evt_dns_name N/A
domain to bind to resolve AD objects.
Whether to include the message text that
comes with a security event. A value of 1
suppress_text 0
suppresses the message text, and a value
of 0 preserves the text.
Whether or not to read Event Log events
with the Event Logging API.
127
This is an advanced setting. Contact
Splunk Support before you change it.
128
This is an advanced setting. Contact
Splunk Support before you change it.
129
For ranges, use hyphens (for
example "0-1000,5000-1000").
130
For multiple codes/IDs, separate
the list with commas.
For ranges, use hyphens (for
example "0-1000,5000-1000").
131
Use the Security event log to monitor changes to files
You can monitor changes to files on your system by enabling security auditing on
a set of files and/or directories and then monitoring the Security event log
channel for change events. The event log monitoring input includes three
attributes which you can use in inputs.conf. For example:
[WinEventLog://Security]
disabled = 0
start_from = oldest
current_only = 0
evt_resolve_ad_obj = 1
checkpointInterval = 5
# only index events with these event IDs.
whitelist = 0-2000,3001-10000
# exclude these event IDs from being indexed.
blacklist = 2001-3000
To enable security auditing for a set of files or directories, read "Auditing Security
Events How To"
(http://technet.microsoft.com/en-us/library/cc727935%28v=ws.10%29.aspx) on
MS Technet.
You can also use the suppress_text attribute to include or exclude the message
text that comes with a security event.
[WinEventLog://Security]
disabled = 0
start_from = oldest
current_only = 0
evt_resolve_ad_obj = 1
checkpointInterval = 5
# suppress message text, we only want the event number.
suppress_text = 1
# only index events with these event IDs.
whitelist = 0-2000,2001-10000
# exclude these event IDs from being indexed.
blacklist = 2001-3000
To use a specific domain controller, set the evt_dc_name attribute:
132
[WinEventLog://Security]
disabled = 0
start_from = oldest
current_only = 0
evt_resolve_ad_obj = 1
evt_dc_name = boston-dc1.contoso.com
checkpointInterval = 5
# suppress message text, we only want the event number.
suppress_text = 1
# only index events with these event IDs.
whitelist = 0-2000,2001-10000
# exclude these event IDs from being indexed.
blacklist = 2001-3000
To use the primary domain controller to resolve AD objects, set the
evt_resolve_ad_ds attribute to PDC. Otherwise, it locates the nearest domain
controller:
[WinEventLog://Security]
disabled = 0
start_from = oldest
current_only = 0
evt_resolve_ad_obj = 1
evt_resolve_ad_ds = PDC
checkpointInterval = 5
# suppress message text, we only want the event number.
suppress_text = 1
# only index events with these event IDs.
whitelist = 0-2000,2001-10000
# exclude these event IDs from being indexed.
blacklist = 2001-3000
Create advanced filters with 'whitelist' and 'blacklist'
You can perform advanced filtering of incoming events with the whitelist and
blacklist attributes in addition to filtering based solely on event codes. To do
this, specify the key/regular expression format in the attribute:
Key Description
The time that the computer generated the event. Only generates the time string
$TimeGenerated
event.
The time that the event was received and recorded by the Event Log service. S
$Timestamp
Enterprise only generates the time string as the event.
133
Category The category number for a specific event source.
CategoryString A string translation of the category. The translation depends on the event sourc
ComputerName The name of the computer that generated the event.
EventCode The event ID number for an event. Corresponds to "Event ID" in Event Viewer.
A numeric value that represents one of the five types of events that can be logg
Warning, Information, Success Audit, and Failure Audit.) Available only on mac
EventType run Windows Server 2003 and earlier or clients running Windows XP and earlie
"Win32_NTLogEvent class (Windows)"
(http://msdn.microsoft.com/en-us/library/aa394226(v=vs.85).aspx) on MSDN.
An element used to classify different types of events within an event log channe
Keywords
Security Event Log channel has this element, for example.
The name of the Event Log channel that received the event. Corresponds to "Lo
LogName
Event Viewer.
Message The text of the message in the event.
OpCode The severity level of the event ("OpCode" in Event Viewer.)
The Windows Event Log record number. Each event on a Windows machine ge
number. This number starts at 0 with the first event generated on the system, a
RecordNumber
with each new event generated, until it reached a maximum of 4294967295. It t
back over to 0.
The Security Identifier (SID) of the principal (such as a user, group, computer, o
entity) that was associated with or generated the event. See "Win32_UserAccou
Sid
(http://msdn.microsoft.com/en-us/library/windows/desktop/aa394507%28v=vs.8
on MSDN.
A numeric value that represents the type of SID that was associated with the ev
"Win32_UserAccount class"
SidType
(http://msdn.microsoft.com/en-us/library/windows/desktop/aa394507%28v=vs.8
on MSDN.
SourceName The source of the entity that generated the event ("Source" in Event Viewer)
The task category of the event. Event sources let you define categories so that
TaskCategory them with Event Viewer (using the "Task Category" field. See Event Categories
(http://msdn.microsoft.com/en-us/library/aa363649%28VS.85%29.aspx) on MS
A numeric value that represents one of the five types of events that can be logg
"Warning", "Information", "Success Audit", and "Failure Audit".) Only available o
Type that run Windows Server 2008 or later, or Windows Vista or later. See "Win32_N
class (Windows)" (http://msdn.microsoft.com/en-us/library/aa394226(v=vs.85).a
MSDN.
134
User The user associated with the event. Correlates to "User" in Event Viewer.
and <regular expression> is any valid regular expression that represents the
filters that you want to include (when used with the whitelist attribute) or
exclude (when used with the blacklist attribute).
You can specify more than one key/regular expression set on a single entry line.
When you do this, Splunk Enterprise logically conjuncts the sets. This means that
only events that satisfy all of the sets on the line are valid for inclusion or
exclusion. For example, this entry:
whitelist = EventCode="^1([0-5])$"
whitelist1 = EventCode="^2([0-5])$"
Resolve Active Directory objects in event log files
135
For example:
[WinEventLog://Security]
disabled = 0
start_from = oldest
current_only = 0
evt_resolve_ad_obj = 1
checkpointInterval = 5
To specify a domain controller for the domain that Splunk should bind to in order
to resolve AD objects, use the evt_dc_name attribute.
The string specified in the evt_dc_name attribute can represent either the domain
controller's NetBIOS name, or its fully-qualified domain name (FQDN). Either
name type can, optionally, be preceded by two backslash characters.
FTW-DC-01
\\FTW-DC-01
FTW-DC-01.splunk.com
\\FTW-DC-01.splunk.com
To specify the FQDN of the domain to bind to, use the evt_dns_name attribute.
For example:
[WinEventLog://Security]
disabled = 0
start_from = oldest
current_only = 0
evt_resolve_ad_obj = 1
evt_dc_name = ftw-dc-01.splunk.com
evt_dns_name = splunk.com
checkpointInterval = 5
Constraints for using the evt_dc_name and evt_resolve_ad_obj attributes
Splunk software first attempts to resolve SIDs and GUIDs using the
domain controller (DC) specified in the evt_dc_name attribute first. If it
cannot resolve SIDs using this DC, it attempts to bind to the default DC to
perform the translation.
If Splunk software cannot contact a DC to translate SIDs, it attempts to
use the local machine for translation.
136
If none of these methods works, then Splunk prints the SID as it was
captured in the event.
Splunk software cannot translate SIDs that are not in the format
S-1-N-NN-NNNNNNNNNN-NNNNNNNNNN-NNNNNNNNNN-NNNN.
If you discover that SIDs are not being translated properly, review splunkd.log
for clues on what the problem might be.
Specify whether to start index at the earliest or the most recent event
Use the start_from attribute to specify whether events are indexed starting at
the earliest event or the most recent. By default, indexing starts with the oldest
data and moves forward. Do not change this setting, because Splunk software
stops indexing after it has indexed the backlog using this method.
Use the current_only attribute to specify whether to index all preexisting events
in a given log channel. When set to 1, only events that appear from the moment
the Splunk deployment was started are indexed. When set to 0, all events are
indexed.
For example:
[WinEventLog://Application]
disabled = 0
start_from = oldest
current_only = 1
Display events in XML
To have Splunk Enterprise generate events in XML, use the renderXml attribute:
[WinEventLog://System]
disabled = 0
renderXml = 1
evt_resolve_ad_obj = 1
evt_dns_name = \"SV5DC02\"
This input stanza generates events like the following:
<Event xmlns='http://schemas.microsoft.com/win/2004/08/events/event'>
<System>
<Provider Name='Service Control Manager'
Guid='{555908d1-a6d7-4695-8e1e-26931d2012f4}' EventSourceName='Service
Control Manager'/>
<EventID Qualifiers='16384'>7036</EventID>
137
<Version>0</Version>
<Level>4</Level>
<Task>0</Task>
<Opcode>0</Opcode>
<Keywords>0x8080000000000000</Keywords>
<TimeCreated SystemTime='2014-04-24T18:38:37.868683300Z'/>
<EventRecordID>412598</EventRecordID>
<Correlation/>
<Execution ProcessID='192' ThreadID='210980'/>
<Channel>System</Channel>
<Computer>SplunkDoc.splunk-docs.local</Computer>
<Security/>
</System>
<EventData>
<Data Name='param1'>Application Experience</Data>
<Data Name='param2'>stopped</Data>
<Binary>410065004C006F006F006B00750070005300760063002F0031000000</Binary>
</EventData>
</Event>
When you instruct Splunk Enterprise to render events in XML, event keys within
the XML event render in English regardless of the machine system locale.
Compare the following events generated on a French version of Windows
Server:
Standard event:
04/29/2014 02:50:23 PM
LogName=Security
SourceName=Microsoft Windows security auditing.
EventCode=4672
EventType=0
Type=Information
ComputerName=sacreblue
TaskCategory=Ouverture de session spciale
OpCode=Informations
RecordNumber=2746
Keywords=Succs de l?audit
Message=Privilges spciaux attribus la nouvelle ouverture de
session.
Sujet :
ID de scurit : AUTORITE NT\Systme
Nom du compte : Systme
Domaine du compte : AUTORITE NT
ID d?ouverture de session :
0x3e7
Privilges : SeAssignPrimaryTokenPrivilege
SeTcbPrivilege
138
SeSecurityPrivilege
SeTakeOwnershipPrivilege
SeLoadDriverPrivilege
SeBackupPrivilege
SeRestorePrivilege
SeDebugPrivilege
SeAuditPrivilege
SeSystemEnvironmentPrivilege
SeImpersonatePrivilege
XML event:
<Event xmlns='http://schemas.microsoft.com/win/2004/08/events/event'>
<System><Provider
Name='Microsoft-Windows-Security-Auditing'
Guid='{54849625-5478-4994-A5BA-3E3B0328C30D}'/>
<EventID>4672</EventID>
<Version>0</Version>
<Level>0</Level>
<Task>12548</Task>
<Opcode>0</Opcode>
<Keywords>0x8020000000000000</Keywords>
<TimeCreated
SystemTime='2014-04-29T22:15:03.280843700Z'/>
<EventRecordID>2756</EventRecordID>
<Correlation/><Execution ProcessID='540'
ThreadID='372'/>
<Channel>Security</Channel>
<Computer>sacreblue</Computer>
<Security/>
</System>
<EventData>
<Data Name='SubjectUserSid'>AUTORITE
NT\Systme</Data>
<Data
Name='SubjectUserName'>Systme</Data>
<Data Name='SubjectDomainName'>AUTORITE
NT</Data>
<Data Name='SubjectLogonId'>0x3e7</Data>
<Data
Name='PrivilegeList'>SeAssignPrimaryTokenPrivilege
SeTcbPrivilege
SeSecurityPrivilege
SeTakeOwnershipPrivilege
SeLoadDriverPrivilege
SeBackupPrivilege
SeRestorePrivilege
SeDebugPrivilege
SeAuditPrivilege
SeSystemEnvironmentPrivilege
SeImpersonatePrivilege</Data>
139
</EventData>
</Event>
The Data Name keys in the XML event render in English despite rendering in the
system's native language in the standard event.
You can use the CLI to configure local event log monitoring. Before you use the
CLI, create stanza entries in inputs.conf first. See "Use inputs.conf to configure
event log monitoring" in this topic.
Note: The CLI is not available for remote Event Log collections.
To index exported Windows event log files, use the instructions for monitoring
files and directories to monitor the directory that contains the exported files.
Caution: Do not attempt to monitor an .evt or .evtx file that is open for writing.
Windows does not allow read access to these files. Use the event log monitoring
feature instead.
Constraints
140
Splunk Enterprise on Windows XP and Windows Server 2003/2003 R2
cannot index .evtx files exported from systems running Windows Vista and
later or Windows Server 2008/2008 R2 and later.
Splunk Enterprise on Windows Vista and later and Server 2008/2008 R2
and later can index both .evt and .evtx files.
If your .evt or .evtx file is not from a standard event log channel, you must
make sure that any dynamic link library (DLL) files required by that
channel are present on the computer on which you are indexing.
Splunk Enterprise indexes an .evt or .evtx file in the primary
locale/language of the computer that collects the file.
Files that have been exported from another machine do not work with the
Splunk Web Upload feature. This is because those files contain
information that is specific to the machine that generated them. Other
machines won't be able to process the files in their unaltered form.
Note: When producing .evt or .evtx files on one system, and monitoring them on
another, it's possible that not all of the fields in each event expand as they would
on the system producing the events. This is caused by variations in DLL
versions, availability and APIs. Differences in OS version, language, Service
Pack level and installed third party DLLs, etc. can also have this effect.
If you have Splunk Cloud and want to monitor Windows file system changes
through the Security Event Log channel, use the Splunk Universal Forwarder.
141
You must enable security auditing for the file(s)
or director(ies) you want Splunk Enterprise to
monitor changes to
Use the Security event log to monitor changes to files
You can monitor changes to files on your system by enabling security auditing on
a set of files and/or directories and then monitoring the Security event log
channel for change events. The event log monitoring input includes three
attributes which you can use in inputs.conf.
You can use these attributes outside of the context of the Security event log and
file system changes. Also, this list of attributes is only a subset of the available
attributes for inputs.conf. For additional attributes, read Monitor Windows event
log data in this manual.
142
Use '=' between the key and the regular
expression that represents your filter (for
example "whitelist =
EventCode=%^1([8-9])$%"
You can have multiple key/regular expression
sets in a single advanced filtering entry.
Splunk Enterprise joins the sets logically. This
means that the entry is valid only if all of the
sets in the entry are true.
You can specify up to 10 whitelists per stanza
by adding a number to the end of the
whitelist attribute, for example
whitelist1...whitelist9.
blacklist Do not ndex events that match the text string N/A
specified. This attribute is optional.
143
EventCode=%^1([8-9])$%"
You can have multiple key/regular expression
sets in a single advanced filtering entry.
Splunk Enterprise joins the sets logically. This
means that the entry is valid only if all of the
sets in the entry are true.
You can specify up to 10 blacklists per stanza
by adding a number to the end of the
blacklist attribute, for example
blacklist1...blacklist9.
You can perform advanced filtering of incoming events with the whitelist and
blacklist attributes in addition to filtering based solely on event codes. To do
this, specify the key/regular expression format in the attribute:
Key Description
The time that the computer generated the event. Only generates the time string
$TimeGenerated
event.
The time that the event was received and recorded by the Event Log service. S
$Timestamp
Enterprise only generates the time string as the event.
Category The category number for a specific event source.
CategoryString A string translation of the category. The translation depends on the event sourc
ComputerName The name of the computer that generated the event.
EventCode The event ID number for an event. Corresponds to "Event ID" in Event Viewer.
A numeric value that represents one of the five types of events that can be logg
"Warning", "Information", "Success Audit", and "Failure Audit".) Available only o
EventType machines running Windows Server 2003 and earlier or clients running Windows
earlier. See Win32_NTLogEvent class (Windows)
(http://msdn.microsoft.com/en-us/library/aa394226(v=vs.85).aspx) on MSDN.
144
An element used to classify different types of events within an event log channe
Keywords
Security Event Log channel has this element, for example.
The name of the Event Log channel that received the event. Corresponds to "Lo
LogName
Event Viewer.
Message The text of the message in the event.
OpCode The severity level of the event ("OpCode" in Event Viewer.)
The Windows Event Log record number. Each event on a Windows server gets
number. This number starts at 0 with the first event generated on the system, a
RecordNumber
with each new event generated, until it reached a maximum of 4294967295. It t
back over to 0.
The Security Identifier (SID) of the principal (such as a user, group, computer, o
entity) that was associated with or generated the event. See Win32_UserAccou
Sid
(http://msdn.microsoft.com/en-us/library/windows/desktop/aa394507%28v=vs.8
on MSDN.
A numeric value that represents the type of SID that was associated with the ev
Win32_UserAccount class
SidType
(http://msdn.microsoft.com/en-us/library/windows/desktop/aa394507%28v=vs.8
on MSDN.
SourceName The source of the entity that generated the event ("Source" in Event Viewer)
The task category of the event. Event sources allow you to define categories so
can filter them with Event Viewer (using the "Task Category" field. See Event C
TaskCategory
(Windows) (http://msdn.microsoft.com/en-us/library/aa363649%28VS.85%29.as
MSDN.
A numeric value that represents one of the the five types of events that can be l
("Error", "Warning", "Information", "Success Audit", and "Failure Audit".) Only av
Type server machines that run Windows Server 2008 or later, or clients that run Wind
later. See Win32_NTLogEvent class (Windows)
(http://msdn.microsoft.com/en-us/library/aa394226(v=vs.85).aspx) on MSDN.
User The user associated with the event. Correlates to "User" in Event Viewer.
<regular expression> is any valid regular expression that represents the filters
that you want to include (when used with the whitelist attribute) or exclude
(when used with the blacklist attribute).
To learn more about regular expressions and how to use them, visit the
Regularexpressions.info (http://www.regular-expressions.info) website.
You can specify more than one regular expression on a single entry line. Only
events that satisfy all of the entries on the line are included or excluded. For
145
example, this entry:
Splunk software ignores the first expression and only attempts to include events
that match the second expression. In this case, only events that contain an
EventCode between 20 and 25 match. Events that contain an EventCode between
10 and 15 do not match. Only the last expression in the entry ever matches.
whitelist = EventCode="^1([0-5])$"
whitelist1 = EventCode="^2([0-5])$"
Monitor file system changes
For instructions on how to configure the Event Log monitor input, see
[http://docs.splunk.comhttp://docs.splunk.com/Documentation/Splunk/7.0.0/Data/MonitorWindowse
Monitor Windows event log data].
146
Examples of file system change monitoring
Following are inputs.conf stanzas that show examples of how to monitor file
system changes.
This stanza collects security events with event ID codes 0 to 2000 and
3001-10000.
[WinEventLog:Security]
disabled = 0
start_from = oldest
current_only = 0
evt_resolve_ad_obj = 1
checkpointInterval = 5
# only index events with these event IDs.
whitelist = 0-2000,2001-10000
# exclude these event IDs from being indexed.
blacklist = 2001-3000
This stanza collects security events with event ID codes 0 to 2000 and
3001-10000. It also suppresses the message text that comes in the event ID.
[WinEventLog:Security]
disabled = 0
start_from = oldest
current_only = 0
evt_resolve_ad_obj = 1
checkpointInterval = 5
# suppress message text, we only want the event number.
suppress_text = 1
# only index events with these event IDs.
whitelist = 0-2000,2001-10000
# exclude these event IDs from being indexed.
blacklist = 2001-3000
If possible, use a universal forwarder rather than WMI to collect data from
remote machines. The resource load of WMI can exceed that of a Splunk
147
universal forwarder in many cases. Use a forwarder if you collect multiple event
logs or performance counters from each host, or from very busy hosts like
domain controllers. See Considerations for deciding how to monitor remote
Windows data in this manual. If you have Splunk Cloud, you must use the
universal forwarder to collect data from WMI providers and forward it to your
Splunk Cloud deployment.
WMI-based data inputs can connect to multiple WMI providers. The input runs as
a separate process called splunk-wmi.exe. It is a scripted input.
Here are the basic minimum requirements to monitor WMI-based data. You
might need additional permissions based on the logs or performance counters
you want to monitor.
Splunk Enterprise and your Windows network must be correctly configured for
WMI data access. Review the following prerequisites before attempting to use
Splunk Enterprise to get WMI data.
148
The user Splunk Enterprise runs as must be a member of an Active
Directory (AD) domain or forest and must have appropriate privileges to
query WMI providers.
The Splunk user must also be a member of the local Administrators group
on the computer that runs Splunk Enterprise.
The computer that runs Splunk Enterprise must be able to connect to the
remote machine and must have permissions to get the desired data from
the remote machine once it has connected.
Both the Splunk Enterprise instance and the target machines must be part
of the same AD domain or forest.
The user that Splunk Enterprise runs as does not need to be a member of the
Domain Admins group (and for security reasons, should not be). However, you
must have domain administrator privileges to configure access for the user. If you
don't have domain administrator access, find someone who can either configure
Splunk user access or give domain administrator rights to you.
If you install Splunk Enterprise as the Local System user, remote authentication
over WMI does not work. The Local System user has no access to other
machines on the network. It is not possible to grant privileges to a Local System
account for access to another host.
You can give the Splunk user access to WMI providers by doing one of the
following:
Adding it to the local Administrators group on each member host you want
to poll (not recommended for security reasons).
Adding it to the Domain Admins global group (not recommended for
security reasons).
Assigning least-permissive rights as detailed below (recommended).
To maintain security integrity, place Splunk users into a domain global group and
assign permissions on Windows machines and resource ACLs to that group,
instead of assigning permissions directly to the user. Assignment of permissions
directly to users is a security risk, and can cause problems during security audits
or future changes.
149
least-permissive access to all Windows resources, including WMI. In order to
grant this type of access, follow this checklist. For additional information and
step-by-step instructions, see Prepare your Windows network for a Splunk
Enterprise installation in the Installation manual.
You must grant several levels of access to the user Splunk Enterprise runs as for
Splunk Enterprise to collect data over WMI using the least-permissive method:
To deploy these user rights assignments domain-wide, use the Domain Security
Policy (dompol.msc) Microsoft Management Console (MMC) snap-in. After
deployment, member hosts inherit those rights assignments on the network
during the next AD replication cycle. Restart Splunk Enterprise instances on
those machines for the changes to take effect.
To extend this access to domain controllers specifically, assign the rights using
the Domain Controller Security Policy (dcpol.msc) snap-in.
Local Security Policy Permissions. The Splunk user needs the following
Local Security Policy user rights assignments defined on each machine
you poll for WMI-based data:
Access this Computer from the Network
Act as part of the operating system
Log on as a batch job
Log on as a service
Profile System Performance
Replace a process level token
150
objects over WMI. The best way to do this is to nest the "Performance Log
Users" domain global group into the "Performance Log Users" local group
on each member host and then assign the user to the global group.
These rights must be assigned to the Root namespace and all subnamespaces
below it. See "Managing WMI security"
(https://technet.microsoft.com/en-us/library/cc731011.aspx) on Microsoft
TechNet.
Note: There is no standard facility for deploying WMI security settings remotely
to multiple machines at once using Group Policy. However, Set WMI namespace
security via GPO
(http://blogs.msdn.com/spatdsg/archive/2007/11/21/set-wmi-namespace-security-via-gpo-script.asp
on MSDN Blogs offers instructions on how to create a startup script that you can
place inside a Group Policy Object (GPO), which sets the namespace security
once the GPO applies to the desired hosts. You can then deploy this GPO
domain-wide or to one or more Organizational Units (OUs).
151
Test access to WMI providers
After you configure WMI and set up the Splunk user for access to your domain,
test access to the remote machine.
This procedure includes steps to temporarily change the Splunk Enterprise data
store directory (the location SPLUNK_DB points to). You must do this before testing
access to WMI. Failure to do so can result in missing WMI events. This is
because the splunk-wmi.exe process updates the WMI checkpoint file every time
it runs.
If you attempt to log into a domain controller, you might have to change your
domain controller security policy to assign the "Allow log on locally" policy for the
designated user.
1. Log into the machine Splunk Enterprise runs on as the Splunk user.
2. Open a command prompt (click Start -> Run and type cmd).
5. Run the following command to change where Splunk Enterprise stores its data
temporarily:
7. Once Splunk Enterprise has restarted, test access to WMI providers, replacing
<host> with the name of the remote host:
152
> splunk cmd splunk-wmi -wql "select * from win32_service" -namespace
\\<host>\root\cimv2
If you see data streaming back and no error messages, then Splunk
Enterprise was able to connect to the WMI provider and query
successfully.
If there is an error, a message with a reason on what caused the error
appears. Look for the error="<msg>" string in the output for clues on how
to correct the problem.
After testing WMI access, point Splunk Enterprise back to the correct database
directory by running the following command, and then restarting Splunk
Enterprise:
To add WMI-based inputs, use the "Remote event log monitoring" and "Remote
Performance monitoring" data inputs. See Configure remote Windows
performance monitoring with Splunk Web. See also Configure remote Windows
event log monitoring
wmi.conf handles remote data collection configurations. Review this file to see
the default values for WMI-based inputs. If you want to make changes to the
default values, edit a copy of wmi.conf in %SPLUNK_HOME%\etc\system\local\. Set
values only for the attributes you want to change for a given type of data input.
See About configuration files in the Admin manual.
153
One or more input-specific stanzas, which define how to connect to WMI
providers to get data from the remote machine.
Global settings
The [settings] stanza specifies global WMI parameters. The entire stanza and
every parameter within it are optional. If the stanza is not present, Splunk
Enterprise assumes system defaults.
The following attributes control how Splunk Enterprise reconnects to a given WMI
provider when an error occurs.
154
example shown above.
How long, in seconds, to wait for state
checkpoint_sync_interval data (event log checkpoint) to be 2
written to disk.
Input-specific settings
[WMI:AppAndSys]
When you configure WMI-based inputs in Splunk Web, Splunk Enterprise uses
this naming convention for input-specific stanza headers.
You can specify one of two types of data inputs in an input-specific stanza:
Do not define both of these attributes in one stanza. Use only one or the other.
Otherwise, the input defined by the stanza will not run.
155
Attribute Description Default
value
A comma-separated list of
event_log_file N/A
event log channels to monitor.
Whether or not to collect
events that occur only when it
is running. If events are
generated when Splunk
Enterprise is stopped, it will
0 (gather all
current_only not attempt to index those
events)
events when it is started
again. Set to 1 to collect
events that occur only when it
is running, and 0 to collect all
events.
Do not normalize the host
name that is retrieved from a
WMI event. By default, Splunk
Enterprise normalizes host
names by producing a single
0 (normalize
name for the host by
host names
disable_hostname_normalization identifying various equivalent
for WMI
host names for the local
events)
system. Set this parameter to
1 to disable host name
normalization in events, and 0
to normalize host names in
events.
The WQL-specific parameters are:
156
(http://msdn.microsoft.com/en-us/library/aa394084(VS.85).aspx).
Whether or not an event notification query is expected. See
"WQL query types: event notification versus standard" in this
0 (expect a stan
current_only topic for additional information. Set this attribute to 1 to tell
query)
Splunk Enterprise to expect an event notification query, and 0 to
expect a standard query.
WQL query types: event notification versus standard
The current_only attribute in WQL stanzas determines the type of query the
stanza expects to use to collect WMI-based data. When you set the attribute to 1,
the stanza expects event notification data. Event notification data is data that
alerts you of an incoming event. To get event notification data, you must use an
event notification query.
For example, to find out when a remote host spawns processes, you must use an
event notification query. Standard queries have no facilities for notifying you
when an event has occurred, and can only return results on information that
already exists.
Event notification queries require that the WQL statement defined for the stanza
be structurally and syntactically correct. Improperly formatted WQL will cause the
input defined by the stanza to not run. Review the wmi.conf configuration file
reference for specific details and examples.
When you use a WQL query stanza to gather data through WMI, Splunk
Enterprise does not update the WMI checkpoint file - the file that determines if
WMI data has been indexed. This is by design - a WQL query of any type returns
dynamic data and a context for saving a checkpoint for the data produced cannot
be built. This means that Splunk Enterprise indexes WMI data that it collects
through WQL query stanzas as fresh data each time the stanza runs. This can
result in the indexing of duplicate events and possibly impact license volume.
If you need to index data regularly, such as event logs, use the appropriate
monitor on a universal forwarder. If you must use WMI, use a standard WMI
query type.
157
Examples of wmi.conf
[settings]
initial_backoff = 5
max_backoff = 20
max_retries_at_max_backoff = 2
checkpoint_sync_interval = 2
[WMI:AppAndSys]
server = foo, bar
interval = 10
event_log_file = Application, System, Directory Service
disabled = 0
[WMI:LocalSplunkWmiProcess]
interval = 5
wql = select * from Win32_PerfFormattedData_PerfProc_Process where Name
= "splunk-wmi"
disabled = 0
# Listen from three event log channels, capturing log events that occur
only
# while Splunk Enterprise runs. Gather data from three machines.
[WMI:TailApplicationLogs]
interval = 10
event_log_file = Application, Security, System
server = srv1, srv2, srv3
disabled = 0
current_only = 1
158
Fields for WMI data
When Splunk Enterprise indexes data from WMI-based inputs, it sets the
originating host from the data received. it sets the source for received events to
wmi. It sets the source type of the incoming events based on the following
conditions:
For event log data, Splunk Enterprise sets the source type to
WinEventLog:<name of log file>. For example,
WinEventLog:Application.
For WQL data, Splunk Enterprise sets the source type to the name of the
stanza that defines the input. For example, for a stanza named
[WMI:LocalSplunkdProcess], Splunk sets the source type to
WMI:LocalSplunkdProcess.
WMI events are not available for transformation at index time. You cannot modify
or extract WMI events as Splunk Enterprise indexes them. This is because WMI
events arrive as a single source (a scripted input), which means they can be
matched only as a single source.
You can modify and extract WMI events at search time. You can also address
WMI-based inputs at parse time by specifying the sourcetype [wmi].
If you encounter problems receiving events through WMI providers or are not
getting the results you expect, see Common Issues with Splunk and WMI in the
Troubleshooting Manual.
159
When a program makes a change to a configuration, it writes those changes to
the Registry. Later, when the program runs again, it looks into the Registry to
read those configurations. You can learn when Windows programs and
processes add, update, and delete Registry entries on your system. When a
Registry entry changes, Splunk Enterprise captures the name of the process that
made the change, as well as the entire path to the entry being changed.
If you have Splunk Cloud, you must use the universal forwarder to collect data
from the Windows Registry and forward it to your Splunk Cloud deployment.
The Registry is probably the most used, yet least understood component of
Windows operation. Many programs and processes read from and write to it at all
times. When something is not functioning, Microsoft often instructs administrators
and users alike to make changes to the Registry directly using the RegEdit tool.
The ability to capture those edits, and any other changes, in real time is the first
step in understanding the importance of the Registry.
Registry health is very important. Splunk Enterprise tells you when changes to
the Registry are made and also if those changes were successful. If programs
and processes can't write to or read from the Registry, a system failure can
occur. Splunk Enterprise can alert you to problems interacting with the Registry
so that you can restore it from a backup and keep your system running.
The following table lists the explicit permissions you need to monitor the Registry.
You might need additional permissions based on the Registry keys that you want
to monitor.
160
Performance considerations
When you enable Registry monitoring, you specify which Registry hives to
monitor: the user hive (represented as HKEY_USERS in RegEdit) and/or the
machine hive (represented as HKEY_LOCAL_MACHINE). The user hive contains
user-specific configurations required by Windows and programs, and the
machine hive contains configuration information specific to the machine, such as
the location of services, drivers, object classes and security descriptors.
Because the Registry plays a central role in the operation of a Windows machine,
enabling both Registry paths results in a lot of data for Splunk Enterprise to
monitor. To achieve the best performance, filter the amount of Registry data that
Splunk Enterprise indexes by configuring inputs.conf.
Similarly, you can capture a baseline snapshot of the current state of your
Windows Registry when you first start Splunk Enterprise, and again every time a
specified amount of time has passed. The snapshot lets you compare what the
Registry looks like at a certain point in time and provides for easier tracking of the
changes to the Registry over time.
The snapshot process can be somewhat CPU-intensive, and might take several
minutes to complete. You can postpone taking a baseline snapshot until you
have narrowed the scope of the Registry entries to those you specifically want
Splunk Enterprise to monitor.
Splunk Home
Splunk Settings
By Splunk Settings:
161
By Splunk Home:
2. In the Collection Name field, enter a unique name for the input that you will
remember.
3. In the Registry hive field, enter the path to the Registry key that you want
Splunk Enterprise to monitor.
4. (Optional) If you are not sure of the path, click the Browse button to select the
Registry key path that you want Splunk Enterprise to monitor.
The Registry hive window opens and displays the Registry in tree view. Hives,
keys and subkeys display as folders, and values display as document icons.
5. In the Registry hive window, choose the desired Registry key by clicking on
the name of the key.
The qualified key name appears in the Qualified name field at the bottom of the
window.
7. (Optional) Select Monitor subnodes if you want to monitor the child nodes
below the starting hive.
Note: The Monitor subnodes node determines what Splunk Enterprise adds to
the inputs.conf file that it creates when you define a Registry monitor input in
Splunk Web.
162
If you use the tree view to select a key or hive to monitor and check Monitor
subnodes, then Splunk Enterprise adds a regular expression to the stanza for
the input you are defining. This regular expression (\\\\?.*) filters out events
that do not directly reference the selected key or any of its subkeys.
If you do not check Monitor subnodes, then Splunk Enterprise adds a regular
expression to the input stanza which filters out events that do not directly
reference the selected key (including events that reference subkeys of the
selected key.)
If you do not use the tree view to specify the desired key to monitor, then Splunk
Enterprise adds the regular expression only if you have checked Monitor
subnodes and have not entered your own regular expression in the Registry
hive field.
8. Under Event types, select the Registry event types that you want Splunk
Enterprise to monitor for the chosen Registry hive:
Event
Description
Type
Splunk Enterprise generates a Set event when a program executes
Set a SetValue method on a Registry subkey, thus setting a value or
overwriting an existing value on an existing Registry entry.
Splunk Enterprise generates a Create event when a program
Create executes a CreateSubKey method within a Registry hive, thus
creating a new subkey within an existing Registry hive.
Splunk Enterprise generates a Delete event when a program
executes a DeleteValue or DeleteSubKey method. This method
Delete
either removes a value for a specific existing key, or removes a key
from an existing hive.
Splunk Enterprise generates a Rename event when you rename a
Rename
Registry key or subkey in RegEdit.
Splunk Enterprise generates an Open event when a program
executes an OpenSubKey method on a Registry subkey, such as
Open
what happens when a program needs configuration information
contained in the Registry.
Splunk Enterprise generates a Close event when a program
executes a Close method on a Registry key. This happens when a
Close
program is done reading the contents of a key, or after you make a
change to a key's value in RegEdit and exit the value entry window.
163
Splunk Enterprise generates a Query event when a program
Query
executes the GetValue method on a Registry subkey.
9. Specify which processes Splunk Enterprise should monitor for changes to the
Registry by entering appropriate values in the Process Path field. Or, leave the
default of C:\.* to monitor all processes.
10. Specity whether or not you want to take a baseline snapshot of the whole
Registry before monitoring Registry changes. To set a baseline, click Yes under
Baseline index.
Note: The baseline snapshot is an index of your entire Registry, at the time the
snapshot is taken. Scanning the Registry to set a baseline index is a
CPU-intensive process and might take some time.
The Input Settings page lets you specify application context, default host value,
and index. All of these parameters are optional.
2. Set the Host name value. You have several choices for this setting. Learn
more about setting the host value in About hosts.
Note: Host only sets the host field in the resulting events. It does not
direct Splunk Enterprise to look on a specific host on your network.
3. Set the Index that Splunk Enterprise should send data to. Leave the value as
"default", unless you have defined multiple indexes to handle different types of
events. In addition to indexes for user data, Splunk Enterprise has a number of
utility indexes, which also appear in this dropdown box.
4. Click Review.
After specifying all your input settings, review your selections. Splunk Enterprise
lists all options you selected, including the type of monitor, the source, the source
type, the application context, and the index.
164
2. If they do not match what you want, click < to go back to the previous step in
the wizard. Otherwise, click Submit.
Splunk Enterprise then loads the "Success" page and begins indexing the
specified Registry nodes.
To view Registry change data that Splunk Enterprise indexed, go to the Search
app and search for events with a source of WinRegistry. An example event,
which Group Policy generates when a user logs in to a domain, follows:
3:03:28.505 PM
06/19/2011 15:03:28.505
event_status="(0)The operation completed successfully."
pid=340
process_image="c:\WINDOWS\system32\winlogon.exe"
registry_type="SetValue"
key_path="HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Group
Policy\History\DCName"
data_type="REG_SZ"
data="\\ftw.ad.splunk.com"
Each registry monitoring event contains the following attributes.
Attribute Description
The result of the registry change attempt. This should always
be "(0) The operation completed successfully.". If it is not,
event_status
there might be problems with the Registry that might eventually
require a restore from a backup.
The process ID of the process that attempted to make the
pid
Registry change.
The name of the process that attempted to make the Registry
process_image
change.
The type of Registry operation that the process_image
registry_type
attempted to invoke.
The Registry key path that the process_image attempted to
key_path
make a change to.
The type of Registry data that the process_image making the
data_type
Registry change tried to get or set.
data
165
The data that the process_image making the Registry change
tried to read or write.
Filter incoming Registry events
inputs.conf contains the specific regular expressions you create to refine and
filter the Registry hive paths you want Splunk to monitor.
Attribute Description
A regular expression containing the path to the process or
proc
processes you want to monitor.
hive A regular expression that contains the hive path to the entry or
entries you want to monitor. Splunk supports the root key value
mappings predefined in Windows:
166
keys you wish to monitor by using
\\REGISTRY\\USER\\<SID>, where SID is the SID of the
user.
The subset of event types to monitor. Can be one or more of
delete, set, create, rename, open, close or query. The
type
values here must be a subset of the values for event_types
that you set in inputs.conf.
Whether or not to capture a baseline snapshot for that
baseline
particular hive path. Set to 1 for yes, and 0 for no.
How long Splunk Enterprise has to have been down before
baseline_interval re-taking the snapshot, in seconds. The default value is 86,400
seconds (1 day).
Whether or not a filter is enabled. Set to 1 to disable the filter,
disabled
and 0 to enable it.
Get a baseline snapshot
When you enable Registry monitoring, you can record a baseline snapshot of the
Registry hives the next time Splunk Enterprise starts. By default, the snapshot
covers the HKEY_CURRENT_USER and HKEY_LOCAL_MACHINE hives. It also establishes
a timeline for when to retake the snapshot: by default, if Splunk Enterprise has
been down for more than 24 hours since the last checkpoint, it retakes the
baseline snapshot. You can customize this value for each of the filters in
inputs.conf by setting the value of baseline_interval, in seconds.
The Splunk Enterprise performance monitoring utility gives you the abilities of
Performance Monitor in a web interface. Splunk Enterprise uses the Performance
Data Helper (PDH) API for performance counter queries on local machines.
The types of performance objects, counters and instances that are available to
Splunk Enterprise depend on the performance libraries installed on the system.
Both Microsoft and third-party vendors provide libraries that contain performance
counters. For information on performance monitoring, see "Performance
Counters" on MSDN.
167
Both full instances of Splunk Enterprise and universal forwarders support local
collection of performance metrics. Remote performance monitoring is available
through WMI (Windows Management Instrumentation) and requires that Splunk
Enterprise runs as a user with appropriate Active Directory credentials. If you
have Splunk Cloud and want to monitor Windows performance metrics, you must
use the Splunk universal Forwarder to collect the data and forward it to your
Splunk Cloud deployment.
The following table lists the permissions you need to monitor performance
counters in Windows. You might need additional permissions based on the
performance objects or counters that you want to monitor.
168
computer.
Security and remote access considerations
Splunk Enterprise gets data from remote machines with either a forwarder or
WMI. Splunk recommends using a universal forwarder to send performance data
from remote machines to an indexer.
If you want Splunk Enterprise to use WMI to get performance data from remote
machines, then you must configure both Splunk Enterprise and your Windows
network. You cannot install Splunk Enterprise as the Local System user, and the
user that you choose determines what Performance Monitor objects that Splunk
Enterprise can see.
After you install Splunk Enterprise with a valid user, you must add that user to the
following groups before you enable local performance monitor inputs:
To learn more about WMI security, see Security and remote access
considerations in "Monitor WMI Data". To learn about how to use a universal
forwarder, see About the universal forwarder.
You can configure local performance monitoring either in Splunk Web or with
configuration files.
Splunk Web is the preferred way to add performance monitoring data inputs. You
can make typos with configuration files, and it is important to specify performance
monitor objects exactly as the Performance Monitor API defines them. See
"Important information about specifying performance monitor objects in
inputs.conf" later in this topic for a full explanation.
169
Configure local Windows performance monitoring with Splunk
Web
Splunk Home
Splunk Settings
By Splunk Settings:
By Splunk Home:
You can only add one performance object per data input. This is due to
how Microsoft handles performance monitor objects. Many objects
enumerate classes that describe themselves dynamically upon selection.
This can lead to confusion as to which performance counters and
instances belong to which object, as defined in the input. If you need to
monitor multiple objects, create additional data inputs for each object.
170
4. In the Select Counters list box, locate the performance counters you want
this input to monitor.
5. Click once on each counter you want to monitor. Splunk Enterprise moves
the counter from the "Available counter(s)" window to the "Selected
counter(s)" window.
6. To unselect a counter, click on its name in the "Available Items" window.
Splunk Enterprise moves the counter from the "Selected counter(s)"
window to the "Available counter(s)" window.
7. To select or unselect all of the counters, click on the "add all" or "remove
all" links.
Selecting all of the counters can result in the indexing of a lot of data and
possibly lead to license violations.
8. In the Select Instances list box, select the instances that you want this
input to monitor by clicking once on the instance in the "Available
instance(s)" window. Splunk Enterprise moves the instance to the
"Selected instance(s)" window.
The "_Total" instance is a special instance, and appears for many types of
performance counters. This instance is the average of any associated
instances under the same counter. Data collected for this instance can be
significantly different than for individual instances under the same counter.
For example, when you monitor performance data for the "Disk Bytes/Sec"
performance counter under the "PhysicalDisk" object on a system with two
disks installed, the available instances include one for each physical disk -
"0 C:" and "1 D:" - and the "_Total" instance, which is the average of the
two physical disk instances.
9. In the Polling interval field, enter the time, in seconds, between polling
attempts for the input.
10. Click the green Next button.
The Input Settings page lets you specify application context, default host value,
and index. All of these parameters are optional.
Setting the Host on this page only sets the host field in the resulting events. It
does not direct Splunk Enterprise to look on a specific host on your network.
171
2. Set the Host name value. You have several choices for this setting. Learn
more about setting the host value in About hosts.
3. Set the Index that Splunk Enterprise should send data to. Leave the value
as "default", unless you have defined multiple indexes to handle different
types of events. In addition to indexes for user data, Splunk Enterprise has
a number of utility indexes, which also appear in this dropdown box.
4. Click Review.
After you specify input settings, review your selections. Splunk Enterprise lists all
options you selected, including the type of monitor, the source, the source type,
the application context, and the index.
Splunk Enterprise then loads the "Success" page and begins indexing the
specified performance metrics. For more information on getting data from files
and directories, see Monitor Windows performance in this manual.
172
name of an existing Performance Monitor object or use a
regular expression to reference multiple objects. If this
attribute is not present and defined, the input will not run, as
there is no default.
One or more valid performance counters that are associated
with the object specified in object. Separate multiple
counters with semicolons. You can also use an asterisk (*) to
counters Yes
specify all available counters under a given object. If this
attribute is not present and defined, the input will not run,
there is no default.
One or more valid instances associated with the
performance counter specified in counters. Multiple
instances No instances are separated by semicolons. Specify all instances
by using an asterisk (*), which is the default if you do not
define the attribute in the stanza.
The index to route performance counter data to. If not
index No
present, the default index is used.
Whether or not to gather the performance data defined in
disabled No this input. Set to 1 to disable this stanza, and 0 to enable it. If
not present, it defaults to 0 (enabled).
Advanced option. Whether or not Splunk Enterprise should
collect events that have values of zero.
showZeroValue No
Set to 1 to collect zero-value events, and 0 to ignore these
events. If not present, it defaults to 0 (ignore zero-value
events.)
Advanced option. How often, in milliseconds, that Splunk
should collect performance data.
173
The default is no setting (disabled).
Advanced option. When you enable high-performance
sampling, this attribute controls how Splunk Enterprise
outputs events.
174
Advanced option. Controls how Splunk Enterprise formats
the output of floating-point values for performance counter
events.
You can collect performance metrics in English even if the system that Splunk
Enterprise runs on does not use the English language.
Following are some example stanzas that show you how to use inputs.conf to
monitor performance monitor objects.
# Query the PhysicalDisk performance object and gather disk access data
for
# all physical drives installed in the system. Store this data in the
# "perfmon" index.
175
# Note: If the interval attribute is set to 0, Splunk resets the
interval
# to 1.
[perfmon://LocalPhysicalDisk]
interval = 0
object = PhysicalDisk
counters = Disk Bytes/sec; % Disk Read Time; % Disk Write Time; % Disk
Time
instances = *
disabled = 0
index = PerfMon
# Gather SQL statistics for all database instances on this SQL server.
# 'object' attribute uses a regular expression "\$.*" to specify SQL
# statistics for all available databases.
[perfmon://SQLServer_SQL_Statistics]
object = MSSQL\$.*:SQL Statistics
counters = *
instances = *
176
# Collect CPU processor usage metrics. Format the output to two decimal
places only.
[perfmon://Processor]
counters = *
disabled = 0
interval = 30
object = Processor
instances = *
formatString = %.20g
Important information about specifying performance monitor
objects in inputs.conf
When you create a performance monitor input in inputs.conf, you must use all
lower case for the perfmon keyword, for example:
Correct Incorrect
[Perfmon://CPUTime]
[perfmon://CPUTime]
[PERFMON://CPUTime]
If you use capital or mixed-case letters for the keyword, Splunk Enterprise warns
of the problem on start-up, and the specified performance monitor input does not
run.
To specify multiple objects in a single performance monitor stanza, you must use
a valid regular expression to capture those objects. For example, to specify a
wildcard to match a string beyond a certain number of characters, do not use *,
but rather .*. If the object contains a dollar sign or similar special character, you
might need to escape it with a backslash (\).
Values must exactly match what is in the Performance Monitor API if you
do not use regular expressions
When you specify values for the object, counters and instances attributes in
[perfmon://] stanzas, be sure that those values exactly match those defined in
the Performance Monitor API, including case, or the input might return incorrect
data, or no data at all. If the input cannot match a performance object, counter, or
instance value that you've specified, it logs that failure to splunkd.log. For
example:
177
01-27-2011 21:04:48.681 -0800 ERROR ExecProcessor - message from
""C:\Program Files\Splunk\bin\splunk-perfmon.exe" -noui" splunk-perfmon
- PerfmonHelper::enumObjectByNameEx: PdhEnumObjectItems failed for
object - 'USB' with error (0xc0000bb8): The specified object is not
found on the system.
Use Splunk Web to add performance monitor data inputs to ensure that you add
them correctly.
When you collect performance metrics over WMI, you must configure Splunk
Enterprise to run as an AD user with appropriate access for remote collection of
performance metrics. You must do this before attempting to collect those metrics.
Both the machine that runs Splunk Enterprise and the machine(s) Splunk collects
performance data from must reside in the same AD domain or forest.
When you gather remote performance metrics through WMI, some metrics return
zero values or values that are not in line with values that Performance Monitor
returns. A limitation in the implementation of WMI for performance monitor
counters causes this problem. This is not an issue with Splunk Enterprise or how
it retrieves WMI-based data.
WMI defines the data structures within these classes as either 32- or 64-bit
unsigned integers, depending on the version of Windows you run. The PDH API
defines Performance Monitor objects as floating-point variables. This means that
you might see WMI-based metrics that appear anomalous, due to rounding
factors.
178
For example, if you collect data on the "Average Disk Queue Length"
Performance Monitor counter at the same time you collect the
Win32_PerfFormattedData_PerfDisk_PhysicalDisk\AvgDiskQueueLength metric
through WMI, the WMI-based metric might return zero values even though the
Performance Monitor metric returns values that are greater than zero (but less
than 0.5). This is because WMI rounds the value down before displaying it.
Splunk Home
Splunk Settings
By Splunk Settings:
By Splunk Home:
179
Select the input source
1. In the Collection Name field, enter a unique name for this input that you
will remember.
2. In the Select Target Host field, enter the host name or IP address of the
Windows computer you want to collect performance data from.
3. Click "Query" to get a list of the performance objects available on the
Windows machine you specified in the "Select Target Host" field.
4. Choose the object that you want to monitor from the Select Class list.
Splunk Enterprise displays the "Select Counters" and "Select Instances"
list boxes.
You can only add one performance object per data input. This is due to
how Microsoft handles performance monitor objects. Many objects
enumerate classes that describe themselves dynamically upon selection.
This can lead to confusion as to which performance counters and
instances belong to which object, as defined in the input. If you need to
monitor multiple objects, create additional data inputs for each object.
5. In the Select Counters list box, locate the performance counters you want
this input to monitor.
6. Click once on each counter you want to monitor. Splunk Enterprise moves
the counter from the "Available counter(s)" window to the "Selected
counter(s)" window.
7. To unselect a counter, click on its name in the "Available Items" window.
Splunk Enterprise moves the counter from the "Selected counter(s)"
window to the "Available counter(s)" window.
8. To select or unselect all of the counters, click on the "add all" or "remove
all" links. Important: Selecting all of the counters can result in the
indexing of a lot of data, possibly more than your license allows.
9. In the Select Instances list box, select the instances that you want this
input to monitor by clicking once on the instance in the "Available
instance(s)" window. Splunk Enterprise moves the instance to the
"Selected instance(s)" window.
The "_Total" instance is a special instance, and appears for many types of
performance counters. This instance is the average of any associated
instances under the same counter. Data collected for this instance can be
significantly different than for individual instances under the same counter.
180
For example, when you monitor performance data for the "Disk Bytes/Sec"
performance counter under the "PhysicalDisk" object on a host with two
disks installed, the available instances include one for each physical disk -
"0 C:" and "1 D:" - and the "_Total" instance, which is the average of the
two physical disk instances.
10. In the Polling interval field, enter the time, in seconds, between polling
attempts for the input.
11. Click Next.
The Input Settings page lets you specify application context, default host value,
and index. All of these parameters are optional.
Setting the Host only sets the host field in the resulting events. It does not direct
Splunk Enterprise to look on a specific host on your network.
After specifying all your input settings, you can review your selections. Splunk
Enterprise lists all options you selected, including the type of monitor, the source,
the source type, the application context, and the index.
Splunk Enterprise then loads the "Success" page and begins indexing the
specified performance metrics.
181
Configure remote Windows performance monitoring with
configuration files
Use Splunk Web to create remote performance monitor inputs unless you do not
have access to it. The names of performance monitor objects, counters, and
instances must exactly match what the Performance Monitor API defines,
including case. Splunk Web uses WMI to get the properly-formatted names,
eliminating the potential for typos.
wmi.conf contains one stanza for each remote performance monitor object that
you want to monitor. In each stanza, you specify the following content.
Global settings
182
with a WMI provider, to
continue to attempt to
reconnect to that
provider.
How long, in seconds, to
checkpoint_sync_interval No wait for state data to be 2
flushed to disk.
Input-specific settings
183
The namespace in which the WMI
provider you want to query resides.
The value for this attribute can be
either relative (Root\CIMV2) or
absolute (\\SERVER\Root\CIMV2),
but must be relative if you specify
the server attribute.
184
Tells Splunk whether or not to
gather the performance data
disabled No defined in this input. Set this to 1 to 0
disable performance monitoring for
this stanza, and 0 to enable it.
Examples of using wmi.conf
The following example of wmi.conf gathers local disk and memory performance
metrics and places them into the 'wmi_perfmon' index:
[settings]
initial_backoff = 5
max_backoff = 20
max_retries_at_max_backoff = 2
checkpoint_sync_interval = 2
# Gather disk and memory performance metrics from the local system every
second.
# Store event in the "wmi_perfmon" Splunk index.
[WMI:LocalPhysicalDisk]
interval = 1
wql = select Name, DiskBytesPerSec,
PercentDiskReadTime,PercentDiskWriteTime, PercentDiskTime from \
Win32_PerfFormattedData_PerfDisk_PhysicalDisk
disabled = 0
index = wmi_perfmon
[WMI:LocalMainMemory]
interval = 10
wql = select CommittedBytes, AvailableBytes,
PercentCommittedBytesInUse, Caption from \
Win32_PerfFormattedData_PerfOS_Memory
disabled = 0
index = wmi_perfmon
Additional information on WQL query statements
WQL queries must be structurally and syntactically correct. If they are not, you
might get undesirable results or no results at all. In particular, when writing event
notification queries (by specifying current_only=1 in the stanza in which a WQL
query resides), your WQL statement must contain one of the clauses that specify
such a query (WITHIN, GROUP, and/or HAVING). Review this MSDN article on
Querying with WQL for additional information.
Splunk Web eliminates problems with WQL syntax by generating the appropriate
WQL queries when you use it to create performance monitor inputs.
185
Caveats to using the performance monitoring input
When you collect data on some performance objects, such as the "Thread"
object and its associated counters, you might notice increased memory usage in
Splunk. This is normal, as certain performance objects consume more memory
than others during the collection process.
Due to how Microsoft tallies CPU usage with the Processor:% Processor Time
and Process:% Processor Time counters, these counters do not return a value of
more than 100 regardless of the number of CPUs or cores in the system. This is
by design - these counters subtract the amount of time spent on the Idle process
from 100%.
If you set the attribute to true, you cannot use wildcards or regular expressions
for the object and counters attributes. These attributes must contain specific
entries based on valid English values as defined in the Performance Data Helper
library. You can specify a wildcard for the instances attribute. Here's an
example:
[perfmon://Processor]
object = Processor
instances = _Total
counters = % Processor Time;% User Time
useEnglishOnly = 1
interval = 30
disabled = 0
The counters attribute contain values in English even though the system
language is not English.
If you set the attribute to false, you can use wildcards and regular expressions
for these attributes, but you must specify values based on the operating system's
language. An example of a stanza on a system running in French follows:
186
[perfmon://FrenchProcs]
counters = *
disabled = 0
useEnglishOnly = 0
interval = 30
object = Processeur
instances = *
Note in this example that the object attribute has been set to Processeur, which
is the French equivalent of Processor. If you specify English values here, Splunk
Enterprise will not find the performance object or instance.
If you have Splunk Cloud and want to monitor script output, use the universal
forwarder to consume the output and forward it to your Splunk Cloud
deployment.
187
What do you need to monitor data with PowerShell scripts?
188
Whether or not to enable the input.
disabled 0 (enabled)
Set to 1 to disable and 0 to enable
Following are some examples of how to configure the input:
Single command example: This example runs the Get-Process cmdlet and
pipes that output to the Select-Object cmdlet using the host name that Splunk
software has been installed on as an argument. It runs the command every 5
minutes.
[powershell://Processes-EX1]
script = Get-Process | Select-Object Handles, NPM, PM, WS, VM, Id,
ProcessName, @{n="SplunkHost";e={$Env:SPLUNK_SERVER_NAME}}
schedule = 0 */5 * * *
sourcetype = Windows:Process
Script example: This example runs the getprocesses.ps1 script located in
%SPLUNK_HOME\etc\apps\My-App. It sets the source type for these events to
Windows:Process. The script runs every 20 minutes from 9:00am to 4:40pm on
Mondays to Fridays.
[powershell://Processes-EX2]
script = . "$SplunkHome\etc\apps\My-App\bin\getprocesses.ps1"
schedule = 0 */20 9-16 * 1-5
sourcetype = Windows:Process
For information on writing PowerShell scripts, see Write scripts for the
PowerShell input.
189
Write scripts for the PowerShell input
Architecture
You can define many PowerShell stanzas and run them simultaneously. You can
schedule each stanza through the cron syntax. Because all scripts run within the
same process, scripts share environment variables such as the current working
directory.
Note: The input does not set a host variable in your PowerShell environment.
When you write a script for the input, do not refer to $host or use the Write-Host
or Out-Host PowerShell cmdlets. Instead, use either the Write-Output or
Write-Error cmdlets.
The input converts all output to key/value pairs based on public properties that
are defined in the schema.
Get-LocalStoragePath
Export-LocalStorage
Import-LocalStorage
These cmdlets use the Splunk Enterprise checkpoint directory and let you
maintain key/value pairs of data between scheduled runs of your script. Normally,
data does not persist from one invocation to the next.
Specify paths
The input sets the SplunkHome variable so you can easily address scripts in
add-ons by writing paths like this:
[powershell://MSExchange_Health]
script=.
$SplunkHome/etc/apps/TA-Exchange-2010/powershell/health.ps1
190
Besides $SplunkHome, there are several other read-only constant variables:
Variable Description
SplunkServerName The name configured for this machine to use in events
SplunkServerUri The Splunk Enterprise REST API address.
The session key (authentication token) needed for
SplunkSessionKey
accessing the Splunk Enterprise REST API.
SplunkCheckpointPath The path for storing persistent state
The name of the Splunk Enterprise instance that you
SplunkServerHost
want to communicate with.
The name of the inputs.conf stanza that defined this
SplunkStanzaName
script.
Handle output of PowerShell scripts
Splunk Enterprise takes each object that your script produces as an output and
turns it into an event, wrapped in <event> and tags. Splunk Enterprise converts
the properties of each object into key/value pairs. However, the value can only be
a quoted string, converted by calling the .ToString() method. Thus, the output
must be simple, and you should flatten any complex nested objects in your script
before the script outputs them.
There are a few special property names which have significance for Splunk
Enterprise modular inputs and let you override the defaults in the inputs.conf
stanza. They are:
Property Description
SplunkIndex Overrides the index that the output will be stored in.
SplunkSource Overrides the "source" for the ouput.
SplunkHost Overrides the "host" name for the output.
SplunkSourceType Overrides the "sourcetype" for the output.
SplunkTime Overrides the "time". If you do not specify this, all objects
that your script generates in a single execution will get
roughly the same timestamp. This is because the script
holds the objects for output until it has finished executing,
and then marks the objects with the output time. You must
specify this value in epoch or POSIX time, which is a
positive integer that represents the seconds that have
191
elapsed since 0:00 UTC on Thursday, January 1, 1970.
These properties never appear as objects in the key/value output.
If you want to set these properties and override the defaults, use a calculated
expression with the Select-Object cmdlet or use the Add-Member cmdlet to add a
NoteProperty property.
The input currently requires that any PowerShell scripts it executes produce
output objects that do not have any script properties. Pipe output through the
Select-Object cmdlet to ensure proper formatting.
The input currently does not process the output of scripts until your pipeline and
runspace are finished. This means the input does not process ScriptProperty
values. It also means that all of your output essentially has the same timestamp,
unless you override it using the SplunkTime variable.
When writing your scripts, avoid long-running scripts. Do not write scripts that
wait for things to happen unless the scripts exit every time there is output.
General computer. The make and model of the computer, its host name
and the Active Directory domain it is in.
Operating system. The version and build number of the operating system
installed on the computer, as well as any service packs; the computer
name; the last time it was started, the amount of installed and free
memory, and the system drive.
Processor. The make and model of the CPU(s) installed in the system,
their speed and version, the number of processor(s) and core(s), and the
processor ID.
Disk. A listing of all drives available to the system and, if available, their
file system type and total and available space.
Network Adapter. Information about the installed network adapters in the
system, including manufacturer, product name and MAC address.
192
Service. Information about the installed services on the system, including
name, display name, description, path, service type, start mode, state, and
status.
Process. Information on the running processes on the system, including
the name, the command line (with arguments), when they were started,
and the executable's path.
Both full instances of Splunk Enterprise and universal forwarders support local
collection of host information. If you have Splunk Cloud and want to monitor host
information, use the universal forwarder to collect the data and forward it to your
Splunk Cloud deployment.
Windows host monitoring gives you detailed information about your Windows
hosts. You can monitor changes to the system, such as installation and removal
of software, the starting and stopping of services, and uptime. When a system
failure occurs, you can use Windows host monitoring information as a first step
into the forensic process. With the Splunk Enterprise search language, you can
give your team at-a-glance statistics on all machines in your Windows network.
Splunk Enterprise must run as the Local System user to collect Windows host
information by default.
193
If you choose to install forwarders on your remote machines to collect Windows
host data, then you can install the forwarder as the Local System user on these
machines. The Local System user has access to all data on the local machine,
but not on remote machines.
If you run Splunk Enterprise as a user other than the "Local System" user, then
that user must have local Administrator rights on the machine that you want to
collect host data. It must also have other permissions, as detailed in Choose the
Windows user Splunk Enterprise should run as in the Installation manual.
Splunk Home
Splunk Settings
By Splunk Settings:
By Splunk Home:
2. Click Monitor to monitor host information from the local Windows machine.
1. In the left pane, locate and select Local Windows host monitoring.
2. In the Collection Name field, enter a unique name for this input that you will
remember.
194
3. In the Event Types list box, locate the host monitoring event types you want
this input to monitor.
4. Click once on each type you want to monitor. Splunk Enterprise moves the
type from the "Available type(s)" window to the "Selected type(s)" window.
5. To unselect a type, click on its name in the "Selected type(s)" window. Splunk
Enterprise moves the counter from the "Selected type(s)" window to the
"Available type(s)" window.
6. (Optional) To select or unselect all of the types, click on the "add all" or
"remove all" links. Note: Selecting all of the types can result in the indexing of a
lot of data, possibly more than your license allows.
7. In the Interval field, enter the time, in seconds, between polling attempts for
the input.
8. Click Next.
The Input Settings page lets you specify application context, default host value,
and index. All of these parameters are optional.
2. Set the Host name value. You have several choices for this setting. Learn
more about setting the host value in About hosts.
Note: Host only sets the host field in the resulting events. It does not
direct Splunk Enterprise to look on a specific host on your network.
3. Set the Index that Splunk Enterprise should send data to. Leave the value as
"default", unless you have defined multiple indexes to handle different types of
events. In addition to indexes for user data, Splunk Enterprise has a number of
utility indexes, which also appear in this dropdown box.
4. Click Review.
After specifying all your input settings, review your selections. Splunk Enterprise
lists all options you selected, including the type of monitor, the source, the source
195
type, the application context, and the index.
2. If they do not match what you want, click < to go back to the previous step in
the wizard. Otherwise, click Submit.
Splunk Enterprise then loads the "Success" page and begins indexing the
specified host information.
You can edit inputs.conf to configure host monitoring. For more information on
how to edit configuration files, see About configuration files in the Admin manual.
3. Copy the Windows event log input stanzas you want to enable from
%SPLUNK_HOME%\etc\system\default\inputs.conf.
5. Make edits to the stanzas to collect the Windows event log data you desire.
196
attribute, the input does not run, as there is no
default.
The type of host information to monitor. Can be one
of Computer, operatingSystem, processor, disk,
type Yes
networkAdapter, service, process,, or driver. The
input does not run if this attribute is not present.
Whether or not to run the input. If you set this
disabled No attribute to 1, then Splunk Enterprise does not run
the input.
Examples of Windows host monitoring configurations
Following are some examples of how to use the Windows host monitoring
configuration attributes in inputs.conf.
# Queries OS information.
# 'interval' set to a negative number tells Splunk Enterprise to
# run the input once only.
[WinHostMon://os]
type = operatingSystem
interval = -1
197
# Queries information on running processes.
# This example runs the input every 5 minutes.
[WinHostMon://process]
type = process
interval = 300
Fields for Windows host monitoring data
When Splunk Enterprise indexes data from Windows host monitoring inputs, it
sets the source for received events to windows. It sets the source type of the
incoming events to WinHostMon.
Answers
Have questions? Visit Splunk Answers and see what questions and answers the
Splunk community has around Windows host information.
Both full instances of Splunk Enterprise and universal forwarders support local
collection of printer subsystem information. If you have Splunk Cloud and want to
monitor printer subsystem information, use the universal forwarder to consume
the information and forward it to your Splunk Cloud deployment.
198
Why monitor printer information?
Windows printer monitoring gives you detailed information about your Windows
printer subsystem. You can monitor any changes to the system, such as
installation and removal of printers, print drivers, and ports, the starting and
completion of print jobs, and learn who printed what when. When a printer failure
occurs, you can use print monitoring information as a first step into the forensic
process. With the Splunk Enterprise search language, you can give your team
at-a-glance statistics on all printers in your Windows network.
Splunk Enterprise must run as the Local System user to collect Windows print
subsystem information by default.
If you run Splunk Enterprise as a user other than the "Local System" user, then
that user must have local Administrator rights to the machine, and other
permissions as detailed in Choose the Windows user Splunk Enterprise should
run as in the Installation manual.
Splunk Home
Splunk Settings
199
By Splunk Settings:
By Splunk Home:
2. Click Monitor to monitor print information from the local Windows machine.
3. In the left pane, locate and select Local Windows print monitoring.
1. In the Collection Name field, enter a unique name for this input that you will
remember.
2. In the Event Types list box, locate the print monitoring event types you want
this input to monitor.
3. Click once on each type you want to monitor. Splunk Enterprise moves the
type from the "Available type(s)" window to the "Selected type(s)" window.
4. To unselect a type, click on its name in the "Selected type(s)" window. Splunk
Enterprise moves the counter from the "Selected type(s)" window to the
"Available type(s)" window.
5. (Optional) To select or unselect all of the types, click on the "add all" or
"remove all" links. Important: Selecting all of the types can result in the indexing
of a lot of data, possibly more than your license allows.
6. In the Baseline control, click the Yes radio button to run the input as soon as it
starts, and no further. Click No to run the input at the interval specified in the
Interval (in minutes) field.
200
Specify input settings
The Input Settings page lets you specify application context, default host value,
and index. All of these parameters are optional.
2. Set the Host name value. You have several choices for this setting. Learn
more about setting the host value in About hosts.
Note: Host only sets the host field in the resulting events. It does not direct
Splunk Enterprise to look on a specific host on your network.
3. Set the Index that Splunk Enterprise should send data to. Leave the value as
"default", unless you have defined multiple indexes to handle different types of
events. In addition to indexes for user data, Splunk Enterprise has a number of
utility indexes, which also appear in this dropdown box.
4. Click Review.
After specifying all your input settings, review your selections. Splunk Enterprise
lists all options you selected, including the type of monitor, the source, the source
type, the application context, and the index.
2. If they do not match what you want, click < to go back to the previous step in
the wizard. Otherwise, click Submit.
Splunk Enterprise then loads the "Success" page and begins indexing the
specified print information.
You can edit inputs.conf to configure host monitoring. For information on how to
edit configuration files, see About configuration files in the Admin manual.
2. Use Explorer or the ATTRIB command to remove the file's "Read Only" flag.
201
3. Open the file and edit it to enable Windows print monitoring inputs.
4. Restart Splunk.
Following are some examples of how to use the Windows host monitoring
configuration attributes in inputs.conf.
202
Fields for Windows print monitoring data
When Splunk Enterprise indexes data from Windows print monitoring inputs, it
sets the source for received events to windows. It sets the source type of the
incoming events to WinPrintMon.
Answers
Have questions? Visit Splunk Answers and see what questions and answers the
Splunk community has around Windows print monitoring.
Both full instances of Splunk Enterprise and universal forwarders support local
collection of network information. If you have Splunk Cloud and want to monitor
network information, use the universal forwarder to collect the data and forward it
to your Splunk Cloud deployment.
203
The network monitor input runs as a process called splunk-netmon.exe. This
process runs once for every input defined, at the interval specified in the input.
You can configure network monitoring using Splunk Web or inputs.conf.
Windows network monitoring gives you detailed information about your Windows
network activity. You can monitor all transactions on the network, such as the
initiation of a network connection by a user or process or whether or not the
transaction uses the IPv4 or IPv6 address families. The network monitoring
facilities in Splunk Enterprise can help you detect and interrupt an incoming (or
outgoing) denial of service attack by telling you the involved machines. With
Splunk Enterprise search language, you can give your team at-a-glance statistics
on all Windows network operations.
Activity Requirements
204
Security and remote access considerations
Splunk Enterprise must run as the Local System user to collect Windows network
information by default.
If you run Splunk Enterprise as a user other than the "Local System" user, then
that user must have local Administrator rights to the machine and other explicit
permissions, as detailed in Choose the Windows user Splunk Enterprise should
run as in the Installation manual.
Splunk Home
Splunk Settings
By Splunk Settings:
By Splunk Home:
2. Click Monitor to monitor network information from the local Windows machine,
or Forward to forward network information from another Windows machine.
Splunk Web displays the "Add Data - Select Source" page.
205
Note: Forwarding network information requires additional setup.
3. In the left pane, locate and select Local Windows network monitoring.
1. In the Network Monitor Name field, enter a unique name for this input that
you will remember.
2. Under Address family, check the IP address family types that you want
Splunk Enterprise to monitor (either IPv4 or IPv6.)
3. Under Packet Type, check the packet types you want the input to monitor
(any of connect, accept, or transport.)
4. Under Direction, check the network directions that you want the input to
monitor (any of inbound (toward the monitoring host) or outbound (away from
the monitoring host).
5. Under Protocol, check the network protocol types that you want the input to
monitor (any of tcp (Transmission Control Protocol) or udp (User Datagram
Protocol).
6. In the Remote address text field, enter the host name or IP address of a
remote host whose network communications with the monitoring host that you
want the input to monitor.
Note: If you want to monitor multiple hosts, enter a regular expression in this
field.
7. In the Process text field, enter the partial or full name of a process whose
network communications you want the input to monitor.
Note: As with the remote address, you can monitor multiple processes by
entering a regular expression.
8. In the User text field, enter the partial or full name of a user whose network
communications you want the input to monitor.
Note: As with the remote address and process entries, you can monitor multiple
users by entering a regular expression in this field.
9. Click Next.
206
Specify input settings
The Input Settings page lets you specify application context, default host value,
and index. All of these parameters are optional.
2. Set the Host name value. You have several choices for this setting. Learn
more about setting the host value in About hosts.
Note: Host only sets the host field in the resulting events. It does not direct
Splunk Enterprise to look on a specific host on your network.
3. Set the Index that Splunk Enterprise should send data to. Leave the value as
"default", unless you have defined multiple indexes to handle different types of
events. In addition to indexes for user data, Splunk Enterprise has a number of
utility indexes, which also appear in this dropdown box.
4. Click Review.
After specifying all your input settings, review your selections. Splunk Enterprise
lists all options you selected, including the type of monitor, the source, the source
type, the application context, and the index.
2. If they do not match what you want, click < to go back to the previous step in
the wizard. Otherwise, click the green Submit button.
Splunk Enterprise then loads the "Success" page and begins indexing the
specified print information.
2. Use Explorer or the ATTRIB command to remove the file's "Read Only" flag.
207
3. Open the file and edit it to enable Windows network monitoring inputs.
4. Restart Splunk.
The next section describes the specific configuration values for host monitoring.
208
processes that match the
regular expression.
Matches against the user name
which performed network
access. Filters out events
(empty string -
user = <regular generated by users that do not
includes access by
expression> match the regular expression.
all users)
Passes through events
generated by users that match
the regular expression.
If set, matches against the
address family used in the (empty string -
addressFamily =
network access. Accepts includes all IP
[ipv4;ipv6]
semicolon-separated values, for traffic.)
example "ipv4;ipv6".
Matches against the packet
type used in the transaction. (empty string -
packetType =
Accepts semicolon-separated includes all packet
[connect;accept;transport]
values, for example types.)
"connect;transport".
209
"udp" means User Datagram
Protocol, a stateless, "fire and
forget" protocol.
Accepts semicolon-separated
values, for example "tcp;udp".
Advanced option. Use the
default value unless there is a
problem with input
performance.
210
multikv (key-value pair) mode.
Advanced option. Use the
default value unless there is a
problem with input
performance.
multikvMaxEventCount =
100
<integer> The maximum amount of
events to output when you set
mode to multikv. The minimum
legal value is 10 and the
maximum legal value is 500.
Advanced option. Use the
default value unless there is a
problem with input
performance.
multikvMaxTimeMs =
The maximum amount of time, 1000
<integer>
in milliseconds, to output
mulitkv events when you set
mode to multikv. The minimum
legal value is 100 and the
maximum legal value is 5000.
Fields for Windows network monitoring data
When Splunk Enterprise indexes data from Windows network monitoring inputs,
it sets the source for received events to windows. It sets the source type of the
incoming events to WinNetMon.
If you encounter issues while running the network monitoring input on a Windows
Vista, Windows 7, Windows Server 2008, or Windows Server 2008 R2 machine,
confirm that you have updated the machine with all available patches, including
the Kernel-Mode Driver Framework version 1.11 Update
(http://support.microsoft.com/kb/2685811) that is part of Knowledge Base article
2685811. Network monitoring input might not function if this update is not present
on your system.
211
Answers
Have questions? Visit Splunk Answers and see what questions and answers the
Splunk community has around Windows network monitoring.
212
Get other kinds of data in
Note: Data that you send over FIFO queues does not remain in computer
memory and can be an unreliable method for data sources. To ensure data
integrity, use the monitor input instead.
If you have not worked with configuration files before, read About Configuration
Files in the Admin manual before you begin.
This input stanza configures Splunk Enterprise to read from a FIFO queue at the
specified path.
[fifo://<path>]
<attribute1> = <val1>
<attribute2> = <val2>
...
You can use the following attributes with FIFO stanzas:
213
to set the host field. It also
uses the host field at search
time.
The index where events
main, or whatever
from this input will be stored.
index = <string> you have set as
The <string> is prepended
your default index.
with 'index::'.
The sourcetype key/field for
events from this input.
Explicitly declares the
source type for this data, as
opposed to letting it be
determined automatically.
This is important both for
searchability and for
applying the relevant
formatting for this type of
data during parsing and
indexing. Splunk software
picks a source
Sets the sourcetype key's type based on
sourcetype = <string> initial value. This value is various aspects of
used during parsing and the data. There is
indexing to set the source no hard-coded
type field. It is also the default.
source type field used at
search time.
The <string> is
prepended with
'sourcetype::'.
For more information
about source types,
see Why source
types matter in this
manual.
source = <string> Sets the source key/field for The input file path.
events from this input. The
<string> is prepended with
'source::'.
214
Do not override the source
field unless absolutely
necessary. The input layer
provides a more accurate
string to aid in problem
analysis and investigation,
accurately recording the file
from which the data was
retreived. Consider use of
source types, tagging, and
search wildcards before
overriding this value.
Where the input processor
should deposit the events
that it reads.
For a list of all deprecated features, see the topic Deprecated features in the
Release Notes.
The Splunk Enterprise file system change monitor tracks changes in your file
system. The monitor watches a directory you specify and generates an event
when that directory undergoes a change. It is completely configurable and can
detect when any file on the system is edited, deleted, or added (not just
Splunk-specific files).
215
For example, you can tell the file system change monitor to watch
/etc/sysconfig/ and alert you any time the system configurations change.
To monitor file system changes on Windows, see Monitor file system changes in
this manual to learn how with Microsoft native auditing tools.
modification date/time
group ID
user ID
file mode (read/write attributes, etc.)
optional SHA256 hash of file contents
You can configure the following features of the file system change monitor:
By default, the file system change monitor generates audit events whenever the
contents of $SPLUNK_HOME/etc/ are changed, deleted, or added to. When you
start Splunk Enterprise for the first time, it generates an audit event for each file
in the $SPLUNK_HOME/etc/ directory and all subdirectories. Afterward, any change
in configuration (regardless of origin) generates an audit event for the affected
file. If you have configured signedaudit=true, Splunk Enterprise indexes the file
system change into the audit index (index=_audit). If signedaudit is not turned
on, by default, Splunk Enterprise writes the events to the main index unless you
specify another index.
The file system change monitor does not track the user name of the account
executing the change, only that a change has occurred. For user-level
216
monitoring, consider using native operating system audit tools, which have
access to this information.
Caution: Do not configure the file system change monitor to monitor your root
file system. This can be dangerous and time-consuming if directory recursion is
enabled.
Configure the file system change monitor in inputs.conf. There is no support for
configuring the file system change monitor in Splunk Web. You must restart
Splunk Enterprise any time you make changes to the [fschange] stanza.
1. Open inputs.conf.
If you want to use this feature with forwarding, follow these guidelines:
To use the file system change monitor to watch any directory, add or edit an
[fschange] stanza to inputs.conf in $SPLUNK_HOME/etc/system/local/ or your
own custom application directory in $SPLUNK_HOME/etc/apps/. For information on
configuration files in general, see About configuration files in the Admin manual.
Syntax
217
Splunk Enterprise monitors all adds/updates/deletes to the directory and
its subdirectories.
Any change generates an event that Splunk indexes.
<directory or file to monitor> defaults to $SPLUNK_HOME/etc/.
Attributes
If you make a
change, the file
Check this directory for system audit events
pollPeriod=N changes every N could take anywhere
seconds. between 1 and 3600
seconds to be
generated and
become available in
audit search.
hashMaxSize=N Calculate a SHA1 hash -1 (no hashing used
for every file that is less for change
218
than or equal to N size in detection).
bytes.
219
processing <integer>
files. This throttles file
system monitoring so it
does not consume as
much CPU.
The delay in milliseconds
to use after processing
every <integer> files as
specified in
delayInMills = <integer>
filesPerDelay. This is
used to throttle file system
monitoring so it does not
consume as much CPU.
Each of these filters will
apply from left to right for
each file or directory that
is found during the
filters=<filter1>,<filter2>,...<filterN> n/a
monitors poll cycle. See
the next section for
information on defining
filters.
Define a filter
To define a filter to use with the filters attribute, add a [filter...] stanza as
follows:
[filter:blacklist:backups]
regex1 = .*bak
regex2 = .*bk
[filter:whitelist:code]
regex1 = .*\.c
regex2 = .*\.h
[fschange:/etc]
filters = backups,code
The following list describes how Splunk Enterprise handles fschange whitelist
and blacklist logic:
The events run down through the list of filters until they reach their first
match.
If the first filter to match an event is a whitelist, then Splunk Enterprise
indexes the event.
220
If the first filter to match an event is a blacklist, the filter prevents the event
from getting indexed.
If an event reaches the end of the chain with no matches, then Splunk
Enterprise indexes the event. This means that there is an implicit "all
pass" filter built in.
To default to a situation where Splunk Enterprise does not index events if they
don't match a whitelist explicitly, end the chain with a blacklist that matches all
remaining events.
For example:
...
filters = <filter1>, <filter2>, ... terminal-blacklist
[filter:blacklist:terminal-blacklist]
regex1 = .?
If you blacklist a directory including a terminal blacklist at the end of a series of
whitelists, then Splunk Enterprise blacklists all its subfolders and files, as they do
not pass any whitelist. To accommodate this, whitelist all desired folders and
subfolders explicitly ahead of the blacklist items in your filters.
This configuration monitors files in the specified directory with the extensions
.config, .xml, .properties, and .log and ignores all others.
[filter:whitelist:configs]
regex1 = .*\.config
regex2 = .*\.xml
regex3 = .*\.properties
regex4 = .*\.log
[filter:blacklist:terminal-blacklist]
regex1 = .?
[fschange:/var/apache]
index = sample
recurse = true
followLinks = false
221
signedaudit = false
fullEvent = true
sendEventMaxSize = 1048576
delayInMills = 1000
filters = configs,terminal-blacklist
Use with a universal forwarder
To forward file system change monitor events from a universal forwarder, you
must set signedaudit = false and index=_audit.
This topic describes how to add scripted inputs that you have already written. To
learn how to write scripted inputs, see Build scripted inputs in the Developing
Views and Apps for Splunk Web manual.
You can configure scripted inputs from the Settings menu or by editing
inputs.conf.
When a scripted input launches a script, that script inherits the Splunk Enterprise
environment. Clear any environment variables that can affect the operation of a
script. The only environment variable that could cause problems is the library
path (most commonly known as LD_LIBRARY_PATH on Linux, Solaris, and
222
FreeBSD).
Splunk Enterprise logs any messages that scripted inputs send to the stderr I/O
channel to splunkd.log.
Splunk Home
Splunk Settings
By Splunk Settings:
By Splunk Home:
1. In the Script Path drop down, select the path where the script resides.
Splunk Web updates the page to include a new drop down list, "Script
Name."
2. In the Script Name drop-down, select the script that you want to run.
Splunk Web updates the page to populate the "Command" field with the
script name.
3. In the Command field, add any arguments needed to invoke the script.
4. In the Interval field, enter the amount of time (in seconds) that Splunk
Enterprise should wait before invoking the script.
223
5. Optionally, In the Source Name Override field, enter a new source name
to override the default source value, if necessary.
6. Click Next.
The Input Settings page lets you specify application context, default host value,
and index. All of these parameters are optional. Learn more about setting the
host value in "About hosts".
When you set the Host on this page, this only sets the host field in the resulting
events. It does not direct Splunk Enterprise to look on a specific host on your
network.
1. Select the source type for the script. You can choose Select to pick from
the list of available source types on the local machine, or "Manual" to
enter the name of a source type.
2. Select the appropriate Application context for this input.
3. Set the Host name value. You have several choices for this setting.
4. Set the Index that Splunk Enterprise should send data to. Leave the value
as "default", unless you have defined multiple indexes to handle different
types of events. In addition to indexes for user data, Splunk Enterprise has
a number of utility indexes, which also appear in this drop down box.
5. Click Review.
After specifying all your input settings, review your selections. Splunk Web lists
all options you selected, including the type of monitor, the source, the source
type, the application context, and the index.
224
Syntax
[script://$SCRIPT]
<attrbute1> = <val1>
<attrbute2> = <val2>
...
The script that you reference in $SCRIPT can only reside in one of the following
places on the host file system:
$SPLUNK_HOME/etc/system/bin
$SPLUNK_HOME/etc/apps/<your_app>/bin
$SPLUNK_HOME/bin/scripts
As a best practice, put your script in the bin/ directory that is nearest the
inputs.conf that calls your script on the host filesystem. For example, if you
configure $SPLUNK_HOME/etc/system/local/inputs.conf, place your script in
$SPLUNK_HOME/etc/system/bin/. If you work on an application in
$SPLUNK_HOME/etc/apps/$APPLICATION/, put your script in
$SPLUNK_HOME/etc/apps/$APPLICATION/bin/.
Attributes
225
cron schedule.
226
Sets the sourcetype key initial value.
Splunk Enterprise uses this key is
during parsing/indexing, in particular
to set the source type field during
indexing. It also uses the source type
field at search time.
* Sets the source key/field for events
from this input.
If you want the script to run continuously, write the script to never exit and set it
on a short interval. This helps to ensure that if there is a problem the script gets
restarted. Splunk Enterprise keeps track of scripts it has spawned and shuts
them down on exit.
It is best practice to write a wrapper script for scripted inputs that use commands
with arguments. In some cases, the command can contain special characters
that the scripted input escapes when it validates text that you have entered in
Splunk Web. This causes updates to a previously configured input to fail to save.
Splunk Enterprise escapes characters that should not be in paths, such as the
227
equals sign (=) and semicolon (;) when it validates text. For example, the
following scripted input is not correctly saved when you edit it in Splunk Web
because the scripted input escapes the equals (=) sign in the parameter to the
myUtil.py utility:
[script://$SPLUNK_HOME/etc/apps/myApp/bin/myUtil.py file=my_datacsv]
disabled = false
To avoid this problem, write a wrapper script that contains the scripted input, or
use the special .path argument for the scripted input stanza name. For
information on writing wrapper scripts, see Scripted inputs overview in the
Developing Views and Apps for Splunk Web manual.
When you update scripted Inputs by editing inputs.conf directly, this validation
does not occur.
As an alternative to writing a wrapper script, you can configure the scripted input
to reference a script or executable that is anywhere on the host file system.
The script that you reference can have a single line that calls the script or
executable that you want. You can use this file to call a runtime environment that
is outside of the Splunk Enterprise environment. For example, if you have both
Splunk Enterprise, which comes with Python, and a second installation of Python
on the same host, you can use the .path method to reference the second Python
installation.
1. Use Splunk Web or edit inputs.conf and specify a scripted input stanza
with a script name that ends in .path.
[script://myfile.path]
disabled = 0
2. Place the file that you reference in the stanza in the appropriate directory,
as described in Where to place the scripts for scripted inputs.
3. Edit the file to specify the script or executable you want.
This example shows the use of the UNIX top command as a data input source:
228
1. Create a new application directory. This example uses scripts/.
$ mkdir $SPLUNK_HOME/etc/apps/scripts
2. All scripts should be run out of a bin/ directory inside your application
directory.
$ mkdir $SPLUNK_HOME/etc/apps/scripts/bin
$ #!/bin/sh
top -bn 1 # linux only - different OSes have different
parameters
chmod +x $SPLUNK_HOME/etc/apps/scripts/bin/top.sh
$SPLUNK_HOME/etc/apps/scripts/bin/top.sh
The script should send one top output.
[script:///opt/splunk/etc/apps/scripts/bin/top.sh]
interval = 5 # run every 5 seconds
sourcetype = top # set sourcetype to top
source = script://./bin/top.sh # set source to name of script
Note: You might need to modify props.conf:
229
[top]
BREAK_ONLY_BEFORE = <stuff>
Since there is no timestamp in the top output, you must tell Splunk
Enterprise to use the current time. Use props.conf and set the following:
DATETIME_CONFIG = CURRENT
The following example uses the special .path stanza setting to reference an
external build of Python to run a script on your host.
1. Edit inputs.conf.
[script://loglogs.path]
disabled = 0
2. Place or create loglogs.path in $SPLUNK_HOME/etc/system/bin.
3. Edit loglogs.path to reference the external version of Python.
In the above example, you can also set the interval attribute to a "cron"
schedule by specifying strings like the following:
*/15 9-17 * * 1-5: Means run every 15 minutes from 9 am until 5 pm, on
Monday to Friday.
15,35,55 0-6,20-23 1 */2 *: Means run at 15, 35, and 55 minutes after the
hour, between midnight and 7 am and again between 8pm and midnight, on the
first of every even month (February, April, June and so on).
For more information about setting cron schedules, read CRONTAB(5) on the
Crontab website.
230
Configure event processing
For an overview of the indexing process, see the Indexing overview chapter of
the Managing Indexers and Clusters manual.
231
You can retrieve a list of the valid character encoding specifications by using the
iconv -l command on most *nix systems. A port for iconv on Windows is
available.
UTF-8
UTF-16LE
Latin-1
BIG5
SHIFT-JIS
See "Comprehensive list of supported character sets" at the end of this topic for
the exhaustive list.
Here is a short list of the main supported character sets and the languages they
correspond to.
Language Code
Arabic CP1256
Arabic ISO-8859-6
Armenian ARMSCII-8
Belarus CP1251
Bulgarian ISO-8859-5
Czech ISO-8859-2
Georgian Georgian-Academy
Greek ISO-8859-7
Hebrew ISO-8859-8
Japanese EUC-JP
Japanese SHIFT-JIS
Korean EUC-KR
Russian CP1251
Russian ISO-8859-5
232
Russian KOI8-R
Slovak CP1250
Slovenian ISO-8859-2
Thai TIS-620
Ukrainian KOI8-U
Vietnamese VISCII
Manually specify a character set
To manually specify a character set to apply to an input, set the CHARSET key in
props.conf:
[spec]
CHARSET=<string>
For example, if you have a host that generates data in Greek (called
"GreekSource" in this example) and that uses ISO-8859-7 encoding, set
CHARSET=ISO-8859-7 for that host in props.conf:
[host::GreekSource]
CHARSET=ISO-8859-7
Note: Splunk software parses only character encodings that have UTF-8
mappings. Some EUC-JP characters do not have a mapped UTF-8 encoding.
Splunk software can automatically detect languages and proper character sets
using its sophisticated character set encoding algorithm.
[host::my-foreign-docs]
CHARSET=AUTO
Train Splunk software to recognize a character set
If you want to use a character set encoding that Splunk software does not
recognize, train it to recognize the character set by adding a sample file to the
233
following path and restarting Splunk Enterprise:
$SPLUNK_HOME/etc/ngram-models/_<language>-<encoding>.txt
For example, if you want to use the "vulcan-ISO-12345" character set, copy the
specification file to the following path:
/SPLUNK_HOME/etc/ngram-models/_vulcan-ISO-12345.txt
After the sample file is added to the specified path, Splunk software recognizes
sources that use the new character set, and automatically converts them to
UTF-8 format at index time.
If you have Splunk Cloud and want to add a character set encoding to your
Splunk deployment, file a Support ticket.
The common character sets described earlier are a small subset of what the
CHARSET attribute can support. Splunk software also supports a long list of
character sets and aliases, identical to the list supported by the *nix iconv utility.
Note: Splunk software ignores punctuation and case when matching CHARSET,
so, for example, "utf-8", "UTF-8", and "utf8" are all considered identical.
234
CSISOLATIN3)
latin-4 (aka, ISO-8859-4, ISO-IR-110, ISO_8859-4:1988, L4,
CSISOLATIN4)
latin-5 (aka, ISO-8859-9, ISO-IR-148, ISO_8859-9:1989, L5,
CSISOLATIN5)
latin-6 (aka, ISO-8859-10, ISO-IR-157 ,ISO_8859-10:1992, L6,
CSISOLATIN6)
latin-7 (aka, ISO-8859-13, ISO-IR-179 ,L7)
latin-8 (aka, ISO-8859-14, ISO-CELTIC, ISO-IR-199, ISO_8859-14:1998,
L8)
latin-9 (aka, ISO-8859-15, ISO-IR-203, ISO_8859-15:1998)
latin-10 (aka, ISO-8859-16, ISO-IR-226, ISO_8859-16:2001, L10,
LATIN10)
ISO-8859-5 (aka, CYRILLIC, ISO-IR-144, ISO_8859-5:198,8
CSISOLATINCYRILLIC)
ISO-8859-6(aka, ARABIC, ASMO-708, ECMA-114, ISO-IR-127,
ISO_8859-6:1987, CSISOLATINARABIC, MACARABIC)
ISO-8859-7 (aka, ECMA-118, ELOT_928, GREEK, GREEK8, ISO-IR-126,
ISO_8859-7:1987, ISO_8859-7:2003, CSISOLATINGREEK)
ISO-8859-8 (aka, HEBREW, ISO-8859-8, ISO-IR-138, ISO8859-8,
ISO_8859-8:1988, CSISOLATINHEBREW)
ISO-8859-11
roman-8 (aka, HP-ROMAN8, R8, CSHPROMAN8)
KOI8-R (aka, CSKOI8R)
KOI8-U
KOI8-T
GEORGIAN-ACADEMY
GEORGIAN-PS
ARMSCII-8
MACINTOSH (aka, MAC, MACROMAN, CSMACINTOSH) [Note: these
MAC* charsets are for MacOS 9; OS/X uses unicode]
MACGREEK
MACCYRILLIC
MACUKRAINE
MACCENTRALEUROPE
MACTURKISH
MACCROATIAN
MACICELAND
MACROMANIA
MACHEBREW
MACTHAI
NEXTSTEP
CP850 (aka, 850, IBM850, CSPC850MULTILINGUAL)
235
CP862 (aka, 862, IBM862, CSPC862LATINHEBREW)
CP866 (aka, 866, IBM866, CSIBM866)
CP874 (aka, WINDOWS-874)
CP932
CP936 (aka, MS936, WINDOWS-936)
CP949 (aka, UHC)
CP950
CP1250 (aka, MS-EE, WINDOWS-1250)
CP1251 (aka, MS-CYRL, WINDOWS-1251)
CP1252 (aka, MS-ANSI, WINDOWS-1252)
CP1253 (aka, MS-GREEK, WINDOWS-1253)
CP1254 (aka, MS-TURK, WINDOWS-1254)
CP1255 (aka, MS-HEBR, WINDOWS-1255)
CP1256 (aka, MS-ARAB, WINDOWS-1256)
CP1257 (aka, WINBALTRIM, WINDOWS-1257)
CP1258 (aka, WINDOWS-1258)
CP1361 (aka, JOHAB)
BIG-5 (aka, BIG-FIVE, CN-BIG5, CSBIG5)
BIG5-HKSCS(aka, BIG5-HKSCS:2001)
CN-GB (aka, EUC-CN, EUCCN, GB2312, CSGB2312)
EUC-JP (aka,
EXTENDED_UNIX_CODE_PACKED_FORMAT_FOR_JAPANESE,
CSEUCPKDFMTJAPANESE)
EUC-KR (aka, CSEUCKR)
EUC-TW (aka, CSEUCTW)
GB18030
GBK
GB_1988-80 (aka, ISO-IR-57, ISO646-CN, CSISO57GB1988, CN)
HZ (aka, HZ-GB-2312)
GB_2312-80 (aka, CHINESE, ISO-IR-58, CSISO58GB231280)
SHIFT-JIS (aka, MS_KANJI, SJIS, CSSHIFTJIS)
ISO-IR-87 (aka, JIS0208 JIS_C6226-1983, JIS_X0208 JIS_X0208-1983,
JIS_X0208-1990, X0208, CSISO87JISX0208, ISO-IR-159, JIS_X0212,
JIS_X0212-1990, JIS_X0212.1990-0, X0212, CSISO159JISX02121990)
ISO-IR-14 (aka, ISO646-JP, JIS_C6220-1969-RO, JP,
CSISO14JISC6220RO)
JISX0201-1976 (aka, JIS_X0201, X0201, CSHALFWIDTHKATAKANA)
ISO-IR-149 (aka, KOREAN, KSC_5601, KS_C_5601-1987,
KS_C_5601-1989, CSKSC56011987)
VISCII (aka, VISCII1.1-1, CSVISCII)
ISO-IR-166 (aka, TIS-620, TIS620-0, TIS620.2529-1, TIS620.2533-0,
TIS620.2533-1)
236
Configure event line breaking
Some events consist of more than one line. Splunk software handles most
multiline events correctly by default. If you have multiline events that Splunk
software doesn't handle properly, you can configure the software to change its
line breaking behavior.
If the second step does not run (because you set the SHOULD_LINEMERGE attribute
to "false"), then the events are the individual lines that the LINE_BREAKER attribute
determines. The first step is relatively efficient, while the second is relatively
slow. Appropriate use of the LINE_BREAKER regular expression can produce the
results you want in the first step. This is valuable if a significant amount of your
data consists of multiline events.
Many event logs have a strict one-line-per-event format, but some do not. Splunk
software can often recognize the event boundaries, but if event boundary
recognition does not work properly, you can set custom rules in props.conf to
establish event boundaries.
237
6. Restart Splunk Enterprise to commit the changes.
This method usually simplifies the configuration process, as it gives you access
to several attributes that you can use to define line-merging rules.
If your data conforms well to the default LINE_BREAKER setting (any number of
newlines and carriage returns), you don?t need to alter the LINE_BREAKER setting.
Instead, set SHOULD_LINEMERGE=true and use the line-merging attributes to
reassemble it.
Break the data stream directly into real events with the LINE_BREAKER
setting
Using the LINE_BREAKER setting to define event boundaries might increase your
indexing speed, but is somewhat more difficult to work with. If you find that
indexing is slow and a significant amount of your data consists of multiline
events, this method can provide significant improvement.
The following tables list the settings in the props.conf file that affect line
breaking.
238
Change the default maximum
line length (in bytes). Although
this attribute is a byte
measurement, Splunk rounds
down line length when this
attribute would otherwise land
TRUNCATE =
<non-negative integer>
mid-character for multibyte 10000 bytes
characters.
239
You can realize a significant
boost to processing speed when
you use LINE_BREAKER to delimit
multiline events (as opposed to
using SHOULD_LINEMERGE to
reassemble individual lines into
multiline events). Consider using
this method if a significant portion
of your data consists of multiline
events.
When you set SHOULD_LINEMERGE=true (the default), use these attributes to define
line breaking behavior.
240
true
Note: If you
configure the
When set to true, Splunk DATETIME_CONFIG
BREAK_ONLY_BEFORE_DATE software creates a new event if setting to CURRENT or
= [true|false] it encounters a new line with a NONE, this attribute is
date. not meaningful,
because in those
cases, Splunk
software does not
identify timestamps.
When set, Splunk software
creates a new event if it
BREAK_ONLY_BEFORE =
<regular expression>
encounters a new line that empty string
matches the regular
expression.
When set and the regular
expression matches the current
line, Splunk software always
MUST_BREAK_AFTER = creates a new event for the
empty string
<regular expression> next input line. Splunk software
might still break before the
current line if another rule
matches.
When set and the current line
matches the regular
expression, Splunk software
MUST_NOT_BREAK_AFTER =
<regular expression>
does not break on any empty string
subsequent lines until the
MUST_BREAK_AFTER expression
matches.
When set and the current line
matches the regular
MUST_NOT_BREAK_BEFORE
= <regular expression>
expression, Splunk software empty string
does not break the last event
before the current line.
MAX_EVENTS = <integer> Specifies the maximum number 256 lines
of input lines that Splunk
software adds to any event.
The software breaks the event
241
after it reads the specified
number of lines.
Examples of configuring event line breaking
[my_custom_sourcetype]
BREAK_ONLY_BEFORE = ^\d+\s*$
Assume that any line that consists of only digits is the start of a new event for any
data whose source type is set to my_custom_sourcetype.
The following log event contains several lines that are part of the same request.
The differentiator between requests is "Path". For this example, assume that all
these lines need to be shown as a single event entry.
[source::source-to-break]
SHOULD_LINEMERGE = True
BREAK_ONLY_BEFORE = Path=
This code tells Splunk software to merge the lines of the event, and only break
before the term Path=.
242
Multiline event line breaking and segmentation limitations
Answers
Have questions? Visit Splunk Answers and see what questions and answers the
Splunk community has around line breaking.
243
[01/Jul/2017:12:05:27 -0700]
is a timestamp.
In most cases, Splunk software extracts timestamps correctly, but there are
situations where you might need to configure timestamp handling. For example,
when dealing with some sources or with distributed deployments, you might need
to reconfigure timestamp recognition and formatting.
See the "Configure timestamps" chapter of this manual for specific instructions
on how to configure timestamps.
Default fields
Custom fields
File header fields
Splunk software always extracts a set of default fields for each event. You can
configure it to extract custom fields and, for some data, file header fields.
For more information on indexed field extraction, see the chapter Configure
indexed field extraction in this manual.
Anonymize data
This topic discusses how to anonymize data you are sending to your Splunk
deployment, such as credit card and Social Security numbers.
244
You might want to mask sensitive personal data when indexing log events. Credit
card numbers and social security numbers are two examples of data that you
might not want to appear in an index. This topic describes how to mask part of
confidential fields to protect privacy while providing enough remaining data for
use in tracking events.
If you're running Splunk Enterprise and want to anonymize data, configure your
indexers or heavy forwarders as described in this topic. If you're forwarding data
to Splunk Cloud and want to anonymize it, use a heavy forwarder, configured as
described in this topic.
This example masks all but the last four characters of fields SessionId and
Ticket number in an application server log.
SessionId=###########7BEA&Ticket=############96EE
A sample input:
245
To mask the data, modify the props.conf and transforms.conf files in your
$SPLUNK_HOME/etc/system/local/ directory.
Configure props.conf
[<spec>]
TRANSFORMS-anonymize = session-anonymizer, ticket-anonymizer
In this stanza, <spec> must be one of the following:
Configure transforms.conf
[session-anonymizer]
REGEX = (?m)^(.*)SessionId=\w+(\w{4}[&"].*)$
FORMAT = $1SessionId=########$2
DEST_KEY = _raw
[ticket-anonymizer]
REGEX = (?m)^(.*)Ticket=\w+(\w{4}&.*)$
FORMAT = $1Ticket=########$2
DEST_KEY = _raw
In this transform:
REGEX should specify the regular expression that points to the string in the
event you want to anonymize. FORMAT specifies the masked values.
$1 is all the text leading up to the regex and $2 is all the text of the event
after the regular expression.
DEST_KEY = _raw specifies to write the value from FORMAT to the raw value
in the log - thus modifying the event.
246
Note: The regular expression processor does not handle multiline events. As a
workaround, specify that the event is multiline by placing (?m) before the regular
expression in transforms.conf.
You can also anonymize data by using a sed script to replace or substitute
strings in events.
Most UNIX users are familiar with sed, a Unix utility which reads a file and
modifies the input as specified by a list of commands. Splunk Enterprise lets you
use sed-like syntax in props.conf to anonymize your data.
[<spec>]
SEDCMD-<class> = <sed script>
The sed script applies only to the _raw field at index time. The following subset
of sed commands are supported:
replace (s)
character substitution (y).
2. After making changes to props.conf, restart the Splunk instance to enable the
configuration.
SEDCMD-<class> = s/<regex>/<replacement>/flags
247
In this stanza:
Example
In the following example, you want to index data containing Social Security and
credit card numbers. At index time, you want to mask these values so that only
the last four digits are present in your events. Your props.conf stanza might look
like this:
[source::.../accounts.log]
SEDCMD-accounts = s/ssn=\d{5}(\d{4})/ssn=xxxxx\1/g
s/cc=(\d{4}-){3}(\d{4})/cc=xxxx-xxxx-xxxx-\2/g
In your accounts events, Social Security numbers appear as ssn=xxxxx6789 and
credit card numbers appear as cc=xxxx-xxxx-xxxx-1234.
Substitute characters
SEDCMD-<class> = y/<string1>/<string2>/
This substitutes each occurrence of the characters in string1 with the characters
in string2.
Example
You have a file you want to index, abc.log, and you want to substitute the capital
letters "A", "B", and "C" for every lowercase "a", "b", or "c" in your events. Add the
following to your props.conf:
[source::.../abc.log]
SEDCMD-abc = y/abc/ABC/
When you search for source="*/abc.log", you should not find the lowercase
letters "a", "b", and "c" in your data. Splunk Enterprise substituted "A" for each
"a", "B" for each "b", and "C" for each "c'.
248
Caveats for anonymizing data
When you forward structured data to an indexer, the indexer does not parse it,
even if you have configured props.conf on that indexer with
INDEXED_EXTRACTIONS. Forwarded data skips the following queues on the indexer,
which precludes any parsing of that data on the indexer:
parsing
aggregation
typing
The forwarded data must arrive at the indexer already parsed. To achieve this,
you must also set up props.conf on the forwarder that sends the data. This
includes configuration of INDEXED_EXTRACTIONS and any other parsing, filtering,
anonymizing, and routing rules.
Universal forwarders are capable of performing these tasks solely for structured
data. See Forward data extracted from structured data files.
249
Configure timestamps
For more information on event processing, see the chapter in this manual called
Configure event processing.
1. It looks for a time or date in the event itself using an explicit TIME_FORMAT,
if provided. You configure the TIME_FORMAT attribute in props.conf.
2. If no TIME_FORMAT was configured for the data, Splunk software attempts to
automatically identify a time or date in the event itself. It uses the source
type of the event (which includes TIME_FORMAT information) to try to find
the timestamp.
3. If an event has a time and date, but not a year, Splunk software
determines the year, as described in How Splunk software determines
timestamps with no year, and builds the timestamp from that.
4. If no events in a source have a date, Splunk software tries to find a date in
the source name or file name. Time of day is not identified in filenames.
(This requires that the events have a time, even though they don't have a
date.)
250
5. For file sources, if no date can be identified in the file name, Splunk
software uses the file modification time.
6. As a last resort, Splunk software sets the timestamp to the current system
time when indexing each event.
Splunk software can extract only dates from a source, not times. If you need to
extract a time from a source, use a transform. See Create custom fields at index
time.
If Splunk software discovers a timestamp within an event that does not have a
year element, it uses the following logic to determine the year:
1. It identifies the current date by using either the date of the event it last
parsed or the current clock time.
2. It then uses the year from that date as a base and runs the year through
several tests:
1. If the date in the new event is December 31 and the current date is
January 1, it decrements the base year.
2. If the date in the new event is January 1 and the current date is
December 31, it increments the base year.
3. If the date in the new event is February 29, it determines if the
current date year is a leap year.
4. If the current date year is a leap year, it uses that year as the base
year. If it is not, it uses the previous leap year.
3. If none of the previous tests results in a successful base year
determination, the software uses the following procedure to determine the
year:
1. It determines the day of the year of the new event by calculating the
number of days from January 1.
2. If the date information of the previous event is available, and the
day of the year of that event is more than the day of the year of the
new event plus 4, then it increments the base year.
3. If the date information of the previous event is not available, and
the day of the year of the new event is greater than the current day
of the year plus 2, then it decrements the base year.
4. The software then assigns the base year to the timestamp for the event.
The timestamp must still pass the time range check for the timestamp to
be valid.
251
Example 1
If Splunk software encounters 26 Jun in a new event on May 26, 2017, and it was
not able to determine the year in the previous events:
1. Since it was not able to determine the year in the previous event, it sets a
base year of 2017 as that is the year of the current date.
2. The December 31 and January 1 tests fail, as the date is neither
December 31 nor January 1. The base year remains 2017.
3. The leap year test fails, as the date is not February 29. The base year
remains 2017.
4. Splunk software calculates the day of the year for June 26 as Day 177.
5. Since it could not determine the year in the previous event, it adds two to
this number to arrive at 179.
6. It then compares 179 to the day of the year of the current date, May 26
(2017) which is Day 147.
7. Since 179 is greater than 147, the software decrements the year from
2017 to 2016.
8. The software then builds the new timestamp: 26 Jun 2016.
9. If the new timestamp falls within the time range that has been set, the
software adds the timestamp to the event.
Example 2
If Splunk software encounters 10 Apr in a new event on May 26, 2017, and it
determined the year 2017 in previous events:
1. Since it determined the year in the previous event, it sets that year as the
base year: 2017.
2. The December 31 and January 1 tests fail, as the date is neither
December 31 nor January 1. The base year remains 2017.
3. The leap year test fails, as the date is not February 29. The base year
remains 2017.
4. Splunk software calculates the day of the year for April 10 as Day 100.
5. Since the year information in the previous event was available, it adds four
to this number to arrive at 104.
6. It then compares 104 to the day of the year of the current date, May 26
(2017) which is Day 147.
7. Since 104 is less than 147, the software increments the year from 2017 to
2018.
8. The software then builds the new timestamp: 10 Apr 2018.
9. By default, this new timestamp is not legal, since it falls outside the default
MAX_DAYS_HENCE setting which limits valid timestamps to 2 days into the
252
future. The software uses the current date of 26 May 2017 as the
timestamp, and applies that timestamp to the event.
Configure timestamps
If you index data from a new input and then discover that you need to adjust the
timestamp extraction process, you must reindex that data after you make the
configuration changes.
Consider previewing your data to prevent the need to reindex. Alternatively, you
can test new data inputs in a test Splunk deployment (or in a separate index on
the production Splunk instance) before adding data to your production instance.
That way, you can delete and reindex until you get the results you want.
253
Most events do not require special timestamp handling. Splunk software
recognizes and extracts their timestamps correctly. However, with some sources
and distributed deployments, you might need to configure how timestamps are
extracted, to ensure they are formatted properly.
If you have Splunk Enterprise and need to modify timestamp extraction, perform
the configuration on your indexer machines or, if you forward data, use heavy
forwarders and perform the configuration on the machines where the heavy
forwarders run. If you have Splunk Cloud and need to modify timestamp
extraction, use heavy forwarders and perform the configuration on the machines
where the heavy forwarders run.
You can also set other attributes that pertain to timestamps. This includes
specifying where to look in an event for a timestamp, what time zone to use, or
254
how to deal with timestamps of varying currency.
Syntax overview
[<spec>]
DATETIME_CONFIG = <filename relative to $SPLUNK_HOME>
TIME_PREFIX = <regular expression>
MAX_TIMESTAMP_LOOKAHEAD = <integer>
TIME_FORMAT = <strptime-style format>
TZ = <POSIX time zone string>
MAX_DAYS_AGO = <integer>
MAX_DAYS_HENCE = <integer>
MAX_DIFF_SECS_AGO = <integer>
MAX_DIFF_SECS_HENCE = <integer>
In this syntax, <spec> can be:
If an event contains data that matches the value of <spec>, then the timestamp
rules specified in the stanza apply to that event. You can have multiple stanzas,
to handle different <spec> values.
By default, all events are indexed, unless you specifically filter out events through
other means.
255
The software uses the timestamp of the previous event to assign the
timestamp of the current event.
If the timestamp of the previous event can't be determined, then the
software uses the current index time to assign a timestamp to the event.
Events are not dropped if they fall outside of the parameters of these attributes.
Timestamp attributes
256
Set DATETIME_CONFIG = CURRENT to
assign the current system time to
each event as it's indexed.
257
For example, if TIME_PREFIX
positions a location 11 characters
into the event, and
MAX_TIMESTAMP_LOOKAHEAD
is set to 10, timestamp extraction
will be constrained to characters 11
through 20.
258
with a valid timestamp, based on
how Splunk software attempts to
recover from the problem.)
If <strptime-style format>
contains an hour component, but
no minute component,
TIME_FORMAT ignores the hour
component. It treats the format as
an anomaly and considers the
precision to be date-only.
The time zone of an event is
determined as follows:
259
TZ_ALIAS = Provides admin-level control over
<key=value>[,<key=value>]... how time zone strings extracted
from events are interpreted. For
example, EST can mean Eastern
(US) Standard Time or Eastern
(Australian) Standard Time. There
are many other three letter time
zone acronyms with multiple
expansions.
Example: TZ_ALIAS =
EST=GMT+10:00 (See the props.conf
example file in the Configuration
File Reference for more examples).
MAX_DAYS_AGO = <integer> Specifies the maximum number of 2000 days
days in the past, from the current
260
date, that an extracted date can be Note: If you have data that
valid. more than 2000 days old,
increase this setting.
For example, if MAX_DAYS_AGO = 10,
Splunk software ignores dates
older than 10 days from the current
date and instead either uses the
timestamp of the previous event, or
uses the current index time of the
event if it cannot determine a
timestamp in the previous event.
261
If the event timestamp is more than
<integer> seconds before the
previous timestamp, Splunk
software accepts it only if it has the
same time format as the majority of
timestamps from the source.
MAX_DIFF_SECS_AGO = <integer> 3600 seconds (1 hour)
If your timestamps are wildly out of
order, consider increasing this
value.
262
For milliseconds, microseconds for Apache Tomcat. %Q
and %q can format any time resolution if the width is specified.
For hours on a 12-hour clock format. If %I appears after %S
%I or %s (like "%H:%M:%S.%l"), it takes on the log4cpp meaning
of milliseconds.
%+ For standard Unix date format timestamps.
%v For BSD and OSX standard date format.
The time zone abbreviation (nothing if there is no time zone
%Z
information.)
The time zone offset designator in International Organization for
Standardization (ISO) 8601 format (for example, -0800 for PST,
+0000 for GMT, or nothing if the time zone cannot be
%z, %:z, %::z
determined.) Use %:z if the timestamp offset contains hours and
minutes (for example, -08:00) and %::z if the timestamp offset
contains hours, minutes, and seconds (for example, -08:00:00.)
%o For AIX timestamp support (%o used as an alias for %Y).
%p The locale's equivalent of AM or PM. (Note: there may be none.)
%s Epoch (10 digits)
Note: A strptime() expression that ends with a literal dot and subsecond specifier
such as %Q, %q, %N treats the terminal dot and conversion specifier as optional.
If the .subseconds portion is absent from the text, the timestamp is still extracted.
Here are some sample date formats, with the strptime() expressions that
handle them:
1998-12-31 %Y-%m-%d
98-12-31 %y-%m-%d
1998 years, 312 days %Y years, %j days
Jan 24, 2003 %b %d, %Y
January 24, 2003 %B %d, %Y
1397477611.862 %s.%3N
Note: Splunk software does not recognize non-English month names in
timestamps. If you have an app that writes non-English month names to log files,
reconfigure the app to use numerical months, if possible.
263
Examples
[host::foo]
TIME_PREFIX = FOR:
TIME_FORMAT = %m/%d/%y
Another example that includes time zone information:
[host::bar]
TIME_PREFIX = Valid_Until=
TIME_FORMAT = %a %b %d %H:%M:%S %Z%z %Y
Your data might contain other information that is parsed as timestamps, for
example:
Splunk software extracts the date as Dec 31, 1989, which is not useful. In this
case, configure props.conf to extract the correct timestamp from events from
host::foo:
[host::foo]
TIME_PREFIX = \d{4}/\d{2}/\d{2} \d{2}:\d{2}:\d{2} \w+\s
TIME_FORMAT = %b %d %H:%M:%S %Y
This configuration assumes that all timestamps from host::foo are in the same
format. Configure your props.conf stanza to be as granular as possible to avoid
potential timestamping errors.
For more information on extracting the correct timestamp from events containing
multiple timestamps, see Configure timestamp assignment for events with
multiple timestamps.
264
Configure timestamps for specific needs
You can use the attributes described in this topic to configure the timestamp
extraction processor for some specialized purposes, such as:
You can use your browser locale setting to configure how Splunk Web displays
timestamps in search results. For information on setting the browser locale, see
User language and locale.
Even though Splunk software uses the browser locale to configure how
timestamps appear in search results, the raw data still remains in its original
format. You might want to change this so that the data format is standardized in
both raw data and search results. Do this with props.conf and transforms.conf.
Here is an example:
Assume the timestamp data in the raw event looks like this:
06/07/2011 10:26:11 PM
but you want it to look like this (to correspond with how it appears in search
results):
07/06/2011 10:26:11 PM
This example shows briefly how you can use props.conf and transforms.conf to
transform the timestamp in the raw event.
[resortdate]
REGEX = ^(\d{2})\/(\d{2})\/(\d{4})\s([^/]+)
FORMAT = $2/$1/$3 $4
DEST_KEY = _raw
In props.conf, add this stanza, where <spec> qualifies your data:
265
[<spec>]
TRANSFORMS-sortdate = resortdate
Answers
Have questions? Visit Splunk Answers and see what questions and answers the
Splunk community has around timestamp recognition and configuration.
To specify the position of the timestamp you want extracted, you add
TIME_PREFIX and MAX_TIMESTAMP_LOOKAHEAD attributes to a props.conf stanza. By
setting a regular expression for TIME_PREFIX, you specify the pattern of
characters that indicates the point to start looking for the timestamp. Set a value
for MAX_TIMESTAMP_LOOKAHEAD to specify how far into an event (past the
TIME_PREFIX location) to look for the timestamp. By constraining lookahead,
you can improve both accuracy and performance.
When TIME_PREFIX is set, Splunk software scans the event text for a match to its
regular expression before it tries to extract a timestamp. The timestamping
algorithm only looks for a timestamp in the text following the end of the first
regular expression match. So if TIME_PREFIX is set to abc123, only the text
following the first occurrence of abc123 is used for timestamp extraction.
266
TIME_PREFIX also sets the start point for MAX_TIMESTAMP_LOOKAHEAD; the
lookahead starts after the matched portion of text in the TIME_PREFIX regular
expression. For example, if TIME_PREFIX matches text through the first 11
characters of the event and the timestamp you want to extract is always within
the next 30 characters, you can set MAX_TIMESTAMP_LOOKAHEAD=30. Timestamp
extraction would be limited to text starting with character 12 and ending with
character 41.
Example
[source::/Applications/splunk/var/spool/splunk]
TIME_PREFIX = \d{4}\/\d{2}\/\d{2} \d{2}:\d{2}:\d{2} \w+\s
MAX_TIMESTAMP_LOOKAHEAD = 21
This configuration instructs Splunk software to locate events that match the first
timestamp construction, but ignore that timestamp in favor of a timestamp that
occurs within the following 21 characters (a number it gets from the
MAX_TIMESTAMP_LOOKAHEAD attribute). Splunk software will find the second
timestamp because it always occurs within that 21-character limit.
267
zones based on the host, source, or source type of an event.
To determine the time zone to assign to a timestamp, Splunk software uses the
following logic:
Use the time zone specified in raw event data (for example, PST, -0800),
if present.
Use the TZ attribute set in props.conf, if the event matches the host,
source, or source type that the stanza specifies.
If the forwarder and the receiving indexer are version 6.0 or later, use the
time zone that the forwarder provides.
Use the time zone of the host that indexes the event.
Note: If you have Splunk Enterprise and you change the time zone setting of the
host machine, you must restart Splunk Enterprise for the software to detect the
change.
268
sourcetype.
You do not configure the time zone for the indexer in Splunk Enterprise, but in
the underlying operating system. As long as the time is set correctly on the host
system of the indexer, the offsets to event time zones will be calculated correctly.
In the first example, events come into the indexer from New York City (in the
US/Eastern time zone) and Mountain View, California (US/Pacific). To correctly
handle the timestamps for these two sets of events, the props.conf for the
indexer needs the time zone to be specified as US/Eastern and US/Pacific,
respectively.
The first example sets the time zone to US/Eastern for any events coming from
hosts whose names match the regular expression nyc.*:
[host::nyc*]
TZ = US/Eastern
The second example sets the time zone to US/Pacific for any events coming
from sources in the path /mnt/ca/...:
[source::/mnt/ca/...]
TZ = US/Pacific
zoneinfo (TZ) database
Refer to the list of tz database time zones for all permissible TZ values.
269
Map timezone strings extracted from event data
TZ_ALIAS = EST=GMT+10:00
Then, when Splunk software encounters "EST" in event data, it will interpret it as
"GMT+10:00", rather than the default of "GMT- 5:00".
As this example shows, you can map a timezone string to an existing string plus
offset value. You can also just map one TZ string directly to another.
When mapping timezone strings, be sure to handle both summer and winter
versions of the time zones. If mapping EST, also map EDT, for example -
depending on whatever your local pairs are. Test your software to see what
timezone strings it produces.
You can specify multiple mappings. The syntax for TZ_ALIAS is:
TZ_ALIAS = <key=value>[,<key=value>]...
For more information, including examples, see the props.conf specification and
example file in the Configuration File Reference.
When you add or edit users using Splunk authentication, you can set a user time
zone. Search results for that user will appear in the specified time zone. This
setting, however, does not change the actual event data, whose time zone is
determined at index time. For information on setting this value, see Configure
users with Splunk Web in the Securing Splunk manual.
270
processor altogether.
Timestamp lookahead determines how far (how many characters) into an event
the timestamp processor looks for a timestamp. Adjust how far the timestamp
processor looks by setting the MAX_TIMESTAMP_LOOKAHEAD attribute.
The default number of characters that the timestamp processor looks into an
event is 150. You can set MAX_TIMESTAMP_LOOKAHEAD to a lower value to speed up
indexing. You should particularly do this if the timestamps always occur in the
first part of the event.
Example:
Look for timetamps in the first 20 characters of events coming from source foo.
[source::foo]
MAX_TIMESTAMP_LOOKAHEAD = 20
...
Disable timestamp processor
You can turn off the timestamp processor entirely to improve indexing
performance. Turn off timestamp processing for events matching a specified
host, source, or sourcetype by setting the DATETIME_CONFIG attribute to NONE.
When DATETIME_CONFIG=NONE, Splunk software does not look at the text of the
event for the timestamp. Instead, it uses the event "time of receipt"; in other
words, the time the event is received from its input. For file-based inputs (such as
monitor) this means that the timestamp comes from the modification time of the
input file.
271
Example:
This example turns off timestamp extraction for events that come from the source
foo.
[source::foo]
DATETIME_CONFIG = NONE
...
Note: Both CURRENT and NONE disable timestamp identification, so the default
event boundary detection (BREAK_ONLY_BEFORE_DATE = true) might not work as
you expect. When you use these settings, specify SHOULD_LINEMERGE or the
BREAK_ONLY_* and MUST_BREAK_* settings to control event merging.
272
Configure indexed field extraction
The process of adding fields to events is known as field extraction. There are
two types of field extraction:
Indexed field extraction, which was described briefly at the start of this
topic and which forms the basis for this chapter. These fields are stored in
the index and become part of the event data.
Note: When working with fields, consider that most machine data either does not
have structure or has structure that changes constantly. For this type of data, use
search-time field extraction for maximum flexibility. Search-time field extraction is
easy to modify after you define it.
Other types of data might exhibit a more fixed structure, or the structure might
already be defined within the data or events in the file. You can configure Splunk
software to read the structure of these kinds of files (such as comma-separated
value files (CSV), tab-separated value files (TSV), pipe-separated value files, and
JavaScript Object Notation (JSON) data sources) and map fields at index time.
To learn how this works, see Extract data from files with headers in this manual.
273
About default fields (host, source, sourcetype, and
more)
When Splunk software indexes data, it tags each event with a number of fields.
These fields become part of the index event data. The fields that are added
automatically are known as default fields.
The default field index identifies the index in which the event is located.
The default field linecount describes the number of lines the event
contains.
The default field timestamp specifies the time at which the event occurred.
Splunk software uses the values in some of the fields, particularly sourcetype,
when indexing the data, in order to create events properly. Once the data has
been indexed, you can use the default fields in your searches.
Type of
List of fields Description
field
Internal _raw, _time, These fields contain information that Splunk
fields _indextime, _cd software uses for its internal processes.
274
time/date at indexing or input time (for
example, by setting the timestamp to be the
time at index or input time), these fields will
not represent that.
For information about default fields from the search perspective, see Use default
fields in the Knowledge Manager Manual.
You can also specify additional, custom fields for inclusion in the index. See
Create custom fields at index-time in this chapter.
host
source
sourcetype
Source vs sourcetype
Source and source type are both default fields, but they are entirely different
otherwise, and can be easily confused.
The source is the name of the file, stream, or other input from which a
particular event originates.
275
The sourcetype determines how Splunk software processes the incoming
data stream into individual events according to the nature of the data.
Events with the same source type can come from different sources, for example,
if you monitor source=/var/log/messages and receive direct syslog input from
udp:514. If you search sourcetype=linux_syslog, events from both of those
sources are returned.
Much of the time, Splunk software can automatically identify host and sourcetype
values that are both correct and useful. But situations do come up that require
you to intervene in this process and provide override values.
You load archive data in bulk that was originally generated from a different
host and you want those events to have that host value.
You forward data from a different host. (The forwarder assigns its host
name unless you specify otherwise.)
You are working with a centralized log server environment, which means
that all of the data received from that server will have the same host, even
if it originated elsewhere.
For detailed information about hosts, see the chapter Configure host values.
There are also steps you can take to expand the range of source types that
Splunk software automatically recognizes, or to simply rename source types.
276
Assign default fields dynamically
This feature lets you dynamically assign default fields, also known as "metadata",
to events as they are being consumed by Splunk software. Use this feature to
specify source type, host, or source dynamically for incoming data. This feature
is useful mainly with scripted data -- either a scripted input or an existing file
processed by a script.
Do not use dynamic metadata assignment with file monitoring (tail) inputs. For
more information about file inputs, see Monitor files and directories in this
manual.
Note: The modular inputs feature has superseded this ***SPLUNK*** header
feature. If you need dynamically-generated values for host, source and
sourcetype, consider writing a modular input.
To use this feature, you append a single dynamic input header to your file and
specify the metadata fields you want to assign values to. The available metadata
fields are sourcetype, host, and source.
You can use this method to assign metadata instead of editing inputs.conf,
props.conf, and transforms.conf.
To use this feature for an existing input file, edit the file (either manually or with a
script) to add a single input header:
2. Add the single header anywhere in your file. Any data following the header will
be appended with the attributes and values you assign until the end of the file is
reached.
277
Configure with a script
In the more common scenario, you write a script to dynamically add an input
header to your incoming data stream. Your script can also set the header
dynamically based on the contents of the input file.
Conversely, you might want to add an indexed field if the value of a search-time
extracted field exists outside of the field more often than not. For example, if you
commonly search only for foo=1, but 1 occurs in many events that do not have
foo=1, you might want to add foo to the list of fields extracted by Splunk at index
time.
For more information see About fields in the Knowledge Manager manual.
If you have Splunk Cloud and want to define index-time field extractions, open a
support ticket.
Caution: Do not add custom fields to the set of default fields that Splunk
software automatically extracts and indexes at index time unless absolutely
necessary. This includes fields such as timestamp, punct, host, source, and
sourcetype. Adding to this list of fields can negatively impact indexing
performance and search times, because each indexed field increases the size of
the searchable index. Indexed fields are also less flexible--whenever you make
changes to your set of fields, you must re-index your entire dataset. For more
information, see Index time versus search time in the Managing Indexers and
Clusters manual.
278
Edit these files in $SPLUNK_HOME/etc/system/local/ or in your own custom
application directory in $SPLUNK_HOME/etc/apps/. For more information on
configuration files in general, see About configuration files in the Admin manual.
If you are employing heavy forwarders in front of your search peers, the props
and transforms processing takes place on the forwarders, not the search peers.
Therefore, you must deploy the props and transforms changes to the forwarders,
not the search peers.
For details on where you need to put configuration settings, read Configuration
parameters and the data pipeline in the Admin Manual.
[<unique_transform_stanza_name>]
279
REGEX = <regular_expression>
FORMAT = <your_custom_field_name>::$1
WRITE_META = [true|false]
DEST_KEY = <KEY>
DEFAULT_VALUE = <string>
SOURCE_KEY = <KEY>
REPEAT_MATCH = [true|false]
LOOKAHEAD = <integer>
_KEY_<string>, _VAL_<string>
Using FORMAT:
REGEX = ([a-z]+)=([a-z]+)
FORMAT = $1::$2
REGEX = (?<_KEY_1>[a-z]+)=(?<_VAL_1>[a-z]+)
FORMAT is optional. Use it to specify the format of the field-value pair(s) that
you are extracting, including any field names or values that you want to
add. You don't need to specify the FORMAT if you have a simple REGEX with
name-capturing groups.
FORMAT behaves differently depending on whether the extraction takes
place at search time or index time.
For index-time transforms, you use $n to specify the output of each
REGEX match (for example, $1, $2, and so on).
If the REGEX does not have n groups, the matching fails.
FORMAT defaults to <unique_transform_stanza_name>::$1.
280
The special identifier $0 represents what was in the DEST_KEY
before the REGEX was performed (in the case of index-time field
extractions the DEST_KEY is _meta). For more information, see "How
Splunk builds indexed fields," below.
For index-time field extractions, you can set up FORMAT in several
ways. It can be a <field-name>::<field-value> setup like:
or:
FORMAT = $1::$2 (where the REGEX extracts both the field name and
the field value)
However you can also set up index-time field extractions that create
concatenated fields:
FORMAT = ipaddress::$1.$2.$3.$4
So if your regex has only one capturing group and its value is bar, then:
WRITE_META = true writes the extracted field name and value to _meta,
which is where Splunk stores indexed fields. This attribute setting is
required for all index-time field extractions, except for those where
DEST_KEY = _meta (see the discussion of DEST_KEY, below).
For more information about _meta and its role in indexed field
creation, see "How Splunk builds indexed fields," below.
281
For index-time searches, DEST_KEY = _meta, which is where Splunk
stores indexed fields. For other possible KEY values see the
transforms.conf page in this manual.
For more information about _meta and its role in indexed field
creation, see How Splunk builds indexed fields, below.
When you use DEST_KEY = _meta you should also add $0 to the
start of your FORMAT attribute. $0 represents the DEST_KEY value
before Splunk performs the REGEX (in other words, _meta.
Note: The $0 value is in no way derived from the REGEX.
282
for writing and testing regular expressions.
Note: The capturing groups in your regex must identify field names that follow
field name syntax restrictions. They can only contain ASCII characters (a-z, A-Z,
0-9 or _.). International characters will not work.
[<spec>]
TRANSFORMS-<class> = <unique_stanza_name>
[<your_custom_field_name>]
INDEXED=true
283
<your_custom_field_name> is the name of the custom field you set in the
unique stanza that you added to transforms.conf.
Set INDEXED=true to indicate that the field is indexed.
Note: If a field of the same name is extracted at search time, you must set
INDEXED=false for the field. In addition, you must also set INDEXED_VALUE=false if
events exist that have values of that field that are not pulled out at index time, but
which are extracted at search time.
284
Note: Indexed fields with regex-extracted values containing quotation marks will
generally not work, and backslashes might also have problems. Fields extracted
at search time do not have these limitations.
WRITE_META = true
FORMAT = field1::value field2::"value 2" field3::"a field with a \"
quotation mark" field4::"a field which
ends with a backslash\\"
Remember: When Splunk creates field names, it applies field name syntax
restrictions to them.
1. All characters that are not in a-z,A-Z, and 0-9 ranges are replaced with an
underscore (_).
Here are a set of examples of configuration file setups for index-time field
extractions.
transforms.conf
In transforms.conf add:
[netscreen-error]
REGEX = device_id=\[\w+\](?<err_code>[^:]+)
FORMAT = err_code::"$1"
WRITE_META = true
This stanza takes device_id= followed with a word within brackets and a text
string terminating with a colon. The source type of the events is testlog.
285
Comments:
props.conf
[testlog]
TRANSFORMS-netscreen = netscreen-error
fields.conf
[err_code]
INDEXED=true
This example creates two indexed fields called username and login_result.
transforms.conf
In transforms.conf add:
[ftpd-login]
REGEX = Attempt to login by user: (.*): login (.*)\.
FORMAT = username::"$1" login_result::"$2"
WRITE_META = true
This stanza finds the literal text Attempt to login by user:, extracts a
username followed by a colon, and then the result, which is followed by a period.
A line might look like:
286
2008-10-30 14:15:21 mightyhost awesomeftpd INFO Attempt to login by
user: root: login FAILED.
props.conf
[ftpd-log]
TRANSFORMS-login = ftpd-login
fields.conf
[username]
INDEXED=true
[login_result]
INDEXED=true
This example shows you how an index-time transform can be used to extract
separate segments of an event and combine them to create a single field, using
the FORMAT option.
20100126 08:48:49 781 PACKET 078FCFD0 UDP Rcv 127.0.0.0 8226 R Q [0084 A
NOERROR] A (4)www(8)google(3)com(0)
transforms.conf
[dnsRequest]
287
REGEX = UDP[^\(]+\(\d\)(\w+)\(\d\)(\w+)\(\d\)(\w+)
FORMAT = dns_requestor::$1.$2.$3
This transform defines a custom field named dns_requestor. It uses its REGEX to
pull out the three segments of the dns_requestor value. Then it uses FORMAT to
order those segments with periods between them, like a proper URL.
Note: This method of concatenating event segments into a complete field value
is something you can only perform with index-time extractions; search-time
extractions have practical restrictions that prevent it. If you find that you must use
FORMAT in this manner, you will have to create a new indexed field to do it.
props.conf
Then, the next step would be to define a field extraction in props.conf that
references the dnsRequest transform and applies it to events coming from the
server1 source type:
[server1]
TRANSFORMS-dnsExtract = dnsRequest
fields.conf
[dns_requestor]
INDEXED = true
host,status,message,"start date"
srv1.splunk.com,error,"No space left on device",2013-06-10T06:35:00
288
srv2.splunk.com,ok,-,2013-06-11T06:00:00
Input types that the indexed field extraction feature supports
It does not work with modular inputs, network inputs, or any other type of input.
For information on how to set source types when importing structured data
files, see The "Set source type" page.
For information on how to adjust timestamps when previewing indexing
results, see Adjust time stamps and event breaks.
For more general information about configuration files, see About
configuration files in the Admin manual.
When you upload or monitor a structured data file, Splunk Web loads the "Set
Source type" page. This page lets you preview how your data will be indexed.
See The 'Set Source type' page.
1. From the Add Data page in Splunk Web, choose Upload or Monitor as
the method that you want to add data.
2. Specify the structured data file that you want the software to monitor.
Splunk Web loads the "Set Source type" page. It sets the source type of
the data based on its interpretation of that data. For example, if you
upload a CSV file, it sets the source type to csv.
3. Review the events in the preview pane on the right side of the page. The
events are formatted based on the current source type.
4. If the events appear to be formatted correctly, click "Next" to proceed to
the "Modify input settings" page. Otherwise, configure event formatting by
modifying the timestamp, event breaking, and delimited settings until the
previewed events look the way that you want.
5. If you don't want to save the settings as a new source type, return to Step
4. Otherwise, click the Save As button to save the settings as a new
source type.
6. In the dialog that appears, type in a name and description for the new
source type.
289
7. Select the category for the source type by selecting the category you want
from the "Category" drop-down.
8. Select the application context that the new source type should apply to by
choosing from the entries in the "App" drop-down.
9. Click "Save" to save the source type.
10. Return to Step 4 to proceed to the "Modify input settings" page.
Structured data files with large numbers of columns might not display all
extracted fields in Splunk Search
If you index a structured data file with a large number of columns (for example, a
CSV file with 300 columns), you might experience a problem later where the
Search app does not appear to return or display all of the fields for that file. While
Splunk software has indexed all of the fields correctly, this anomaly occurs
because of a configuration setting for how Splunk software extracts the fields at
search time.
Before Splunk software displays fields in Splunk Web, it must first extract those
fields by performing a search time field extraction. By default, the limit for the
number of fields that can be extracted automatically at search time is 100. You
can set this number higher by editing the limits.conf file in
$SPLUNK_HOME/etc/system/local and changing the limit setting to a number
that is higher than the number of columns in the structured data file.
[kv]
limit = 300
If you work with a lot of large CSV files, you might want to configure the setting to
a number that reflects the largest number of columns you expect your structured
data files to have.
You can also use a combination of inputs.conf and props.conf to extract fields
from structured data files. Edit these files in $SPLUNK_HOME/etc/system/local/ or
in your own custom application directory in
$SPLUNK_HOME/etc/apps/<app_name>/local. Inputs.conf specifies the files you
want to monitor and the source type to be applied to the events they contain, and
props.conf defines the source types themselves. If you have Splunk Enterprise,
you can edit the settings on indexer machines or machines where you are
running the Splunk universal forwarder. You must restart Splunk Enterprise for
any changes that you make to inputs.conf and props.conf to take effect. If you
290
have Splunk Cloud and want configure the extraction of fields from structured
data, use the Splunk universal forwarder.
To configure field extraction for files that contain headers, modify the following
attributes in props.conf. For additional attributes in props.conf, review the
props.conf specification file.
291
Specifies the character to use for
quotes in the specified file or
source. You can specify special
characters in this attribute.
Specifies which character delimits
or separates field names in the
header line. You can specify
HEADER_FIELD_DELIMITER special characters in this attribute. n/a
If HEADER_FIELD_DELIMITER is
not specified, FIELD_DELIMITER
applies to the header line.
Specifies which character is used
for quotes around field names in
the header line. You can specify
HEADER_FIELD_QUOTE special characters in this attribute. n/a
If HEADER_FIELD_QUOTE is not
specified, FIELD_QUOTE applies
to the header line.
Specifies the line number of the
line within the file that contains the
HEADER_FIELD_LINE_NUMBER header fields. If set to 0, Splunk 0
attempts to locate the header
fields within the file automatically.
Some CSV and structured files
Splunk
have their timestamp encompass
Enterprise
multiple fields in the event
tries to
TIMESTAMP_FIELDS = separated by delimiters. This
automatically
field1,field2,...,fieldn attribute tells Splunk software to
extract the
specify all such fields which
timestamp of
constitute the timestamp in a
the event.
comma-separated fashion.
Some CSV and structured files
might have missing headers. This
FIELD_NAMES n/a
attribute specifies the header field
names.
If Splunk software finds data that
matches the specified regular
MISSING_VALUE_REGEX expression in the structured data n/a
file, it considers the value for the
field in the row to be empty.
292
Special characters or values are available for some attributes
You can use special characters or values such as spaces, vertical and horizontal
tabs, and form feeds in some attributes. The following table lists these
characters:
whitespace whitespace
none none or \0
file
fs or \034
separator
group
gs or \035
separator
record
rs or \036
separator
unit
us or \037
separator
You can use these special characters for the following attributes only:
FIELD_DELIMITER
FIELD_HEADER_REGEX
FIELD_QUOTE
To create and reference the new source types to extract files with headers:
1. Using a text editor, open the file props.conf in the appropriate location as
described in Enable automatic header-based field extraction earlier in this
topic. If the props.conf file does not exist, you must create it.
2. Define a new sourcetype by creating a stanza which tells Splunk
Enterprise how to extract the file header and structured file data, using the
attributes described above. You can define as many stanzas - and thus,
293
as many sourcetypes - as you like in the file. For example:
[HeaderFieldsWithFewEmptyFieldNamesWithSpaceDelim]
FIELD_DELIMITER=,
HEADER_FIELD_DELIMITER=\s
FIELD_QUOTE="
3. Save the props.conf file and close it.
4. Create a file inputs.conf in the same directory, if it does not already exist.
5. Open the file for editing.
6. Add a stanza which represents the file or files that you want Splunk
Enterprise to extract file header and structured data from. You can add as
many stanzas as you wish for files or directories from which you want to
extract header and structured data. For example:
[monitor:///opt/test/data/StructuredData/HeaderFieldsWithFewEmptyFieldNamesWithSpa
sourcetype=HeaderFieldsWithFewEmptyFieldNamesWithSpaceDelim
7. Save the inputs.conf file and close it.
8. Restart Splunk Enterprise or the universal forwarder for the changes to
take effect.
You can also forward fields extracted from a structured data file to a heavy
forwarder or a universal forwarder.
1. Configure the Splunk instance that monitors the files to forward data to a
heavy forwarder or a universal forwarder.\.
2. Configure the receiving instance.
3. On the monitoring instance, configure props.conf and inputs.conf to
properly handle event breaking and timestamps for your data. You can do
this in one of two ways.
294
5. Restart the receiving instance.
6. Restart the monitoring instance.
7. On the receiving instance, use the Search app to confirm that the fields
have been extracted from the structured data files and properly indexed.
Splunk software does not parse structured data that has been forwarded to
an indexer
When you forward structured data to an indexer, it is not parsed when it arrives
at the indexer, even if you have configured props.conf on that indexer with
INDEXED_EXTRACTIONS. Forwarded data skips the following pipelines on the
indexer, which precludes any parsing of that data on the indexer:
parsing
merging
typing
If you want to forward fields that you extract from structured data files to another
Splunk instance, you must configure the props.conf settings that define the field
extractions on the forwarder that sends the data. This includes configuration of
INDEXED_EXTRACTIONS and any other parsing, filtering, anonymizing, and routing
rules. Performing these actions on the instance that indexes the data will have no
effect, as the forwarded data must arrive at the indexer already parsed.
When you use Splunk Web to modify event break and time stamp settings, it
records all of the proposed changes as a stanza for props.conf. You can find
those settings in the "Advanced" tab on the "Set Source type" page.
Use the "Copy to clipboard" link in the "Advanced" tab to copy the proposed
changes to props.conf to the system clipboard. You can then paste this stanza
into props.conf in a text editor on Splunk instances that monitor and forward
similar files.
295
Only header fields containing data are indexed
When Splunk software extracts header fields from structured data files, it only
extracts those fields where data is present in at least one row. If the header field
has no data in any row, it is skipped (that is, not indexed). Take, for example, the
following csv file:
header1,header2,header3,header4,header5
one,1,won,,111
two,2,too,,222
three,3,thri,,333
four,4,fore,,444
five,5,faiv,,555
When Splunk software reads this file, it notes that the rows in the header4 column
are all empty, and does not index that header field or any of the rows in it. This
means that neither header4 nor any of the data in its row can be searched for in
the index.
If, however, the header4 field contains rows with empty strings (for example, ""),
the field and all the rows underneath it are indexed.
Following are an example inputs.conf and props.conf to give you an idea of how
to use the file header extraction attributes.
To extract the data locally, edit inputs.conf and props.conf to define inputs and
sourcetypes for the structured data files, and use the attributes described above
to specify how to deal with the files. To forward this data to another Splunk
instance, edit inputs.conf and props.conf on the forwarding instance, and
props.conf on the receiving instance.
Inputs.conf
[monitor:///opt/test/data/StructuredData/CSVWithFewHeaderFieldsWithoutAnyValues.csv]
296
sourcetype=CSVWithFewHeaderFieldsWithoutAnyValues
[monitor:///opt/test/data/StructuredData/VeryLargeCSVFile.csv]
sourcetype=VeryLargeCSVFile
[monitor:///opt/test/data/StructuredData/UselessLongHeaderToBeIgnored.log]
sourcetype=UselessLongHeaderToBeIgnored
[monitor:///opt/test/data/StructuredData/HeaderFieldsWithFewEmptyFieldNamesWithSpaceDeli
sourcetype=HeaderFieldsWithFewEmptyFieldNamesWithSpaceDelim
[monitor:///opt/test/data/FieldHeaderRegex.log]
sourcetype=ExtractCorrectHeaders
Props.conf
[CSVWithFewHeaderFieldsWithoutAnyValues]
FIELD_DELIMITER=,
[VeryLargeCSVFile]
FIELD_DELIMITER=,
[UselessLongHeaderToBeIgnored]
HEADER_FIELD_LINE_NUMBER=35
TIMESTAMP_FIELDS=Date,Time,TimeZone
FIELD_DELIMITER=\s
FIELD_QUOTE="
[HeaderFieldsWithFewEmptyFieldNamesWithSpaceDelim]
FIELD_DELIMITER=,
HEADER_FIELD_DELIMITER=\s
FIELD_QUOTE="
[ExtractCorrectHeaders]
FIELD_HEADER_REGEX=Ignore_This_Stuff:\s(.*)
FIELD_DELIMITER=,
Sample files
The following are snippets of the files referenced in the above inputs.conf and
props.conf examples, to give you an idea of what the files look like.
You might need to scroll right quite a bit to see all of the content.
CSVWithFewHeaderFieldsWithoutAnyValues.csv
vqmcallhistoryid,serialnumber,vqmavgjbenvdelay,vqmavgjbenvnegdelta,vqmavgjbenvposdelta,v
99152,CFG0730084,-3,-2,356,64000,1,280,14,14.29,36,3499,201000,BW163736844290611-1731707
12:37:37.292,0,4.68,1.43,0.19,0,0,0,0,52,60,15,17,60,10,0,Loopback,0.48,48,46,0,30,1334,
297
0/1,2,0,54,80,80,18500,6096147089,48,1,0,2011-06-29
12:41:47.303,2011-06-29 12:41:47.303
99154,CFG0730084,-3,-1,251,64000,4,195,9,20.52,28,3494,359000,BW163502270290611594566299
12:35:02.324,0,2.88,1.11,3.44,0,0,0,0,40,40,26,24,50,10,0,Loopback,0.31,54,46,0,31,2455,
0/1,2,0,48,60,70,30400,6096147089,54,1,0,2011-06-29
12:41:47.342,2011-06-29 12:41:47.342
VeryLargeCSVFile.csv
IncidntNum,Category,Descript,DayOfWeek,Date,Time,PdDistrict,Resolution,Location,X,Y
030203898,FRAUD,"FORGERY, CREDIT
CARD",Tuesday,02/18/2003,16:30,NORTHERN,NONE,2800 Block of VAN NESS
AV,-122.424612993055,37.8014488257836
000038261,WARRANTS,WARRANT
ARREST,Thursday,04/17/2003,22:45,NORTHERN,"ARREST, BOOKED",POLK ST /
SUTTER ST,-122.420120319211,37.7877570602182
030203901,LARCENY/THEFT,GRAND THEFT
PICKPOCKET,Tuesday,02/18/2003,16:05,NORTHERN,NONE,VAN NESS AV /
MCALLISTER ST,-122.42025048261,37.7800745746105
030203923,DRUG/NARCOTIC,SALE OF BASE/ROCK
COCAINE,Tuesday,02/18/2003,17:00,BAYVIEW,"ARREST, BOOKED",1600 Block of
KIRKWOOD AV,-122.390718076188,37.7385560584619
030203923,OTHER
OFFENSES,CONSPIRACY,Tuesday,02/18/2003,17:00,BAYVIEW,"ARREST,
BOOKED",1600 Block of KIRKWOOD AV,-122.390718076188,37.7385560584619
030203923,OTHER OFFENSES,PROBATION
VIOLATION,Tuesday,02/18/2003,17:00,BAYVIEW,"ARREST, BOOKED",1600 Block
of KIRKWOOD AV,-122.390718076188,37.7385560584619
UselessLongHeaderToBeIgnored.log
298
HeaderFieldsWithFewEmptyFieldNamesWithSpaceDelim.csv
Garbage
Garbage
Garbage
Ignore_This_Stuff: Actual_Header1 Actual_Header2
Answers
Have questions? Visit Splunk Answers and see what questions and answers the
Splunk community has around extracting fields.
299
Configure host values
About hosts
The host field value of an event is the name of the physical device from which
the event originates. Because it is a default field, which means that Splunk
software assigns a host to every event it indexes, you can use it to search for all
events that have been generated by a particular host.
The host value is typically the hostname, IP address, or fully qualified domain
name of the network host on which the event originated.
Splunk software assigns a host value to each event by examining settings in the
following order and using the first host setting it encounters:
2. The default host value for the input that created the event, if any.
3. The default host value for the Splunk indexer or forwarder that initially
consumes the data.
If no other host rules are specified for a source, Splunk software assigns the host
field a default value that applies to all data coming into the instance from any
input. The default host value is the hostname or IP address of the Splunk indexer
or forwarder initially consuming the data. When the Splunk instance runs on the
server where the event occurred, this is correct and no manual intervention is
required.
For more information, see Set a default host for a Splunk instance in this manual.
300
The default host for a file or directory input
If you run Splunk Enterprise on a central log archive, or you are working with files
that are forwarded from other hosts in your environment, you might need to
override the default host assignment for events coming from particular inputs.
There are two methods for assigning a host value to data received through a
particular input. You can define a static host value for all data coming through a
specific input, or you can have Splunk software dynamically assign a host value
to a portion of the path or filename of the source. The latter method can be
helpful when you have a directory structure that segregates each host's log
archive in a different subdirectory.
For more information, see Set a default host for a file or directory input in this
manual.
Event-specific assignments
Some situations require you to assign host values by examining the event data.
For example, If you have a central log host sending events to your Splunk
deployment, you might have several host servers that feed data to that main log
server. To ensure that each event has the host value of its originating server, you
need to use the event's data to determine the host value.
For more information, see Set host values based on event data in this manual.
If your event data gets tagged with the wrong host value, don't worry. There are a
number of ways to fix or work around the problem.
For details, see Change host values after indexing in this manual.
You can tag host values to aid in the execution of robust searches. Tags enable
you to cluster groups of hosts into useful, searchable categories.
For details, see About tags and aliases in the Knowledge Manager manual.
301
Set a default host for a Splunk instance
An event host value is the IP address, host name, or fully qualified domain name
of the physical device on the network from which the event originates. Because
Splunk software assigns a host value at index time for every event it indexes,
host value searches enable you to easily find data originating from a specific
device.
If you have not specified other host rules for a source (using the information in
subsequent topics in this chapter), the default host value for an event is the
hostname or IP address of the server running the Splunk instance (forwarder or
indexer) consuming the event data. When the event originates on the server on
which the Splunk instance is running, that host assignment is correct and there's
no need to change anything. However, if all your data is being forwarded from a
different host or if you're bulk-loading archive data, you might want to change the
default host value for that data.
To set the default value of the host field, you can use Splunk Web or edit
inputs.conf.
4. On the General settings page, scroll down to the Index settings section and
change the Default host name.
This sets the default value of the host field for all events coming into that Splunk
instance. You can override the value for invidividual sources or events, as
described later in this chapter.
The default host assignment is set in inputs.conf during installation. You can
modify the host value by editing that file in $SPLUNK_HOME/etc/system/local/ or
in your own custom application directory in $SPLUNK_HOME/etc/apps/.
302
The host assignment is specified in the [default] stanza.
[default]
host = <string>
Set <string> to your
chosen default host value. <string> defaults to the IP
address or domain name of the host where the data originated.
Warning: Do not put quotes around the <string> value: host=foo, not
host="foo".
After editing inputs.conf, you must restart your Splunk instance to put your
changes into effect.
Note: By default, the host attribute is set to the variable $decideOnStartup, which
means that it's set to the hostname of the machine splunkd is running on. The
splunk daemon re-interprets the value each time it starts up.
If you are running Splunk Enterprise on a central log archive, or you are working
with files forwarded from other hosts in your environment, you might need to
override the default host assignment for events coming from particular inputs.
There are two methods for assigning a host value to data received through a
particular input. You can define a static host value for all data coming through a
specific input, or you can dynamically assign a host value to a portion of the path
or filename of the source. The latter method can be helpful when you have a
directory structure that segregates each host's log archive in a different
subdirectory.
For more information, see Set a default host for an file or directory input in this
manual.
Some situations require you to assign host values by examining the event data.
For example, If you have a central log host sending events to your Splunk
deployment, you might have several host servers feeding data to that main log
server. To ensure that each event has the host value of its originating server, you
303
need to use the event's data to determine the host value.
For more information, see Set host values based on event data in this manual.
If you set the host value statically, the same host is assigned to every event
received from a designated file or directory input.
If you set the host value dynamically, the host name is extracted from the source
input using a regular expression or segment of the full directory path of the
source.
You can also assign host values to events that come through a particular file or
directory input based on their source or source type values (as well as other
kinds of information). See Set host values based on event data.
At this time, you cannot enable the setting of default host values for network
(TCP and UDP) or scripted inputs.
This method applies a single default host value to each event that a specific file
or directory input generates.
A static host value assignment only affects new events that a certain input
generates. You cannot assign a default host value to data that has already been
indexed. Instead, you must tag the host value to the existing events. See Define
and manage tags in the Knowledge Manager Manual.
You can define a host for a file or directory input whenever you add or edit an
input of that type.
To set the default host when creating a new input, see Set a default host for a
new input.
304
1. Click Settings > Data Inputs.
2. Click Files & Directories.
3. On the Files & directories page, click the name of an existing input to
update it.
4. In the Host section, select the "constant value" option from the Set host
dropdown.
5. Enter the static host value for the input in the Host field value field.
6. Click Save.
The process to set a default host is different when you create a new input.
Note: If you specified a directory, the "Set Sourcetype" page does not
appear.
7. Click Next.
8. On the Input Settings page, in the Host section, click the Constant
Value button.
9. In the Host field value field, enter the host name for the input.
10. Click Review to continue to the Review page.
11. Click Submit to create the input.
Edit inputs.conf
To specify a host value for a monitored file or directory input, edit inputs.conf to
specify a host value for a monitored file or directory input. When you edit
inputs.conf, set the host attribute in the stanza that defines the input. If you
have Splunk Cloud, you configure this setting on the machines where you run the
Splunk universal forwarder.
[monitor://<path>]
host = <your_host>
Edit inputs.conf in $SPLUNK_HOME/etc/system/local/ or in your own custom
application directory in $SPLUNK_HOME/etc/apps/. For more information on
305
configuration files in general, see About configuration files in the Admin manual.
For more information about inputs and input types, see What data can I index? in
this manual.
This example covers any events coming in from /var/log/httpd. Any events
coming from this input will receive a host value of webhead-1.
[monitor:///var/log/httpd]
host = webhead-1
Dynamically set the default host value
This method dynamically extracts the host value for a file or directory input, either
from a segment of the source input path or from a regular expression. For
example, if you want to index an archived directory and the name of each file in
the directory contains relevant host information, you can extract this information
and assign it to the host field.
regex on path - Choose this option if you want to extract the host
name with a regular expression. Then enter the regex for the host
you want to extract in the Regular expression field.
306
want the third segment (the host server name) to be the host value,
enter "3".
5. Click Save.
The process to set a default host dynamically is different when you create a new
input.
Edit inputs.conf
You can set up dynamic host extraction rules by configuring inputs.conf. For
more information on configuration files in general, see About configuration files in
the Admin manual.
[monitor://<path>]
host_regex = <your_regular_expression>
307
3. Save the inputs.conf file.
4. Restart the Splunk instance.
The regular expression extracts the host value from the filename of each input.
The input uses the first capturing group of the regular expression as the host. If
the regular expression fails to match, the input sets the default host attribute as
the host.
The host_segment value overrides the host field with a value that has been
extracted from a segment in the path of your data source.
[monitor://var/log/]
host_segment = 3
3. Save the inputs.conf file.
4. Restart the Splunk instance.
In this example, the regular expression assigns all events from /var/log/foo.log
a host value of "foo":
[monitor://var/log]
host_regex = /var/log/(\w+)
This example assigns the host value to the third segment in the path
apache/logs:
[monitor://apache/logs/]
host_segment = 3
308
Caveats to setting the host_segment attribute to extract a host name
[monitor:///mnt/logs/]
host_segment = 3
Resulting host value: server01
[monitor:///mnt/logs/server01]
source = /mnt/logs/server01
host_segment = 3
Resulting host value: server01
[monitor:///mnt/logs/server02]
source = serverlogs
host_segment = 3
Resulting host value: server02
309
Set host values based on event data
You can configure Splunk software to assign host names to your events based
on the data in those events. This topic shows you how to use event data to
override default host assignments with props.conf, transforms.conf, and regular
expressions.
Configuration
transforms.conf
[<unique_stanza_name>]
REGEX = <your_regex>
FORMAT = host::$1
DEST_KEY = MetaData:Host
Note the following:
props.conf
310
[<spec>]
TRANSFORMS-<class> = <unique_stanza_name>
Note the following:
Example
Assume that you're starting with the following set of events from the
houseness.log file. The host is in the third position ("fflanda", etc.).
[houseness]
DEST_KEY = MetaData:Host
REGEX = \s(\w*)$
FORMAT = host::$1
Next, reference your transforms.conf stanza in a props.conf stanza. For
example:
[source::.../houseness.log]
TRANSFORMS-rhallen=houseness
SHOULD_LINEMERGE = false
The above stanza has the additional attribute/value pair SHOULD_LINEMERGE =
false, to break events at each newline.
311
Change host values after indexing
At some point after indexing, you might discover that the host value for some of
your events is not correct. For example, you might be collecting some Web proxy
logs into a directory directly on your Splunk Enterprise server and you add that
directory as an input without remembering to override the value of the host field,
which results in the host value being the same as your Splunk Enterprise host.
If something like that happens, here are your options, from easiest to hardest:
Of these options, deleting and reindexing gives you the best performance and is
the easiest. If you cannot delete and reindex the data, then the last option
provides the cleanest alternative.
312
Configure source types
Because the source type controls how Splunk software formats incoming data, it
is important that you assign the correct source type to your data. That way, the
indexed version of the data (the event data) looks the way you want, with
appropriate timestamps and event breaks. This facilitates easier searching of
the data later.
Splunk software comes with a large number of predefined source types. When
consuming data, Splunk software will usually select the correct source type
automatically. If your data is specialized, you might need to manually select a
different predefined source type. If your data is unusual, you might need to create
a new source type with customized event processing settings. And if your data
source contains heterogeneous data, you might need to assign the source type
on a per-event (rather than a per-source) basis.
Like any other field, you can also use the source type field to search event data,
once the data has been indexed. You will use it a lot in your searches since the
source type is a key way to categorize your data.
Any common data input format can be a source type. Most source types are log
formats. For example, some common source types that Splunk software
automatically recognizes include:
313
websphere_core, a core file export from WebSphere.
For a complete list of predefined source types, see List of pretrained source
types in this manual.
There are two basic types of configuration you can do with source types:
In most cases, Splunk software determines the best source type for your data
and automatically assigns it to incoming events. In some cases, however, you
might need to explicitly assign a source type to your data. You usually do this
when defining the data input. For details on how to improve source type
assignment, see:
Later in this topic, there is a section that explains how Splunk software assigns
source types.
If none of the existing source types fits the needs of your data, create a new one.
Splunk Web lets you adjust source type settings to fit your data. In essence, it is
a visual source type editor. See The Set Sourcetype page.
If you have Splunk Enterprise, you can also create a new source type by directly
editing props.conf and adding a source type stanza. See Create source types. If
you have Splunk Cloud, use Splunk Web to define source types.
314
Preview data to test and modify source types
Splunk Web lets you review the effects of applying a source type to an input. It
lets you preview the resulting events without actually committing them to an
index. You can also edit timestamp and event breaking settings interactively and
then save the modifications as a new source type. For information on how data
preview functions as a source type editor, see The Set Sourcetype page.
sourcetype is the name of the source type search field. You can use the
sourcetype field to find similar types of data from any source type. For example,
you could search sourcetype=weblogic_stdout to find all of your WebLogic
server events, even when WebLogic is logging from more than one domain (or
"host," in Splunk terms).
The following list shows how Splunk software goes about determining the source
type for a data input. Splunk software starts with the first method and then
descends through the others as necessary, until it can determine the source
type. The list also provides an overview on how you configure source type
assignment for each level.
If Splunk software finds an explicit source type for the data input, it stops here.
You configure this in inputs.conf or Splunk Web. Here is the inputs.conf syntax
for assigning source types to a file input:
[monitor://<path>]
315
sourcetype=<sourcetype>
You can also assign a source type when defining an input in Splunk Web. For
information on doing this for file inputs, see Monitor files and directories with
Splunk Web in this manual. The process is similar for network or other types of
inputs.
If Splunk software finds an explicit source type for the particular source, it stops
here.
[source::<source>]
sourcetype=<sourcetype>
For more information, see Specify source type for a source.
Splunk software looks next for any rules you've created for source types.
[rule::<rule_name>]
sourcetype=<sourcetype>
MORE_THAN_[0-100] = <regex>
LESS_THAN_[0-100] = <regex>
For information about setting up source type recognition rules, see Configure
rule-based source type recognition.
Splunk software next attempts to use automatic source type recognition to match
similar-looking files and assign a source type.
Splunk software calculates signatures for patterns in the first few thousand lines
of any file or network input stream. These signatures identify things like repeating
word patterns, punctuation patterns, line length, and so on. When Splunk
software calculates a signature, it compares it to its set of signatures for known,
316
"pretrained" source types. If it identifies a match, it assigns that source type to
the data.
See List of pretrained source types in this manual for a list of the source types
that Splunk software can recognize out of the box.
If Splunk software hasn't identified a source type by now, it looks for any delayed
rules.
A good use of delayed rule associations is for generic versions of very specific
source types that were defined earlier with rule:: in step 3, above. For example,
you could use rule:: to catch event data with specific syslog source types, such
as "sendmail syslog" or "cisco syslog" and then have delayedrule:: apply the
generic "syslog" source type to the remaining syslog event data.
[delayedrule::$RULE_NAME]
sourcetype=$SOURCETYPE
MORE_THAN_[0-100] = $REGEX
LESS_THAN_[0-100] = $REGEX
For more information about setting up or removing delayed rules for source type
recognition, see Configure rule-based source type recognition.
If Splunk software is unable to assign a source type for the event using the
preceding methods, it creates a new source type for the event signature (see
step 4, above). Splunk software stores learned pattern information in
sourcetypes.conf.
317
can specify what source type to assign. You can also configure Splunk software
so that it assigns a source type based on either the data input or the data source.
For details on the precedence rules that Splunk software uses to assign source
types to data, read How Splunk software assigns source types.
Overrides only work on file and directory monitoring inputs or files you have
uploaded. You cannot override the source type on network inputs. Additionally,
overrides only affect new data that arrives after you set up the override. To
correct the source types of events that have already been indexed, create a tag
for the source type instead.
This topic describes how to specify a source type based for data based on its
input and source.
You can assign the source type for data coming from a specific input, such as
/var/log/. If you have Splunk Enterprise, you do this in Splunk Web or by editing
the inputs.conf configuration file. If you have Splunk Cloud, use Splunk Web to
define source types.
Note: While assigning source type by input seems like a simple way to handle
things, it is not very granular--when you use it, Splunk software assigns the same
source type to all data from an input, even if some of the data comes from
different sources or hosts. To bypass automatic source type assignment in a
more targeted manner, you can assign source types based on the source of the
data, as described later in this topic.
When you define a data input, you can set a source type value to be applied to
all incoming data from that input. You can pick a source type from a list or enter
your own source type value.
To select a source type for an input, change the source type settings for the data
input type you want to add. For example, for file inputs:
318
4. Click the New button to add an input.
5. In the "Add Data" page, browse or enter the name of the file you want to
monitor, then click "Next".
6. In the "Set Sourcetype" page, click the "Sourcetype" drop-down and choose
from the list of pretrained source types. Splunk Web updates the page to show
how the data looks when it receives the new source type.
7. If you want to make changes to the source type, use the "Event Breaks",
"Timestamp", and "Advanced" tabs to modify settings and refresh the data
preview. See The Set Sourcetype page in this manual.
8. If you want to save the source type as a different name, click Save As? to
open a dialog box to save the new source type. Otherwise, proceed to Step 10.
9. If you chose to save the source type, Splunk Web displays the "Save
Sourcetype" dialog. Enter the name, description, category, and app that the
source type should apply to. See Save modifications as a new source type.
10. Click "Next" to set the source type for the data and proceed to the Input
settings" page.
Splunk software now assigns your selected source type to all events it indexes
for that input.
When you configure an input in inputs.conf, you can specify a source type for the
input. Edit inputs.conf in $SPLUNK_HOME/etc/system/local/ or in your own
custom application directory in $SPLUNK_HOME/etc/apps/. For information on
configuration files in general, see About configuration files in the Admin manual.
To specify a source type, include a sourcetype attribute within the stanza for the
input. For example:
[tcp://:9995]
connection_host=dns
sourcetype=log4j
source=tcp:9995
This example sets the source type to "log4j" for any events coming from your
TCP input on port 9995.
319
Caution: Do not put quotes around the attribute value: sourcetype=log4j, not
sourcetype="log4j".
Use props.conf to override automated source type matching and explicitly assign
a single source type to all data coming from a specific source.
Note: If you forward data, and you want to assign a source type for a source, you
must assign the source type in props.conf on the forwarder. If you do it in
props.conf on the receiver, the override has no effect.
To override source type assignment, add a stanza for your source to props.conf.
In the stanza, identify the source path, using regular expression (regex) syntax
for flexibility if necessary. Then specify the source type by including a sourcetype
attribute. For example:
[source::.../var/log/anaconda.log(.\d+)?]
sourcetype=anaconda
This example sets the source type to "anaconda" for events from any sources
containing the string /var/log/anaconda.log followed by any number of numeric
characters.
[source::/home/fflanda/...]
sourcetype=mytype
This is dangerous. It tells Splunk software to process any gzip files in
/home/fflanda as "mytype" files rather than gzip files.
Instead, write:
[source::/home/fflanda/....log(.\d+)?]
sourcetype=mytype
320
Configure rule-based source type recognition
You can use rule-based source type recognition to expand the range of source
types that Splunk software recognizes. In props.conf, you create a rule::
stanza that associates a specific source type with a set of qualifying criteria.
When consuming data, Splunk software assigns the specified source type to file
inputs that meet the rule's qualifications.
You can create two kinds of rules in props.conf: rules and delayed rules. The
only difference between the two is the point at which Splunk software checks
them during the source typing process. As it processes each set of incoming
data, Splunk software uses several methods to determine source types:
After checking for explicit source type definitions based on the data input
or source, Splunk software looks at any rule:: stanzas defined in
props.conf and tries to match source types to the data based on the
classification rules specified in those stanzas.
If Splunk software is unable to find a matching source type using the
available rule:: stanzas, it tries to use automatic source type matching,
where it tries to identify patterns similar to source types it has learned in
the past.
If that method fails, Splunk software then checks any delayedrule::
stanzas in props.conf and tries to match the data to source types using
the rules in those stanzas.
For details on the precedence rules that Splunk software uses to assign source
types to data, read How Splunk software assigns source types.
You can configure your system so that rule:: stanzas contain classification rules
for specialized source types, while delayedrule:: stanzas contain classification
rules for generic source types. That way, Splunk software applies the generic
source types to broad ranges of events that haven't qualified for more specialized
source types. For example, you could use rule:: stanzas to catch data with
specific syslog source types, such as sendmail_syslog or cisco_syslog, and
then configure a delayedrule:: stanza to apply the generic syslog source type
to any remaining syslog data.
321
Configuration
[rule::<rule_name>] OR [delayedrule::<rule_name>]
sourcetype=<source_type>
MORE_THAN_[0-99] = <regex>
LESS_THAN_[1-100] = <regex>
You set a numerical value in the MORE_THAN and LESS_THAN attributes,
corresponding to the percentage of input lines that must contain the string
specified by the regular expression. For example, MORE_THAN_80 means at least
80% of the lines must contain the associated expression. LESS_THAN_20 means
that less than 20% of the lines can contain the associated expression.
Note: Despite its nomenclature, the MORE_THAN_ attribute actually means "more
than or equal to". Similarly the LESS_THAN_ attribute means "less than or equal
to".
Examples
322
sourcetype = postfix_syslog
# If 80% of lines match this regex, then it must be this type
MORE_THAN_80=^\w{3} +\d+ \d\d:\d\d:\d\d .* postfix(/\w+)?\[\d+\]:
Delayed rule for breakable text
# breaks text on ascii art and blank lines if more than 10% of lines
have
# ascii art or blank lines, and less than 10% have timestamps
[delayedrule::breakable_text]
sourcetype = breakable_text
MORE_THAN_10 = (^(?:---|===|\*\*\*|___|=+=))|^\s*$
LESS_THAN_10 = [: ][012]?[0-9]:[0-5][0-9]
323
other web servers), with "61.3.110.148.1124404439914689"
cookie field added at end
NCSA common format http
web server logs (can be 10.1.1.140 - - [16/May/2005:15:01:
access_common
generated by apache or /themes/ComBeta/images/bullet.png
other web servers)
Standard Apache web [Sun Aug 7 12:17:35 2005] [error]
apache_error
server error log exist: /home/reba/public_html/imag
"","5106435249","1234","default","
Standard Asterisk IP PBX Jesse""<5106435249>","SIP/5249-1ce
asterisk_cdr
call detail record 15:19:25","2005-05-26 15:19:25","2
15:19:42",17,17,"ANSWERED","DOCUME
Standard Asterisk event log Aug 24 14:08:05 asterisk[14287]: M
asterisk_event
(management events) 127.0.0.1
Standard Asterisk
Aug 24 14:48:27 WARNING[14287]: Ch
asterisk_messages messages log (errors and extension 's' in context 'default'
warnings)
Standard Asterisk queue
asterisk_queue 1124909007|NONE|NONE|NONE|CONFIGRE
log
Standard Cisco syslog
produced by all Cisco Sep 14 10:51:11 stage-test.splunk.
00:08:49: %PIX-2-106001: Inbound T
network devices including
cisco_syslog to IP_addr/port flags TCP_flags on
PIX firewalls, routers, ACS, connection denied from 144.1.10.22
etc., usually via remote on interface outside
syslog to a central log host
Standard IBM DB2 2005-07-01-14.08.15.304000-420 I27
: 4760 PROC : db2fmp.exe INSTANCE
db2_diag database administrative Automatic Table Maintenance, db2Hm
and error log Automatic Runstats: evaluation has
2005-08-19 09:02:43 1E69KN-0001u6-
exim_main Exim MTA mainlog R=send_to_relay T=remote_smtp H=ma
2005-08-08 12:24:57 SMTP protocol
exim_reject Exim reject log (input sent without waiting for gr
H=gate.int.splunk.com [10.2.1.254]
Standard linux syslog
Aug 19 10:04:28 db1 sshd(pam_unix)
linux_messages_syslog (/var/log/messages on most by (uid=0)
platforms)
Aug 18 16:19:27 db1 sshd[29330]: A
linux_secure Linux securelog from ::ffff:10.2.1.5 port 40892 ss
log4j
324
Log4j standard output 2005-03-07 16:44:03,110 53223013 [
produced by any J2EE property...
server using log4j
050818 16:19:29 InnoDB: Started; l
mysqld_error Standard mysql error log /usr/libexec/mysqld: ready for con
socket: '/var/lib/mysql/mysql.sock
Standard MySQL query log;
also matches the MySQL 53 Query SELECT xar_dd_itemid, xar
mysqld
binary log following xar_dynamic_data WHERE xar_dd_prop
conversion to text
Standard Postfix MTA log
Mar 1 00:01:43 avas postfix/smtpd[
postfix_syslog reported via the Unix/Linux client=host76-117.pool80180.interb
syslog facility
Standard Sendmail MTA Aug 6 04:03:32 nmrjl00 sendmail[52
ctladdr=root (0/0), delay=00:00:01
sendmail_syslog log reported via the min=00026, relay=[101.0.0.1] [101.
Unix/Linux syslog facility (v00F3HmX004301 Message accepted f
Standard Sugarcrm activity Fri Aug 5 12:39:55 2005,244 [28666
sugarcrm_log4php log reported using the the application list language file
log4php utility the default language(en_us)
325
Standard Websphere [7/1/05 13:41:00:516 PDT] 000003ae
system error log in the IBM com.ibm.ws.http.channel. inbound.i
(HttpICLReadCallback.java(Compiled
native trlog format
Standard Websphere
system out log in the IBM
native trlog format; similar [7/1/05 13:44:28:172 PDT] 0000082d
to the log4j server log for 2005 TradeStreamerMDB: 100 Trade s
Statistics Total update Quote Pric
websphere_trlog_sysout Resin and Jboss, sample stock update alerts messages (in s
format as the system error 1.0365270454545454 The current pri
log but containing lower s:393 old price = 15.47 new price
severity and informational
events
0050818050818 Sep 14 10:49:46 stag
Standard windows event MSWinEventLog 0 Security 3030 Day
log reported through a 3rd admin4 User Success Audit Test_Hos
party Intersect Alliance Object Type: File Object Name: C:\
windows_snare_syslog
Snare agent to remote 1220 Operation ID: {0,117792} Proc
syslog on a Unix or Primary Domain: FLAME Primary Logo
Client Domain: - Client Logon ID:
Linuxserver ListDirectory) Privileges -Sep
Special source types
Source type
Origin Examples
name
The filename matches a mp3 files, images, .rdf, .dat,
known_binary pattern that is generally known etc. This is intended to catch
to be a binary file, not a log file obvious non-text files
Pretrained source types
These are all the pretrained source types, including both those that are
automatically recognized and those that are not.
326
osx_crashreporter, osx_crash_log, osx_install, osx_secure,
osx_daily, osx_weekly, osx_monthly, osx_window_server,
windows_snare_syslog, dmesg, ftp, ssl_error, syslog, sar,
rpmpkgs
Metrics collectd_http, metrics_csv, statsd
Network novell_groupwise, tcp
Printers cups_access, cups_error, spooler
Routers and
cisco_cdr, cisco:asa, cisco_syslog, clavister
firewalls
asterisk_cdr, asterisk_event, asterisk_messages,
VoIP
asterisk_queue
access_combined, access_combined_wcookie,
Webservers
access_common, apache_error, iis?
splunk_com_php_error, splunkd, splunkd_crash_log,
splunkd_misc, splunkd_stderr, splunk-blocksignature,
splunk_directory_monitor, splunk_directory_monitor_misc,
Splunk splunk_search_history, splunkd_remote_searches,
splunkd_access, splunkd_ui_access, splunk_web_access,
splunk_web_service, splunkd_conf?, django_access,
django_service, django_error, splunk_help, mongod
csv?, psv?, tsv?, _json?, json_no_timestamp, fs_notification,
Non-Log files
exchange?, generic_single_line
Miscellaneous snort, splunk_disk_objects?, splunk_resource_usage?,
/ Other kvstore?
? These source types use the INDEXED_EXTRACTIONS attribute, which sets other
attributes in props.conf to specific defaults, and requires special handling to
forward to another Splunk instance. See Forward data extracted from structured
data files.
To find out what configuration information Splunk software uses to index a given
source type, you can invoke the btool utility to list out the properties. For more
information on using btool, refer to Use btool to troubleshoot configurations in
the Troubleshooting manual.
The following example shows how to list out the configuration for the tcp source
327
type:
328
process.
For information about configuring basic (not per-event) source type overrides for
event data that comes from specific inputs or that has a particular source, see
Override automatic source type assignment in this manual.
Configuration
transforms.conf
[<unique_stanza_name>]
REGEX = <your_regex>
FORMAT = sourcetype::<your_custom_sourcetype_value>
DEST_KEY = MetaData:Sourcetype
Note the following:
props.conf
[<spec>]
TRANSFORMS-<class> = <unique_stanza_name>
329
Note the following:
Let's say that you have a shared UDP input, "UDP514". Your Splunk deployment
indexes a wide range of data from a number of hosts through this input. You've
found that you need to apply a particular source type called "my_log" to data
originating from three specific hosts (host1, host2, and host3) reaching your
Splunk deployment through UDP514.
To start, you can use the regular expression that Splunk software typically uses
to extract the host field for syslog events. You can find it in
system/default/transforms.conf:
[syslog-host]
REGEX
= :\d\d\s+(?:\d+\s+|(?:user|daemon|local.?)\.\w+\s+)*\[?(\w[\w\.\-]{2,})\]?\s
FORMAT = host::$1
DEST_KEY = MetaData:Host
You can easily modify this regular expression to only match events from the
hostnames you want (in this example, host1, host2, and host3):
REGEX
= :\d\d\s+(?:\d+\s+|(?:user|daemon|local.?)\.\w+\s+)*\[?(host1|host2|host3)[\w\.\-]*\]?\
Now you can use the modified regular expression in a transform that applies the
my_log source type to events that come from those three hosts:
[set_sourcetype_my_log_for_some_hosts]
REGEX
= :\d\d\s+(?:\d+\s+|(?:user|daemon|local.?)\.\w+\s+)*\[?(host1|host2|host3)[\w\.\-]*\]?\
FORMAT = sourcetype::my_log
DEST_KEY = MetaData:Sourcetype
330
Then you can specify that transform in a props.conf stanza that identifies the
specific input for the events:
[source::udp:514]
TRANSFORMS-changesourcetype = set_sourcetype_my_log_for_some_hosts
Use the "Set Sourcetype" page in Splunk Web as part of adding the data.
Create a source type in the "Source types" management page, as
described in Add source type.
Edit the props.conf configuration file directly.
The "Set Sourcetype" page in Splunk Web provides an easy way to view the
effects of applying a source type to your data and to make adjustments to the
source type settings as necessary. You can save your changes as a new source
type, which you can then assign to data inputs.
The page lets you make the most common types of adjustments to timestamps
and event breaks. For other modifications, it lets you edit the underlying
props.conf file directly. As you change settings, you can immediately see the
changes to the event data.
The page appears only when you specify or upload a single file. It does not
appear when you specify any other type of source.
To learn more about the page, see The "Set Sourcetype" page in this manual.
You can use the "Source types" management page to create a new source type.
See Add source type in this manual.
331
Edit props.conf
If you have Splunk Enterprise, you can create a new source type by editing
props.conf and adding a new stanza. For detailed information on props.conf,
read the props.conf specification in the Admin manual. For information on
configuration files in general, see About configuration files in the Admin manual.
[access_combined]
pulldown_type = true
maxDist = 28
MAX_TIMESTAMP_LOOKAHEAD = 128
REPORT-access = access-extractions
SHOULD_LINEMERGE = False
TIME_PREFIX = \[
category = Web
description = National Center for Supercomputing Applications (NCSA)
combined fo
rmat HTTP web server logs (can be generated by apache or other web
servers)
[source::/opt/weblogs/apache.log]
sourcetype = iis
To edit props.conf:
[my_sourcetype]
attribute1 = value
attribute2 = value
Note: See the props.conf specification for a list of attributes and how they
should be used.
332
4. (Optional) If you know the name of the file (or files) to which the source
type is to be applied, specify them in the [source::<source>] stanza:
[my_sourcetype]
attribute1 = value
attribute2 = value
<br>
[source::.../my/logfile.log]
sourcetype = my_sourcetype
When you create a source type, there are some key attributes that you should
specify:
There are also a number of additional settings that you can configure. See the
props.conf specification for more information.
333
The Source Types page displays all source types that have been configured on
the instance. It shows the default source types provided by your Splunk
deployment and any source types that you have added.
Each header bar (except for "Actions") acts as a toggle. Click once to sort in
ascending order and click again to sort in descending order.
You can filter the number of source types you see n the Source Type
management page.
To see only the most common source types, click the Show only popular
checkbox along the top of the page. Popular source types are the most
common source types that customers use. They have a pulldown_type
source type field value of 1. When the "Show only popular" checkbox is
not selected, the page shows all source types that have been defined on
the instance.
To see only source types that belong to a certain category, click the
Category drop-down and select the category you want. Only source types
334
that belong to that category will display. To see all source types again,
select "All" from the "Category" drop-down.
To see only source types that belong in a certain application context, click
the App drop-down and select the application context that the source type
applies to. Only source types that apply to that application context will
display. To see all source types again, select "All" from the "App"
drop-down.
To see only source types whose names contain a certain string, type that
string in the Filter text box next to the "App" drop-down, then press Enter.
Only source types whose names or descriptions match what you have
typed in the "Filter" box display. To se all source types again, click the "x"
button on the right side of the "Filter" text box.
To modify a source type, click its name in the list, or click its Edit link in the
Actions column. The "Edit Source type" page appears.
The Edit Source Type dialog box lets you change the configuration of a source
type. You can change the following:
Description: Type in the description of the source type in the "Description" field.
Destination app: The application context that the source type applies to.
335
Note: You cannot change the app destination for source types that come with
Splunk software.
Category: The category that the source type is a member of. Click the button to
select from the list of categories and choose the one you want. When you save,
the source type appears in the category you selected.
Indexed Extractions: A format for extracting fields at index time from files with
structured data. Select the type of indexed extraction that best represents the
contents of the file:
Timestamp:
The Timestamp section of the dialog controls how timestamps are determined for
events from the source file.
The following advanced configurations are available when you select Advanced
in the "Timestamp Extraction" section:
336
%d %b %Y %H:%M:%S
Another example:
maps to
%a %b %d %I:%M:%S %p %Y
For a list of the strings that you can use to define the time stamp format, see
strptime(3) (http://linux.die.net/man/3/strptime) on the die.net Linux man page
site.
Advanced
The Advanced section of the dialog shows you all of the configurations for the
source type, in key/value format. This represents what is in the props.conf file
that defines the source type. You can edit each setting directly, or add and delete
settings. To delete settings, click the "x" on the right side of each setting. To add
an entry, click the "New setting" link at the bottom of the dialog. This exposes a
key/value pair of fields. Enter the key name in the "Name" field and its value in
the "Value" field.
Caution: Use the "Advanced" section with care. Adding or changing values here
can cause data to be incorrectly indexed.
To create a new source type, click the New Source Type button at the top right
of the screen. The Create Source Type dialog box opens.
This dialog is exactly the same as the "Edit Source Type" dialog. See
"Managesourcetypes for information on the controls in the dialog.
337
When you have finished configuring the source type, click "Save."
To delete a source type, click the Delete link in the "Actions" column for the
source type that you want to delete. You cannot delete built-in source types, only
source types that you create or that come with apps.
Data can be indexed incorrectly after you delete the source type. Making
the data searchable in the way you want later can take a lot of effort. Many
apps and add ons use source types to look for data, and data indexed
under a missing source type is data those apps and add-ons do not see.
Any configurations that the source type uses, such as field extractions,
index time filtering, and time stamp formats, are irretrievably lost.
You cannot undo a source type deletion. The only options available in this
case are to restore the props.conf file that defines the source type from a
backup, or recreate the source type manually.
If you are sure you want to delete the source type, click "Delete". The dialog
closes and Splunk Web returns you to the Source Types management page.
338
If you have Splunk Enterprise, you can use the rename attribute in props.conf to
assign events to a new source type at search time. In case you ever need to
search on it, the original source type is moved to a separate field, _sourcetype.
Note: The indexed events still contain the original source type name. The
renaming occurs only at search time. Also, renaming the source type does only
that; it does not fix any problems with the indexed format of your event data
caused by assigning the wrong source type in the first place.
To rename the source type, add the rename attribute to your source type stanza:
rename = <string>
Note: A source type name can only contain the letters a though z, the numerals 0
through 9, and the _ (underscore) character.
For example, say you're using the source type "cheese_shop" for your
application server. Then, accidentally, you index a pile of data as source type
"whoops". You can rename "whoops" to "cheese_shop" with this props.conf
stanza:
[whoops]
rename=cheese_shop
Now, a search on "cheese_shop" will bring up all the "whoops" events as well as
any events that had a "cheese_shop" source type from the start:
sourcetype=cheese_shop
If you ever need to single out the "whoops" events, you can use _sourcetype in
your search:
_sourcetype=whoops
Important: Data from a renamed source type will only use the search-time
configuration for the target source type ("cheese_shop" in this example). Any
field extractions for the original source type ("whoops" in the example) will be
ignored.
339
Manage event segmentation
You can define how detailed the event segmentation should be. This is important
because index-time segmentation affects indexing and search speed, storage
size, and the ability to use typeahead functionality (where Splunk Web provides
items that match text you type into the Search bar). Search-time segmentation,
on the other hand, affects search speed and the ability to create searches by
selecting items from the results displayed in Splunk Web.
For more information about the distinction between "index time" and "search
time," see "Index time versus search time" in the Managing Indexers and
Clusters manual.
340
0, 2, and 223. Setting inner segmentation at index time leads to faster
indexing and searching and reduced disk usage. However, it restricts the
typeahead functionality, so that a user can only type ahead at the minor
segment level.
Outer segmentation is the opposite of inner segmentation. Under outer
segmentation, Splunk software only indexes major segments. For
example, the IP address 192.0.2.223 gets indexed as 192.0.2.223, which
means that you cannot search on individual pieces of the phrase. You can
still use wildcards, however, to search for pieces of a phrase. For
example, you can search for 192.0* and you will get any events that have
IP addresses that start with 192.0. Also, outer segmentation disables the
ability to click on different segments of search results, such as the 192.0
segment of the same IP address. Outer segmentation tends to be
marginally more efficient than full segmentation, while inner segmentation
tends to be much more efficient.
Full segmentation is a combination of inner and outer segmentation.
Under full segmentation, the IP address is indexed both as a major
segment and as a variety of minor segments, including minor segment
combinations like 192.0 and 192.0.2. This is the least efficient indexing
option, but it provides the most versatility in terms of searching.
No segmentation
341
Important: Do not modify the default file. If you want to make changes to the
existing segmentation stanzas or create new ones altogether, you can copy the
default file to $SPLUNK_HOME/etc/system/local/ or to a custom app directory in
$SPLUNK_HOME/etc/apps/. For information on configuration files and directory
locations, see "About configuration files".
For details about how to apply segmentation types to specific event categories,
see "Set the segmentation for event data".
Splunk software can also segment events at search time. You can set
search-time segmentation in Splunk Web, as described in "Set search-time
segmentation in Splunk Web".
If you know how you want to search for or process events from a specific host,
source, or source type, you can configure index-time segmentation for that
specific type of event. You can also configure search-time segmentation options
for specific types of events.
342
stanzas, you assign segmentation types, or "rules", that have been defined in
segmenters.conf. These can either be predefined types (such as inner, outer, or
full), or custom types that you've defined. For more information on defining
custom types, read "Configure segmentation types".
The attribute you configure in props.conf to use these types depends on whether
you're configuring index-time or search-time segmentation:
You can define either one of the attributes or both together in the stanza.
Index-time segmentation
The SEGMENTATION attribute determines the segmentation type used at index time.
Here's the syntax:
[<spec>]
SEGMENTATION = <seg_rule>
SEGMENTATION = <seg_rule>
This specifies the type of segmentation to use at index time for [<spec>]
events.
<seg_rule>
A segmentation type, or "rule", defined in segmenters.conf
Common settings are inner, outer, none, and full, but the default
file contains other predefined segmentation rules as well.
Create your own custom rule by editing
$SPLUNK_HOME/etc/system/local/segmenters.conf, as described in
"Configure segmentation types".
343
Search-time segmentation
[<spec>]
SEGMENTATION-<segment_selection> = <seg_rule>
SEGMENTATION-<segment_selection> = <seg_rule>
<seg_rule>
A segmentation type, or "rule", defined in segmenters.conf
Common settings are inner, outer, none, and full, but the default
file contains other predefined segmentation rules as well.
Create your own custom rule by editing
$SPLUNK_HOME/etc/system/local/segmenters.conf, as described in
"Configure segmentation types".
344
Example
This example sets both index-time and search-time segmentation rules for
syslog events.
[syslog]
SEGMENTATION = inner
SEGMENTATION-full= inner
This stanza changes the index-time segmentation for all events with a syslog
source type to inner segmentation. It also causes the full radio button in Splunk
Web to invoke inner segmentation for those same events.
3. In the Event Segmentation dropdown box, choose from the available options:
full, inner, outer, or raw. The default is "full".
You can configure the meaning of these dropdown options, as described in "Set
the segmentation for event data".
345
Improve the data input process
If you find that the inputs you started with are not the ones you want, or that the
indexed events don't appear the way you need them to, you can keep working
with the test index until you get results you like. When things start looking good,
you can edit the inputs to point to your main index instead.
You can preview how Splunk software will index your data into a test index.
During preview, you can adjust some event processing settings interactively. See
"The "Set Sourcetype" page" for details.
To learn how to create and use custom indexes, read "Create custom indexes" in
the Managing Indexers and Clusters manual. There are a few basic steps,
described in detail in that topic:
1. Create the test index, using Splunk Web, or, if you have Splunk Enterprise,
using the CLI or by editing indexes.conf directly. See "Create custom indexes"
for details.
2. When configuring the data inputs, route events to the test index. You can
usually do this in Splunk Web. For each input:
a. When configuring the input from the Add data page, check the More settings
option. It reveals several new fields, including one called Index.
b. In the Index dropdown box, select your test index. All events for that data
input will now go to that index.
c. Repeat this process for each data input that you want to send to your test
index.
346
You can also specify an index when configuring an input in inputs.conf, as
described here.
3. When you search, specify the test index in your search command. (By default,
Splunk software searches the "main" index.) Use the index= command:
index=test_index
Note : When searching a test index for events coming in from your newly created
input, use the Real-time > All time(real-time) time range for the fields sidebar.
The resulting real-time search will show all events being written to that index
regardless of the value of their extracted time stamp. This is particularly useful if
you are indexing historical data into your index that a search for "Last hour" or
"Real-time > 30 minute window" would not show.
If you want to clean out your test index and start over again, use the CLI clean
command, described here.
Once you're satisfied with the results and are ready to start indexing for real,
you'll want to edit your data inputs so that they point to the default, "main" index,
instead of the test index. This is a simple process, just the reverse of the steps
you took to use the test index in the first place. For each data input that you've
already set up:
1. Go back to the place where you initially configured the input. For example, if
you configured the input from the Add data page in Splunk Web, return to the
configuration screen for that input:
b. Select the input's data type to see a list of all configured inputs of that type.
c. Select the specific data input that you want to edit. This will take you to a
screen where you can edit it.
d. Select the Display advanced settings option. Go to the field named Index.
e. In the Index dropdown box, select the main index. All events for that data
input will now go to that index.
347
If you instead used inputs.conf to configure an input, you can change the index
directly in that file, as described here.
2. Now when you search, you no longer need to specify an index in your search
command. By default, Splunk software searches the "main" index.
By implementing persistent queues, you can help prevent this from happening.
With persistent queuing, once the in-memory queue is full, the forwarder or
indexer writes the input stream to files on disk. It then processes data from the
queues (in-memory and disk) until it reaches the point when it can again start
processing directly from the data stream.
Note: While persistent queues help prevent data loss if processing gets backed
up, you can still lose data if Splunk software crashes. For example, Splunk
software holds some input data in the in-memory queue as well as in the
persistent queue files. The in-memory data can get lost if a crash occurs.
Similarly, data that is in the parsing or indexing pipeline but that has not yet been
written to disk can get lost in the event of a crash.
Persistent queuing is available for certain types of inputs, but not all. Generally
speaking, it is available for inputs of an ephemeral nature, such as network
inputs, but not for inputs that have their own form of persistence, such as file
monitoring.
TCP
UDP
348
FIFO
Scripted inputs
Windows Event Log inputs
Monitor
Batch
File system change monitor
splunktcp (input from Splunk forwarders)
Inputs do not share queues. You configure a persistent queue in the stanza for
the specific input.
Syntax
To create the persistent queue, specify these two attributes within the particular
input's stanza:
persistentQueueSize = <integer>(KB|MB|GB|TB)
* Max size of the persistent queue file on disk.
* Defaults to 0 (no persistent queue).
Example
[tcp://9994]
persistentQueueSize=100MB
Persistent queue location
The persistent queue has a hardcoded location, which varies according to the
input type.
$SPLUNK_HOME/var/run/splunk/[tcpin|udpin]/pq__<port>
349
Note: There are two underscores in the file name: pq__<port>, not pq_<port>.
For example:
When you add an input to your Splunk deployment, that input gets added relative
to the app you are in. Some apps write input data to a specific index. If you
cannot find data that you are certain is in your Splunk deployment, confirm that
you are looking at the right index. See Retrieve events from indexes in the
Search Manual. You might want to add indexes to the list of default indexes for
the role you are using.
For more information about roles, refer to the topic about roles in the
Securing Splunk Enterprise manual.
For more information about troubleshooting data input issues, read the
rest of this topic or see I can't find my data! in the Troubleshooting Manual.
Note: If you have Splunk Enterprise and add inputs by editing inputs.conf, the
inputs might not be recognized immediately. Splunk Enterprise looks for inputs
every 24 hours, starting from the time it was last restarted, so if you add a new
stanza to monitor a directory or file, it could take up to 24 hours for Splunk
Enterprise to start indexing the contents of that directory or file. To ensure that
350
your input is immediately recognized and indexed, add the input through Splunk
Web or the CLI, or restart Splunk services after making edits to inputs.conf.
You can use the FileStatus Representational State Transfer (REST) endpoint to
get the status of your tailed files. For example:
curl
https://serverhost:8089/services/admin/inputstatus/TailingProcessor:FileStatus
You can also monitor the fishbucket, a subdirectory used to keep track of how
much of a file's contents has been indexed. In Splunk Enterprise deployments,
the fishbucket resides at $SPLUNK_DB/fishbucket/splunk_private_db. In Splunk
Cloud deployments you do not have physical access to this subdirectory.
To monitor the fishbucket, use the REST endpoint. Review the REST API
Reference manual for additional information.
Confirm that the forwarder functions properly and is visible to the indexer. You
can use the Distributed Management Console (DMC) to troubleshoot Splunk
topologies and get to the root of any forwarder issues. Read Monitoring Splunk
Enterprise for details.
351
Line breaking issues
Problem
Indicators that you have line breaking issues include the following:
You have fewer events than you expect and the events are very large,
especially if your events are single-line events.
Line breaking issues are present in the Monitoring Console Data Quality
dashboard.
In the Splunk Web Data Input workflow or in splunkd.log, an error
message like the following.
Diagnosis
To confirm that your Splunk software has line breaking issues, do one or more of
the following:
Search for events. Multiple events combined, or a single event broken into
many, indicates a line breaking issue.
Solution
352
3. Select a file with a sample of your data.
4. Click Next.
5. On the Set Source Type page, work with the options on the left until your
sample data is correctly broken into events. To configure LINE_BREAKER
or TRUNCATE, click Advanced.
6. Complete the data input workflow or record the correct settings and use
them to correct your existing input configurations.
While you are working with the options on the Set Source Type page, the
LINE_BREAKER setting might not be properly set. LINE_BREAKER must have a
capturing group and the group must match the events.
For example, you might have a value of LINE_BREAKER that is not matched
(screenshot called linebreaker_mismatch). Look for messages with "Truncating
line because limit of 10000 bytes has been exceeded" in splunkd.log or in Splunk
Web:
make sure that TRUNCATE is set large enough to contain the entire data
fragment delimited by LINE_BREAKER. The default value for TRUNCATE is
10,000. If your events are larger than the TRUNCATE value, you might want to
increase the value of TRUNCATE. For performance and memory usage reasons,
do not set TRUNCATE to unlimited.
353
See Configure event line breaking.
Problem
Diagnosis
To confirm that your Splunk software has event breaking issues, do one or more
of the following:
For line and event breaking, determine whether this is happening because either
(1) your events are properly recognized but too large for the limits in place
(MAX_EVENTS, which defines the maximum number of lines in an event), or (2)
your events are not properly recognized.
354
If the cause is scenario 1, you can increase limits. But be aware that large events
are not optimal for indexing performance, search performance, and resource
usage. Large events can be costly to search. The upper values of both limits
result in 10,000 characters per line, as defined by TRUNCATE, times 256 lines,
as set by MAX_EVENTS. The combination of those two limits is a very large
event.
If the cause is scenario 2, which is more likely, your Splunk software is not
breaking events as it should. Check the following:
Your event breaking strategy. The default is to break before the date, so if
Splunk software does not extract a time stamp, it does not break the
event. To diagnose and resolve, investigate time stamp extraction. See
How timestamp assignment works.
Your event breaking regex.
Problem
355
Diagnosis
To confirm that you have a time stamping issue, do one or more of the following:
Solution
Make sure that each event has a complete time stamp, including a year,
full date, full time, and a time zone.
See Configure time stamp recognition for additional possible resolution
steps.
356