lab statistical processing

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

Statistical Processing Lab Exercises

Overview
Welcome to the Splunk Education lab environment. These lab exercises will test your knowledge of common
transforming commands, modifying results with the eval command and formatting data.

Scenario
You will use data from the international video game company, Buttercup Games. A list of source types is
provided below.

NOTE: This is a lab environment driven by data generators with obvious limitations. This is not a
production environment. Screenshots approximate what you should see, not the exact output.

Index Type Sourcetype Interesting Fields


web Online sales access_combined action, bytes, categoryId, clientip, itemId,
JSESSIONID, price, productId, product_name,
referer, referer_domain, sale_price, status,
user, useragent

security Badge reader history_access Address_Description, Department, Device, Email,


Event_Description, First_Name, last_Name, Rfid,
Username

Active Directory winauthentication_security LogName, SourceName, EventCode, EventType, User

Web server linux_secure action, app, dest, process, src_ip, src_port,


user, vendor_action

sales Retail sales vendor_sales categoryId, product_name, productId,


sale_price, Vendor, VendorCity, VendorCountry,
VendorID, VendorStateProvince

network Web security cisco_wsa_squid action, cs_method, cs_mime_type, cs_url,


appliance data cs_username, sc_bytes, sc_http_status,
sc_result_code, severity, src_ip, status, url,
usage, x_mcafee_virus_name, x_wbrs_score,
x_webcat_code_abbr

Firewall data cisco_firewall bcg_ip, dept, Duration, fname, IP, lname,


location, rfid, splunk_role, splunk_server,
Username
games Game logs SimCubeBeta date_hour, date_mday, date_minute, date_month,
date_second, data_wday, data_year, date_zone,
eventtype, index, linecount, punct,
splunk_server, timeendpos, timestartpos

© 2023 Splunk Inc. All rights reserved. Statistical Processing 22 September 2023 1
Common Commands and Functions
These commands and statistical functions are commonly used in searches but may not have been explicitly
discussed in the course. Please use this table for quick reference. Click on the hyperlinked SPL to be taken to
the Search Manual for that command or function.
SPL Type Description Example

Sorts results in Sort the first 100 src_ip values in descending order
descending or ascending
sort command
order by a specified field.
| sort 100 -src_ip
Can limit results to a
specific number.
Return events with a count value greater than 30
Filters search results
where command
using eval-expressions.
| where count > 30
Rename SESSIONID to 'The session ID'
Renames one or
rename command
more fields.
| rename SESSIONID as "The session ID"

Remove the host field from the results


Keeps (+) or removes (-)
fields command
fields from search
results.
| fields - host

Calculate the total sales, i.e. the sum of price values


Calculates aggregate
stats command
statistics over the
results set.
| stats sum(price)

Concatenate first_name and last_name values with a


Calculates an expression space to create a field called "full_name"
eval command and puts the resulting
value into a new or
existing field.
| eval full_name=first_name." ".last_name

Output vendorCountry, vendor, and sales values to


table command Returns a table. a table
| table vendorCountry, vendor, sales

Returns the sum of the


Calculate the sum of the bytes field
statistical values of a field. Can be
sum() function used with stats,
timechart, and chart
| stats sum(bytes)
commands.
Count all events as "events" and count all events that
Returns the number of contain a value for action as "action"
occurrences of all events
count or statistical
or a specific field. Can | stats count as events,
count() function
be used with stats, count(action) as action
timechart, and chart
commands.

Refer to the Search Reference Manual for a full list of commands and functions.
© 2023 Splunk Inc. All rights reserved. Statistical Processing 22 September 2023 2
Lab Exercise 1 – Transforming Data
Description
Configure the lab environment user account. Then, transform data using the chart, timechart, top, and
stats commands.

Steps
Task 1: Log into Splunk and change the account name and time zone.

Set up your lab environment to fit your time zone. This also allows the
instructor to track your progress and assist you if necessary.
1. Log into your Splunk lab environment using the username and
password provided to you.
2. You may see a pop-up window welcoming you to the lab environment.
You can click Continue to Tour but this is not required. Click Skip to
dismiss the window.
3. Click on the username you logged in with (at the top of the screen) and
then choose Account Settings from the drop-down menu.
After you complete step 6,
4. In the Full name box, enter your first and last name.
you will see your name in
5. Click Save. the web interface.
6. Reload your browser to reflect the recent changes to the interface.
(This area of the web interface will be referred to as user name.)

NOTE: Sometimes there can be delays in executing an action like saving in the UI or returning results
of a search. If you are experiencing a delay, please allow the UI a few minutes to execute
your action.

7. Navigate to user name > Preferences.


8. Choose your local time zone from the Time zone drop-down menu.
9. Click Apply.
10. (Optional) Navigate to user name > Preferences > SPL Editor > Search auto-format and click on the
toggle to activate auto-formatting. Then click Apply. When the pipe character is used in search, the SPL
Editor will begin the pipe on a new line.

Search auto-format disabled (default)

Search auto-format enabled

© 2023 Splunk Inc. All rights reserved. Statistical Processing 22 September 2023 3
Scenario: The Network team wants to add a dashboard panel that displays internet usage over the last
24 hours.

Task 2: Complete a search with the timechart command to create a multi-series visualization.

11. In the top left corner of Splunk Web, select Apps > Search & Reporting. This sets our app context to the
search app.
12. Count usage events from the web security appliance data by completing the <missing> portion of the
search with the timechart command. Run the search over the Last 24 hours.

index=network sourcetype=cisco_wsa_squid
| <missing>

13. Visualize results as a Line Chart.

14. Save your search as a report with the name L1S1.


a. Click Save As > Report
b. For Title, enter L1S1.
c. Save.
d. You can View your report or exit out of the Your Report Has Been Created window by clicking
the X in the upper-right corner.
e. You can access your saved reports using the Reports tab in the application bar.

© 2023 Splunk Inc. All rights reserved. Statistical Processing 22 September 2023 4
Your recently saved L1S1 report will be visible in the Reports tab.

Scenario: Security wants to add a dashboard panel that displays the top 10 IPs associated with
"Accepted" and "Failed" events on the web server.

Task 3: Complete a search with the chart command to create a multi-series visualization.

15. Re-initialize the search window by clicking Search in the application bar. This step should be done every
time you save a report so that you do not accidentally overwrite a previous report.
16. Complete the <missing> portion of the search with the chart command so that the output displays a
count of events for each vendor_action value by src_ip. Run the search over the Last 24 hours.

index=security sourcetype=linux_secure vendor_action!="session opened"


| <missing>

17. Revise the chart command so that only the top 10 src_ip values are shown and there is no
OTHER column.

18. Navigate to the Visualization tab and view your results as a Column Chart.

© 2023 Splunk Inc. All rights reserved. Statistical Processing 22 September 2023 5
19. Save your search as a report with the name L1S2.

Scenario: Sales and Marketing want to know the two most popular referrer domains our website users
are coming from.

Task 4: Use the top command to identify which domains website visitors are using.

20. This search finds all events from online sales data. However, Sales is only interested in external domains.
Edit this search so that only events where the referer_domain, i.e. the domain of the website that a
visitor clicked on that led to the http request for a specific product, is not
http://www.buttercupgames.com. Run the search over the Last 30 days.

index=web sourcetype=access_combined

The values of referer_domain before


removing http://www.buttercupgames.com

© 2023 Splunk Inc. All rights reserved. Statistical Processing 22 September 2023 6
The values of referer_domain after
removing http://www.buttercupgames.com

21. Use the top command to generate a table that shows the top 2 referer_domain values, the number of
events associated with each of these values, and a percentage of events where these values occur.

22. Edit your search to remove the percent column.

NOTE: Step 23 is optional and can be skipped. Continue to step 24 to save your search as a report.

23. Visualize your results as a Pie Chart.

© 2023 Splunk Inc. All rights reserved. Statistical Processing 22 September 2023 7
24. Save your search as a report with the name L1S3.

Scenario: Facilities needs to know how many people are accessing the Buttercup Games offices daily.

Task 5: Use the stats command to count badge swipes at the Buttercup Games offices in San
Francisco, Boston, and London.

25. Search the badge reader data (index=security sourcetype=history_access) over the Last 24 hours.

26. Investigate the data and the fields in the Interesting Fields list. Find the field that contains the office
location values, e.g. "San Francisco", "Boston", and "London." Use the stats command to count events
by this field.

27. Revise your stats command to display a distinct count of Username values by office location.

28. Rename the count field to "Badged-in Employees."

© 2023 Splunk Inc. All rights reserved. Statistical Processing 22 September 2023 8
29. Visualize your results as a Bar Chart.

30. Save your search as a report with the name L1S4.

OPTIONAL TASK 1: Security wants to identify the types of content employees are viewing while on the
network. Specifically, they want to know the rare content types as these can potentially be malicious.
Use the rare command to identify uncommon content types employees are accessing while on the
internal network.

31. Search security web appliance events (index=network sourcetype=cisco_wsa_squid) and find the 3
most uncommon cs_mime_type, i.e. media type, values. Run your search over the Last 24 hours.

32. Save your search as a report with the name L1X1.

OPTIONAL TASK 2: Sales wants to know the 5 best-selling products for North American vendors over
the previous week. Complete a search with the chart command to create a multi-series visualization.

33. Complete the <missing> portion of this search with the chart command so that the output displays a
count of events for each VendorCountry. Run the search over the Previous week. (Note: The basic
search contains VendorID<4000 because in our environment the VendorIDs for North American countries
are 1000 – 2999 for USA and 3000 – 3999 for Canada.)

index=sales sourcetype=vendor_sales VendorID<4000


| <missing>

© 2023 Splunk Inc. All rights reserved. Statistical Processing 22 September 2023 9
34. Split your data by product_name to see a count of each product sold in USA and Canada.

35. Finally, edit the chart command so that only the top 5 best-selling products are displayed without an
OTHER category.

36. Visualize your results as a Column Chart. Use the Format tab to add custom X and Y-axis labels:
a. X-Axis > Title: Choose Custom and then enter: North American Countries
b. Y-Axis > Title: Choose Custom and then enter: Volume

37. Save your search as a report with the name L1X2.

© 2023 Splunk Inc. All rights reserved. Statistical Processing 22 September 2023 10
Lab Exercise 2 – Manipulating Data with eval Command
Description
Use eval functions to manipulate search results.

Steps
Scenario: Sales wants to know the total events, average price, and total price for each action
performed by visitors to the online store during the previous week.

Task 1: Use the stats command and the eval command to transform and manipulate event data.

1. Create a search that performs the following calculations and modifications on online sales data
(index=web sourcetype=access_combined) over the Previous week. Your search, including the basic
search, should be between 4 and 9 lines long.
a. Calculate the total events by action.
b. Calculate the average price and sum of price by each action.
c. Rename the count, average, and sum fields as "Total Events", "Average Price", and "Total
Amount", respectively.
d. Round Total Amount and Average Price values to two decimal places.
e. Sort Total Amount in descending order.

2. Save your search as a report with the name L2S1.

Scenario: Networking wants to know the daily volume (in MB) handled by all Buttercup Games online
sales servers over the previous week.

Task 2: Chart daily volume with timechart and use eval to convert bytes to megabytes.

3. Search online sales data (index=web sourcetype=access_combined) over the Previous week. Find the
numeric field that represents how many bytes were transferred during each http request, i.e. each event.

4. Use the timechart command with the sum function to calculate the total bytes consumed each day. Use
the as clause to name this calculation "bytes."

© 2023 Splunk Inc. All rights reserved. Statistical Processing 22 September 2023 11
5. Use the eval command to:
a. Create a new field called "megabytes."
b. Convert bytes to megabytes with the calculation: bytes/(1024*1024).
c. Round the result of this calculation to 2 decimal places.

6. Rewrite your search so that your eval command uses the round and pow functions to convert
bytes to megabytes

7. Remove the bytes field.

8. Visualize your results as a Line Chart and rename the X-axis to "Day."

© 2023 Splunk Inc. All rights reserved. Statistical Processing 22 September 2023 12
9. Save your search as a report with the name L2S2.

OPTIONAL TASK: Networking wants to know the total number of GET and POST requests and the ratio
of GET to POST requests for each web server over the last 4 hours. Edit the search to round the values
of Ratio.

10. Edit this search so that the values of Ratio are rounded to two decimal places. Run the modified search
over the Last 4 hours.

index=web sourcetype=access_combined
| chart count over host by method
| eval Ratio = GET/POST

Before modifying the search.

After modifying the search.


11. Save your search as a report with the name L2X1.

CHALLENGE: The Sim Cubicle Beta team needs help randomly generating phone numbers for
characters in the game. Use the random function to generate fake phone numbers for players.

© 2023 Splunk Inc. All rights reserved. Statistical Processing 22 September 2023 13
12. This search looks for all events from the beta phase of the new upcoming game, Sim Cubicle. Then, the
dedup command removes any duplicate values for CharacterName. Complete the <missing> portion of
this search so that random phone numbers are generated for each CharacterName. The phone numbers
should be in the format 555-xxxx where the last 4 digits contain any number from 0 to 9. Run the search
over All Time.

index=games sourcetype=SimCubeBeta
| dedup CharacterName
| eval phoneNumber = <missing>
| table CharacterName phoneNumber

13. Save your search as a report with the name L2X2.

© 2023 Splunk Inc. All rights reserved. Statistical Processing 22 September 2023 14
Lab Exercise 3 – Formatting Data
Description
Format search results with sort and rename commands and use your knowledge from the previous 2 lab
exercises to fulfill scenario requests.

Steps
Scenario: Sales wants to know which one-hour intervals over the last 24 hours have Buttercup Games
online sales been twice as profitable as sales in retail stores.

Task 1: Use timechart and sort commands to create a report that shows the hours where web sales
were twice as much as retail sales, sorted in descending order.

1. The provided search pulls successful purchase events from the online sales data (index=web
sourcetype=access_combined action=purchase status=200) and all recorded sales entries from the
retail sales data (index=sales sourcetype=vendor_sales.) Calculate the sum of price values from
these events, grouped into one-hour increments, and split by index. Run you search over the Last
24 hours.

(index=web sourcetype=access_combined action=purchase status=200) OR (index=sales


sourcetype=vendor_sales)

2. Use the where command to only keep events where the web sales values are more than twice as much as
retail sales values:
| where web > sales*2

© 2023 Splunk Inc. All rights reserved. Statistical Processing 22 September 2023 15
3. Sort results in descending order based on the web sales values.

4. Save your search as a report with the name L3S1.

Scenario: Sales wants to know which products had online sales of more than $15,000 during the
last 30 days.

Task 2: Fulfill the scenario request using stats, eval, sort, and rename commands.

5. Search for all successful purchase events from the online sales data (index=web
sourcetype=access_combined status=200 action=purchase) over the Last 30 days.
6. Calculate the sum of price values as "sales" by each product_name.

© 2023 Splunk Inc. All rights reserved. Statistical Processing 22 September 2023 16
7. Limit results to products with over $15,000 in sales by using the where command. Refer to the Common
Commands and Functions page at the beginning of this document for the where command syntax.

8. Round sales values to the nearest whole number.

9. Sort values in descending order and rename product_name as "Best Sellers" and sales as
"Total Revenue."

10. Save your search as a report with the name L3S2.

OPTIONAL TASK: ITOps wants to see the two most common status codes for each of the web servers.
Use the top command to identify common status codes by web server.

11. Search online sales data (index=web sourcetype=access_combined) and find the top 2 status code
values during the Last 24 hours.

© 2023 Splunk Inc. All rights reserved. Statistical Processing 22 September 2023 17
12. Edit your search so that results are split by the web server host values. Your results should display the
number of events and the percentage of events that the top 2 status code values appear in for each
web server.

13. Remove the count field with the fields command.

14. Sort results in ascending order by host and in descending order by percent.

15. Save your search as a report with the name L3X.

© 2023 Splunk Inc. All rights reserved. Statistical Processing 22 September 2023 18

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy