D&T Notes
D&T Notes
INDEX:
1. Introduction
2. SSH problems
3. User Issues
4. Being Proactive
5. Troubleshooting Boot Issues
6. Identifying Hardware Issues
7. Troubleshooting Storage Issues
8. Troubleshooting RPM Issues
9. Troubleshooting Network Issues
Introduction
This course provides system admins with a tools and techniques they
need to be successfully diagnosis and fix for the issue.
Troubleshooting:
Troubleshooting is the art of taking of problem, gathering info
about it, analyzing it, and finally solving it.
Using Scientific Method:
1. Clearly define the issue
2. Collect Information
3. Form a hypothesis
4. Test the hypothesis
5. Fixing the problem
6. Rinse & Repeat
SSH
Scenario:
Over the weekend, some of your colleagues have been running user
maintenance. This morning one of your users, student calls the desk to
complain about that his login is no longer working on servera.
Step: 1
#lab scientificmethod setup
Step 2:
#ssh student@servera
Connection to server closed, It wont login because we create setup no
to login
Step 3:
#cd /home/student
#vim .bashrc
We are seeing alias function and nothing is changed on the alias function
Step 4:
Next see the entries in the /etc/passwd file for the student login user
Student:x:1000:1000::/home/student:/bin/false
So we have to change the shell for that user
Step 5:
#usermod –s /bin/bash student (or) #chsh –s /bin/bash student
Step 6:
#ssh root@servera
It logged in
Step7:
Com back to workstation machine and check the grade of the lab setup
$logout
#lab scientificmethod grade
#lab scientificmethod reset
FOR CHECKING:
Do this command in server machine
#lastlog –u student{how many times you logging in student user)
If we have two machines servera and serverb we can login one machine
to another machine through the SSH protocol but we change to login to
their own machine.
We need to change in the settings the configuration file named as
/etc/ssh/ssh_config file
WORKSTATION MACHINE:
#vim /etc/ssh/ssh_config
go to last line write this line
hostname localhost
:wq!
Checking:
#ssh root@servera
It logged into the same machine not to login the servera machine.
USER ISSUES
There is the directory /etc/skel which contain some important files
such as .bashrc, .bash_profile and .bash_logout.
As the name say skel is a skeleton directory, which provides
structure for the user.
When a user gets created these files will be copied to the default
home directory of the user.
/etc/skel describes about all user behavior whereas these files(.bash
files) under the default home directory of the user determines the
property of the particular user.
These files are symbolic, executable bash scripts.
.bash_profile is the file which gets executed when a user switch to
his account.
The user behaves based on the .bashrc file
.bash_logout is the file which gets executed when the user logs out
from his account.
Editing .bashrc file:
Adding parameter:-
The user behavior can be structured by editing the
corresponding .bashrc files
Step 1:
#vim /home/student/.bashrc
Step 2:
Go to the end of the file and add the parameter ‘exit’
Step 3:
#source .bashrc
reload the bashrc file
#su – student
It won’t login the student user …
Command Aliasing:
It means that the command can be alias into the single character. We
can reduce the command length and changed it into single or multiple
characters.
Step 1:
Edit the .bashrc file in the user’s home directory
Step 2:
Go top the last line and add the line
alias l=”ls –ltr’
save and quit
Step 3:
#source .bashrc
read the .bashrc file
Step 4:
# ls –ltr or # l
both are showing same output
#systemctl –t help
Available unit types (cmd to list all services controlled by systemd)
1. Service
2. Socket
3. Target
4. Snapshot
5. Device
6. Mount
7. Automount
8. Swap
9. Timer
10.Slice
11.Scope
The important units are service, socket, device, target, swap etc.
To understand more about systemd
#ls –lh /usr/lib/systemd/systemd
Command to take backup the booting process and it saved in the file.
rsyslog Systemd-journal
Persistent Not-persistent
Stored in /var/log directory Stored in /run/log/journal file in
temporary filesystem
Summarized Information Detailed Information
Daemon is rsyslogd Daemon is journal
OWN NOTES:
o Initial create directory in the given path
o Check the permissions of the directory and change into the group
ownership to the systemd-journal
o For saving the files into that directory we have to give special
permissions to that directory
o It will not inherit into that directory, for that we are giving
special permissions (SGID)
o After kill the services and then reboot
Commands:
1. #journalctl –ef
command to view logs dynamically
-e - jump to the end in the pager
-f - follow the journal
2. #journalctl –u sshd.services
-u - units
command to view logs for the particular service
#journalctl _SYSTEMD_UNIT=sshd.service
3. #journalctl –b –l
Only show the messages from the last system boot, this is
used for information about system crash.
-b -boot the system
-l - list the system
7. #journalctl –o verbose
Use verbose output mode, this will show all fields stored
in the journal with their field name and contents.
Scenario:
Your servera machine is running a webserver serving the file
http://servera.lab.exampole.com/test.html. A ticket just came from test
manager that this file is not accessible from the web browser. Investigate
this issue using the log files on the servera machine and then fix the issue.
For testing from the command line on workstation, if you do not want to
open a graphical browser, you can use the command line browser elinks.
Step 2:
Step 3:
If it is shown error first check the firewall of the service
#firewall-cmd –-list-all
It shows that http service is running
Step 4:
It shows no error in the firewall and check the status of the service
#systemctl status httpd.service
Status: running
Step 5:
Step 6:
#cd /var/www/html
#ls -lZ
Shows no permission and SELINUX context is not default
Step 7:
First change the permission of the webpage
#chmod 644 /var/www/html/test.html
Second change the context of the file
#semanage fcontext –a –t http_sys_content_t “/var/www/html/test.html(/.*)?”
Step 8:
Enable & Restart the service
#systemctl enable httpd.service
#systemctl restart httpd.service
Step 9:
Open the web page now
#elinks http://servera.lab.example.com/test.html
SOSREPORT:
This command will collect diagnostic and configuration information
from the RedHat Enterpise Linux System and installed applications.
An archive containing the collected information will be generated in
/var/tmp directory and may be provided to RedHat support
representative.
The archive may be stored locally (or) remotely.
The SOSREPORT doesnot make any changes to the system
configuration.
It collects the information about the system.
By default SOSREPORT using the ‘xz’ compression.
MD5 file is generated for checking the integrity of the compressed
archive file.
Commands:
1. Command to create sosreport with non-interactively
#sosreport –-batch
These are piece of code, which is used to add a feature to the main
application
#sosreport –l
NOTE:
Whenever we generate sosreport it collects the information from
the active plugin.
Active plugin are the plugins, which is already installed and
running on the system.
Installing and running the service can activate inactive plugins.
Whenever we generate the sosreport the newly added plugin
information will be added to the sosreport.
After install a package then enter into the configuration file of the aide,
then go to the directories that you want in database and add the new
directory. Now the new directory also enter into the database.
#vim /etc/aide.conf
Remove the other directories if you don’t want to check and make them
into comment in the configuration file.
After go to the configuration file there you can change the database file
name aide.db.gz . So we change the name of the previous creation one.
#cd /var/lib/aide
#mv aide.db.new.gz aide.db.gz
Command to give changes in the system and display the changes and
different between the before and after changes
#aide --check
*** if we get any error we have to check the database filename and
the change the name of the file to aide.db.gz
We need to edit the configuration file /etc/aide.conf
1.uncomment the 11th line
2.change the database output file name to aide.db.gz
Cockpit:
Cockpit is a free and an open source web based system admin interface
actively developed by Redhat. Using cockpit anyone can easily monitor
manage multiple servers at the same time in the web browsers.
Note:
It is perfectly for new system admin to perform simple task such as
storage admin, user admin, LVM operation, inspecting journals, start and
stop services etc..
Using cockpit we can also perform basic network operation, it also has
journal log viewer to perform troubleshooting and log analysis.
PCP is available from version 6.6, the daemon which is responsible for
pcp is pmcd(performance matrix collection daemon)
Scenario:
On server, restore the boot loader on a bios based machine that is
refusing to boot.
In workstation #lab biosbootbreak setup
Then open server machine
If some reason, the MBR become damage and administrator will have to
reinstall grub into MBR.
The error showing is
Continuing boot loader
Booting from hard disk
Step 5:
Select option 1 to continue
Step 6:
Recue mount
Your system has been mount under /mnt/sysimage. If you would like to
make your system the root environment, run the command
#chroot /mnt/sysimage
#grub2-install /dev/vda (MBR is not present in the partition, It is
present in the first portion of the hard disk)
#exit
#exit
#grub2-mkpasswd-pbkdf2
Enter password:
Re-enter password:
PBKDF2 hash your password is (password is in encryption format)
It generates the password string suitable for use in grub
configuration file.
Go to the directory grub.d in the etc directory
#cd /etc/grub.d
#ls
#vim 10_Linux
Go to the end of line
Write this entry
cat << EOF
set superusers=”root”
password_pbkdf2 root paste the password before we created there.
EOF
#grub2-mkconfig –o /boot/grub2/grub.cfg
#reboot
Note:
1. Pbkdf2- password based key derivation function2
2. It is a part of RSA public key cryptography
3. After reboot your machine Press E button to edit the grub menu to
change the root user password and then it will ask the username and
password that we create before and we write the password in the
grub.d directory file
If we want to know about what type of hard disk and which version is
using to list scsi devices
#lsscsi –v
#lsscsi –V
Rasdaemon:
Rasdaemon is more modern replacement for mcelogs that hooks into the
kernel trace sub system.
Ras stands for Reliability Awarability and Serveribility
#yum install –y rasdaemon.x86_64
#systemctl enable rasdaemon
#systemctl restart rasdaemon
#journalctl –u rasdaemon.service
Scenario:
One of your jr.admin has provided you with a test dump of systemd-
journal file from a machine, there has been experiencing strange random
errors. He has ask you to inspect this journal file for any possible signs of
hardware failure.
Since machine kept experiencing, these issues even after a complete
wipe and reinstall the machine
The log dump location is /root/logdump
#lab hardwareissues setup
#vim /root/logdump
#grep –n mcelog /root/logdump
#yum install –y memtest*
#memtest-setup
#grub2-mkconfig -o /boot/grub2/grub.cfg
Identifying Storage Issues:
A fellow admin was ask to configure the server aid to use the iqn
iqn.2017-01.com.example.lab:ISCSI storage
Scenario:
The target is configured on workstation the admin has reported that he can
discover the target but he is unable to login successfully. Troubleshoot and
identify the issue so that the initiator can successfully login to the target
The journal is stored in the separate part in the particular, which is unable
to access by the user.
1. Command to check the filesystem
#e2fsck -n /dev/vdb1
0 NO error
1 File system error
8 Operational error
16 Usage error
32 Concede by user
Commands:
1. Check the file system
#xfs_repair -n /dev/vdb1
lost+found directory:
During file system check, it is possible to detect files and directories
but are un referred by their parent directories (corrupted file) so
called orphan file.
These files are deposited in the ‘lost+found’ directory.
If the files are missing after the file system check, check this
directory.
Scenario:
/dev/vdb on servera contains XFS file system, which holds the content
of etc directory from another system.
Check the integrity of XFS file system on /dev/vdb1
Repair any file system inconsistency found and mount the file system
/mnt/etc_restore
The file system repair deposit any filesto the lost+found directory
Use the backup file /root/etc.tgz to determine the proper location and
name to restore orphan file once validated
Restore the archive file back to its proper location
Step 1:
First check the mount point
#df –h
Step 2:
The un mount the file system
#umount /mnt/etc_restore
Step 3:
Check the file system what type is mount and then repair
#xfs_repair –n /dev/vdb1
#xfs_repair –f /dev/vdb1
We are getting error at 144973(subscription-manager)
Step 4:
Them remount the file system
#mount /dev/vdb1 /mnt/etc_restore
#df -h
Step 5:
Go to that directory
#cd /mnt/etc_restore
#ls
#cd lost+found
145282 the file is missing in the file system
Step 6:
We have to check the mount directory because it is the inode number
Step 7:
We have backup of etc directory in root directory in zip format
#tar –xf /root/etc.tgz -C /media
(we extract into the media directory)
Step 8:
Now we can check the files difference of the two directories what flies
are lost
#diff –s 145282 /media/etc/security/console.apps/subscription-
manager
Files both are identical..
Step 9:
Then rename the file to subscription-manager
#mv 145282 subscription-manager
Step 10:
#ls /mnt/etc_restore/lost+found
subscription manager
Step 11:
#mv subscription-manager /mnt/etc_restore/etc/security/console.apps
It moved to the error location
#ls
Step 12:
#lab xfs grade
Scenario:
A request was recently receive for additional 20MB of storage to be
allocated for the use by directory /mnt/lvm on servera. After the request
was completed the user reported that the directory is no longer accessible.
The user thinks that the critical file was accidently deleted by the admin
who fulfill the request. The problem has been escalated and viewer asks to
investigate and identify the root cause of the problem. Once root cause is
completed attempt to restore the system to proper working condition.
In workstation
#lab lvm setup
It will show error it didn’t mount the file system after partition resizes the
data
Step 1:
#df –h
#mount –a
mount: /dev/mapper/vg00-lvolo : cant read superblock
Step 2:
#cat /etc/fstab or #vim /etc/fstab
/dev/vg00/lvolo /mnt/lvm xfs defaults 1 0
Now we can identify that the metadata of the lvm is missing, so we can
recover through the backup of metadata
Step 3:
After the metadata is missing, we can recover from the archive directory
#cd /etc/lvm/archive
#ls
vg00_ -- same files is available but we don’t know which one is the
metadata file is actually missing…
Step 4:
#vgcfgrestore –l vg00
The above command list all the volume group metadata files
Step 5:
#vgcfgrestore –f /etc/lvm/archive/vg00_00002-400393107.vg vg00
Restore volume group vg00
Step 7:
#lvextend -L +20M /dev/vg00/lvolo
#xfs_growfs /dev/vgoo/lvolo
#df –h
Step 8:
#lab lvm grade
Yum fails to install the specific version that requires a specific version of
another version and a different in compactable version of the required
package is already installed on the system.
“YUM” has a plugin that allows an admin to lock down a package or
group of packages to a specific version.
SCENARIO:
The security team has updated the system requirements for host in the
data center part of the update requires a custom package “rht-main” on
servera machine without success you have been called to resolve the
problem and get the package install.
In workstation
#lab package-dependencies setup
In servera machine
Step 1:
#yum install -y rht-main
Step 3:
#yum list yum-plugin*
Step 4:
#yum install –y yum-plugin-versionlock.noarch
we have to install that plugin package
Step 5:
#yum versionlock add rht-prereq
Now we have to add the latest plugin to the version lock for doing the
work, it will lock the latest version
Step 6:
#yum versionlock list
It will list all the versionlock packages
Step 7:
#yum update rht-prereq
To verify the update package
Step 8:
#yum install - y rht-main
Step 9:
Go to workstation run this command
#lab package-dependencies grade
If there is any change of attribute then only the options will be displayed
M Mode (permissions)
5 Changing context
c Configuration file
d Documentation file
r Readme file
l License file
In an rpm package, it consists of different type of file c, d, r, l
Not only by using rpm but also using ’yum’, it is possible to verify the
packages. Yum uses rpm database for performing these tasks.
If there is any change in the size of the package and content of the
package, it is not possible to restore for using above two commands. So
we have to use these commands
#yum reinstall –y <package name>
Scenario:
Reports are coming in that some of the command in servera is broken.
Users cannot list files with ls command and sudo command behaving
strangely. You have been called into diagnose and correct the problem.
Step 1:
#lab broken-commands setup
Step 2:
Open servera machine
#ls
$: permission denied : ls
Step 3:
#which ls
/usr/bin/ls
Step 4:
#rpm -qf /usr/bin/ls
coreutils
Step 5:
#rpm –V coreutils
Step 6:
#rpm –setperms coreutils
Step 7:
#ls
files are shown
SUDO:
Step 1:
#su – student
Step 2:
#sudo fdisk –l
/bin/sudo:- permission denied
Step 3:
#logout
#rpm –qf /bin/sudo
sudo
Step 4:
#rpm -V sudo
two errors
Step 5:
#yum reinstall -y sudo
Step 6:
#su – student
#sudo fdisk -l
Scenario:
Someone on the team install a webserver on servera machine
server.lab.example.com, instead of displaying the contents the following
error messages displaced when its access, failed to connect.
Step 1:
#lab package-issues setup
Step 2:
#ssh root@servera
#yum install –y elinks.x86_64
Step 3:
Now open the webpage using elinks
#elinks http://servera.lab.example.com
Network is unreachable
Step 4:
Since it is showing the network issue. First check the firewall settings.
#firewall-cmd --list-all
Step 5:
Since in the firewall list everything is perfect. So next option go and
check the status of the service
#systemctl status httpd.service
Step 6:
Now check the logs of the service
#journalctl -u httpd.service
It will show errors for the service in the permissions
Step 7:
Now check the binary file and what package is responsible for that
#rpm -qf /usr/bin/http
httpd (package name)
Step 8:
Check the errors using verify option in the rpm commands
#rpm –V httpd
Step 9:
Reinstall the package we can recover the permissions
#yum reinstall -y httpd
Step 10:
Restart and enable the service
#systemctl enable httpd.service
#systemctl restart httpd.service
Step 11;
Now open the webpage using elinks
#elinks http://servera.lab.example.com
-I outer interface
b broadcast
n display host information numerically
i interval specific echo request in seconds
I Interface (echo request out interface)
c count (no. of echo request)
w deadline
W Time out before quitting
Network Mapper(or)nmap:
It is an open source port scanner there is provided by RHEL.
It is a tool that administrator used to rapidly scan large network it can
also do more intensive scan on individual host
nmap uses raw ip packets to determine what host are available on the
network. What services are host are offering.
what OS they are running, what type of packet filters firewall are in used
#nmap ip/24
(172.25.250.254/24)
command to display nmap report of a network
Servera Serverb
Iptraf-ng:
Iptraf is a open source network monitor software to monitor
the network traffic.
Iptraf-ng launches the application
Step 1:
#lab network-fix setup
Step 2:
#ifconfig
#cd /etc/sysconfig/network-scripts
#ls
It is showing ifcfg-enp2s0, it is not showing eth1
In that file, the ip address will be same as eth1 we are changing the name of
the file.
Step 3:
#mv ifcfg-enp2s0 ifcfg-eth1
Step 4:
#ls
#nmcli connection reload
#ifconfig
No error getting output perfectly
Second Error:
Step 1:
Check this command in servera
#ping serverb.lab.example.com (unknown host)
Step 2:
#cat /etc/resolv.conf
It is showing 4 name servers
This file won’t work more than 3 name servers
Step 3:
#nmcli connection show “System eth0”
Step 4:
#nmcli connection modify “System eth0” ipv4.dns “172.25.250.254”
Step 5:
#nmcli connection sown “System eth0” ; nmcli connection up “System
eth0”
Step 6:
#ping serverb.lab.example.com
#ping6 serverb.lab.example.com
#exit
Step 7:
#lab network-fix grade
SELinux Logging
When SELinux blocks action from happening this action is logged using
“audit” daemon. The audit logs can be view /var/log/audit/audit.log
It is also used possible to search for exact messages that they are interested
in
#ausearch –m avc –ts recent
-m message
avc access vector control
-ts timestamp
#ausearch –m avc –ts today
sealert –a /var/log/audit/audit.log
-a analyze
The above command is used for to see the file and what is the denials that
we can readable.
Booleans acts as a switch that changes the behavior of the SELinux policy.
SELinux Booleans are the rules that can be enabled (or) disabled
#getsebool –a
It will show all the polices whether the switch is ON/OFF
SELinux not commonly controls the files and directories but also it can
control the ports called port labeling.
Scenario:
One of your co-worker recently perform some emergency maintenance and
troubleshooting on your servera machine while the original problem is
solved /var/log/audit/audit.log file on servera is now going rapidly. This
was spotted because the chronyd daemon now fails to start contrary to your
companies policy co-worker didn’t document any of this steps perform,
Investigate and fix the issues.
Step 1:
#lab selinuxts setup
Step 2:
See the selinux denials
#vim /var /log/audit/audit.log
Step 3:
#ausearch –m avc –ts recent
It will show the recent log files denials of selinux
Step 4:
Install the selinux troubleshoot package
#yum install –y setroubleshoot-server.x86_64
Step 5:
Show the readable denials of the log files, It will show the error named
unlabeled_t context
#sealert –a /var/log/audit/audit.log
Step 6:
#touch /.autorelabel
Relabel the entire files in the system
Step 7:
#reboot
#lab selinuxts grade
Step 1:
Reboot you machine
Step 2:
Press ‘e’ button to edit the kernel
Step 3:
Go to end of the line starting with linux_16 and add rd.break console=tty1
Step 4:
Press ctrl+x to go to the next step
Step 5:
#mount –o rw,remount /sysroot
Step 6:
# chroot /sysroot
Step 7:
#echo “redhat” | passwd --stdin root
Step 8:
#load-policy -i
It will load the policy
Step 9:
#restorecon -RFv /etc/
It will restore settings only that context file.
restorecon reset /etc/shadow context unlabeled_t to the shadow_t context
Step 10:
#exit
#exit
Enable the port 514 for tcp flowing from the clients
Next, go to the section rules
Step 2:
Write template for making central log host
$template DynamicFile,
“/var/log/loghost/%HOSTNAME%/%syslogfacilty-text%.log”
*.* -?DynamicFile
Step 3:
#vim /etc/logrotate.d/syslog
add
/var/log/loghost/*/*.log
It will rotate the logs each time whatever enters in that directory
Step 4:
#systemctl restart rsyslog
#systemctl enable rsyslog
Step 5:
#firewall-cmd --permanent --add-port=514/tcp
#firewall-cmd --reload
#cd /var/log/loghost
go to rules section
*.* @@servera.lab.example.com:514
for checking
#logger –p user.info “text message from server”
To add logs to the system
#logger -p authpriv.crit “”message from server”
go to server side:
#cd /var/log/loghost
#ls
servera serverb
#cd serverb
#ls
authpriv.log daemon.log syslog.log user.log
D& T Completed
************************