Skip to main content

Command Palette

Search for a command to run...

Troubleshooting Linux Scenarios – Part 1

Updated
β€’5 min read
Troubleshooting Linux Scenarios – Part 1
P

πŸ‘‹ Hello! I'm passionate about DevOps and have over 1+ years of experience in the field. I'm proficient in a variety of cutting-edge technologies and always motivated to expand my knowledge and skills. Let's connect and grow together!

SKILLS:

πŸ”Ή Languages & Runtimes: Python, Shell Scripting, HCL, YAML πŸ”Ή Cloud Technologies: AWS, Microsoft Azure, GCP πŸ”Ή Infrastructure Tools: Docker, Terraform, AWS CloudFormation πŸ”Ή Other Tools: Linux, Git and GitHub Actions, Jenkins, Jira, GitLab (beginner), Docker, AWS DevOps πŸ”Ή Web Development: HTML, CSS, Bootstrap, Python, SQL

Job & Responsibilities:

πŸš€ Improved development efficiency by implementing CI/CD pipelines, resulting in a 30% reduction in deployment time on the test server. πŸ”’ Strengthened deployment and testing reliability by utilizing Docker containers and optimizing Dockerfile, reducing development issues on the test server by 20%. βš™οΈ Automated S3 bucket log creation with Shell scripting, eliminating 100% of manual search and saving 2 hours per week. πŸ“… Scheduled EC2 instance start/stop using Lambda functions and Event Bridge, leading to a 25% decrease in infrastructure costs. πŸ”§ Utilized AWS, Linux, Python, Docker, Shell scripting, Terraform, Jenkins Pipelines, and automation to streamline workflows and improve overall system performance.

I'm very detail-oriented and possess strong written and verbal communication skills. As a high performer with a possibility mindset, I strive to solve problems using efficient approaches.

Let's Connect & Grow:

If you find my profile suitable for the role you are searching for, please feel free to reach out to me at sumanprasad9766@gmail.com.

Issue 1: Unable to Start a Service

πŸ› οΈ Approach / Solution:

β”œβ”€β”€ Check if the service is installed
β”œβ”€β”€ Verify the service configuration file
β”œβ”€β”€ Check the service status using systemctl or other command
β”œβ”€β”€ Inspect the service logs for any errors
β”œβ”€β”€ Ensure there are no port conflicts
β”œβ”€β”€ Review firewall rules and SELinux settings
β”œβ”€β”€ Restart the service and check for error messages
β”œβ”€β”€ Inspect system resource usage with tools like top or htop
└── ...

Issue 2: High CPU Usage

πŸ› οΈ Approach / Solution:

β”œβ”€β”€ Identify the process causing high CPU usage using top or htop
β”œβ”€β”€ Check if the issue is intermittent or continuous
β”œβ”€β”€ Review logs for any error messages or known issues
β”œβ”€β”€ Inspect running processes and their resource consumption
β”œβ”€β”€ Investigate potential malware or unauthorized processes
β”œβ”€β”€ Consider optimizing or scaling the application
β”œβ”€β”€ Monitor system metrics over time to identify patterns
β”œβ”€β”€ Apply performance tuning based on the specific application
└── ...

Issue 3: Network Connectivity Issues Between Servers

πŸ› οΈ Approach / Solution:

β”œβ”€β”€ Check if other servers on the same network are accessible
β”œβ”€β”€ Verify firewall rules on both source and destination servers
β”œβ”€β”€ Inspect routing tables to ensure correct routes are set
β”œβ”€β”€ Use tools like traceroute or mtr to trace the network path
β”œβ”€β”€ Check for any network hardware failures or misconfigurations
β”œβ”€β”€ Review system logs for network-related errors
β”œβ”€β”€ Test connectivity using tools like telnet or nc
β”œβ”€β”€ Investigate potential DNS or hostname resolution problems
β”œβ”€β”€ Consider network segmentation or VLAN configurations
└── ...

Issue 4: Unable to Mount a Filesystem

πŸ› οΈ Approach / Solution:

bash
β”œβ”€β”€ Check if the filesystem is specified in /etc/fstab
β”œβ”€β”€ Verify the device path and UUID in /etc/fstab
β”œβ”€β”€ Ensure the filesystem type is correct
β”œβ”€β”€ Check for errors in /var/log/messages or dmesg
β”œβ”€β”€ Confirm that the device is accessible and not failing
β”œβ”€β”€ Use the mount command manually to check for errors
β”œβ”€β”€ Investigate if the filesystem needs repair (fsck)
β”œβ”€β”€ Inspect disk space on the target mount point
β”œβ”€β”€ Check for any SELinux or AppArmor restrictions
└── ...

Issue 5: Filesystem corrupted

πŸ› οΈ Approach / Solution:

β”œβ”€β”€ One of the error that causes the system unable to BOOT UP
β”œβ”€β”€ Check /var/log/messages, dmesg, and other log files
β”œβ”€β”€ If we have bad sector logs, we have to run fsck
β”‚ β”œβ”€β”€ True:
β”‚ β”‚ β”œβ”€β”€ Reboot the system into rescue mode by booting it from CDROM by applying ISO
β”‚ β”‚ β”œβ”€β”€ Proceed with option 1, which mounts the original root filesystem under /mnt/sysimage
β”‚ β”‚ β”œβ”€β”€ Edit fstab entries or create a new file with the help of blkid and reboot
└── ...

Issue 6: Can’t cd to the directory even if the user has sudo privileges

πŸ› οΈ Approach / Solution:

β”œβ”€β”€ Reasons and Resolution
β”‚ β”œβ”€β”€ Directory does not exist
β”‚ β”œβ”€β”€ Pathname conflict: relative vs absolute path
β”‚ β”œβ”€β”€ Parent directory permission/ownership
β”‚ β”œβ”€β”€ Doesn't have executable permission on the target directory
β”‚ β”œβ”€β”€ Hidden directory
└── ...

Issue 7: Running Out of Memory

πŸ› οΈ Approach / Solution:

β”œβ”€β”€ Types
β”‚ β”œβ”€β”€ Cache (L1, L2, L3)
β”‚ β”œβ”€β”€ RAM
β”‚ β”‚ β”œβ”€β”€ Usage
β”‚ β”‚ β”‚ β”œβ”€β”€ #free -h
β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€ Total (Total assigned memory)
β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€ Used (Total actual used memory)
β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€ Free (Actual free memory)
β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€ Shared (Shared Memory)
β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€ Buff/Cache (Pages cache memory)
β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€ Available (Memory can be freed)
β”‚ β”‚ β”‚ β”œβ”€β”€ /proc/meminfo
β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€ file active
β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€ file inactive
β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€ anon active
β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€ anon inactive
β”‚ β”œβ”€β”€ Swap (Virtual Memory)
β”œβ”€β”€ Resolution
β”‚ β”œβ”€β”€ Identify the processes that are using high memory using top, htop, ps, etc.
β”‚ β”œβ”€β”€ Check the OOM in logs and also check if there is a memory commitment in sysctl.conf
β”‚ β”œβ”€β”€ Kill or restart the process/service
β”‚ β”œβ”€β”€ Prioritize the process using nice
β”‚ β”œβ”€β”€ Add/Extend the swap space
β”‚ β”œβ”€β”€ Add more physical more RAM
└── ...

Issue 8: Add/Extend the Swap Space

πŸ› οΈ Approach / Solution:

β”œβ”€β”€ Due to running out of memory, we would need to add more swap space
β”‚ β”œβ”€β”€ Create a file with #dd, as it will reserve the blocks of disk for the swap file
β”‚ β”œβ”€β”€ Set permission 600 and give root ownership
β”‚ β”œβ”€β”€ #mkswap
β”‚ β”œβ”€β”€ Now Turned swap on #swapon
β”‚ β”œβ”€β”€ fstab entry for persistence
└── ...

Issue 9: Unable to Run Certain Commands

πŸ› οΈ Approach / Solution:

β”œβ”€β”€ Troubleshooting and Resolution
β”‚ β”œβ”€β”€ command
β”‚ β”‚ β”œβ”€β”€ Could be the system-related command which non-root user does not have the access
β”‚ β”‚ β”œβ”€β”€ Could be the user-defined script/command
β”‚ β”œβ”€β”€ Troubleshooting
β”‚ β”‚ β”œβ”€β”€ permission/ownership of the command/script
β”‚ β”‚ β”œβ”€β”€ sudo permission
β”‚ β”‚ β”œβ”€β”€ absolute/relative path of command/script
β”‚ β”‚ β”œβ”€β”€ not defined in user $PATH variable
β”‚ β”‚ β”œβ”€β”€ command is not installed
β”‚ β”‚ β”œβ”€β”€ command library is missing or deleted
└── ...

Issue 10: System Unexpectedly reboot and process restart?

πŸ› οΈ Approach / Solution:

β”œβ”€β”€ Troubleshooting and Resolution
β”‚ β”œβ”€β”€ System reboot/crash reasons
β”‚ β”‚ β”œβ”€β”€ CPU stress
β”‚ β”‚ β”œβ”€β”€ RAM stress
β”‚ β”‚ β”œβ”€β”€ Kernel fault
β”‚ β”‚ β”œβ”€β”€ Hardware fault
β”‚ β”œβ”€β”€ Process restart
β”‚ β”‚ β”œβ”€β”€ System reboot
β”‚ β”‚ β”œβ”€β”€ Restart itself
β”‚ β”‚ β”œβ”€β”€ Watchdog application
β”‚ β”‚ β”‚ β”œβ”€β”€ To prevent high stress on system resources
β”‚ β”‚ β”‚ β”œβ”€β”€ If the application is causing stress, so it will restart or terminate
β”‚ β”œβ”€β”€ Troubleshooting
β”‚ β”‚ β”œβ”€β”€ After logged in, check the status by using commands like uptime, top, dmesg, journalctl, iostat -xz 1
β”‚ β”‚ β”œβ”€β”€ syslog.log, boot.log, dmesg, messages.log, etc
β”‚ β”‚ β”œβ”€β”€ custom log path of application
β”‚ β”‚ β”œβ”€β”€ if not completely accessible, so take the virtual console like from ILO, IDRAC, etc
β”‚ β”‚ β”œβ”€β”€ open a case and reach out a vendor
└── ...

More from this blog

D

DeployToCloud

405 posts

πŸ‘‹ Welcome to my Hashnode blog! I'm a DevOps Engineer with 2+ years of experience. Join ~5k followers and explore 320+ blogs on Python, AWS, Docker, Jenkins, Linux, and more. Let's connect & grow πŸš€