ansible_qna
ansible_qna
If an Ansible playbook fails, the first thing I do is check the error message to identify the root
cause. I use the -v, -vv, or -vvv flags for detailed logs. If the issue is with a particular task, I use --
start-at-task to resume execution from that point instead of rerunning everything. If the playbook
modifies critical configurations, I ensure idempotency so it can be safely re-run without causing
issues.
I use Ansible Vault to encrypt sensitive data such as passwords, API keys, and SSH keys. I create
an encrypted file using:
vars_files:
- secrets.yml
If multiple users need access, I share the vault password securely or use Ansible AWX/Tower
for centralized credential management.
For AWS, I use the AWS EC2 inventory plugin. I configure aws_ec2.yml and retrieve dynamic
hosts with:
For other clouds like Azure or GCP, I use their respective dynamic inventory plugins. This
ensures that new instances are automatically included without manual updates.
Alternatively, I define conditions in the playbook using when statements, so tasks only run when
certain conditions are met.
5. How do you check which hosts are reachable before running a playbook?
Before executing any playbook, I use the ping module to verify connectivity:
If a host is unreachable, I check the SSH configuration, firewall settings, and network
connectivity.
For package installations, I ensure the previous version is available and use the state: absent
option to remove faulty installations.
For complex rollbacks, I maintain version-controlled infrastructure using Git and Ansible roles.
inventories/
├── dev/hosts
├── staging/hosts
├── prod/hosts
If a specific module fails, I manually run the corresponding command on the remote machine to
check for syntax or permission issues.
crontab -e
Then, I add:
0 2 * * * ansible-playbook /path/to/playbook.yml
For better control, I prefer AWX/Tower, which provides a web UI for scheduling and
monitoring.
Example:
12. How do you test Ansible playbooks before applying them to production?
13. How do you use Ansible to update a fleet of servers without downtime?
I use rolling updates with serial:
This ensures only two servers restart at a time, preventing complete downtime.
inventory/
├── host_vars/
│ ├── web1.yml
│ ├── db1.yml
├── group_vars/
│ ├── webservers.yml
│ ├── dbservers.yml
Inside web1.yml:
nginx_port: 8080
Then, in my playbook:
1. Use Forking
Increase parallel execution:
2. ansible-playbook -f 10 myplaybook.yml
3. Enable Pipelining
In ansible.cfg:
4. [ssh_connection]
5. pipelining = True
6. Use Fact Caching
Store facts to reduce SSH overhead:
7. [defaults]
8. fact_caching = jsonfile
9. fact_caching_connection = /tmp/ansible_facts
10. Use Async Tasks
For long-running tasks, use:
11. - name: Run task in the background
12. shell: sleep 300
13. async: 300
14. poll: 0
Great! Here are 15 real-world Ansible work scenario questions, with answers in the first
person to help you handle interviews like a pro.
1. You need to quickly check if all servers in your inventory are reachable. What do
you do?
If a server fails to respond, I check SSH access, ensure the host is online, and verify my inventory
file.
First, I check the Ansible logs and output to identify which task caused the issue. Then, I verify
the affected servers using:
If needed, I roll back changes by running a corrective playbook or manually restoring the
service.
3. How do you install software on multiple servers using Ansible?
I write a playbook using the apt or yum module depending on the OS. Example:
---
- name: Install Nginx
hosts: webservers
become: yes
tasks:
- name: Install Nginx
apt:
name: nginx
state: present
Then, I run:
ansible-playbook install_nginx.yml
4. Your SSH connection to a remote server is failing in Ansible. What do you check?
ssh user@remote_host
If it fails, I check:
5. How do you ensure that an Ansible playbook only runs on specific OS types?
6. You need to copy a configuration file to multiple servers. How do you do it?
This simulates the execution without making actual changes. I also use:
ansible-lint deploy.yml
9. You need to restart a service only if a configuration file changes. How do you do it?
handlers:
- name: Restart Nginx
service:
name: nginx
state: restarted
The notify keyword ensures the restart only happens if the file changes.
10. How do you fetch logs from multiple servers using Ansible?
11. How do you execute a command on multiple servers but ignore failures?
13. You need to execute different tasks based on hostname. How do you do it?
15. How do you handle package installation for different Linux distributions?
Here are 15 real-world Ansible Inventory Management scenarios, answered in first person, as
if I were in an interview.
I structure my inventory using a directory-based approach. I create separate inventory files for
each environment:
inventories/
├── dev/
│ ├── hosts.yml
│ ├── group_vars/
│ │ ├── all.yml
│ │ ├── web.yml
│ │ ├── db.yml
├── staging/
│ ├── hosts.yml
│ ├── group_vars/
│ │ ├── all.yml
│ │ ├── web.yml
│ │ ├── db.yml
├── production/
│ ├── hosts.yml
│ ├── group_vars/
│ │ ├── all.yml
│ │ ├── web.yml
│ │ ├── db.yml
Each environment has its own inventory file and group variables to customize configurations. I
can switch environments using -i inventories/dev/hosts.yml.
2. How do you handle different SSH users for different groups in inventory?
[web]
web1 ansible_host=192.168.1.10
web2 ansible_host=192.168.1.11
[database]
db1 ansible_host=192.168.1.20
db2 ansible_host=192.168.1.21
[web:vars]
ansible_user=webadmin
[database:vars]
ansible_user=dbadmin
Now, Ansible will automatically use webadmin for web servers and dbadmin for database servers.
3. How do you filter and run a playbook only on hosts with a specific variable?
I use host patterns and filters. If I need to run a playbook only on servers where
app_role=backend, I use:
4. How do you deal with a scenario where some hosts have different package
managers?
[all:vars]
ansible_python_interpreter=/usr/bin/python3
[debian_servers]
server1 ansible_pkg_mgr=apt
server2 ansible_pkg_mgr=apt
[redhat_servers]
server3 ansible_pkg_mgr=yum
server4 ansible_pkg_mgr=yum
plugin: amazon.aws.aws_ec2
regions:
- us-east-1
filters:
instance-state-name: running
keyed_groups:
- key: tags.Environment
prefix: env_
Then, I run:
import json
inventory = {
"web": {
"hosts": ["web1", "web2"]
},
"_meta": {
"hostvars": {
"web1": {"ansible_host": "192.168.1.10"},
"web2": {"ansible_host": "192.168.1.11"}
}
}
}
print(json.dumps(inventory))
I use:
This outputs the full inventory in JSON format. If I want a specific host’s details:
9. How do you organize inventory when working with multiple cloud providers?
inventories/
├── aws_ec2.yml
├── azure_rm.yml
├── gcp.yml
- hosts: all
tasks:
- group_by:
key: "os_{{ ansible_distribution }}"
Now, all Debian-based servers are in os_Debian, and Red Hat-based servers are in os_RedHat.
I use:
12. How do you configure an inventory for a HAProxy load balancer setup?
[load_balancers]
haproxy1 ansible_host=192.168.1.100
[web_servers]
web1 ansible_host=192.168.1.101
web2 ansible_host=192.168.1.102
[database]
db1 ansible_host=192.168.1.200
[load_balancers:vars]
ansible_user=haproxy
- hosts: web
serial: 2
tasks:
- name: Update packages
apt:
name: "*"
state: latest
all:
children:
masters:
hosts:
master1:
master2:
workers:
hosts:
worker1:
worker2:
Then, I use:
Final Thoughts
Would you like me to create mock interview questions for you to practice?
15 Real-World Ansible Work Scenario Questions & Answers
Here are some practical work scenarios with first-person answers that you can use to prepare
for an interview.
In my playbook, I define variables in the vars section or use external variable files for better
organization. For example, if I need to install a package dynamically, I do this:
This ensures flexibility, allowing me to change package_name without modifying the task itself.
I structure my inventory using host_vars/ and group_vars/ directories. If I want all webservers to use
port 8080, I create group_vars/webservers.yaml:
web_port: 8080
Now, every webserver in this group will use this port in my tasks. This method helps me
organize configurations cleanly.
Inside group_vars/prod.yaml:
database_host: db-prod.example.com
I use register to store the output of a command and reference it in another task.
This ensures that the service restart only happens on RedHat-based systems.
vars_files:
- secrets.yml
[defaults]
fact_caching = jsonfile
fact_caching_connection = /tmp/facts
I pass variables via the command line or use host_vars/. Command-line variables override
everything:
This ensures variables are set dynamically based on the system’s properties.
[custom]
env=staging
I access it in my playbook:
Final Thoughts
With these real-world scenarios, I’m confident in handling variables, facts, and troubleshooting
issues in Ansible. By practicing these situations, I ensure smooth automation and deployment
workflows.
Here are 15 real-world Ansible playbook interview scenarios, answered in first person as if I
were responding in an interview.
1. How do you structure an Ansible playbook for managing multiple servers with
different roles?
I structure my playbooks using roles to keep things modular and organized. For example, I
might have roles like webserver, database, and load_balancer. Each role has its own tasks, handlers,
variables, and templates. A typical site-wide playbook looks like this:
This ensures better reusability and separation of concerns across different server types.
I use handlers to ensure the service only restarts if the configuration file actually changes.
3. How do you ensure a task runs only if certain conditions are met?
I use the when statement to apply conditionals based on host variables or facts.
5. A task needs to retry until a certain condition is met. How do you do it?
I use the until directive to retry tasks until the condition succeeds.
- block:
- name: Attempt risky command
command: /bin/false
rescue:
- name: Handle failure
debug:
msg: "The previous task failed, but we handled it!"
always:
- name: Ensure cleanup
debug:
msg: "This runs no matter what."
7. How do you run a task on a different machine than the target host?
Example: Copying a file from the Ansible control node to a remote host.
I use Ansible Vault to encrypt sensitive files like passwords and keys.
vars_files:
- vars.yml
9. You need to execute a task on localhost but apply changes remotely. How do
you do it?
13. How do you fetch and use data from a previous task?
I ensure tasks don’t run unnecessarily by using modules instead of raw commands.
1. Verbose output:
2. Debug a variable:
Here are 15 real-world Ansible work scenarios with answers in the first person, as if I were
responding in an interview.
1. You need to create a new role for setting up a web server. How do you proceed?
Then, I define tasks in tasks/main.yml to install and configure the web server, using
handlers/main.yml to restart the service when needed. I store configurations in templates/ and files/,
ensuring variables are parameterized in defaults/main.yml. I also document the role in README.md
for clarity.
dependencies:
- role: common
- role: database
vars:
db_name: "app_db"
This ensures the common and database roles execute before the current role. I also verify that
dependencies are installed using requirements.yml.
4. How do you install and manage roles from Ansible Galaxy?
- name: geerlingguy.nginx
version: 3.0.0
5. A role needs an update but you don’t want to break existing setups. What do
you do?
I create a new branch in Git and update the role there. I test the updated role in a staging
environment before merging it into the main branch. If it's an external role from Ansible Galaxy,
I specify a version in requirements.yml and gradually test updates before upgrading production
systems.
app_port: 8080
app_port: 80
7. How do you debug an issue where a role is failing?
I check logs and use the debug module to print variable values. If the issue is with a role variable,
I check precedence using:
8. A role modifies a configuration file but doesn’t restart the service. How do you
fix this?
This ensures the service restarts only if the config file changes.
I ensure the role has a meta/main.yml with proper metadata, commit it to a public GitHub repo,
and then run:
10. You need to use a role stored in a private Git repository. How do you install it?
- name: my_private_role
src: git+ssh://git@github.com/myorg/my_private_role.git
version: master
I version-control roles and use a CI/CD pipeline to test updates before deployment. In
requirements.yml, I pin role versions:
- name: my_role
version: "1.2.0"
Before updating, I test in a staging environment. Once validated, I roll out updates gradually
using Ansible’s serial execution:
- hosts: web_servers
serial: 2
roles:
- my_role
12. A role is taking too long to execute. How do you optimize it?
If tasks are redundant, I use conditionals (when:) and check mode (--check) to avoid unnecessary
changes. For loops, I replace with_items with loop for better performance.
13. How do you ensure role security when handling sensitive data?
vars_files:
- secrets.yml
For passwords and API keys, I also consider Ansible lookup plugins like ansible.builtin.env to fetch
secrets from environment variables.
molecule test
ansible/
│── group_vars/
│── host_vars/
│── roles/
│ ├── common/
│ ├── web_server/
│ ├── database/
│── site.yml
│── inventory.ini
Conclusion
These scenarios cover real-world Ansible challenges you might face at work. Mastering them
ensures you can confidently explain, troubleshoot, and optimize Ansible roles in any
environment.
In my Flask route, I use render_template() to pass variables to the template. For example:
@app.route('/profile')
def profile():
user = {"name": "Alice", "age": 30}
return render_template("profile.html", user=user)
<ul>
{% for fruit in fruits %}
<li>{{ fruit }}</li>
{% endfor %}
</ul>
3. How do you conditionally display content in Jinja2?
{% if user.is_admin %}
<a href="/admin">Go to Admin Panel</a>
{% else %}
<p>You do not have admin privileges.</p>
{% endif %}
<html>
<head>
<title>{% block title %}Default Title{% endblock %}</title>
</head>
<body>
<header>Site Header</header>
<main>
{% block content %}{% endblock %}
</main>
</body>
</html>
Then, I extend it in home.html:
{% extends "base.html" %}
{% block title %}Home Page{% endblock %}
{% block content %}
<h1>Welcome Home</h1>
{% endblock %}
{% macro greet(name) %}
<p>Hello, {{ name }}!</p>
{% endmacro %}
/static
styles.css
/templates
index.html
By default, Jinja2 escapes data. But if I need raw HTML, I use safe:
{{ user.bio|safe }}
Otherwise, if user.bio contains <script>, it’s rendered as plain text, preventing XSS.
10. How do you pass multiple values into a template?
{% if users %}
<p>There are users.</p>
{% else %}
<p>No users found.</p>
{% endif %}
{% include "navbar.html" %}
Then, in Jinja2:
app.run(debug=True)
For errors, Flask shows the traceback, which helps me pinpoint issues.
Final Thoughts
If I can answer these types of questions fluently, I’m well-prepared for real-world Flask and
Jinja2 challenges!
Here are 15 real-world Ansible work scenarios, answered in the first person as if I were
responding in an interview.
Example Playbook:
I use the user module to create the user and the copy module to configure sudo access.
Example Playbook:
3. How do you copy a file to multiple servers and verify its integrity?
I use the copy module to distribute the file and stat to verify its checksum.
Example Playbook:
- debug:
msg: "File exists: {{ file_status.stat.exists }}"
I check the running kernel version against the installed version using shell and reboot only when
necessary.
rescue:
- name: Rollback to previous version
ansible.builtin.copy:
src: backup_version/
dest: /var/www/app
notify: Restart service
always:
- name: Ensure service is running
ansible.builtin.service:
name: myapp
state: started
I use Ansible inventory files and variable precedence with group_vars and host_vars.
# inventory
[dev]
server1 ansible_host=192.168.1.10
[staging]
server2 ansible_host=192.168.1.20
[prod]
server3 ansible_host=192.168.1.30
#!/usr/bin/python
from ansible.module_utils.basic import AnsibleModule
def main():
module = AnsibleModule(argument_spec={})
module.exit_json(changed=True, message="Custom module executed")
if __name__ == '__main__':
main()
- debug:
msg: "{{ result.message }}"
Final Thoughts
These scenarios cover core Ansible concepts and real-world challenges. If you can confidently
explain these, you'll be well-prepared for any interview. Do you want me to simulate a live
interview with more complex follow-ups?
Here are 15 real-world work scenarios with first-person answers that will help you prepare for
an Ansible Vault-related interview:
I would immediately remove the file from Git history using git filter-branch or git rebase to ensure
the secrets don’t persist. Then, I’d rotate the exposed credentials and re-encrypt them using
Ansible Vault. Going forward, I’d enforce Git hooks or a .gitignore rule to prevent secrets from
being committed.
2. You need to securely store API keys for multiple environments (dev, staging,
production). How would you handle this in Ansible?
I would create separate vault-encrypted files for each environment, such as:
group_vars/
├── dev/vault.yml
├── staging/vault.yml
├── production/vault.yml
Then, I’d reference them dynamically in the playbook using vars_files, ensuring only the relevant
secrets are used based on the environment.
3. How do you ensure that Ansible Vault passwords aren’t stored in plaintext on
disk?
I’d store the vault password in an environment variable instead of a plaintext file. For CI/CD
pipelines, I’d use a secure secret manager like AWS Secrets Manager or HashiCorp Vault to
dynamically inject passwords at runtime.
4. Your manager asks you to share an encrypted Ansible Vault file with another
team without exposing the secret. How do you do it?
I would rekey the vault file to use a new password that I share securely with the other team:
This allows them to access the secrets without knowing our main vault password.
I’d store the vault password in Jenkins credentials and pass it as a variable:
ANSIBLE_VAULT_PASSWORD=$(cat /path/to/jenkins/secret)
ansible-playbook deploy.yml --vault-password-file <(echo "$ANSIBLE_VAULT_PASSWORD")
6. What if a junior engineer forgets the Ansible Vault password and can’t run
playbooks?
If they don’t have access, I’d guide them to request it from a secure password manager like
Bitwarden or 1Password. If the password is completely lost, we’d need to recreate the
encrypted files, as Ansible Vault does not provide recovery options.
7. A playbook fails because it can’t decrypt a vault file. What are the possible
reasons?
9. What security measures would you take to protect Ansible Vault files?
I would:
10. You are onboarding a new team member. How do you securely share the
Ansible Vault password?
I’d use a secure password manager (e.g., Bitwarden, 1Password, or HashiCorp Vault) to share
the password. I’d also ensure they understand best practices and policies around handling
secrets.
11. What happens if two engineers encrypt the same file with different passwords?
The file becomes unreadable unless you have the exact password used for encryption. To
prevent this, I’d enforce a shared vault password file or use a vault rekeying process.
12. Your team uses both Ansible Vault and HashiCorp Vault. When would you use
each?
I’d use Ansible Vault for encrypting static data like configuration files. I’d use HashiCorp Vault
for dynamic secrets that need on-demand access (e.g., ephemeral database credentials).
13. How would you migrate existing plaintext secrets into Ansible Vault?
I’d:
14. A senior engineer suggests using GPG instead of Ansible Vault for secrets. How
would you respond?
I’d acknowledge that GPG is a good alternative, but I’d explain that Ansible Vault is built-in,
integrates well with playbooks, and provides simpler management. If the organization already
uses GPG, we could explore ansible-vault --vault-id gpg@... for integration.
15. How would you handle a situation where a vault-encrypted playbook is running
in production but fails due to decryption errors?
I’d:
These scenarios cover a broad range of challenges you might face when handling secrets with
Ansible Vault in the real world. If you can confidently explain these, you’ll be well-prepared for
any interview!
1. A team member installed a collection, but Ansible still says the module is
missing. What do you do?
If it’s not listed, I confirm the installation path using ansible-config dump | grep COLLECTIONS_PATHS.
If the module is still not recognized, I check if it requires FQCN (Fully Qualified Collection Name)
in the playbook.
For example, instead of:
collections:
- name: community.general
- name: ansible.utils
version: ">=1.0.0,<2.0.0"
Then, I install them all at once using:
3. You are developing a custom module inside a collection. How do you test it?
- hosts: localhost
tasks:
- name: Run custom module
my_namespace.my_collection.my_module:
param1: "test"
collections:
- name: my_namespace.my_collection
source: https://internal-repo.com/my_collection.tar.gz
5. A new collection update breaks your playbooks. How do you fix it?
collections:
- name: community.general
version: "5.5.0"
I enforce version control by specifying versions in requirements.yml and locking them in ansible.cfg:
[defaults]
collections_paths = ./collections
- hosts: localhost
tasks:
- name: Use custom filter
debug:
msg: "{{ 'hello' | my_namespace.my_collection.custom_filter }}"
8. Your collection needs dependencies from another collection. How do you handle
it?
dependencies:
ansible.utils: ">=1.0.0"
I check meta/runtime.yml inside the collection to see the required version. If upgrading Ansible
isn’t an option, I look for an older collection version that supports my Ansible version:
I add requirements.yml to the repository and install it in the CI/CD pipeline before running
playbooks:
If it’s a module issue, I insert debug messages in the Python code using:
I move roles into roles/ inside the collection structure, update galaxy.yml, and reference them
using FQCN:
roles:
- my_namespace.my_collection.my_role
13. A module from a collection is not working as expected. How do you fix it?
If the issue persists, I enable debugging (-vvv), check logs, and review module source code.
This covers real-world troubleshooting and best practices. Let me know if you want more!
Here are 15 practical work scenarios where I might need to optimize Ansible’s performance,
with answers in the first person to reflect how I would handle them in a real job.
Q: My Ansible playbook takes too long when running on 500+ hosts. How can I speed it up?
A: The first thing I check is the number of forks in my Ansible configuration. By default, Ansible
only runs 5 hosts in parallel. I increase this by modifying ansible.cfg:
[defaults]
forks = 50
This allows me to execute tasks on 50 hosts at a time, reducing the overall execution time.
Q: Some tasks, like database backups, take a long time and block the playbook. How do I
prevent this?
A: I use asynchronous execution so that Ansible doesn't wait for the task to finish. For example:
This lets the task run in the background while the playbook moves on.
A: I store the job ID when running the task and later check its status using async_status:
[defaults]
transport = accelssh
Q: My playbook is slow because it gathers facts every time. How do I optimize this?
[defaults]
fact_caching = jsonfile
fact_caching_connection = /tmp/ansible_facts
fact_caching_timeout = 600
Q: Some of my servers are slower than others and cause playbook timeouts. How do I handle
this?
A: I increase the SSH timeout in ansible.cfg to allow slow hosts more time to respond:
[defaults]
timeout = 60
- hosts: all
serial: 20
This ensures Ansible processes only 20 hosts at a time instead of overloading them.
Q: Some remote servers are in high-latency environments. How do I optimize for this?
A: I increase timeout and use persistent connections (accelerated mode or persistent SSH).
I also use fact_caching to avoid unnecessary network calls.
Q: How do I prevent one failing host from stopping the entire playbook?
Q: How do I prevent Ansible from running on hosts that don’t need updates?
A: I use the changed_when condition and check if changes occurred before proceeding.
A: I use ANSIBLE_DEBUG=True and -vvv verbosity to check where time is being spent.
I also check if fact gathering, SSH connections, or slow tasks are the bottlenecks.
Final Thoughts
By applying these techniques, I ensure my Ansible playbooks run efficiently, scale well, and
avoid unnecessary delays.
Here are 15 real-world Ansible troubleshooting scenarios with answers in first-person to help
you prepare for any interview.
This gives me full JSON output of module execution, helping me pinpoint the issue. If necessary,
I increase verbosity further (-vvvv) to see SSH details.
If it’s missing, I check where it should be set (inventory, vars, group_vars, host_vars). If it's a fact, I
ensure gather_facts: yes is enabled.
I also verify indentation and colons. YAML is strict, so I make sure everything is properly
formatted.
However, I use this cautiously because ignoring failures might hide real issues.
It’s not idempotent. I switch to a proper module instead of using command or shell.
Instead of this:
I use this:
ssh user@remote_host
If SSH fails, I check network/firewall settings. If Ansible is using the wrong user, I specify it in the
inventory:
[servers]
192.168.1.100 ansible_user=myuser ansible_ssh_private_key_file=~/.ssh/id_rsa
9. I need to run a playbook as another user but keep getting permission errors. What
do I do?
[defaults]
forks = 50
11. My playbook works on one server but fails on another. How do I troubleshoot?
12. A playbook fails with "no action detected in task". How do I fix it?
I forgot a colon (:) after apt. Fixing it like this solves the issue:
13. How do I retry failed hosts instead of rerunning the playbook from the start?
14. How do I ensure a task runs even if the previous ones fail?
- block:
- name: This might fail
command: /bin/false
rescue:
- name: Handle failure
debug:
msg: "Task failed, but we handled it."
always:
- name: Cleanup
debug:
msg: "This always runs."
To test it quickly.
Final Thoughts
These scenarios prepare me for real-world Ansible debugging. I now feel confident answering
interview questions and troubleshooting playbooks effectively.
Want more scenarios or deeper explanations on any of them?
Here are 15 real-world work scenarios related to Ansible Tower/AWX, along with first-person
responses demonstrating expertise and problem-solving skills.
Answer:
I navigate to Resources > Inventories to verify if the inventory exists. If it’s missing, I recreate it
or check if it was accidentally deleted. If it exists but isn't assigned to the job template, I edit
the template and select the correct inventory. I also ensure that the inventory syncs properly,
especially if it's a dynamic inventory from a cloud provider like AWS.
Answer:
I create a credential in Tower/AWX under Resources > Credentials and select Amazon Web
Services (AWS) as the type. I enter the Access Key ID and Secret Access Key securely. Then, I
ensure that only authorized users and job templates have access to these credentials using
RBAC permissions.
Answer:
I go to Templates > Job Templates, select the required job, and navigate to the Schedules tab. I
create a new schedule, name it "Weekly Maintenance," and set it to run every Sunday at 2:00
AM in the appropriate time zone. I verify that the job is enabled and test it with a manual
execution.
4. A Job is Running Too Slowly. How Do You Optimize It?
Answer:
I check the job output logs to identify slow tasks. Then, I optimize it by:
5. A User Complains They Can’t Run a Job. How Do You Fix It?
Answer:
I first check the user’s role under Users & Teams to ensure they have the right permissions. If
they lack access, I modify the RBAC settings and grant them the appropriate role (e.g., Job
Executor). If they have access but still can’t run it, I review the job template permissions to
ensure it's not restricted.
6. How Do You Integrate Tower Logs with an External System Like Splunk?
Answer:
I go to Settings > System > Logging and configure external logging by enabling "Enable External
Logging". I enter the Splunk HTTP Event Collector (HEC) URL, choose JSON format, and set an
authentication token. I then test it by executing a job and confirming that logs appear in Splunk.
Answer:
I edit the job template and use the Limit field to specify hostnames or groups (e.g.,
"web_servers"). If hosts aren’t pre-grouped, I use an extra variable like -e
target_hosts="server1,server2", and modify the playbook to respect this variable.
Answer:
I check if the correct Vault credential is assigned to the job template. If missing, I add it under
Resources > Credentials and associate it with the job. If it's a one-time execution, I enable Vault
password prompting instead of storing it.
Answer:
I ensure that job templates using sensitive credentials have RBAC restrictions to limit log
access. I also configure Ansible no_log: true in playbooks to hide sensitive outputs from logs.
10. How Do You Handle Dynamic Inventory for AWS EC2 Instances?
Answer:
I create a Dynamic Inventory Source under Resources > Inventories and select Amazon EC2 as
the source. I configure authentication using stored AWS credentials and set a sync schedule.
This ensures the inventory updates automatically when new instances are created.
Answer:
I use the Ansible Tower plugin for Jenkins or trigger jobs via the REST API by making a POST
request:
curl -k -X POST https://awx.example.com/api/v2/job_templates/10/launch/ \
-H "Authorization: Bearer mytoken"
12. A Job Fails with an SSH Permission Denied Error. What’s the Fix?
Answer:
I verify that the SSH key or password is correctly stored in AWX under Credentials. If using an
SSH key, I ensure it’s added to authorized_keys on the target servers. I also test connectivity
manually:
Answer:
I configure Notifications in AWX under Settings > Notifications and add an email notification
provider with SMTP details. Then, I link it to job templates so the team gets emails when jobs
fail.
14. How Do You Control Which Users Can Use a Specific Credential?
Scenario: Only the DevOps team should use a specific AWS credential.
Answer:
I go to Resources > Credentials, select the AWS credential, and modify the RBAC settings. I
assign it only to the DevOps team and remove access from other users. This ensures only
authorized users can access it.
Enable verbosity (-vvv or -vvvv) in the job template for deeper logs.
Check network connectivity between AWX and target hosts.
Run the playbook manually with --step to identify failing tasks.
Use retry files (ansible-playbook playbook.yml --limit @failed_hosts.retry).
Final Thoughts
These real-world troubleshooting scenarios will prepare you for almost any Ansible
Tower/AWX interview or work challenge. Would you like me to generate hands-on lab
exercises for practice?
Here are 15 real-world Ansible network automation scenarios with answers in the first person,
simulating how I'd respond in a work environment or interview.
1. A critical network outage has occurred. How would you use Ansible to diagnose
the issue?
I would first use Ansible to gather real-time data from all affected network devices. I’d run a
show commands playbook to collect interface status, routing tables, and logs.
Once I have the data, I’d analyze it to identify down interfaces, high CPU usage, or routing
issues.
2. You need to configure VLANs on 50 switches. How would you do it efficiently?
I’d write an Ansible playbook to apply VLAN configurations in bulk, ensuring consistency across
all devices.
This ensures all switches get the same VLAN setup without manual intervention.
3. A security team asks you to audit all network devices for SSH configurations.
How would you do it?
I’d use Ansible to check SSH settings on all devices and store the output for auditing.
4. A new branch office is opening. How would you provision network devices using
Ansible?
I’d prepare a playbook to configure interfaces, routing, VLANs, and users. Then, I’d deploy it to
the new devices.
- name: Provision New Branch Devices
hosts: new_branch_routers
tasks:
- name: Configure interfaces
cisco.ios.ios_config:
lines:
- interface GigabitEthernet0/1
- ip address 192.168.10.1 255.255.255.0
- no shutdown
5. A firmware update is needed on 100 devices. How would you handle it?
I’d use Ansible to automate firmware uploads and reboots in batches to minimize downtime.
I’d schedule updates during maintenance windows and verify each device post-upgrade.
6. You need to back up all device configurations. How would you do it?
9. You need to check interface errors on all devices. How would you do it?
I’d use Ansible to fetch interface statistics and analyze error counts.
10. How would you validate network changes before applying them?
I’d use Ansible’s check mode to simulate changes without applying them:
11. How would you manage network device credentials securely in Ansible?
I’d first retrieve the last backup, then restore it using Ansible.
13. How would you verify if all devices have the same NTP configuration?
14. How would you automate user account management on network devices?
15. How would you troubleshoot slow network performance using Ansible?
Here are 15 real-world work scenario questions with detailed first-person answers to help you
master Ansible for Cloud Automation in interviews.
1. You need to provision an AWS EC2 instance and install Apache on it. How would
you do it?
I would use Terraform to provision the EC2 instance and Ansible to configure Apache. First, I’d
write a Terraform script to create the instance. Then, I’d use an Ansible playbook to install
Apache and start the service.
tags = {
Name = "web-server"
}
}
I would then run Terraform to deploy the server and Ansible to configure it.
2. How would you automate the provisioning of an S3 bucket with Ansible?
I would use the amazon.aws.s3_bucket module to create an S3 bucket. Here’s how I’d do it:
I would then run the playbook, and it would create the bucket automatically.
3. What would you do if your Ansible playbook fails due to missing AWS
credentials?
[default]
aws_access_key_id=YOUR_ACCESS_KEY
aws_secret_access_key=YOUR_SECRET_KEY
region=us-east-1
If credentials are present, I’d verify that boto3 and botocore are installed:
4. You need to deploy a virtual machine on Azure using Ansible. What would you
do?
- name: Create VM
azure.azcollection.azure_rm_virtualmachine:
name: myVM
resource_group: myResourceGroup
vm_size: Standard_B1s
admin_username: azureuser
admin_password: "P@ssword123!"
5. How would you configure an IAM role for an EC2 instance using Ansible?
I would:
I:
Terraform is for provisioning infrastructure, while Ansible is for configuring it. I use Terraform
to create resources (EC2, S3, VPC) and Ansible to configure them (installing software, updates).
I would use Auto Scaling Groups with an Ansible playbook to configure instances dynamically.
12. How do you automate Kubernetes deployments in the cloud with Ansible?
0 2 * * * ansible-playbook deploy.yml
I:
- debug:
msg: "Variable value is {{ my_var }}"
15. How would you optimize an Ansible playbook for large-scale cloud
environments?
I would:
Here are 15 real-world work scenarios with answers in the first person to help you confidently
answer interview questions about CI/CD with Ansible.
Scenario: A company already has Jenkins pipelines for deployments but wants to introduce
Ansible for automation.
Answer:
I would start by installing the Ansible plugin in Jenkins and ensuring Ansible is installed on the
Jenkins server. Then, I would modify the Jenkins pipeline to include an Ansible playbook
execution step.
For example, I would add a stage like this in the Jenkinsfile:
This ensures that Ansible runs automatically after the code is built and tested.
Answer:
First, I would check the Jenkins logs or CI/CD logs to identify the exact error message. If it's a
syntax issue, I would validate the playbook using:
3. How would you securely store SSH keys for Ansible in GitHub Actions?
Answer:
I would use GitHub Secrets to store the SSH private key securely. In the GitHub Actions
workflow, I would modify the steps like this:
This ensures the key is available only during workflow execution and is securely encrypted.
Scenario: A new Ansible role is added, and it must be tested before deployment.
Answer:
I would use Molecule for testing. First, I would initialize a Molecule scenario:
If using version-controlled infrastructure, I would revert to the last known good state.
If state drift occurred, I would restore backups or manually fix affected services.
To prevent this in the future, I would add an environment check in the playbook to prevent
accidental production deployments.
Answer:
I would define a .gitlab-ci.yml file like this:
deploy:
stage: deploy
image: python:3.9
before_script:
- apt update && apt install -y ansible
script:
- ansible-playbook -i inventory.ini playbook.yml
Answer:
I would use Ansible Vault to encrypt sensitive variables:
vars_files:
- secrets.yml
To automate decryption in CI/CD, I would store the vault password in Jenkins/GitHub Secrets.
Answer:
I would use the AWS Ansible collection and define a playbook like this:
ansible-playbook aws_provision.yml
Scenario: Ansible fails with a "timeout" error when connecting to remote servers.
Answer:
I would:
Answer:
I would:
Answer:
I would use the Docker module in Ansible:
Answer:
I would use Ansible’s declarative modules, which only modify state if needed.
For example, this ensures a package is installed only if missing:
Answer:
I would define dependencies in requirements.yml and install them using:
14. How do you trigger an Ansible playbook after a successful build in Jenkins?
Answer:
I would configure a post-build step in Jenkins to trigger Ansible using a shell script:
ansible-playbook deploy.yml
Answer:
I would:
These scenarios should prepare you for any interview question related to CI/CD with Ansible.
Want more practice?
Here are 15 work scenario questions with 1st-person answers, tailored for interviews on
writing custom Ansible modules and plugins.
1. You need to automate a process that no existing Ansible module covers. What
do you do?
I would write a custom Ansible module in Python. First, I'd define the module’s parameters and
use the AnsibleModule class to handle input validation. Then, I'd write the logic to interact with
the system or API. Finally, I'd ensure it returns structured JSON output. I'd test it locally before
integrating it into a playbook.
2. You wrote a custom module, but it fails. How do you debug it?
I start by running it independently with Python to check for syntax errors. Then, I use ansible-
playbook -vvv for verbose output. If the error isn’t clear, I add module.fail_json(msg="debug message")
in different parts of the script to pinpoint the issue. If needed, I check logs in /var/log/ansible.log
or use debug tasks in the playbook.
3. A team wants a custom Ansible module to interact with an internal API. How do
you approach it?
First, I gather requirements—what endpoints, authentication, and data formats are needed.
Then, I use the requests library in Python within my custom module to send API requests. I'd
handle authentication securely, parse responses, and format the output for Ansible. After
testing, I'd document its usage for the team.
I would write a custom callback plugin to format the output as needed. I'd extend CallbackBase
and override methods like v2_runner_on_ok() to control success output and v2_runner_on_failed() for
failure messages. After testing, I’d add it to callback_whitelist in ansible.cfg.
I'd write a custom connection plugin by subclassing ConnectionBase. I’d implement exec_command()
for remote execution, and put_file() & fetch_file() for file transfers. Once done, I’d place it in
connection_plugins/ and test it by setting connection: my_custom_connection in a playbook.
6. A colleague asks how to distribute a custom Ansible module. What do you
suggest?
I'd suggest packaging it as a collection using ansible-galaxy init my_collection. This makes it easy to
share and install via ansible-galaxy install. Alternatively, if it's a single module, I’d place it in a
library/ folder inside playbooks or contribute it upstream if it’s widely useful.
I'd use the Ansible SDK or ansible-runner. With the SDK, I’d use PlaybookExecutor to run playbooks
directly. With ansible-runner, I’d call ansible_runner.run() to execute Ansible asynchronously and
collect structured results.
8. Your module must support dry-run mode (--check). How do you implement it?
9. Your Ansible module runs but doesn’t return expected output. What do you
check?
First, I check if module.exit_json() is returning correctly formatted JSON. Then, I validate the
module’s logic by printing intermediate outputs. If needed, I enable -vvv logging to see what’s
happening.
10. Ansible execution seems slow due to a custom plugin. How do you optimize?
I profile it by adding timing logs. If it’s a callback plugin, I ensure it only processes necessary
events. If it’s a connection plugin, I check if it’s using unnecessary SSH sessions. I’d also avoid
excessive Python loops and optimize API calls with batch requests.
12. Your custom connection plugin must support SSH tunneling. How do you do it?
I'd modify exec_command() to establish an SSH tunnel before executing commands. I could use
paramiko to handle SSH connections manually or wrap ssh commands with subprocess. I’d also
ensure put_file() and fetch_file() support tunneling.
13. Your team wants a logging system for Ansible tasks. What’s your approach?
I’d create a callback plugin that sends logs to a centralized system, like ELK, Splunk, or a
database. I’d override methods like v2_runner_on_ok() to capture task success and
v2_runner_on_failed() for failures. Then, I’d format logs as JSON and send them to the logging
system.
14. You need to make your custom module idempotent. What steps do you take?
I ensure the module first checks the current state before making changes. If the desired state is
already present, it should return changed: false. I’d also implement a diff mode where the module
shows what changes would occur before applying them.
15. Your playbook using a custom module works locally but fails on remote hosts.
How do you troubleshoot?
I check Python dependencies on the remote machine to ensure they match my development
environment. I also inspect the Python interpreter path in ansible_python_interpreter. If the issue
persists, I copy the module manually to /tmp/ and run it with python /tmp/my_module.py to identify
errors.
These scenarios cover real-world issues you might face while writing custom Ansible modules
and plugins. Practicing them will help you confidently tackle interview questions.
Would you like me to generate more complex scenarios?
Here are 15 real-world Ansible work scenarios in a first-person Q&A format to help you
confidently answer any interview question.
I follow a modular approach using roles to separate concerns. Each role contains tasks,
templates, handlers, and variables for better organization. I also use group_vars and host_vars
to manage configurations per environment (staging, production). My inventory files are stored
separately, and I ensure the ansible.cfg file is properly set to optimize execution.
I tag tasks based on functionality to allow selective execution. For example, I might tag package
installations as install and configuration updates as config. If I need to debug something, I add a
debug tag and run only those tasks with ansible-playbook site.yml --tags debug. I also use the never tag
for tasks that should only run when explicitly called.
I use built-in Ansible modules like file, copy, and template instead of shell or command. I ensure
tasks check for the desired state before applying changes, such as using state: present instead of
latest. For commands, I use creates: or removes: to prevent unnecessary re-execution.
I use include_tasks for dynamic execution, especially when tasks depend on conditions. For
example, when setting up multiple services, I use a loop with include_tasks to call a task file per
service. This approach keeps my main playbook clean and allows better reusability.
5. Have you ever faced an issue where a playbook was not idempotent? How did
you fix it?
Yes, once a playbook kept restarting a service unnecessarily because of a misconfigured
changed_when condition. I fixed it by modifying the task to compare actual file contents using diff
before restarting. I also added a notify handler so the restart only happens when a configuration
change is detected.
I use Ansible Vault to encrypt sensitive data like passwords and API keys. Instead of hardcoding
secrets, I store them in a vault-encrypted file and reference them in playbooks. To manage
vault passwords securely, I use an external vault password file or environment variables.
I use the assert module to validate that required variables are defined and meet expected
conditions. For example, before configuring a database, I check that db_password is set and
meets complexity requirements. If an invalid value is detected, the playbook fails early,
preventing misconfiguration.
I first check the error message to identify the failing task. If it's unclear, I add -vvvv to the
execution command for detailed logs. If a task behaves unexpectedly, I use the debug module to
print variable values. I also use check_mode to simulate execution without making changes,
helping me pinpoint issues safely.
I check the affected hosts for differences in OS, package versions, or connectivity issues. If
necessary, I use run_once for tasks that should only execute on one host and delegate_to for
central execution. I also use failed_when to fine-tune failure conditions if a task isn't truly critical.
I use conditions (when:) to check for existing states before executing tasks. I also leverage the
changed_when parameter to prevent redundant changes. For example, instead of running a
command unconditionally, I check if the required outcome already exists.
For cloud environments like AWS, I use Ansible's dynamic inventory plugins. I configure
inventory scripts or use AWS’s EC2 inventory plugin to fetch host details dynamically. This
eliminates manual inventory updates when instances change.
13. How do you make sure an Ansible deployment doesn’t break production?
I always run playbooks in check mode first to see what changes would be made. I also test in a
staging environment before deploying to production. For critical services, I use rolling updates
by applying changes to a subset of hosts first.
I use assert to validate configurations before applying changes. I also integrate Ansible with
tools like Ansible Lint to catch errors early. If a compliance check fails, I use fail: to stop
execution and provide a clear error message.
I first analyze logs to identify the cause of failure. If the failure left an incomplete state, I use
ansible-playbook --limit <affected_hosts> to rerun only on failed nodes. If needed, I roll back
by running a previous version of the playbook or restoring from a backup.
These answers will demonstrate your practical experience, problem-solving skills, and best
practices when working with Ansible. Want me to tailor any of these further for your specific
use case?
Great! Here are 15 real-world Ansible work scenario questions, answered in the first person to
reflect how you'd respond in an interview.
I set up Ansible Rulebooks to detect incidents automatically. For example, if a critical service
fails, my syslog event source triggers a playbook to restart the service. I also send notifications
to Slack or ServiceNow for visibility. This reduces response time and ensures quick remediation
without manual intervention.
2. You need to integrate Ansible with ServiceNow for ticket automation. How do
you do it?
I use the servicenow.itsm.incident module to create, update, and close incidents based on playbook
results. If a server crashes, my Ansible playbook automatically logs an incident in ServiceNow,
assigns it, and triggers remediation actions. Once resolved, the playbook updates ServiceNow
and closes the ticket.
I use the k8s module to manage Kubernetes resources. My playbook creates a Deployment
with replica settings and a Service to expose the application. I also integrate Helm charts when
managing complex deployments. If scaling is needed, I adjust the replica count dynamically
based on metrics.
4. You need to automate Docker container deployment with Ansible. How would
you do it?
I use the docker_container module to pull images, create containers, and manage networking.
I ensure idempotency by defining container state as present. If a container needs updates, I set
up a rolling restart mechanism to avoid downtime.
I adjust forks in ansible.cfg for parallel execution and use fact caching to reduce data gathering
time. I also distribute workloads using Ansible Automation Controller (formerly Tower) and
enable SSH pipelining for faster task execution.
First, I run the playbook with -vvv for verbose output. I check for tasks with long execution
times, then optimize by enabling async tasks, reducing loops, and ensuring handlers are used
efficiently. If latency is an issue, I use Mitogen to accelerate execution.
7. Your Ansible playbook fails at a specific task. How do you debug it?
I use the --step and --start-at-task flags to isolate the issue. Checking logs with -vvv, I analyze if it’s a
syntax issue, module failure, or network timeout. If needed, I run the playbook in check mode
to see potential issues before execution.
I never hardcode passwords; instead, I use Ansible Vault for sensitive data. I restrict SSH access
with role-based permissions and limit playbook execution using scoped inventory and
automation controller RBAC policies.
9. You need to update 500 servers without downtime. How do you do it?
I use rolling updates with serial execution in the playbook, updating servers in small batches.
Example:
serial: 10
This ensures minimal impact. I also add health checks before moving to the next batch.
I embed Ansible Playbooks in my Jenkins or GitHub Actions workflow. For example, after a
code commit, the pipeline triggers an Ansible playbook to deploy the application in staging, run
tests, and, if successful, proceed to production.
11. Your playbook needs to run different tasks based on OS type. How do you
handle that?
12. You need to automate AWS EC2 provisioning. How do you do it?
I use the amazon.aws.ec2_instance module to launch instances dynamically. The playbook fetches
inventory from AWS using dynamic inventory scripts to target the new instances for
configuration.
I enforce role-based access (RBAC) in Ansible Automation Controller. I also add safety checks in
playbooks:
- fail:
msg: "This playbook cannot run on production!"
when: inventory_hostname in groups['production']
15. You need to set up an event-driven workflow where a security alert triggers
Ansible remediation. How do you do it?
I integrate Ansible Rulebooks with a SIEM tool (e.g., Splunk or ELK). When a security alert (e.g.,
failed SSH attempts) is logged, the Ansible EDA engine triggers a playbook that blocks the
offending IP via firewall rules, preventing further attacks.
Conclusion
These responses cover technical depth while demonstrating real-world problem-solving skills.
Want me to simulate a live interview with follow-ups?