Limited Time Offer!
For Less Than the Cost of a Starbucks Coffee, Access All DevOpsSchool Videos on YouTube Unlimitedly.
Master DevOps, SRE, DevSecOps Skills!
This tutorial is based on a real production-style issue where a server had Keycloak and multiple Laravel microservices installed on the same machine.
The symptoms looked unrelated at first:
git pull origin master
ssh: Could not resolve hostname github.com: Temporary failure in name resolution
Laravel was also failing:
GuzzleHttp\Exception\ConnectException
Call to undefined method GuzzleHttp\Exception\ConnectException::getResponse()
DNS tools were failing:
dig github.com
UDP setup with 8.8.8.8#53 failed: address in use
no servers could be reached
Even direct IP ping was failing:
ping -c 4 8.8.8.8
ping: connect: Resource temporarily unavailable
At first, it looked like DNS, Apache, GitHub, or Laravel API issue. But the real root cause was Keycloak.
One Java process belonging to Keycloak had opened more than 28,000 UDP sockets. Because of that, the server could not create new UDP sockets for DNS resolution. Once DNS failed, GitHub, Laravel Guzzle calls, curl, and other services also started failing.
This tutorial explains how to diagnose the issue safely, how to confirm the real root cause, how to fix Keycloak configuration for a single-server setup, and how to prevent the same issue in the future.
Keycloak production mode enables caching, and Keycloak documentation explains that distributed caches can use a transport stack for node discovery. The current Keycloak documentation says the default cache stack is jdbc-ping, which uses the configured database to track cluster nodes. It also lists older stack values such as tcp and udp as deprecated in the configuration reference. (Keycloak) (Keycloak)
Server setup in this case
The server had this type of architecture:
One Linux server
โโโ Apache
โโโ Keycloak
โโโ Laravel student service
โโโ Laravel trainer service
โโโ Laravel course service
โโโ Other DevOpsSchool microservices
โโโ MySQL/PostgreSQL database
โโโ Git repositories
Keycloak and the Laravel projects were on the same server.
That is important because for a single-server Keycloak setup, you normally do not need cluster discovery or multi-node distributed cache behavior.
First symptom: Laravel Guzzle error
The Laravel trainer service was calling the course service API:
https://www.devopsschool.com/course/oauth/token
The logs showed:
GuzzleHttp\Exception\ConnectException
Call to undefined method GuzzleHttp\Exception\ConnectException::getResponse()
The important part is this:
ConnectException
A ConnectException means Guzzle could not connect to the remote URL. It may happen because of DNS failure, network failure, SSL failure, server unreachable, or socket exhaustion.
The second problem was this:
Call to undefined method GuzzleHttp\Exception\ConnectException::getResponse()
That is a Laravel code issue. A Guzzle ConnectException does not have an HTTP response object. So this code is wrong:
$response = $e->getResponse();
That may work for RequestException, but not for ConnectException.
Correct handling should be:
use GuzzleHttp\Exception\ConnectException;
use GuzzleHttp\Exception\RequestException;
try {
// Guzzle request here
} catch (ConnectException $e) {
\Log::error('Course API connection failed', [
'message' => $e->getMessage(),
]);
return [];
} catch (RequestException $e) {
\Log::error('Course API request failed', [
'message' => $e->getMessage(),
'status' => $e->hasResponse() ? $e->getResponse()->getStatusCode() : null,
'body' => $e->hasResponse() ? (string) $e->getResponse()->getBody() : null,
]);
return [];
} catch (\Exception $e) {
\Log::error('Course API unknown error', [
'message' => $e->getMessage(),
]);
return [];
}
This Laravel fix is important. But it does not solve the server-level network problem. It only prevents the page from crashing badly.
Second symptom: Can we call the service using localhost?
Because the course service and trainer service were on the same server, the question was:
Can I call http://localhost/course/oauth/token instead of https://www.devopsschool.com/course/oauth/token?
In theory, yes. Same-server internal HTTP calls can be faster and avoid external DNS/SSL dependency.
But in this case, Apache was redirecting HTTP to HTTPS.
The test showed:
curl -I http://localhost/course/oauth/token
Output:
HTTP/1.1 301 Moved Permanently
Location: https://localhost/course/oauth/token
Then another test:
curl -I -H "Host: www.devopsschool.com" http://127.0.0.1/course/oauth/token
Output:
HTTP/1.1 301 Moved Permanently
Location: https://www.devopsschool.com/course/oauth/token
This confirmed that the Apache port 80 vhost was forcing HTTP to HTTPS.
So http://localhost/course/oauth/token was not directly usable.
The better long-term internal setup would be a localhost-only Apache vhost on a separate port, for example 127.0.0.1:8081, without HTTPS redirect. But that was not the main issue here.
The main issue was that even public HTTPS URL calls were failing because the serverโs network sockets were exhausted.
Third symptom: GitHub DNS failure
Git pull failed:
git pull origin master
Output:
ssh: Could not resolve hostname github.com: Temporary failure in name resolution
fatal: Could not read from remote repository.
This usually means DNS is broken. But we needed to test whether the server had general internet connectivity or only DNS failure.
Testing direct IP:
ping -c 4 8.8.8.8
Output:
ping: connect: Resource temporarily unavailable
This was more serious. It meant the problem was not only DNS. The server could not even create the required network socket for ping.
Testing GitHub domain:
ping -c 4 github.com
Output:
ping: github.com: Temporary failure in name resolution
At this point, the issue was likely network stack or socket resource exhaustion.
Safe read-only checks before changing anything
In production, do not restart services blindly. Start with read-only commands.
Check route:
ip route
Output was:
default via 68.178.160.2 dev eth0 proto static onlink
Check IP address:
ip addr show
Output showed:
eth0: UP
inet 68.178.165.3/32
This confirmed the network interface and route existed.
Check firewall rules:
iptables -S
iptables -L -n -v
Output showed:
-P INPUT ACCEPT
-P FORWARD ACCEPT
-P OUTPUT ACCEPT
So firewall was not blocking outbound traffic.
Check systemd network services:
systemctl is-active systemd-networkd
systemctl is-active systemd-resolved
systemctl is-active NetworkManager
Output:
active
active
inactive
This meant the server was using systemd-networkd and systemd-resolved. NetworkManager was not active, and that was normal for this server.
Check resolver file:
cat /etc/resolv.conf
It had DNS servers:
nameserver 8.8.8.8
nameserver 1.1.1.1
nameserver 10.255.250.80
nameserver 10.255.251.80
Check systemd-resolved directly:
resolvectl query github.com
resolvectl query www.devopsschool.com
Output showed that resolvectl could resolve domains:
github.com: 20.205.243.166
www.devopsschool.com: 68.178.165.3
But normal resolver calls failed:
getent hosts github.com
getent hosts www.devopsschool.com
No output.
Then dig and nslookup showed the key clue:
dig github.com
Output:
UDP setup with 8.8.8.8#53 for github.com failed: address in use.
no servers could be reached
This was the turning point. DNS servers were reachable by systemd-resolved, but normal tools could not create UDP sockets.
Finding the real root cause: too many UDP sockets
Check socket summary:
ss -s
Output:
Total: 29828
UDP: 28234
TCP: 80
This was abnormal. A normal server may have tens or hundreds of UDP sockets, not more than 28,000.
Check UDP count:
ss -u -a | wc -l
Output:
28235
Check top UDP socket owners:
lsof -nP -iUDP 2>/dev/null | awk 'NR>1 {print $1, $2}' | sort | uniq -c | sort -nr | head -30
Output:
28232 java 3215099
2 systemd-r 3915117
This confirmed one Java process had opened almost all UDP sockets.
Check process details:
ps -fp 3215099
Output:
UID PID PPID CMD
keycloak 3215099 3215020 java ...
Check command line:
tr '\0' ' ' < /proc/3215099/cmdline
echo
Output contained:
-Dkc.home.dir=/opt/keycloak
-cp /opt/keycloak/bin/../lib/quarkus-run.jar
io.quarkus.bootstrap.runner.QuarkusEntryPoint start --optimized --cache-stack=tcp
Check current working directory:
ls -l /proc/3215099/cwd
Output:
/proc/3215099/cwd -> /opt/keycloak
Now the root cause was clear.
Keycloak was the Java process creating thousands of UDP sockets.
Was it hacking?
It was reasonable to ask:
Is this hacking practice?
Based on the evidence, it did not look like an unknown malware process because:
User: keycloak
Executable: /usr/lib/jvm/java-21-openjdk-amd64/bin/java
Working directory: /opt/keycloak
Command: Keycloak Quarkus runner
So the evidence pointed to a Keycloak configuration, runtime, or socket leak issue, not direct hacking.
However, any server incident that causes network exhaustion should still be treated seriously. After stabilizing the service, it is wise to check SSH logins and unknown processes.
Safe checks:
last -a | head -30
grep "Accepted" /var/log/auth.log | tail -50
ps aux --sort=-%cpu | head -20
ps aux --sort=-%mem | head -20
These are read-only checks.
Keycloak log confirmed โToo many open filesโ
Keycloak logs showed:
java.io.IOException: Too many open files
at sun.nio.ch.Net.accept(Native Method)
at sun.nio.ch.ServerSocketChannelImpl.accept
This confirmed the same root problem.
In Linux, sockets are also file descriptors. So when Keycloak opened thousands of sockets, it moved toward the open-file limit.
Check open-file limit:
cat /proc/4133456/limits | grep "open files"
systemctl show keycloak -p LimitNOFILE
Output:
Max open files 65535 65535 files
LimitNOFILE=65535
So the open-file limit was already high. The problem was not a low limit. The problem was that Keycloak was continuously creating sockets.
Increasing LimitNOFILE would only delay the failure. It would not fix the leak.
Finding the bad Keycloak startup option
Search where cache-stack was configured:
grep -R "cache-stack" /opt/keycloak/conf /etc/systemd/system /lib/systemd/system 2>/dev/null
Output:
/etc/systemd/system/keycloak.service:ExecStart=/opt/keycloak/bin/kc.sh start --optimized --cache-stack=tcp
/etc/systemd/system/multi-user.target.wants/keycloak.service:ExecStart=/opt/keycloak/bin/kc.sh start --optimized --cache-stack=tcp
So Keycloak was being started with:
--cache-stack=tcp
For this server, that was not needed because there was only one Keycloak node.
The service file was changed from:
ExecStart=/opt/keycloak/bin/kc.sh start --optimized --cache-stack=tcp
to:
ExecStart=/opt/keycloak/bin/kc.sh start --optimized
After reloading systemd and restarting Keycloak, the UDP count dropped from more than 28,000 to around 24 to 32:
ss -s
Output became:
UDP 24
UDP 27
UDP 32
That confirmed the emergency was resolved.
Why UDP started increasing again
After the first fix, UDP later increased:
UDP 125
UDP 134
UDP 266
The command line no longer showed --cache-stack=tcp:
tr '\0' ' ' < /proc/KEYCLOAK_JAVA_PID/cmdline
Output:
io.quarkus.bootstrap.runner.QuarkusEntryPoint start --optimized
The systemd service also showed:
systemctl cat keycloak | grep ExecStart
Output:
ExecStart=/opt/keycloak/bin/kc.sh start --optimized
But because Keycloak was running with:
start --optimized
there was a strong possibility that the optimized build still had old build-time cache behavior or that the default production distributed cache was still creating sockets.
Keycloak documentation explains that in production mode, caching is enabled, and it supports distributed caching. The docs also state that the default stack is jdbc-ping, while the all-config reference lists tcp and udp among deprecated stack values. (Keycloak) (Keycloak)
cache=local vs cache=ispn in Keycloak
This became the key question:
What is cache=local and cache=ispn?
cache=local
Use this when you have only one Keycloak server.
Example:
cache=local
health-enabled=true
metrics-enabled=true
Meaning:
Keycloak keeps cache locally inside this one process.
No distributed cache cluster transport is needed.
No Keycloak node discovery is needed.
No node-to-node communication is needed.
This is best for a single-server setup where Keycloak and all projects are installed on the same machine.
cache=ispn
ispn means Infinispan. It is Keycloakโs embedded cache system for production and clustering.
Example:
cache=ispn
cache-stack=jdbc-ping
health-enabled=true
metrics-enabled=true
Meaning:
Keycloak uses Infinispan cache.
It can support multiple Keycloak nodes.
With jdbc-ping, nodes can use the database to discover cluster members.
Use cache=ispn when you have multiple Keycloak nodes behind a load balancer or when you need high availability.
Which one is right for this server?
Because this server had only one Keycloak instance, the recommended setting was:
cache=local
health-enabled=true
metrics-enabled=true
Avoid:
cache-stack=tcp
cache-stack=udp
Will rebuild and restart delete Keycloak users?
No.
This was another important question:
Will rebuild and restart affect current user data?
The answer is no. kc.sh build and systemctl restart keycloak do not delete users, realms, clients, roles, groups, passwords, or client secrets.
Keycloak permanent data is stored in the database. Runtime cache is temporary.
What may be affected:
Current login sessions may be cleared.
Users may need to log in again.
Keycloak may be unavailable during restart.
What will remain safe:
Users
Passwords
Realms
Clients
Roles
Groups
Client secrets
Keycloak database data
Keycloakโs configuration guide explains that Keycloak has a build step for optimized startup and configuration application. This build/start behavior is part of Keycloakโs normal production usage. (Keycloak)
Correct production fix for one-server Keycloak setup
Step 1: Take backup
cp /opt/keycloak/conf/keycloak.conf /opt/keycloak/conf/keycloak.conf.backup-$(date +%F-%H%M%S)
cp /etc/systemd/system/keycloak.service /etc/systemd/system/keycloak.service.backup-$(date +%F-%H%M%S)
Step 2: Edit Keycloak config
nano /opt/keycloak/conf/keycloak.conf
Use this for a single server:
cache=local
health-enabled=true
metrics-enabled=true
Make sure these are not present:
cache-stack=tcp
cache-stack=udp
cache=ispn
Step 3: Confirm systemd service does not force tcp
systemctl cat keycloak | grep ExecStart
Expected:
ExecStart=/opt/keycloak/bin/kc.sh start --optimized
Not expected:
ExecStart=/opt/keycloak/bin/kc.sh start --optimized --cache-stack=tcp
Step 4: Build optimized Keycloak config
Because the service uses:
kc.sh start --optimized
run:
cd /opt/keycloak
sudo -u keycloak /opt/keycloak/bin/kc.sh build
Step 5: Restart Keycloak using systemd
Do not run this manually:
./kc.sh start --optimized
Use systemd:
systemctl restart keycloak
Step 6: Verify status
systemctl status keycloak --no-pager
Step 7: Verify UDP sockets
ss -s
Then:
lsof -nP -iUDP 2>/dev/null | awk 'NR>1 {print $1, $2}' | sort | uniq -c | sort -nr | head -20
Expected:
UDP should remain low.
It should not increase every second continuously.
Why you should not run ./kc.sh start –optimized manually
When this was tried:
./kc.sh start --optimized
Keycloak failed:
Port already bound: 8080: Address already in use
That happened because Keycloak was already running through systemd on port 8080.
Starting kc.sh manually attempted to start a second Keycloak instance on the same port.
Correct command:
systemctl restart keycloak
Wrong command when service is already running:
./kc.sh start --optimized
To check who owns port 8080:
ss -ltnp | grep ':8080'
Safe command classification
In production, it is important to know which commands are read-only and which commands change things.
Read-only commands
These do not modify production:
systemctl status keycloak --no-pager
systemctl show keycloak -p LimitNOFILE
systemctl cat keycloak
ss -s
ss -u -a | wc -l
lsof -nP -iUDP 2>/dev/null | awk 'NR>1 {print $1, $2}' | sort | uniq -c | sort -nr | head -20
ip route
ip addr show
cat /etc/resolv.conf
resolvectl query github.com
getent hosts github.com
ps -fp PID
tr '\0' ' ' < /proc/PID/cmdline
cat /proc/PID/limits
Commands that change production
Run these carefully:
systemctl restart keycloak
systemctl stop keycloak
systemctl start keycloak
systemctl daemon-reload
nano /opt/keycloak/conf/keycloak.conf
nano /etc/systemd/system/keycloak.service
/opt/keycloak/bin/kc.sh build
kill PID
reboot
Permanent protection: Keycloak UDP watchdog
Even after fixing config, it is wise to add a watchdog because the server can get stuck when no one is present.
The watchdog should monitor the real Java child process, not only the systemd MainPID.
Create script:
nano /usr/local/bin/keycloak-udp-watchdog.sh
Content:
#!/bin/bash
SERVICE="keycloak"
LIMIT=5000
LOG_FILE="/var/log/keycloak-udp-watchdog.log"
JAVA_PID=$(pgrep -u keycloak -f 'io.quarkus.bootstrap.runner.QuarkusEntryPoint|quarkus-run.jar' | head -1)
if [ -z "$JAVA_PID" ]; then
echo "$(date) - Keycloak Java PID not found. Restarting $SERVICE." >> "$LOG_FILE"
systemctl restart "$SERVICE"
exit 0
fi
UDP_COUNT=$(lsof -nP -iUDP 2>/dev/null | awk -v pid="$JAVA_PID" 'NR>1 && $2==pid {count++} END {print count+0}')
if [ "$UDP_COUNT" -gt "$LIMIT" ]; then
echo "$(date) - CRITICAL: Keycloak Java PID=$JAVA_PID UDP=$UDP_COUNT exceeded limit=$LIMIT. Restarting $SERVICE." >> "$LOG_FILE"
systemctl restart "$SERVICE"
else
echo "$(date) - OK: Keycloak Java PID=$JAVA_PID UDP=$UDP_COUNT" >> "$LOG_FILE"
fi
Make it executable:
chmod +x /usr/local/bin/keycloak-udp-watchdog.sh
Test manually:
/usr/local/bin/keycloak-udp-watchdog.sh
cat /var/log/keycloak-udp-watchdog.log
Create systemd service:
nano /etc/systemd/system/keycloak-udp-watchdog.service
Content:
[Unit]
Description=Keycloak UDP Socket Watchdog
[Service]
Type=oneshot
ExecStart=/usr/local/bin/keycloak-udp-watchdog.sh
Create timer:
nano /etc/systemd/system/keycloak-udp-watchdog.timer
Content:
[Unit]
Description=Run Keycloak UDP Watchdog every 5 minutes
[Timer]
OnBootSec=2min
OnUnitActiveSec=5min
Unit=keycloak-udp-watchdog.service
[Install]
WantedBy=timers.target
Enable timer:
systemctl daemon-reload
systemctl enable --now keycloak-udp-watchdog.timer
Verify:
systemctl list-timers | grep keycloak
This protects the server when no one is present.
Recommended UDP threshold
Use this decision table:
| UDP count | Meaning | Action |
|---|---|---|
| 20 to 200 | Normal | Monitor |
| 200 to 1000 | Suspicious if continuously increasing | Watch closely |
| 1000 to 5000 | Problem likely active | Restart Keycloak and review config |
| 5000 plus | Critical | Auto-restart Keycloak |
| 20000 plus | Server outage risk | Emergency fix required |
The watchdog threshold can be:
LIMIT=5000
Do not set the threshold too low, because normal services may use some UDP sockets.
Enable Keycloak health checks
Add this to:
/opt/keycloak/conf/keycloak.conf
health-enabled=true
metrics-enabled=true
Keycloak officially supports health checks, and the health endpoints are exposed on management port 9000 by default when enabled. (Keycloak)
After build and restart, test:
curl -s http://127.0.0.1:9000/health/ready
If HTTPS is configured for management interface:
curl -ks https://127.0.0.1:9000/health/ready
Do not expose port 9000 publicly. Keep it local or protected.
Fix Laravel Guzzle so frontend does not crash
Even after Keycloak and DNS are fixed, Laravel should handle API failure gracefully.
Bad pattern:
catch (\Exception $e) {
$response = $e->getResponse();
}
Good pattern:
use GuzzleHttp\Exception\ConnectException;
use GuzzleHttp\Exception\RequestException;
try {
$response = $client->post($url, [
'headers' => [
'Accept' => 'application/json',
],
'form_params' => [
'grant_type' => env('COURSE_M_GRANT_TYPE'),
'client_id' => env('COURSE_M_CLIENT_ID'),
'client_secret' => env('COURSE_M_CLIENT_SECRET'),
],
'timeout' => 20,
'connect_timeout' => 10,
]);
} catch (ConnectException $e) {
\Log::error('Course API connection failed', [
'message' => $e->getMessage(),
]);
return [];
} catch (RequestException $e) {
\Log::error('Course API request failed', [
'message' => $e->getMessage(),
'status' => $e->hasResponse() ? $e->getResponse()->getStatusCode() : null,
'body' => $e->hasResponse() ? (string) $e->getResponse()->getBody() : null,
]);
return [];
} catch (\Exception $e) {
\Log::error('Course API unknown error', [
'message' => $e->getMessage(),
]);
return [];
}
This prevents the Blade page from failing with:
Call to undefined method ConnectException::getResponse()
Fix sensitive logging
The Laravel logs showed:
COURSE_M_CLIENT_SECRET: ...
Do not log OAuth client secrets in production.
Bad:
\Log::info('COURSE_M_CLIENT_SECRET: ' . env('COURSE_M_CLIENT_SECRET'));
Good:
\Log::info('COURSE_M_CLIENT_SECRET is configured', [
'configured' => !empty(env('COURSE_M_CLIENT_SECRET')),
]);
If a real secret was exposed in logs, regenerate it.
Testing after final fix
After Keycloak config change, build, and restart, test in this order.
Test Keycloak status
systemctl status keycloak --no-pager
Test UDP sockets
ss -s
Test top UDP owners
lsof -nP -iUDP 2>/dev/null | awk 'NR>1 {print $1, $2}' | sort | uniq -c | sort -nr | head -20
Test DNS
getent hosts github.com
dig github.com
Test direct network
ping -c 4 8.8.8.8
Test Git
git pull origin master
Test Laravel course token API
curl -X POST https://www.devopsschool.com/course/oauth/token \
-H "Accept: application/json" \
-d "grant_type=client_credentials" \
-d "client_id=21" \
-d "client_secret=YOUR_CLIENT_SECRET"
Test trainer page
Open the trainer page and check Laravel logs:
tail -f storage/logs/laravel.log
Internal localhost API call discussion
Calling internal services with localhost can be useful:
COURSE_M_BASE_URL=http://127.0.0.1:8081/course
But in this server, Apache port 80 was redirecting all HTTP traffic to HTTPS, so:
curl -I http://localhost/course/oauth/token
returned:
301 Moved Permanently
Location: https://localhost/course/oauth/token
If you want internal same-server service calls, create a separate internal-only Apache vhost on a separate port, for example 127.0.0.1:8081, without HTTPS redirect.
Example concept:
Listen 8081
<VirtualHost 127.0.0.1:8081>
ServerName devopsschool-internal.local
Alias /course /opt/lampp/htdocs/devopsschool/services/courses/public
<Directory /opt/lampp/htdocs/devopsschool/services/courses/public>
Options Indexes FollowSymLinks
AllowOverride All
Require all granted
</Directory>
</VirtualHost>
Then Laravel can call:
COURSE_M_BASE_URL=http://127.0.0.1:8081/course
But do this only after Keycloak and DNS are stable. The localhost setup was not the root cause of this incident.
Final root cause summary
The real root cause was:
Keycloak Java process created thousands of UDP sockets.
This caused:
UDP socket exhaustion
Then:
DNS tools could not create UDP sockets
Then:
GitHub hostname resolution failed
Then:
Laravel Guzzle calls failed with ConnectException
Then:
Trainer Blade page crashed because ConnectException was handled incorrectly
The Keycloak startup option:
--cache-stack=tcp
was found in:
/etc/systemd/system/keycloak.service
It was removed.
For a one-server setup, the recommended cache config is:
cache=local
health-enabled=true
metrics-enabled=true
Then:
cd /opt/keycloak
sudo -u keycloak /opt/keycloak/bin/kc.sh build
systemctl restart keycloak
Complete final checklist
Use this checklist for production.
Diagnosis checklist
ss -s
lsof -nP -iUDP 2>/dev/null | awk 'NR>1 {print $1, $2}' | sort | uniq -c | sort -nr | head -20
systemctl status keycloak --no-pager
systemctl cat keycloak | grep ExecStart
grep -R "cache-stack" /opt/keycloak/conf /etc/systemd/system /lib/systemd/system 2>/dev/null
Fix checklist
cp /opt/keycloak/conf/keycloak.conf /opt/keycloak/conf/keycloak.conf.backup-$(date +%F-%H%M%S)
cp /etc/systemd/system/keycloak.service /etc/systemd/system/keycloak.service.backup-$(date +%F-%H%M%S)
Edit:
nano /opt/keycloak/conf/keycloak.conf
Use:
cache=local
health-enabled=true
metrics-enabled=true
Confirm service:
systemctl cat keycloak | grep ExecStart
Expected:
ExecStart=/opt/keycloak/bin/kc.sh start --optimized
Build and restart:
cd /opt/keycloak
sudo -u keycloak /opt/keycloak/bin/kc.sh build
systemctl restart keycloak
Verify:
systemctl status keycloak --no-pager
ss -s
lsof -nP -iUDP 2>/dev/null | awk 'NR>1 {print $1, $2}' | sort | uniq -c | sort -nr | head -20
getent hosts github.com
git pull origin master
Prevention checklist
Add UDP watchdog
Enable health check
Do not expose port 9000 publicly
Do not log OAuth secrets
Handle Guzzle ConnectException separately
Monitor Keycloak UDP socket count
Avoid --cache-stack=tcp on single-server setup
Frequently asked questions
Is this a hacking issue?
Based on the observed process, it looked like Keycloak, not an unknown process. The process user was keycloak, the working directory was /opt/keycloak, and the command used Keycloakโs Quarkus runner.
Still, after any server-level resource exhaustion issue, check SSH logs and unknown processes.
Does kc.sh build delete users?
No. It does not delete users, realms, clients, roles, groups, passwords, or client secrets.
It rebuilds optimized runtime configuration. Data remains in the Keycloak database.
Does systemctl restart keycloak delete users?
No. It only restarts the Keycloak service.
Active sessions may be interrupted, and users may need to log in again.
Why did ./kc.sh start --optimized fail with port 8080 already in use?
Because Keycloak was already running through systemd. Manual start tried to launch a second Keycloak instance on the same port.
Use:
systemctl restart keycloak
not:
./kc.sh start --optimized
when Keycloak is managed by systemd.
Should I use cache=local or cache=ispn?
For one server, use:
cache=local
For multiple Keycloak nodes or HA setup, use:
cache=ispn
cache-stack=jdbc-ping
Do not use cache-stack=tcp for this single-server setup.
Why did GitHub fail?
GitHub failed because DNS resolution failed. DNS failed because the server could not create UDP sockets due to Keycloak socket exhaustion.
Why did Laravel fail?
Laravel failed because Guzzle could not connect to the API URL. Then the exception handling code tried to call getResponse() on ConnectException, which does not have a response.
Can I use localhost for service-to-service API calls?
Yes, but only if Apache has an internal vhost that does not redirect HTTP to HTTPS. In this case, /course over localhost was redirected to HTTPS, so it was not immediately usable.
What is the best permanent solution?
Use:
cache=local
health-enabled=true
metrics-enabled=true
Run:
kc.sh build
systemctl restart keycloak
Add a UDP watchdog and fix Laravel exception handling.
Conclusion
This incident looked like multiple unrelated failures: GitHub DNS error, Laravel API failure, Apache localhost redirect, and Keycloak โToo many open files.โ But all of them were connected.
The real root cause was Keycloak creating too many UDP sockets.
Once UDP sockets were exhausted, DNS could not work. Once DNS failed, GitHub and Laravel external API calls failed. Once Laravel failed, the page crashed because Guzzle ConnectException was handled incorrectly.
For a single-server Keycloak setup, keep the configuration simple:
cache=local
health-enabled=true
metrics-enabled=true
Avoid:
--cache-stack=tcp
Add a watchdog so the server can recover automatically if the issue ever returns.
This approach gives you a stable Keycloak setup, safer Laravel behavior, better production monitoring, and a clear troubleshooting path for future incidents.
