Limited Time Offer!

For Less Than the Cost of a Starbucks Coffee, Access All DevOpsSchool Videos on YouTube Unlimitedly.
Master DevOps, SRE, DevSecOps Skills!

Enroll Now

This tutorial is based on a real production-style issue where a server had Keycloak and multiple Laravel microservices installed on the same machine.

The symptoms looked unrelated at first:

git pull origin master
ssh: Could not resolve hostname github.com: Temporary failure in name resolution

Laravel was also failing:

GuzzleHttp\Exception\ConnectException
Call to undefined method GuzzleHttp\Exception\ConnectException::getResponse()

DNS tools were failing:

dig github.com
UDP setup with 8.8.8.8#53 failed: address in use
no servers could be reached

Even direct IP ping was failing:

ping -c 4 8.8.8.8
ping: connect: Resource temporarily unavailable

At first, it looked like DNS, Apache, GitHub, or Laravel API issue. But the real root cause was Keycloak.

One Java process belonging to Keycloak had opened more than 28,000 UDP sockets. Because of that, the server could not create new UDP sockets for DNS resolution. Once DNS failed, GitHub, Laravel Guzzle calls, curl, and other services also started failing.

This tutorial explains how to diagnose the issue safely, how to confirm the real root cause, how to fix Keycloak configuration for a single-server setup, and how to prevent the same issue in the future.

Keycloak production mode enables caching, and Keycloak documentation explains that distributed caches can use a transport stack for node discovery. The current Keycloak documentation says the default cache stack is jdbc-ping, which uses the configured database to track cluster nodes. It also lists older stack values such as tcp and udp as deprecated in the configuration reference. (Keycloak) (Keycloak)

Server setup in this case

The server had this type of architecture:

One Linux server
├── Apache
├── Keycloak
├── Laravel student service
├── Laravel trainer service
├── Laravel course service
├── Other DevOpsSchool microservices
├── MySQL/PostgreSQL database
└── Git repositories

Keycloak and the Laravel projects were on the same server.

That is important because for a single-server Keycloak setup, you normally do not need cluster discovery or multi-node distributed cache behavior.

First symptom: Laravel Guzzle error

The Laravel trainer service was calling the course service API:

https://www.devopsschool.com/course/oauth/token

The logs showed:

GuzzleHttp\Exception\ConnectException
Call to undefined method GuzzleHttp\Exception\ConnectException::getResponse()

The important part is this:

ConnectException

A ConnectException means Guzzle could not connect to the remote URL. It may happen because of DNS failure, network failure, SSL failure, server unreachable, or socket exhaustion.

The second problem was this:

Call to undefined method GuzzleHttp\Exception\ConnectException::getResponse()

That is a Laravel code issue. A Guzzle ConnectException does not have an HTTP response object. So this code is wrong:

$response = $e->getResponse();

That may work for RequestException, but not for ConnectException.

Correct handling should be:

use GuzzleHttp\Exception\ConnectException;
use GuzzleHttp\Exception\RequestException;

try {
    // Guzzle request here
} catch (ConnectException $e) {
    \Log::error('Course API connection failed', [
        'message' => $e->getMessage(),
    ]);

    return [];
} catch (RequestException $e) {
    \Log::error('Course API request failed', [
        'message' => $e->getMessage(),
        'status' => $e->hasResponse() ? $e->getResponse()->getStatusCode() : null,
        'body' => $e->hasResponse() ? (string) $e->getResponse()->getBody() : null,
    ]);

    return [];
} catch (\Exception $e) {
    \Log::error('Course API unknown error', [
        'message' => $e->getMessage(),
    ]);

    return [];
}

This Laravel fix is important. But it does not solve the server-level network problem. It only prevents the page from crashing badly.

Second symptom: Can we call the service using localhost?

Because the course service and trainer service were on the same server, the question was:

Can I call http://localhost/course/oauth/token instead of https://www.devopsschool.com/course/oauth/token?

In theory, yes. Same-server internal HTTP calls can be faster and avoid external DNS/SSL dependency.

But in this case, Apache was redirecting HTTP to HTTPS.

The test showed:

curl -I http://localhost/course/oauth/token

Output:

HTTP/1.1 301 Moved Permanently
Location: https://localhost/course/oauth/token

Then another test:

curl -I -H "Host: www.devopsschool.com" http://127.0.0.1/course/oauth/token

Output:

HTTP/1.1 301 Moved Permanently
Location: https://www.devopsschool.com/course/oauth/token

This confirmed that the Apache port 80 vhost was forcing HTTP to HTTPS.

So http://localhost/course/oauth/token was not directly usable.

The better long-term internal setup would be a localhost-only Apache vhost on a separate port, for example 127.0.0.1:8081, without HTTPS redirect. But that was not the main issue here.

The main issue was that even public HTTPS URL calls were failing because the server’s network sockets were exhausted.

Third symptom: GitHub DNS failure

Git pull failed:

git pull origin master

Output:

ssh: Could not resolve hostname github.com: Temporary failure in name resolution
fatal: Could not read from remote repository.

This usually means DNS is broken. But we needed to test whether the server had general internet connectivity or only DNS failure.

Testing direct IP:

ping -c 4 8.8.8.8

Output:

ping: connect: Resource temporarily unavailable

This was more serious. It meant the problem was not only DNS. The server could not even create the required network socket for ping.

Testing GitHub domain:

ping -c 4 github.com

Output:

ping: github.com: Temporary failure in name resolution

At this point, the issue was likely network stack or socket resource exhaustion.

Safe read-only checks before changing anything

In production, do not restart services blindly. Start with read-only commands.

Check route:

ip route

Output was:

default via 68.178.160.2 dev eth0 proto static onlink

Check IP address:

ip addr show

Output showed:

eth0: UP
inet 68.178.165.3/32

This confirmed the network interface and route existed.

Check firewall rules:

iptables -S
iptables -L -n -v

Output showed:

-P INPUT ACCEPT
-P FORWARD ACCEPT
-P OUTPUT ACCEPT

So firewall was not blocking outbound traffic.

Check systemd network services:

systemctl is-active systemd-networkd
systemctl is-active systemd-resolved
systemctl is-active NetworkManager

Output:

active
active
inactive

This meant the server was using systemd-networkd and systemd-resolved. NetworkManager was not active, and that was normal for this server.

Check resolver file:

cat /etc/resolv.conf

It had DNS servers:

nameserver 8.8.8.8
nameserver 1.1.1.1
nameserver 10.255.250.80
nameserver 10.255.251.80

Check systemd-resolved directly:

resolvectl query github.com
resolvectl query www.devopsschool.com

Output showed that resolvectl could resolve domains:

github.com: 20.205.243.166
www.devopsschool.com: 68.178.165.3

But normal resolver calls failed:

getent hosts github.com
getent hosts www.devopsschool.com

No output.

Then dig and nslookup showed the key clue:

dig github.com

Output:

UDP setup with 8.8.8.8#53 for github.com failed: address in use.
no servers could be reached

This was the turning point. DNS servers were reachable by systemd-resolved, but normal tools could not create UDP sockets.

Finding the real root cause: too many UDP sockets

Check socket summary:

ss -s

Output:

Total: 29828
UDP:   28234
TCP:   80

This was abnormal. A normal server may have tens or hundreds of UDP sockets, not more than 28,000.

Check UDP count:

ss -u -a | wc -l

Output:

Check top UDP socket owners:

lsof -nP -iUDP 2>/dev/null | awk 'NR>1 {print $1, $2}' | sort | uniq -c | sort -nr | head -30

Output:

28232 java 3215099
2 systemd-r 3915117

This confirmed one Java process had opened almost all UDP sockets.

Check process details:

ps -fp 3215099

Output:

UID       PID      PPID     CMD
keycloak 3215099  3215020  java ...

Check command line:

tr '\0' ' ' < /proc/3215099/cmdline
echo

Output contained:

-Dkc.home.dir=/opt/keycloak
-cp /opt/keycloak/bin/../lib/quarkus-run.jar
io.quarkus.bootstrap.runner.QuarkusEntryPoint start --optimized --cache-stack=tcp

Check current working directory:

ls -l /proc/3215099/cwd

Output:

/proc/3215099/cwd -> /opt/keycloak

Now the root cause was clear.

Keycloak was the Java process creating thousands of UDP sockets.

Was it hacking?

It was reasonable to ask:

Is this hacking practice?

Based on the evidence, it did not look like an unknown malware process because:

User: keycloak
Executable: /usr/lib/jvm/java-21-openjdk-amd64/bin/java
Working directory: /opt/keycloak
Command: Keycloak Quarkus runner

So the evidence pointed to a Keycloak configuration, runtime, or socket leak issue, not direct hacking.

However, any server incident that causes network exhaustion should still be treated seriously. After stabilizing the service, it is wise to check SSH logins and unknown processes.

Safe checks:

last -a | head -30
grep "Accepted" /var/log/auth.log | tail -50
ps aux --sort=-%cpu | head -20
ps aux --sort=-%mem | head -20

These are read-only checks.

Keycloak log confirmed “Too many open files”

Keycloak logs showed:

java.io.IOException: Too many open files
at sun.nio.ch.Net.accept(Native Method)
at sun.nio.ch.ServerSocketChannelImpl.accept

This confirmed the same root problem.

In Linux, sockets are also file descriptors. So when Keycloak opened thousands of sockets, it moved toward the open-file limit.

Check open-file limit:

cat /proc/4133456/limits | grep "open files"
systemctl show keycloak -p LimitNOFILE

Output:

Max open files 65535 65535 files
LimitNOFILE=65535

So the open-file limit was already high. The problem was not a low limit. The problem was that Keycloak was continuously creating sockets.

Increasing LimitNOFILE would only delay the failure. It would not fix the leak.

Finding the bad Keycloak startup option

Search where cache-stack was configured:

grep -R "cache-stack" /opt/keycloak/conf /etc/systemd/system /lib/systemd/system 2>/dev/null

Output:

/etc/systemd/system/keycloak.service:ExecStart=/opt/keycloak/bin/kc.sh start --optimized --cache-stack=tcp
/etc/systemd/system/multi-user.target.wants/keycloak.service:ExecStart=/opt/keycloak/bin/kc.sh start --optimized --cache-stack=tcp

So Keycloak was being started with:

--cache-stack=tcp

For this server, that was not needed because there was only one Keycloak node.

The service file was changed from:

ExecStart=/opt/keycloak/bin/kc.sh start --optimized --cache-stack=tcp

to:

ExecStart=/opt/keycloak/bin/kc.sh start --optimized

After reloading systemd and restarting Keycloak, the UDP count dropped from more than 28,000 to around 24 to 32:

ss -s

Output became:

UDP 24
UDP 27
UDP 32

That confirmed the emergency was resolved.

Why UDP started increasing again

After the first fix, UDP later increased:

UDP 125
UDP 134
UDP 266

The command line no longer showed --cache-stack=tcp:

tr '\0' ' ' < /proc/KEYCLOAK_JAVA_PID/cmdline

Output:

io.quarkus.bootstrap.runner.QuarkusEntryPoint start --optimized

The systemd service also showed:

systemctl cat keycloak | grep ExecStart

Output:

ExecStart=/opt/keycloak/bin/kc.sh start --optimized

But because Keycloak was running with:

start --optimized

there was a strong possibility that the optimized build still had old build-time cache behavior or that the default production distributed cache was still creating sockets.

Keycloak documentation explains that in production mode, caching is enabled, and it supports distributed caching. The docs also state that the default stack is jdbc-ping, while the all-config reference lists tcp and udp among deprecated stack values. (Keycloak) (Keycloak)

cache=local vs cache=ispn in Keycloak

This became the key question:

What is cache=local and cache=ispn?

cache=local

Use this when you have only one Keycloak server.

Example:

cache=local
health-enabled=true
metrics-enabled=true

Meaning:

Keycloak keeps cache locally inside this one process.
No distributed cache cluster transport is needed.
No Keycloak node discovery is needed.
No node-to-node communication is needed.

This is best for a single-server setup where Keycloak and all projects are installed on the same machine.

cache=ispn

ispn means Infinispan. It is Keycloak’s embedded cache system for production and clustering.

Example:

cache=ispn
cache-stack=jdbc-ping
health-enabled=true
metrics-enabled=true

Meaning:

Keycloak uses Infinispan cache.
It can support multiple Keycloak nodes.
With jdbc-ping, nodes can use the database to discover cluster members.

Use cache=ispn when you have multiple Keycloak nodes behind a load balancer or when you need high availability.

Which one is right for this server?

Because this server had only one Keycloak instance, the recommended setting was:

cache=local
health-enabled=true
metrics-enabled=true

Avoid:

cache-stack=tcp
cache-stack=udp

Will rebuild and restart delete Keycloak users?

No.

This was another important question:

Will rebuild and restart affect current user data?

The answer is no. kc.sh build and systemctl restart keycloak do not delete users, realms, clients, roles, groups, passwords, or client secrets.

Keycloak permanent data is stored in the database. Runtime cache is temporary.

What may be affected:

Current login sessions may be cleared.
Users may need to log in again.
Keycloak may be unavailable during restart.

What will remain safe:

Users
Passwords
Realms
Clients
Roles
Groups
Client secrets
Keycloak database data

Keycloak’s configuration guide explains that Keycloak has a build step for optimized startup and configuration application. This build/start behavior is part of Keycloak’s normal production usage. (Keycloak)

Correct production fix for one-server Keycloak setup

Step 1: Take backup

cp /opt/keycloak/conf/keycloak.conf /opt/keycloak/conf/keycloak.conf.backup-$(date +%F-%H%M%S)
cp /etc/systemd/system/keycloak.service /etc/systemd/system/keycloak.service.backup-$(date +%F-%H%M%S)

Step 2: Edit Keycloak config

nano /opt/keycloak/conf/keycloak.conf

Use this for a single server:

cache=local
health-enabled=true
metrics-enabled=true

Make sure these are not present:

cache-stack=tcp
cache-stack=udp
cache=ispn

Step 3: Confirm systemd service does not force tcp

systemctl cat keycloak | grep ExecStart

Expected:

ExecStart=/opt/keycloak/bin/kc.sh start --optimized

Not expected:

ExecStart=/opt/keycloak/bin/kc.sh start --optimized --cache-stack=tcp

Step 4: Build optimized Keycloak config

Because the service uses:

kc.sh start --optimized

run:

cd /opt/keycloak
sudo -u keycloak /opt/keycloak/bin/kc.sh build

Step 5: Restart Keycloak using systemd

Do not run this manually:

./kc.sh start --optimized

Use systemd:

systemctl restart keycloak

Step 6: Verify status

systemctl status keycloak --no-pager

Step 7: Verify UDP sockets

ss -s

Then:

lsof -nP -iUDP 2>/dev/null | awk 'NR>1 {print $1, $2}' | sort | uniq -c | sort -nr | head -20

Expected:

UDP should remain low.
It should not increase every second continuously.

Why you should not run ./kc.sh start –optimized manually

When this was tried:

./kc.sh start --optimized

Keycloak failed:

Port already bound: 8080: Address already in use

That happened because Keycloak was already running through systemd on port 8080.

Starting kc.sh manually attempted to start a second Keycloak instance on the same port.

Correct command:

systemctl restart keycloak

Wrong command when service is already running:

./kc.sh start --optimized

To check who owns port 8080:

ss -ltnp | grep ':8080'

Safe command classification

In production, it is important to know which commands are read-only and which commands change things.

Read-only commands

These do not modify production:

systemctl status keycloak --no-pager
systemctl show keycloak -p LimitNOFILE
systemctl cat keycloak
ss -s
ss -u -a | wc -l
lsof -nP -iUDP 2>/dev/null | awk 'NR>1 {print $1, $2}' | sort | uniq -c | sort -nr | head -20
ip route
ip addr show
cat /etc/resolv.conf
resolvectl query github.com
getent hosts github.com
ps -fp PID
tr '\0' ' ' < /proc/PID/cmdline
cat /proc/PID/limits

Commands that change production

Run these carefully:

systemctl restart keycloak
systemctl stop keycloak
systemctl start keycloak
systemctl daemon-reload
nano /opt/keycloak/conf/keycloak.conf
nano /etc/systemd/system/keycloak.service
/opt/keycloak/bin/kc.sh build
kill PID
reboot

Permanent protection: Keycloak UDP watchdog

Even after fixing config, it is wise to add a watchdog because the server can get stuck when no one is present.

The watchdog should monitor the real Java child process, not only the systemd MainPID.

Create script:

nano /usr/local/bin/keycloak-udp-watchdog.sh

Content:

#!/bin/bash

SERVICE="keycloak"
LIMIT=5000
LOG_FILE="/var/log/keycloak-udp-watchdog.log"

JAVA_PID=$(pgrep -u keycloak -f 'io.quarkus.bootstrap.runner.QuarkusEntryPoint|quarkus-run.jar' | head -1)

if [ -z "$JAVA_PID" ]; then
    echo "$(date) - Keycloak Java PID not found. Restarting $SERVICE." >> "$LOG_FILE"
    systemctl restart "$SERVICE"
    exit 0
fi

UDP_COUNT=$(lsof -nP -iUDP 2>/dev/null | awk -v pid="$JAVA_PID" 'NR>1 && $2==pid {count++} END {print count+0}')

if [ "$UDP_COUNT" -gt "$LIMIT" ]; then
    echo "$(date) - CRITICAL: Keycloak Java PID=$JAVA_PID UDP=$UDP_COUNT exceeded limit=$LIMIT. Restarting $SERVICE." >> "$LOG_FILE"
    systemctl restart "$SERVICE"
else
    echo "$(date) - OK: Keycloak Java PID=$JAVA_PID UDP=$UDP_COUNT" >> "$LOG_FILE"
fi

Make it executable:

chmod +x /usr/local/bin/keycloak-udp-watchdog.sh

Test manually:

/usr/local/bin/keycloak-udp-watchdog.sh
cat /var/log/keycloak-udp-watchdog.log

Create systemd service:

nano /etc/systemd/system/keycloak-udp-watchdog.service

Content:

[Unit]
Description=Keycloak UDP Socket Watchdog

[Service]
Type=oneshot
ExecStart=/usr/local/bin/keycloak-udp-watchdog.sh

Create timer:

nano /etc/systemd/system/keycloak-udp-watchdog.timer

Content:

[Unit]
Description=Run Keycloak UDP Watchdog every 5 minutes

[Timer]
OnBootSec=2min
OnUnitActiveSec=5min
Unit=keycloak-udp-watchdog.service

[Install]
WantedBy=timers.target

Enable timer:

systemctl daemon-reload
systemctl enable --now keycloak-udp-watchdog.timer

Verify:

systemctl list-timers | grep keycloak

This protects the server when no one is present.

Recommended UDP threshold

Use this decision table:

UDP count	Meaning	Action
20 to 200	Normal	Monitor
200 to 1000	Suspicious if continuously increasing	Watch closely
1000 to 5000	Problem likely active	Restart Keycloak and review config
5000 plus	Critical	Auto-restart Keycloak
20000 plus	Server outage risk	Emergency fix required

The watchdog threshold can be:

LIMIT=5000

Do not set the threshold too low, because normal services may use some UDP sockets.

Enable Keycloak health checks

Add this to:

/opt/keycloak/conf/keycloak.conf

health-enabled=true
metrics-enabled=true

Keycloak officially supports health checks, and the health endpoints are exposed on management port 9000 by default when enabled. (Keycloak)

After build and restart, test:

curl -s http://127.0.0.1:9000/health/ready

If HTTPS is configured for management interface:

curl -ks https://127.0.0.1:9000/health/ready

Do not expose port 9000 publicly. Keep it local or protected.

Fix Laravel Guzzle so frontend does not crash

Even after Keycloak and DNS are fixed, Laravel should handle API failure gracefully.

Bad pattern:

catch (\Exception $e) {
    $response = $e->getResponse();
}

Good pattern:

use GuzzleHttp\Exception\ConnectException;
use GuzzleHttp\Exception\RequestException;

try {
    $response = $client->post($url, [
        'headers' => [
            'Accept' => 'application/json',
        ],
        'form_params' => [
            'grant_type' => env('COURSE_M_GRANT_TYPE'),
            'client_id' => env('COURSE_M_CLIENT_ID'),
            'client_secret' => env('COURSE_M_CLIENT_SECRET'),
        ],
        'timeout' => 20,
        'connect_timeout' => 10,
    ]);
} catch (ConnectException $e) {
    \Log::error('Course API connection failed', [
        'message' => $e->getMessage(),
    ]);

    return [];
} catch (RequestException $e) {
    \Log::error('Course API request failed', [
        'message' => $e->getMessage(),
        'status' => $e->hasResponse() ? $e->getResponse()->getStatusCode() : null,
        'body' => $e->hasResponse() ? (string) $e->getResponse()->getBody() : null,
    ]);

    return [];
} catch (\Exception $e) {
    \Log::error('Course API unknown error', [
        'message' => $e->getMessage(),
    ]);

    return [];
}

This prevents the Blade page from failing with:

Call to undefined method ConnectException::getResponse()

Fix sensitive logging

The Laravel logs showed:

COURSE_M_CLIENT_SECRET: ...

Do not log OAuth client secrets in production.

Bad:

\Log::info('COURSE_M_CLIENT_SECRET: ' . env('COURSE_M_CLIENT_SECRET'));

Good:

\Log::info('COURSE_M_CLIENT_SECRET is configured', [
    'configured' => !empty(env('COURSE_M_CLIENT_SECRET')),
]);

If a real secret was exposed in logs, regenerate it.

Testing after final fix

After Keycloak config change, build, and restart, test in this order.

Test Keycloak status

systemctl status keycloak --no-pager

Test UDP sockets

ss -s

Test top UDP owners

lsof -nP -iUDP 2>/dev/null | awk 'NR>1 {print $1, $2}' | sort | uniq -c | sort -nr | head -20

Test DNS

getent hosts github.com
dig github.com

Test direct network

ping -c 4 8.8.8.8

Test Git

git pull origin master

Test Laravel course token API

curl -X POST https://www.devopsschool.com/course/oauth/token \
  -H "Accept: application/json" \
  -d "grant_type=client_credentials" \
  -d "client_id=21" \
  -d "client_secret=YOUR_CLIENT_SECRET"

Test trainer page

Open the trainer page and check Laravel logs:

tail -f storage/logs/laravel.log

Internal localhost API call discussion

Calling internal services with localhost can be useful:

COURSE_M_BASE_URL=http://127.0.0.1:8081/course

But in this server, Apache port 80 was redirecting all HTTP traffic to HTTPS, so:

curl -I http://localhost/course/oauth/token

returned:

301 Moved Permanently
Location: https://localhost/course/oauth/token

If you want internal same-server service calls, create a separate internal-only Apache vhost on a separate port, for example 127.0.0.1:8081, without HTTPS redirect.

Example concept:

Listen 8081

<VirtualHost 127.0.0.1:8081>
    ServerName devopsschool-internal.local

    Alias /course /opt/lampp/htdocs/devopsschool/services/courses/public

    <Directory /opt/lampp/htdocs/devopsschool/services/courses/public>
        Options Indexes FollowSymLinks
        AllowOverride All
        Require all granted
    </Directory>
</VirtualHost>

Then Laravel can call:

COURSE_M_BASE_URL=http://127.0.0.1:8081/course

But do this only after Keycloak and DNS are stable. The localhost setup was not the root cause of this incident.

Final root cause summary

The real root cause was:

Keycloak Java process created thousands of UDP sockets.

This caused:

UDP socket exhaustion

Then:

DNS tools could not create UDP sockets

Then:

GitHub hostname resolution failed

Then:

Laravel Guzzle calls failed with ConnectException

Then:

Trainer Blade page crashed because ConnectException was handled incorrectly

The Keycloak startup option:

--cache-stack=tcp

was found in:

/etc/systemd/system/keycloak.service

It was removed.

For a one-server setup, the recommended cache config is:

cache=local
health-enabled=true
metrics-enabled=true

Then:

cd /opt/keycloak
sudo -u keycloak /opt/keycloak/bin/kc.sh build
systemctl restart keycloak

Complete final checklist

Use this checklist for production.

Diagnosis checklist

ss -s
lsof -nP -iUDP 2>/dev/null | awk 'NR>1 {print $1, $2}' | sort | uniq -c | sort -nr | head -20
systemctl status keycloak --no-pager
systemctl cat keycloak | grep ExecStart
grep -R "cache-stack" /opt/keycloak/conf /etc/systemd/system /lib/systemd/system 2>/dev/null

Fix checklist

cp /opt/keycloak/conf/keycloak.conf /opt/keycloak/conf/keycloak.conf.backup-$(date +%F-%H%M%S)
cp /etc/systemd/system/keycloak.service /etc/systemd/system/keycloak.service.backup-$(date +%F-%H%M%S)

Edit:

nano /opt/keycloak/conf/keycloak.conf

Use:

cache=local
health-enabled=true
metrics-enabled=true

Confirm service:

systemctl cat keycloak | grep ExecStart

Expected:

ExecStart=/opt/keycloak/bin/kc.sh start --optimized

Build and restart:

cd /opt/keycloak
sudo -u keycloak /opt/keycloak/bin/kc.sh build
systemctl restart keycloak

Verify:

systemctl status keycloak --no-pager
ss -s
lsof -nP -iUDP 2>/dev/null | awk 'NR>1 {print $1, $2}' | sort | uniq -c | sort -nr | head -20
getent hosts github.com
git pull origin master

Prevention checklist

Add UDP watchdog
Enable health check
Do not expose port 9000 publicly
Do not log OAuth secrets
Handle Guzzle ConnectException separately
Monitor Keycloak UDP socket count
Avoid --cache-stack=tcp on single-server setup

Frequently asked questions

Is this a hacking issue?

Based on the observed process, it looked like Keycloak, not an unknown process. The process user was keycloak, the working directory was /opt/keycloak, and the command used Keycloak’s Quarkus runner.

Still, after any server-level resource exhaustion issue, check SSH logs and unknown processes.

Does `kc.sh build` delete users?

No. It does not delete users, realms, clients, roles, groups, passwords, or client secrets.

It rebuilds optimized runtime configuration. Data remains in the Keycloak database.

Does `systemctl restart keycloak` delete users?

No. It only restarts the Keycloak service.

Active sessions may be interrupted, and users may need to log in again.

Why did `./kc.sh start --optimized` fail with port 8080 already in use?

Because Keycloak was already running through systemd. Manual start tried to launch a second Keycloak instance on the same port.

Use:

systemctl restart keycloak

not:

./kc.sh start --optimized

when Keycloak is managed by systemd.

Should I use `cache=local` or `cache=ispn`?

For one server, use:

cache=local

For multiple Keycloak nodes or HA setup, use:

cache=ispn
cache-stack=jdbc-ping

Do not use cache-stack=tcp for this single-server setup.

Why did GitHub fail?

GitHub failed because DNS resolution failed. DNS failed because the server could not create UDP sockets due to Keycloak socket exhaustion.

Why did Laravel fail?

Laravel failed because Guzzle could not connect to the API URL. Then the exception handling code tried to call getResponse() on ConnectException, which does not have a response.

Can I use localhost for service-to-service API calls?

Yes, but only if Apache has an internal vhost that does not redirect HTTP to HTTPS. In this case, /course over localhost was redirected to HTTPS, so it was not immediately usable.

What is the best permanent solution?

Use:

cache=local
health-enabled=true
metrics-enabled=true

Run:

kc.sh build
systemctl restart keycloak

Add a UDP watchdog and fix Laravel exception handling.

Conclusion

This incident looked like multiple unrelated failures: GitHub DNS error, Laravel API failure, Apache localhost redirect, and Keycloak “Too many open files.” But all of them were connected.

The real root cause was Keycloak creating too many UDP sockets.

Once UDP sockets were exhausted, DNS could not work. Once DNS failed, GitHub and Laravel external API calls failed. Once Laravel failed, the page crashed because Guzzle ConnectException was handled incorrectly.

For a single-server Keycloak setup, keep the configuration simple:

cache=local
health-enabled=true
metrics-enabled=true

Avoid:

--cache-stack=tcp

Add a watchdog so the server can recover automatically if the issue ever returns.

This approach gives you a stable Keycloak setup, safer Laravel behavior, better production monitoring, and a clear troubleshooting path for future incidents.

Post Views: 443

Limited Time Offer!

Server setup in this case

First symptom: Laravel Guzzle error

Second symptom: Can we call the service using localhost?

Third symptom: GitHub DNS failure

Safe read-only checks before changing anything

Finding the real root cause: too many UDP sockets

Was it hacking?

Keycloak log confirmed “Too many open files”

Finding the bad Keycloak startup option

Why UDP started increasing again

cache=local vs cache=ispn in Keycloak

cache=local

cache=ispn

Which one is right for this server?

Will rebuild and restart delete Keycloak users?

Correct production fix for one-server Keycloak setup

Step 1: Take backup

Step 2: Edit Keycloak config

Step 3: Confirm systemd service does not force tcp

Step 4: Build optimized Keycloak config

Step 5: Restart Keycloak using systemd

Step 6: Verify status

Step 7: Verify UDP sockets

Why you should not run ./kc.sh start –optimized manually

Safe command classification

Read-only commands

Commands that change production

Permanent protection: Keycloak UDP watchdog

Recommended UDP threshold

Enable Keycloak health checks

Fix Laravel Guzzle so frontend does not crash

Fix sensitive logging

Testing after final fix

Test Keycloak status

Test UDP sockets

Test top UDP owners

Test DNS

Test direct network

Test Git

Test Laravel course token API

Test trainer page

Internal localhost API call discussion

Final root cause summary

Complete final checklist

Diagnosis checklist

Fix checklist

Prevention checklist

Frequently asked questions

Is this a hacking issue?

Does kc.sh build delete users?

Does systemctl restart keycloak delete users?

Why did ./kc.sh start --optimized fail with port 8080 already in use?

Should I use cache=local or cache=ispn?

Why did GitHub fail?

Why did Laravel fail?

Can I use localhost for service-to-service API calls?

What is the best permanent solution?

Conclusion

Related Posts

How to Skip the Activation Email and Password Reset After Google Login in Keycloak Auto-Link Existing Users in First Broker Login

Keycloak Multi-Client Architecture with Project-Based Email Validation (Student, Trainer, Company, Consulting)

Complete Step-by-Step Guide to Running and Managing Keycloak 26.3.3 on Linux (Production Ready)

Complete Tutorial: Running Keycloak 26.x on Ubuntu with LAMPP (MariaDB) in Production

Complete Guide: Migrating Legacy Wizbrand Users to Keycloak & Customizing Email Templates

Complete Guide: How to Customize and Brand Keycloak Email Templates

Does `kc.sh build` delete users?

Does `systemctl restart keycloak` delete users?

Why did `./kc.sh start --optimized` fail with port 8080 already in use?

Should I use `cache=local` or `cache=ispn`?