Troubleshoot SIP applications
Overview
SIP container troubleshooting basics
- The average CPU usage of the system should go no higher than 60%-70%.
- The container should use no more than 70% of the allocated VM heap size. Be sure that the system has enough physical memory to accommodate the VM heap size. Call loads and session timeouts will have a big affect on heap usage.
- The maximum garbage collection (GC) time of the VM on which the container is running should not exceed 500 ms and the average should be less than 400 ms. Verbose GC can be used to measure this and the PMI viewer can be used to view GC times, heap usage, active sessions, etc., in graphical form.
Initial troubleshooting checklist:
Procedure
- Check the listening ports in the configuration.
- Use netstat –an to see listening ports.
- Check to see if virtual hosts are defined
- Is an application installed? Is it started?
- For a proxy configuration: Is a default cluster configured? If proxy and server are on the machine, is there a port conflict?
Results
SIP container symptoms and solutions If the problem is not resolved, check for specific symptoms.
- Symptom: Lots of retransmissions, CPU periodically drops to zero.
Solution: This is typically a DNS issue caused by Reverse DNS lookups and can be confirmed using a tool like Ethereal. If you do a network capture and send lots of DNS queries that contain an IP address and get back a host name in the response, this could be the problem. Verify nscd is running if you are on HP or other platforms that require name service caching. (Windows does not require this.) Another solution is to add host names to the \etc\host file.
- Symptom: Lots of retransmissions, CPU periodically spikes to 100%.
Solution: This is typically due to garbage collection and can be verified by turning on verbose GC (accessible on the admin console) and looking at the length of the GC cycles. The solution here is to enable Generational Garbage Collection by setting the JVM optional args to -Xgcpolicy:gencon.
- Symptom: Lots of retransmissions, CPU spikes to 100% for long periods of time and Generational Garbage Collection is enabled.Solution: This is typically due to SIP session objects either not being invalidated or not timing out over a long period of time. One solution is to set the session timeout value in the sip.xml of the application to a smaller value. The most efficient way to handle this is for the application to call invalidate on the session when the dialog completes (i.e. after receiving a BYE). The following entry in the SystemOut.log file will indicate the session timeout value is for each application installed on the container:
SipXMLParser 3 SipXMLParser getAppSessionTTL Setting Expiration time: 7 Minutes, For App: TCK back-to-back user agent”
- Symptom: Lots of "480 Service Not Available" messages received from the container when sending new INVITE messages to the SIP container. You will also likely see the following message show up in the SystemOut.log when the server is in this state:
LoadManager E LoadManager warn.server.overloadedSolution: This is typically due to one of the SIP container configurable metrics being exceeded. This includes the "Maximum Application Sessions" value and the "Maximum messages per averaging period" value. The solution is to adjust these values higher.
- Symptom: Lots of resends and calls are not completing accompanied by OutOfMemory exceptions in the SystemErr.log.
Solution: This usually means that the VM heap size associated with your container is not large enough and should be adjusted upwards. You can adjust this value from the admin console.
- Symptom: You receive a "503 Service Unavailable" when sending a SIP request to a SIP proxy.
Solution: This usually means there is no default cluster (or cluster routing rule that matches the message) set up at the proxy.
- Symptom: You receive a "404 Not Found" when sending a SIP request to a SIP proxy.
Solution: This usually means there is no virtual host set up for the containers that reside in the default cluster. It could also mean that the servers in the proxy’s default cluster do not contain a SIP application or that the message does not match one of the applications installed in the default cluster.
Symptom: An "out of memory" type behavior is occurring.
Solution: This may be due to the maximum heap size being set too low. SIP applications can consume a significant amount of memory because the sessions exist for a long call hold time. The maximum heap size of 512 MB does not provide sufficient memory for the SIP traffic workload. Set the maximum heap size for SIP applications to the minimum recommended value of 768 MB or higher.