The following is a log of a recurring issue with our ColdFusion 8 Standard server.
The JRun server is currently crashing every other day with the active thread error (error 1).
If the ColdFusion service is not restarted the system will then proceed to run out of memory (error 2) despite Windows still having 0.5 – 1GB to spare
A couple of things worth noting:
- The server normally runs with a thread count of 170 – 190
- When a crash is in progress this jumps to > 400
- There doesn’t appear to be a specific script or schedule to the crashes
- When tweaking the JVM Config attempts to assign more than 1024MB of heap space result in failure to start JRun
Preventative action
- We’ve switched the server to use Oracle’s JRockit JVM due to the monitoring tools available with this VM
- We’ve tuned the heap to use 1024MB static heap size with a 100M nursery (see VM Config below)
- We’re using an aggressively tuned GC (see VM Config below)
Questions
- Why won’t the JVM address more than 1024MB of RAM? Is this a limitation of 32 bit Windows / Java?
- Why doesn’t the GC handle the thread count before it gets out of hand?
- Is this expected from a 32 bit server once load reaches a certain level?
- Is there any way to identify the CF scripts that are causing the thread count to climb?
- Would switching to 64bit OS address these issues?
Answers
- On a postcard please …
Screen Grabs of JRockit Mission Control
The following are grabs of the server running “normally”
Error Messages from JVM Logs
Error 1 – Threads
java.lang.RuntimeException: Request timed out waiting for an available thread to run. You may want to consider increasing the number of active threads in the thread pool.
at jrunx.scheduler.ThreadPool$Throttle.enter(ThreadPool.java:123)
at jrunx.scheduler.ThreadPool$ThreadThrottle.invokeRunnable(ThreadPool.java:425)
at jrunx.scheduler.ThreadPool$UpstreamMetrics.invokeRunnable(ThreadPool.java:266)
at jrunx.scheduler.WorkerThread.run(WorkerThread.java:66)
Error2 – OOM
javax.servlet.ServletException: ROOT CAUSE:
java.lang.OutOfMemoryError: CG(q0) [cfapplication2ecfc1153384571$funcONERROR.runFunction(Lcoldfusion/runtime/LocalScope;Ljava/lang/Object;Lcoldfusion/runti
at coldfusion.runtime.UDFMethod.invoke(UDFMethod.java:418)at coldfusion.runtime.UDFMethod$ArgumentCollectionFilter.invoke(UDFMethod.java:325)
at coldfusion.filter.FunctionAccessFilter.invoke(FunctionAccessFilter.java:60)
at coldfusion.runtime.UDFMethod.runFilterChain(UDFMethod.java:277)
at coldfusion.runtime.UDFMethod.invoke(UDFMethod.java:192)
at coldfusion.runtime.TemplateProxy.invoke(TemplateProxy.java:448)
at coldfusion.runtime.TemplateProxy.invoke(TemplateProxy.java:308)
at coldfusion.runtime.AppEventInvoker.invoke(AppEventInvoker.java:74)
at coldfusion.runtime.AppEventInvoker.onError(AppEventInvoker.java:368)
at coldfusion.filter.ApplicationFilter.invoke(ApplicationFilter.java:324)
at coldfusion.filter.RequestMonitorFilter.invoke(RequestMonitorFilter.java:48)
at coldfusion.filter.MonitoringFilter.invoke(MonitoringFilter.java:40)
at coldfusion.filter.PathFilter.invoke(PathFilter.java:86)
at coldfusion.filter.ExceptionFilter.invoke(ExceptionFilter.java:70)
at coldfusion.filter.ClientScopePersistenceFilter.invoke(ClientScopePersistenceFilter.java:28)
at coldfusion.filter.BrowserFilter.invoke(BrowserFilter.java:38)
at coldfusion.filter.NoCacheFilter.invoke(NoCacheFilter.java:47)
at coldfusion.filter.GlobalsFilter.invoke(GlobalsFilter.java:38)
at coldfusion.filter.DatasourceFilter.invoke(DatasourceFilter.java:22)
at coldfusion.CfmServlet.service(CfmServlet.java:175)
at coldfusion.bootstrap.BootstrapServlet.service(BootstrapServlet.java:89)
at jrun.servlet.FilterChain.doFilter(FilterChain.java:86)
at coldfusion.monitor.event.MonitoringServletFilter.doFilter(MonitoringServletFilter.java:42)
at coldfusion.bootstrap.BootstrapFilter.doFilter(BootstrapFilter.java:46)
at jrun.servlet.FilterChain.doFilter(FilterChain.java:94)
at jrun.servlet.FilterChain.service(FilterChain.java:102)
at jrun.servlet.ServletInvoker.invoke(ServletInvoker.java:106)
at jrun.servlet.JRunInvokerChain.invokeNext(JRunInvokerChain.java:43)
at jrun.servlet.JRunRequestDispatcher.invoke(JRunRequestDispatcher.java:286)
at jrun.servlet.ServletEngineService.dispatch(ServletEngineService.java:544)
at jrun.servlet.jrpp.JRunProxyService.invokeRunnable(JRunProxyService.java:203)
at jrunx.scheduler.ThreadPool$DownstreamMetrics.invokeRunnable(ThreadPool.java:320)
at jrunx.scheduler.ThreadPool$ThreadThrottle.invokeRunnable(ThreadPool.java:428)
at jrunx.scheduler.ThreadPool$UpstreamMetrics.invokeRunnable(ThreadPool.java:266)
at jrunx.scheduler.WorkerThread.run(WorkerThread.java:66)
at coldfusion.monitor.event.MonitoringServletFilter.doFilter(MonitoringServletFilter.java:70)
at coldfusion.bootstrap.BootstrapFilter.doFilter(BootstrapFilter.java:46)
at jrun.servlet.FilterChain.doFilter(FilterChain.java:94)
at jrun.servlet.FilterChain.service(FilterChain.java:101)
at jrun.servlet.ServletInvoker.invoke(ServletInvoker.java:106)
at jrun.servlet.JRunInvokerChain.invokeNext(JRunInvokerChain.java:42)
at jrun.servlet.JRunRequestDispatcher.invoke(JRunRequestDispatcher.java:286)
at jrun.servlet.ServletEngineService.dispatch(ServletEngineService.java:543)
at jrun.servlet.jrpp.JRunProxyService.invokeRunnable(JRunProxyService.java:203)
at jrunx.scheduler.ThreadPool$DownstreamMetrics.invokeRunnable(ThreadPool.java:320)
at jrunx.scheduler.ThreadPool$ThreadThrottle.invokeRunnable(ThreadPool.java:428)
at jrunx.scheduler.ThreadPool$UpstreamMetrics.invokeRunnable(ThreadPool.java:266)
at jrunx.scheduler.WorkerThread.run(WorkerThread.java:66)
System Information
Windows Server 2003 Service Pack 2, 32bit Standard Edition
ColdFusion Version 8,0,1,195765 Standard
Update Level /C:/ColdFusion8/lib/updates/hf801-71800.jar
Java Version 1.6.0_14 BEA Systems, Inc.
Java VM Specification Version 1.0
Java VM Specification Vendor Sun Microsystems Inc.
Java VM Specification Name Java Virtual Machine Specification
Java VM Version R27.6.5-32_o-121899-1.6.0_14-20091001-2107-windows-ia32
Java VM Vendor BEA Systems, Inc.
Java VM Name BEA JRockit(R)
Java Specification Version 1.6
Java Specification Vendor Sun Microsystems Inc.
Java Specification Name Java Platform API Specification
Java Class Version 50.0
Intel Core 2 CPU, E7400 @ 2.80GHz
2.79GB RAM with PAE
JVM Config
#JRockit JVM
java.home=C:/Program Files/Java/jrmc-3.1.2-1.6.0/jre
# Arguments to VM
java.args=-server -Xmx1024m -Xms1024m -Xns:100m -Xgc:genpar -Dsun.io.useCanonCaches=false -Dcoldfusion.rootDir={application.home}/../ -Dcoldfusion.libPath={application.home}/../lib -Dcoldfusion.classPath={application.home}/../lib/updates,{application.home}/../lib,{application.home}/../gateway/lib/,{application.home}/../wwwroot/WEB-INF/flex/jars,{application.home}/../wwwroot/WEB-INF/cfform/jars -Xmanagement:ssl=false,authenticate=false -Djrockit.managementserver.port=9010
java.ext.dirs={jre.home}/lib/ext
java.library.path={application.home}/../lib,{application.home}/../jintegra/bin,{application.home}/../jintegra/bin/international
system.path.first=false
java.user.dir={application.home}/../../lib
java.class.path={application.home}/servers/lib,{application.home}/../lib/macromedia_drivers.jar,{application.home}/lib/cfmx_mbean.jar,{application.home}/lib
Hi,
In my experience, these types of issues aren’t a problem/limitation with CF per se, but more to do with CF process(es) in your app that are not running efficiently. I’ve spent many fruitless hours in the past chasing “CF memory leaks/errors” only to find out they were a *symptom* of something not running as fast as expected (either long processes or processes that bottleneck under a lot of concurrent load, which it sounds like your app has).
some architecture guidelines: http://cfzen.instantspot.com/blog/2009/8/8/Top-10-Architecture-Scalability-Mistakes-Made-in-ColdFusion
How do you solve it? Start with SeeFusion or FusionReactor, because they don’t require CF Enterprise. Look at your longest running threads, and see if your code, SQL, or database itself needs tuning to reduce request times. The fewer threads that are running concurrently, the less chance that you’ll get all the memory problems you’re having.
If you can afford it, and in many cases it saves you long-term, hire some experts to fix your issues and explain how. I can personally recommend http://www.coldfusionsupport.net/
After analysis, it may make sense to upgrade to CF Enterprise and run on 64-bit machines. 32-bit is limited to under 1.5G ram in the JVM, but 64-bit has no theoretical limit. We run our high-availability system on CF 8 Enterprise (64-bit) on 2 Red Hat servers that each have 32 Gigs of RAM, so that each of our 8 instance of clustered CF instances can have 3 Gigs of memory allocated to them. Needless to say, we have 99.9% availability and no memory problems, even when some processes take too long. But of course each situation is unique…
Aaron, thanks for the comprehensive and speedy response!
I didn’t think for a second that ColdFusion was “to blame” and am almost certain that the reason for the issues is a combination of not so hot legacy code and naturally increasing load due to increased clients / traffic.
I’m installing FusionReactor now and will see what that reveals although the crashes are a little random so I don’t know how soon I’ll have results.
In the mean time I’m happy that I understood the 32 bit limitation correctly. 64 bit enterprise edition is on order and we’ll look to start bumping some clients over just as soon as it’s properly configured.
One aside, would you say that using CFInvoke to call methods on a one off basis would contribute to a climbing thread count? I only ask as the core of the app does this a lot, on occasion within loops and I personally believe that this should be re-factored out to use proper object instantiation but would welcome a 2nd opinion!
Thanks
Rob
Rob, did you end up solving this? If so, any thoughts to share for readers who may find this later?
If you did not solve it, then in addition to the recommendation from Aaron above, I will say also that I provide assistance with these sort of problems (http://www.carehart.org/consulting/), and I’ve become very skilled at finding and resolving them–and also helping you understand how to do it yourself in the future. I also offer a satisfaction guarantee.
I don’t mean that to be a sales pitch, but just sincerely offering help for people who struggle and may feel stick when CF has such issues.
As Aaron intimated, there is always an explanation and it’s often not the common things people think.
I’ll eventually be creating a resource to serve as a repository of things to consider in such situations, but until then, if I can help let me know.
And I’m certainly happy to engage in conversation here. I don’t know if your blog will send us notifications of new comments. I hope so. It’s hard to remember to come back to look.