achieving high throughput and scalability with jruby
play

Achieving High Throughput and Scalability with JRuby Fernando - PowerPoint PPT Presentation

Achieving High Throughput and Scalability with JRuby Fernando Castano fernando.castano@sun.com Sun Microsystems Agenda What is Project Kenai Early tests and re-architecture How, where and what we benchmark Tuning our


  1. Achieving High Throughput and Scalability with JRuby Fernando Castano fernando.castano@sun.com Sun Microsystems

  2. Agenda  What is Project Kenai  Early tests and re-architecture  How, where and what we benchmark  Tuning our stack  References  Q&A

  3. Project Kenai (Kenai.com)  Project Kenai is a platform for: - Developer Collaboration and Tools as a Service - Enables buildings communities for “connected developer” - Integrated collaboration services stack - We develop Project Kenai using Kenai  Features: (per project) - SCM (SVN, Hg) - Bug Tracking - Forums - Wiki - Mailing Lists

  4. First Design: Junction1 junction Apache2 tender 1 html api xml xml scm issues wiki forum lists svn jira sympa search bugzilla hg Solr auth Services

  5. Simple Test: Junction1  why so slow?  mpstat+jstack  too chatty  XML expensive  json slow too  CPU hungry  no CPU scaling

  6. Improved Design: Junction2 search Solr forum services scm issues lists Apache2 svn jira api/html sympa bugzilla hg wiki auth junction2

  7. Simple Test: Junction2  no chatter  better CPU usage  CPU scales  much better

  8. Infrastructure ● Sun Fire T2000 (web and app tier) ● 8 cores x 4 threads @1.4Ghz ● Sun Fire X4500 (storage) ● quad AMD core, 9.7 TB mirrored, NFS server ● opensolaris nevada 70b - containers - smf ● zfs solaris feature ● storage pool with RAIDZ ● nfs protocol ● snapshots ● coolstack and blastwave packages (~lamp stack)

  9. Workload Definition  statistics from one of Sun's busiest collaboration sites - less than 2,000,000 trans/month (46 trans/min) - less than 800 logins/day - extracted mix of activity (R/W = 80/20)  Requirements - Avg response time for 90% in stdy state less and 2 sec - 500 projects and 1000 concurrent users - match 80/20 mix - achieve at least 2000 trans/min  randomized activities for each user  don't get static content (images, jsp, etc)  no think time for now

  10. Kenai Benchmark Kit  jmeter chosen (vs Faban and loadrunner)  gnuplot + light scripting for reporting  beanshell vs TCP server (for forking unix commands)  not requesting embedded objects (no cache)  dtrace very helpful (permspace, io, mysql, etc)  collect mpstat, vmstat, trapstat, netsum, iostat, ... (~ nagios)  save everything and document changes  scale 1 dimension at the time  stickshift profiling (or newrelic) very useful

  11. Baselin Operation e (sec) comment OASIS-1625 (out of Login 0.45 memory) Baselines Logout 0.26 home 0.16 people 0.17 update profile internal error project create internal error  single thread projects 0.43parameter show=5  exclusive operation hg_del 5.30 hg_pull 3.10recurring proxy error  prstat (-L -m -p) hg_push 6.90  jstack svn_del 5.04 svn_pull 3.05recurring proxy error  stickshift svn_push 12.06 Forum_Edit 1.03 Forum_Topic_ Show 0.64 Forum_Topics _List 1.90 short wiki, regex bug, 401 returned & jsession Wiki_Post 1.18 lost view + assertion Wiki_verify 0.68 overhead Wiki_view 0.42

  12. Response Time vs users

  13. trans/min vs users

  14. CPU vs users

  15. Application server at peak  vmstat and prstat

  16. 2 Application servers

  17. High Availability strategy  Web tier - 2 servers with Apache2 (hardware load balancer)  Application tier - 2 or more servers (Appache2 in web tier load balancing) - 1 glassfish with 6 domains (jvms) in each app server  Feature server (sympa, bugzilla, search) - active-standby with manual failover (chg DNS alias)  mysql 5.0.45 database - active-standby with manual failover (chg DNS alias) - local database (146G), replication coming soon  NFS server - active-standby with rsync and manual failover (DNS chg)

  18. Low Level Tuning  Opensolaris (70b) - maxusers=4096 - tcp tuning in web tier (spec.org T2000 publications) - use FX scheduler in app tier: priocntl -s -c FX -i all - 8k blocksize for zfs pool in NFS server  java 1.6 - -server, LargePageSizeInBytes=256m - parallelGC, AggresiveOpts, MaxPermSize=512m - Xmx=Xms=2560m

  19. More Tuning  Apache 2.2.8 - built our own (studio compiler with -fast) - using pre-fork module (mpm not so good for us) - MaxClients = ServerLimit = 600 - 4 virtual hosts to serve static content (jpg, jsp, etc) - proxy balancing with sticky sessions  Memcache 1.1.12 - so far only for SCM permissions - adding as needed if SQL becomes heavy

  20. Jruby 1.1.3 (Rails 2.1) Tuning  need many runtimes for T2000 - First approach: 1 32bit jvm with 20 runtimes - runtimes are memory hungry (20MB + objects) - expensive and frequent full GCs - performance bad - Second approach: - use 6 to 8 glassfish domains per app server - deploy only 5 runtimes per domain (jvm) - full GC under control and use more mem (32G available)  compile.mode=JIT  objectspace.enable=false  bugs fixed: permspace, joni, activerecord (dtrace+prstat)

  21. Glassfish Tuning  5 acceptor-threads  5 request-processing threads (and warbler)  connection-pool validation = table  accepts lots of connections - connection-pool queue-size-in-bytes=30000 - connection-pool max-pending-count=30000  -Dcom.sun.enterprise.server.ss.ASQuickStartup=false

  22. mysql 5.0.45 Tuning  So far Query cache hit 98%  CPU usage < 10%  Planning to move to 64bit mysql  32GB of RAM available for buffers  ZFS/NFS slow compared to FC storage array

  23. Benchmark constantly or ...

  24. Project Kenai live

  25. References  Nick Sieger (team leader) - http://blog.nicksieger.com  Dtrace toolkit - http://opensolaris.org/os/community/dtrace/dtracetoolkit/  More Kenai performance details - http://jfdo.blogspot.com  Project Kenai - http://kenai.com  Solaris Inernals (Richard McDougall) - http://www.solarisinternals.com

  26. Q&A

Recommend


More recommend