You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

ScalableWebApplications.asciidoc 33KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729
  1. ---
  2. title: Scalable Web Applications
  3. order: 22
  4. layout: page
  5. ---
  6. [[scalable-web-applications]]
  7. Scalable web applications
  8. -------------------------
  9. [[introduction]]
  10. Introduction
  11. ^^^^^^^^^^^^
  12. Whether you are creating a new web application or maintaining an
  13. existing one, one thing you certainly will consider is the scalability
  14. of your web application. Scalability is your web application’s ability
  15. to handle a growing number of concurrent users. How many concurrent
  16. users can, for instance, your system serve at its peak usage? Will it be
  17. intended for a small scale intranet usage of tens to hundreds of
  18. concurrent users, or do you plan to reach for a medium size global web
  19. application such as 500px.com with about 1000-3000 concurrent users. Or
  20. is your target even higher? You might wonder how many concurrent users
  21. there are in Facebook or Linkedin. We wonder about that too, and thus
  22. made a small study to estimate it. You can see the results in Figure 1
  23. below. The estimates are derived from monthly visitors and the average
  24. visit time data found from Alexa.com. 
  25. image:img/webusers.png[image]
  26. _Figure 1: Popular web applications with estimated number of concurrent
  27. users._
  28. The purpose of this article is to show you common pain points which tend
  29. to decrease the scalability of your web applications and help you find
  30. out ways to overcome those. We begin by introducing you to our example
  31. application. We will show you how to test the scalability of this
  32. example application with Gatling, a tool for stress testing your web
  33. application. Then, in the next chapters, we will go through some pain
  34. points of scalability, such as memory, CPU, and database, and see how to
  35. overcome these.
  36. [[book-store-inventory-application]]
  37. Book store inventory application
  38. --------------------------------
  39. Throughout this example we’ll use a book store inventory application
  40. (see Figure 2) as an example application when deep diving into the world
  41. of web application scalability. The application contains a login view
  42. and an inventory view with CRUD operations to a mockup data source. It
  43. also has the following common web application features: responsive
  44. layouts, navigation, data listing and master detail form editor. This
  45. application is publicly available as a Maven archetype
  46. (`vaadin-archetype-application-example`). We will first test how many
  47. concurrent users the application can serve on a single server.
  48. image:img/mockapp-ui.png[image]
  49. _Figure 2: Book store inventory application_
  50. The purpose of scalability testing is to verify whether the
  51. application's server side can survive with a predefined number of
  52. concurrent users or not. We can utilize a scalability test to find the
  53. limits, a breaking point or server side bottlenecks of the application.
  54. There are several options for scalability testing web applications. One
  55. of the most used free tools is Apache JMeter. JMeter suits well for
  56. testing web applications, as long as the client-server communication
  57. does not use websockets. When using asynchronous websocket
  58. communication, one can use the free Gatling tool or the commercial
  59. NeoLoad tool.
  60. You can find a lot of step by step tutorials online on how to use
  61. Gatling and JMeter. There is typically a set of tips and tricks that one
  62. should take into account when doing load testing on certain web
  63. frameworks, such as the open source Vaadin Framework. For more
  64. information on Vaadin specific tutorials, check the wiki pages on
  65. https://vaadin.com/scalability[vaadin.com/scalability].
  66. Gatling and JMeter can be used to record client to server requests of a
  67. web application. After recording, the recorded requests can be played
  68. back by numbers of concurrent threads. The more threads (virtual users)
  69. you use the higher the simulated load generated on the tested
  70. application.
  71. Since we want to test our application both in synchronous and
  72. asynchronous communication modes, we will use Gatling. Another benefit
  73. of Gatling compared to JMeter is that it is less heavy for a testing
  74. server, thus more virtual users can be simulated on a single testing
  75. server. Figure 3 shows the Gatling settings used to record the example
  76. scenario of the inventory application. Typically all static resources
  77. are excluded from the recording (see left bottom corner of the figure),
  78. since these are typically served from a separate http server such as
  79. Nginx or from a CDN (Content Delivery Network) service. In our first
  80. test, however, we still recorded these requests to see the worst case
  81. situation, where all requests are served from a single application
  82. server.
  83. image:img/figure3s2.png[image]
  84. _Figure 3: Gatling recorder._
  85. Gatling gathers the recorded requests as text files and composes a Scala
  86. class which is used to playback the recorded test scenario. We planned a
  87. test scenario for a typical inventory application user: The user logs in
  88. and performs several updates (in our case 11) to the store in a
  89. relatively short time span (3 minutes). We also assumed that they leave
  90. the inventory application open in their browser which will result in the
  91. HttpSession not closing before a session timeout (30min in our case).
  92. Let’s assume that we have an extremely large bookstore with several
  93. persons (say 10000) responsible for updating the inventory. If one
  94. person updates the inventory 5 times a day, and an update session takes
  95. 3 minutes, then with the same logic as we calculated concurrent users in
  96. the Introduction, we will get a continuous load of about 100 concurrent
  97. users. This is of course not a realistic assumption unless the
  98. application is a global service or it is a round-the-clock used local
  99. application, such as a patient information system. For testing purposes
  100. this is, however, a good assumption.
  101. A snippet from the end of our test scenario is shown in Figure 4. This
  102. test scenario is configured to be run with 100 concurrent users, all
  103. started within 60 seconds (see the last line of code).
  104. [source,scala]
  105. ....
  106. .pause(9)
  107. .exec(http(>"request_45")
  108. .post(>"/test/UIDL/?v-uiId=0")
  109. .headers(headers_9)
  110. .body(RawFileBody(>"RecordedSimulation_0045_request.txt")))
  111. .pause(5)
  112. .exec(http(>"request_46")
  113. .post(>"/test/UIDL/?v-uiId=0")
  114. .headers(headers_9)
  115. .body(RawFileBody(>"RecordedSimulation_0046_request.txt"))
  116. .resources(http(>"request_47")
  117. .post(uri1 + >"/UIDL/?v-uiId=0")
  118. .headers(headers_9)
  119. .body(RawFileBody(>"RecordedSimulation_0047_request.txt"))))}
  120. setUp(scn.inject(rampUsers(100) over (60 seconds))).protocols(httpProtocol)
  121. ....
  122. _Figure 4: Part of the test scenario of inventory application._
  123. To make the test more realistic, we would like to execute it several
  124. times. Without repeating we do not get a clear picture of how the server
  125. will tolerate a continuous high load. In a Gatling test script, this is
  126. achieved by wrapping the test scenario into a repeat loop. We should
  127. also flush session cookies to ensure that a new session is created for
  128. each repeat. See the second line of code in Figure 5, for an example of
  129. how this could be done.
  130. [source,scala]
  131. ....
  132. val scn = scenario("RecordedSimulation")
  133. .repeat(100,"n"){exec(flushSessionCookies).exec(http("request_0")
  134. .get("/test/")
  135. .resources(http("request_1")
  136. .post(uri1 + "/?v-1440411375172")
  137. .headers(headers_4)
  138. .formParam("v-browserDetails", "1")
  139. .formParam("theme", "mytheme")
  140. .formParam("v-appId", "test-3556498")
  141. ....
  142. _Figure 5: Repeating 100 times with session cookie flushing (small part
  143. of whole script)_
  144. We tested how well this simple example application tolerated several
  145. concurrent users. We deployed our application in Apache Tomcat 8.0.22 on
  146. a Windows 10 machine with Java 1.7.0 and an older quad core mobile Intel
  147. i7 processor. With its default settings (using the default heap size of
  148. 2GB), Tomcat was able to handle about 200 concurrent users for a longer
  149. time. The CPU usage for that small number of concurrent users was not a
  150. problem (it was lower than 5%), but the server configurations were a
  151. bottleneck. Here we stumbled upon the first scalability pain point:
  152. server configuration issues (see next chapter). It might sound
  153. surprising that we could only run a such small number of concurrent
  154. users by default, but do not worry, we are not stuck here. We will see
  155. the reasons for this and other scalability pain points in the following
  156. chapter.
  157. [[scalability-pain-points]]
  158. Scalability pain points
  159. -----------------------
  160. We will go through typical pain points of a web application developer
  161. which she (or he) will encounter when developing a web application for
  162. hundreds of concurrent users. Each pain point is introduced in its own
  163. subchapter and followed by typical remedies.
  164. [[server-configuration-issues]]
  165. Server configuration issues
  166. ~~~~~~~~~~~~~~~~~~~~~~~~~~~
  167. One typical problem that appears when there are lots of concurrent
  168. users, is that the operating system (especially the *nix based ones) run
  169. out of file descriptors. This happens since most *nix systems have a
  170. pretty low default limit for the maximum number of open files, such as
  171. network connections. This is usually easy to fix with the `ulimit`
  172. command though sometimes it might require configuring the `sysctl` too.
  173. A little bit unexpected issues can also surface with network bandwidth.
  174. Our test laptop was on a wireless connection and its sending bandwidth
  175. started choking at about 300 concurrent users. (Please note that we use
  176. an oldish laptop in this entire test to showcase the real scalability of
  177. web apps –your own server environment will no doubt be even more
  178. scalable even out of the box.) One part of this issue was the wifi and
  179. another part was that we served the static resources, such as javascript
  180. files, images and stylesheets, from Tomcat. At this point we stripped
  181. the static resources requests out of our test script to simulate the
  182. situation where those are served from a separate http server, such as
  183. nginx. Please read the blog post
  184. “https://vaadin.com/blog/-/blogs/optimizing-hosting-setup[Optimizing
  185. hosting setup]” from our website for more information about the topic.
  186. Another quite typical configuration issue is that the application server
  187. is not configured for a large number of concurrent users. In our
  188. example, a symptom of this was that the server started rejecting
  189. (“Request timed out”) new connections after a while, even though there
  190. were lots of free memory and CPU resources available.
  191. After we configured our Apache Tomcat for high concurrent mode and
  192. removed static resource requests, and connected the test laptop into a
  193. wired network, we were able to push the number of concurrent users from
  194. 200 up to about 500 users. Our configuration changes into the server.xml
  195. of Tomcat are shown in Figure 6, where we define a maximum thread count
  196. (10240), an accepted threads count (4096), and a maximum number of
  197. concurrent connections (4096).
  198. image:img/figure6a.png[image]
  199. _Figure 6: Configuring Tomcat’s default connector to accept a lot of
  200. concurrent users._
  201. The next pain point that appeared with more than 500 users was that we
  202. were out of memory. The default heap size of 2GB eventually ran out with
  203. such high number of concurrent users. On the other hand, there was still
  204. a lot of CPU capacity available, since the average load was less than
  205. 5%.
  206. [[out-of-memory]]
  207. Out of memory
  208. ~~~~~~~~~~~~~
  209. Insufficient memory is possibly the most common problem that limits the
  210. scalability of a web application with a state. An http session is used
  211. typically to store the state of a web application for its user. In
  212. Vaadin an http session is wrapped into a `VaadinSession`. A
  213. VaadinSession contains the state (value) of each component (such as
  214. `Grid`, `TextFields` etc.) of the user interface. Thus,
  215. straightforwardly the more components and views you have in your Vaadin
  216. web application, the bigger is the size of your session.
  217. In our inventory application, each session takes about 0.3MB of memory
  218. which is kept in memory until the session finally closes and the garbage
  219. collectors free the resources. The session size in our example is a
  220. little bit high. With constant load of 100 concurrent users, a session
  221. timeout of 30 minutes and an average 3 minutes usage time, the expected
  222. memory usage is about 350MB. To see how the session size and the number
  223. of concurrent users affect the needed memory in our case, we made a
  224. simple analysis which results are shown in Figure 7. We basically
  225. calculated how many sessions there can exist at most, by calculating how
  226. many users there will be within an average usage time plus the session
  227. timeout.
  228. image:img/figure6s.png[image]
  229. _Figure 7: Memory need for varying size sessions and a different number
  230. of concurrent users._
  231. [[remedies]]
  232. Remedies
  233. ^^^^^^^^
  234. [[use-more-memory]]
  235. Use more memory
  236. +++++++++++++++
  237. This might sound simplistic, but many times it might be enough to just
  238. add as much memory as possible to the server. Modern servers and server
  239. operating systems have support for hundreds of gigabytes of physical
  240. memory. For instance, again in our example, if the size of a session
  241. would be 0.5MB and we had 5000 concurrent users, the memory need would
  242. be about 28GB.
  243. You also have to take care that your application server is configured to
  244. reserve enough memory. For example, the default heap size for Java is
  245. typically 2GB and for example Apache Tomcat will not reserve more memory
  246. if you do not ask it to do it with **`-Xmx`** JVM argument. You might
  247. need a special JVM for extremely large heap sizes. We used the following
  248. Java virtual machine parameters in our tests:
  249. ....
  250. -Xms5g -Xmx5g -Xss512k -server
  251. ....
  252. The parameters **`-Xms`** and **`-Xmx`** are for setting the minimum and
  253. the maximum heap size for the server (5 GB in the example), the `-Xss`
  254. is used to reduce the stack size of threads to save memory (typically
  255. the default is 1MB for 64bit Java) and the `-server` option tells JVM
  256. that the Java process is a server.
  257. [[minimize-the-size-of-a-session]]
  258. Minimize the size of a session
  259. ++++++++++++++++++++++++++++++
  260. The biggest culprit for the big session size in the inventory
  261. application is the container (BeanItemContainer) which is filled with
  262. all items of the database. Containers, and especially the built in fully
  263. featured BeanItemContainer, are typically the most memory hungry parts
  264. of Vaadin applications. One can either reduce the number of items loaded
  265. in the container at one time or use some lightweight alternatives
  266. available from Vaadin Directory
  267. (https://vaadin.com/directory[vaadin.com/directory]) such as Viritin,
  268. MCont, or GlazedLists Vaadin Container. Another approach is to release
  269. containers and views to the garbage collection e.g. every time the user
  270. switches into another view, though that will slightly increase the CPU
  271. load since the views and containers have to be rebuilt again, if the
  272. user returns to the view. The feasibility of this option is up to your
  273. application design and user flow –usually it’s a good choice.
  274. [[use-a-shorter-session-time-out]]
  275. Use a shorter session time out
  276. ++++++++++++++++++++++++++++++
  277. Since every session in the memory reserves it for as long as it stays
  278. there, the shorter the session timeout is, the quicker the memory is
  279. freed. Assuming that the average usage time is much shorter than the
  280. session timeout, we can state that halving the session timeout
  281. approximately halves the memory need, too. Another way to reduce the
  282. session’s time in the memory could be instructing users to logout after
  283. they are done.
  284. The session of a Vaadin application is kept alive by requests (such as
  285. user interactions) made from the client to the server. Besides user
  286. interaction, the client side of Vaadin application sends a heartbeat
  287. request into the server side, which should keep the session alive as
  288. long as the browser window is open. To override this behaviour and to
  289. allow closing idle sessions, we recommend that the `closeIdleSessions`
  290. parameter is used in your servlet configuration. For more details, see
  291. chapter
  292. https://vaadin.com/book/-/page/application.lifecycle.html[Application
  293. Lifecycle] in the Book of Vaadin.
  294. [[use-clustering]]
  295. Use clustering
  296. ++++++++++++++
  297. If there is not enough memory, for example if there is no way to reduce
  298. the size of a session and the application needs a very long session
  299. timeout, then there is only one option left: clustering. We will discuss
  300. clustering later in the Out of CPU chapter since clustering is more
  301. often needed for increasing CPU power.
  302. [[out-of-cpu]]
  303. Out of CPU
  304. ~~~~~~~~~~
  305. We were able to get past the previous limit of 500 concurrent users by
  306. increasing the heap size of Tomcat to 5GB and reducing the session
  307. timeout to 10 minutes. Following the memory calculations above, we
  308. should theoretically be able to serve almost 3000 concurrent users with
  309. our single server, if there is enough CPU available.
  310. Although the average CPU load was rather low (about 10%) still with 800
  311. concurrent users, it jumped up to 40% every now and then for several
  312. seconds as the garbage collector cleaned up unused sessions etc. That is
  313. also the reason why one should not plan to use full CPU capacity of a
  314. server since that will increase the garbage collection time in worst
  315. case even to tens of seconds, while the server will be completely
  316. unresponsive for that time. We suggest that if the average load grows to
  317. over 50% of the server’s capacity, other means have to be taken into use
  318. to decrease the load of the single server.
  319. We gradually increased the number of concurrent users to find out the
  320. limits of our test laptop and Tomcat. After trial and error, we found
  321. that the safe number of concurrent users for our test laptop was about
  322. 1700. Above that, several request timeout events occurred even though
  323. the CPU usage was about 40-50% of total capacity. We expect that using a
  324. more powerful server, we could have reached 2000-3000 concurrent users
  325. quite easily.
  326. [[remedies-1]]
  327. Remedies
  328. ^^^^^^^^
  329. [[analyze-and-optimize-performance-bottlenecks]]
  330. Analyze and optimize performance bottlenecks
  331. ++++++++++++++++++++++++++++++++++++++++++++
  332. If you are not absolutely sure about the origin of the high CPU usage,
  333. it is always good to verify it with a performance profiling tool. There
  334. are several options for profiling, such as JProfiler, XRebel, and Java
  335. VisualVM. We will use VisualVM in this case since it comes freely with
  336. every (Oracle’s) JDK since the version 1.5.
  337. Our typical procedure goes like this: 1. Deploy your webapp and start
  338. your server, 2. Start VisualVM and double click your server’s process
  339. (“e.g. Tomcat (pid 1234)”) on the Applications tab (see Figure 8), 3.
  340. Start your load test script with, for instance, 100 concurrent users, 4.
  341. Open the Sampler tab to see where the CPU time is spent, 5. Use the
  342. filter on the bottom to show the CPU usage of your application (e.g.
  343. “`biz.mydomain.projectx`”) and possible ORM (Object-relational mapping)
  344. framework (e.g. “`org.hibernate`”) separately.
  345. Typically, only a small part (e.g. 0.1 - 2 %) of CPU time is spent on
  346. the classes of your webapp, if your application does not contain heavy
  347. business logic. Also, CPU time spent on the classes of Vaadin should be
  348. very small (e.g. 1%). You can be relaxed about performance bottlenecks
  349. of your code if the most time (>90%) is spent on application server’s
  350. classes (e.g. “`org.apache.tomcat`”).
  351. Unfortunately, quite often database functions and ORM frameworks take a
  352. pretty big part of CPU time. We will discuss how to tackle heavy
  353. database operations in the Database chapter below.
  354. image:img/figure7s.png[image]
  355. _Figure 8: Profiling CPU usage of our inventory application with Java
  356. VisualVM_
  357. [[use-native-application-server-libraries]]
  358. Use native application server libraries
  359. +++++++++++++++++++++++++++++++++++++++
  360. Some application servers (at least Tomcat and Wildfly) allow you to use
  361. native (operating system specific) implementation of certain libraries.
  362. For example, The Apache Tomcat Native Library gives Tomcat access to
  363. certain native resources for performance and compatibility. Here we
  364. didn’t test the effect of using native libraries instead of standard
  365. ones. With little online research, it seems that the performance benefit
  366. of native libraries for Tomcat is visible only if using secured https
  367. connections.
  368. [[fine-tune-java-garbage-collection]]
  369. Fine tune Java garbage collection
  370. +++++++++++++++++++++++++++++++++
  371. We recommended above not to strain a server more than 50% of its total
  372. CPU capacity. The reason was that above that level, a garbage collection
  373. pause tends to freeze the server for too long a time. That is because it
  374. typically starts not before almost all of the available heap is already
  375. spent and then it does the full collection. Fortunately, it is possible
  376. to tune the Java garbage collector so that it will do its job in short
  377. periods. With little online study, we found the following set of JVM
  378. parameters for web server optimized garbage collection
  379. ....
  380. -XX:+UseCMSInitiatingOccupancyOnly
  381. -XX:CMSInitiatingOccupancyFraction=70
  382. ....
  383. The first parameter prevents Java from using its default garbage
  384. collection strategy and makes it use CMS (concurrent-mark-sweep)
  385. instead. The second parameter tells at which level of “occupancy” the
  386. garbage collection should be started. The value 70% for the second
  387. parameter is typically a good choice but for optimal performance it
  388. should be chosen carefully for each environment e.g. by trial and error.
  389. The CMS collector should be good for heap sizes up to about 4GB. For
  390. bigger heaps there is the G1 (Garbage first) collector that was
  391. introduced in JDK 7 update 4. G1 collector divides the heap into regions
  392. and uses multiple background threads to first scan regions that contain
  393. the most of garbage objects. Garbage first collector is enabled with the
  394. following JVM parameter.
  395. ....
  396. -XX:+UseG1GC
  397. ....
  398. If you are using Java 8 Update 20 or later, and G1, you can optimize the
  399. heap usage of duplicated Strings (i.e. their internal `char[]` arrays)
  400. with the following parameter.
  401. ....
  402. -XX:+UseStringDeduplication
  403. ....
  404. [[use-clustering-1]]
  405. Use clustering
  406. ++++++++++++++
  407. We have now arrived at the point where a single server cannot fulfill
  408. our scalability needs whatever tricks we have tried. If a single server
  409. is not enough for serving all users, obviously we have to distribute
  410. them to two or more servers. This is called clustering.
  411. Clustering has more benefits than simply balancing the load between two
  412. or more servers. An obvious additional benefit is that we do not have to
  413. trust a single server. If one server dies, the user can continue on the
  414. other server. In worst case, the user loses her session and has to log
  415. in again, but at least she is not left without the service. You probably
  416. have heard the term “session replication” before. It means that the
  417. user’s session is copied into other servers (at least into one other) of
  418. the cluster. Then, if the server currently used by the user goes down,
  419. the load balancer sends subsequent requests to another server and the
  420. user should not notice anything.
  421. We will not cover session replication in this article since we are
  422. mostly interested in increasing the ability to serve more and more
  423. concurrent users with our system. We will show two ways to do clustering
  424. below, first with Apache WebServer and Tomcats and then with the Wildfly
  425. Undertow server.
  426. [[clustering-with-apache-web-server-and-tomcat-nodes]]
  427. Clustering with Apache Web Server and Tomcat nodes
  428. ++++++++++++++++++++++++++++++++++++++++++++++++++
  429. Traditionally Java web application clustering is implemented with one
  430. Apache Web Server as a load balancer and 2 or more Apache Tomcat servers
  431. as nodes. There are a lot of tutorials online, thus we will just give a
  432. short summary below.
  433. 1. Install Tomcat for each node
  434. 2. Configure unique node names with jvmRoute parameter to each Tomcat’s
  435. server.xml
  436. 3. Install Apache Web Server to load balancer node
  437. 4. Edit Apache’s httpd.conf file to include mod_proxy, mod_proxy_ajp,
  438. and mod_proxy_balancer
  439. 5. Configure balancer members with node addresses and load factors into
  440. end of httpd.conf file
  441. 6. Restart servers
  442. There are several other options (free and commercial ones) for the load
  443. balancer, too. For example, our customers have used at least F5 in
  444. several projects.
  445. [[clustering-with-wildfly-undertow]]
  446. Clustering with Wildfly Undertow
  447. ++++++++++++++++++++++++++++++++
  448. Using Wildfly Undertow as a load balancer has several advantages over
  449. Apache Web Server. First, as Undertow comes with your WildFly server,
  450. there is no need to install yet another software for a load balancer.
  451. Then, you can configure Undertow with Java (see Figure 8) which
  452. minimizes the error prone conf file or xml configurations. Finally,
  453. using the same vendor for application servers and for a load balancer
  454. reduces the risk of intercompatibility issues. The clustering setup for
  455. Wildfly Undertow is presented below. We are using sticky session
  456. management to maximize performance.
  457. 1. Install Wildfly 9 to all nodes
  458. 2. Configure Wildfly’s standalone.xml
  459. 1. add `“instance-id=”node-id”` parameter undertow subsystem, e.g:
  460. `<subsystem xmlns="urn:jboss:domain:undertow:2.0" instance-id="node1"> `(this
  461. is needed for the sticky sessions).
  462. 2. set http port to something else than 8080 in socket-binding-group,
  463. e.g: `<socket-binding name="http" port="${jboss.http.port:8081}"/>`
  464. 3. Start your node servers accepting all ip addresses:
  465. `./standalone.sh -c standalone.xml -b=0.0.0.0`
  466. 4. Code your own load balancer (reverse proxy) with Java and Undertow
  467. libraries (see Figure 9) and start it as a Java application.
  468. [source,java]
  469. ....
  470. public static void main(final String[] args) {
  471. try {
  472. LoadBalancingProxyClient loadBalancer = new LoadBalancingProxyClient()
  473. .addHost(new URI("http://192.168.2.86:8081"),"node1")
  474. .addHost(new URI("http://192.168.2.216:8082"),"node2")
  475. .setConnectionsPerThread(1000);
  476. Undertow reverseProxy = Undertow.builder()
  477. .addHttpListener(8080, "localhost")
  478. .setIoThreads(8)
  479. .setHandler(new ProxyHandler(loadBalancer, 30000, ResponseCodeHandler.HANDLE_404))
  480. .build();
  481. reverseProxy.start();
  482. } catch (URISyntaxException e) {
  483. throw new RuntimeException(e);
  484. }
  485. }
  486. ....
  487. _Figure 9: Simple load balancer with two nodes and sticky sessions._
  488. [[database]]
  489. Database
  490. ~~~~~~~~
  491. In most cases, the database is the most common and also the most tricky
  492. to optimize. Typically you’ll have to think about your database usage
  493. before you actually need to start optimizing the memory and CPU as shown
  494. above. We assume here that you use object to relational mapping
  495. frameworks such as Hibernate or Eclipselink. These frameworks implement
  496. several optimization techniques within, which are not discussed here,
  497. although you might need those if you are using plain old JDBC.
  498. Typically profiling tools are needed to investigate how much the
  499. database is limiting the scalability of your application, but as a rule
  500. of thumb: the more you can avoid accessing the database, the less it
  501. limits the scalability. Consequently, you should generally cache static
  502. (or rarely changing) database content.
  503. [[remedies-2]]
  504. Remedies
  505. ^^^^^^^^
  506. [[analyze-and-optimize-performance-bottlenecks-1]]
  507. Analyze and optimize performance bottlenecks
  508. ++++++++++++++++++++++++++++++++++++++++++++
  509. We already discussed shortly, how to use Java VisualVM for finding CPU
  510. bottlenecks. These same instructions also apply for finding out at what
  511. level the database consumes the performance. Typically you have several
  512. Repository-classes (e.g. `CustomerRepository`) in your web application,
  513. used for CRUD (create, read, update, delete) operations (e.g.
  514. `createCustomer`). Commonly your repository implementations either
  515. extend Spring’s JPARepository or use `javax.persistence.EntityManager`
  516. or Spring’s `Datasource` for the database access. Thus, when profiling,
  517. you will probably see one or more of those database access methods in
  518. the list of methods that are using most of your CPU’s capacity.
  519. According to our experience, one of the bottlenecks might be that small
  520. database queries (e.g. `findTaskForTheDay`) are executed repeatedly
  521. instead of doing more in one query (e.g. `findTasksForTheWeek`). In some
  522. other cases, it might be vice versa: too much information is fetched and
  523. only part of it is used (e.g. `findAllTheTasks`). A real life example of
  524. the latter happened recently in a customer project, where we were able
  525. to a gain significant performance boost just by using JPA Projections to
  526. leave out unnecessary attributes of an entity (e.g. finding only Task’s
  527. name and id) in a query.
  528. [[custom-caching-and-query-optimization]]
  529. Custom caching and Query optimization
  530. +++++++++++++++++++++++++++++++++++++
  531. After performance profiling, you have typically identified a few queries
  532. that are taking a big part of the total CPU time. A part of those
  533. queries might be the ones that are relatively fast as a single query but
  534. they are just done hundreds or thousands of times. Another part of
  535. problematic queries are those that are heavy as is. Moreover, there is
  536. also the __N__+1 query problem, when, for example, a query for fetching
  537. a Task entity results __N__ more queries for fetching one-to-many
  538. members (e.g. assignees, subtasks, etc.) of the Task.
  539. The queries of the first type might benefit from combining to bigger
  540. queries as discussed in the previous subchapter (use
  541. `findTasksForTheWeek` instead of `findTaskForTheDay`). I call this
  542. approach custom caching. This approach typically requires changes in
  543. your business logic too: you will need to store (cache) yet unneeded
  544. entities, for example in a `HashMap` or `List` and then handle all these
  545. entities sequentially.
  546. The queries of the second type are typically harder to optimize.
  547. Typically slow queries can be optimized by adding a certain index or
  548. changing the query logic into a little bit different form. The difficult
  549. part is to figure out what exactly makes the query slow. I recommend
  550. using a logging setting that shows the actual sql query made in your log
  551. file or console (e.g. in Hibernate use `show_sql=true`). Then you can
  552. take the query and run it against your database and try to vary it and
  553. see how it behaves. You can even use the `EXPLAIN` keyword to ask MySQL
  554. or PostgreSql (`EXPLAIN PLAN FOR` in Oracle and `SHOWPLAN_XML` in SQL
  555. Server) to explain how the query is executed, what indexes are used etc.
  556. The __N__+1 queries can be detected by analysing the executed sqls in
  557. the log file. The first solution for the issue is redesigning the
  558. problematic query to use appropriate join(s) to make it fetch all the
  559. members in a single sql query. Sometimes, it might be enough to use
  560. `FetchType.EAGER` instead of `LAZY` for the problematic cases. Yet
  561. another possibility could be your own custom caching as discussed above.
  562. [[second-level-cache]]
  563. Second-level cache
  564. ++++++++++++++++++
  565. According to Oracle’s Java EE Tutorial: a second-level cache is a local
  566. store of entities managed by the persistence provider. It is used to
  567. improve the application performance. A second-level cache helps to avoid
  568. expensive database queries by keeping frequently used entities in the
  569. cache. It is especially useful when you update your database only by
  570. your persistence provider (Hibernate or Eclipselink), you read the
  571. cached entities much more often than you update them, and you have not
  572. clustered your database.
  573. There are different second-level cache vendors such as EHCache, OSCache,
  574. and SwarmCache for Hibernate. You can find several tutorials for these
  575. online. One thing to keep in mind is that the configuration of, for
  576. example, EHCache varies whether you use Spring or not. Our experience of
  577. the benefits of second-level caches this far is that in real world
  578. applications the benefits might be surprisingly low. The benefit gain
  579. depends highly on how much your application uses the kind of data from
  580. the database that is mostly read-only and rarely updated.
  581. [[use-clustering-2]]
  582. Use clustering
  583. ++++++++++++++
  584. There are two common options for clustering or replication of the
  585. database: master-master replication and master-slave replication. In the
  586. master-master scheme any node in the cluster can update the database,
  587. whereas in the master-slave scheme only the master is updated and the
  588. change is distributed to the slave nodes right after that. Most
  589. relational database management systems support at least the master-slave
  590. replication. For instance, in MySql and PostgreSQL, you can enable it by
  591. few configuration changes and by granting the appropriate master rights
  592. for replication. You can find several step-by-step tutorials online by
  593. searching with e.g. the keywords “postgresql master slave replication”.
  594. [[nosql]]
  595. NoSQL
  596. +++++
  597. When looking back to the first figure (Figure 1) of the article, you
  598. might wonder what kind of database solutions the world's biggest web
  599. application’s use? Most of them use some relation database, partly, and
  600. have a NoSQL database (such as Cassandra, MongoDB, and Memcached) for
  601. some of the functionality. The big benefit of many NoSQL solutions is
  602. that they are typically easier to cluster, and thus help one to achieve
  603. extremely scalable web applications. The whole topic of using NoSQL is
  604. so big that we do not have the possibility to discuss it in this
  605. article.
  606. [[summary]]
  607. Summary
  608. -------
  609. We started the study by looking at typical applications and estimated
  610. their average concurrent user number. We then started with a typical
  611. Vaadin web application and looked at what bottlenecks we hit on the way,
  612. by using a standard laptop. We discussed different ways of overcoming
  613. everything from File Descriptors to Session size minimization, all the
  614. way to Garbage collection tweaking and clustering your entire
  615. application. At the end of the day, there are several issues that could
  616. gap you applications scalability, but as shown in this study, with a few
  617. fairly simple steps we can scale the app from 200 concurrent users to
  618. 3000 concurrent users. As a standard architectural answer, however: the
  619. results in your environment might be different, so use tools discussed
  620. in this paper to find your bottlenecks and iron them out.