You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

ScalableWebApplications.asciidoc 33KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728
  1. ---
  2. title: Scalable Web Applications
  3. order: 22
  4. layout: page
  5. ---
  6. [[scalable-web-applications]]
  7. = Scalable web applications
  8. [[introduction]]
  9. Introduction
  10. ^^^^^^^^^^^^
  11. Whether you are creating a new web application or maintaining an
  12. existing one, one thing you certainly will consider is the scalability
  13. of your web application. Scalability is your web application’s ability
  14. to handle a growing number of concurrent users. How many concurrent
  15. users can, for instance, your system serve at its peak usage? Will it be
  16. intended for a small scale intranet usage of tens to hundreds of
  17. concurrent users, or do you plan to reach for a medium size global web
  18. application such as 500px.com with about 1000-3000 concurrent users. Or
  19. is your target even higher? You might wonder how many concurrent users
  20. there are in Facebook or Linkedin. We wonder about that too, and thus
  21. made a small study to estimate it. You can see the results in Figure 1
  22. below. The estimates are derived from monthly visitors and the average
  23. visit time data found from Alexa.com. 
  24. image:img/webusers.png[image]
  25. _Figure 1: Popular web applications with estimated number of concurrent
  26. users._
  27. The purpose of this article is to show you common pain points which tend
  28. to decrease the scalability of your web applications and help you find
  29. out ways to overcome those. We begin by introducing you to our example
  30. application. We will show you how to test the scalability of this
  31. example application with Gatling, a tool for stress testing your web
  32. application. Then, in the next chapters, we will go through some pain
  33. points of scalability, such as memory, CPU, and database, and see how to
  34. overcome these.
  35. [[book-store-inventory-application]]
  36. Book store inventory application
  37. --------------------------------
  38. Throughout this example we’ll use a book store inventory application
  39. (see Figure 2) as an example application when deep diving into the world
  40. of web application scalability. The application contains a login view
  41. and an inventory view with CRUD operations to a mockup data source. It
  42. also has the following common web application features: responsive
  43. layouts, navigation, data listing and master detail form editor. This
  44. application is publicly available as a Maven archetype
  45. (`vaadin-archetype-application-example`). We will first test how many
  46. concurrent users the application can serve on a single server.
  47. image:img/mockapp-ui.png[image]
  48. _Figure 2: Book store inventory application_
  49. The purpose of scalability testing is to verify whether the
  50. application's server side can survive with a predefined number of
  51. concurrent users or not. We can utilize a scalability test to find the
  52. limits, a breaking point or server side bottlenecks of the application.
  53. There are several options for scalability testing web applications. One
  54. of the most used free tools is Apache JMeter. JMeter suits well for
  55. testing web applications, as long as the client-server communication
  56. does not use websockets. When using asynchronous websocket
  57. communication, one can use the free Gatling tool or the commercial
  58. NeoLoad tool.
  59. You can find a lot of step by step tutorials online on how to use
  60. Gatling and JMeter. There is typically a set of tips and tricks that one
  61. should take into account when doing load testing on certain web
  62. frameworks, such as the open source Vaadin Framework. For more
  63. information on Vaadin specific tutorials, check the wiki pages on
  64. https://vaadin.com/scalability[vaadin.com/scalability].
  65. Gatling and JMeter can be used to record client to server requests of a
  66. web application. After recording, the recorded requests can be played
  67. back by numbers of concurrent threads. The more threads (virtual users)
  68. you use the higher the simulated load generated on the tested
  69. application.
  70. Since we want to test our application both in synchronous and
  71. asynchronous communication modes, we will use Gatling. Another benefit
  72. of Gatling compared to JMeter is that it is less heavy for a testing
  73. server, thus more virtual users can be simulated on a single testing
  74. server. Figure 3 shows the Gatling settings used to record the example
  75. scenario of the inventory application. Typically all static resources
  76. are excluded from the recording (see left bottom corner of the figure),
  77. since these are typically served from a separate http server such as
  78. Nginx or from a CDN (Content Delivery Network) service. In our first
  79. test, however, we still recorded these requests to see the worst case
  80. situation, where all requests are served from a single application
  81. server.
  82. image:img/figure3s2.png[image]
  83. _Figure 3: Gatling recorder._
  84. Gatling gathers the recorded requests as text files and composes a Scala
  85. class which is used to playback the recorded test scenario. We planned a
  86. test scenario for a typical inventory application user: The user logs in
  87. and performs several updates (in our case 11) to the store in a
  88. relatively short time span (3 minutes). We also assumed that they leave
  89. the inventory application open in their browser which will result in the
  90. HttpSession not closing before a session timeout (30min in our case).
  91. Let’s assume that we have an extremely large bookstore with several
  92. persons (say 10000) responsible for updating the inventory. If one
  93. person updates the inventory 5 times a day, and an update session takes
  94. 3 minutes, then with the same logic as we calculated concurrent users in
  95. the Introduction, we will get a continuous load of about 100 concurrent
  96. users. This is of course not a realistic assumption unless the
  97. application is a global service or it is a round-the-clock used local
  98. application, such as a patient information system. For testing purposes
  99. this is, however, a good assumption.
  100. A snippet from the end of our test scenario is shown in Figure 4. This
  101. test scenario is configured to be run with 100 concurrent users, all
  102. started within 60 seconds (see the last line of code).
  103. [source,scala]
  104. ....
  105. .pause(9)
  106. .exec(http(>"request_45")
  107. .post(>"/test/UIDL/?v-uiId=0")
  108. .headers(headers_9)
  109. .body(RawFileBody(>"RecordedSimulation_0045_request.txt")))
  110. .pause(5)
  111. .exec(http(>"request_46")
  112. .post(>"/test/UIDL/?v-uiId=0")
  113. .headers(headers_9)
  114. .body(RawFileBody(>"RecordedSimulation_0046_request.txt"))
  115. .resources(http(>"request_47")
  116. .post(uri1 + >"/UIDL/?v-uiId=0")
  117. .headers(headers_9)
  118. .body(RawFileBody(>"RecordedSimulation_0047_request.txt"))))}
  119. setUp(scn.inject(rampUsers(100) over (60 seconds))).protocols(httpProtocol)
  120. ....
  121. _Figure 4: Part of the test scenario of inventory application._
  122. To make the test more realistic, we would like to execute it several
  123. times. Without repeating we do not get a clear picture of how the server
  124. will tolerate a continuous high load. In a Gatling test script, this is
  125. achieved by wrapping the test scenario into a repeat loop. We should
  126. also flush session cookies to ensure that a new session is created for
  127. each repeat. See the second line of code in Figure 5, for an example of
  128. how this could be done.
  129. [source,scala]
  130. ....
  131. val scn = scenario("RecordedSimulation")
  132. .repeat(100,"n"){exec(flushSessionCookies).exec(http("request_0")
  133. .get("/test/")
  134. .resources(http("request_1")
  135. .post(uri1 + "/?v-1440411375172")
  136. .headers(headers_4)
  137. .formParam("v-browserDetails", "1")
  138. .formParam("theme", "mytheme")
  139. .formParam("v-appId", "test-3556498")
  140. ....
  141. _Figure 5: Repeating 100 times with session cookie flushing (small part
  142. of whole script)_
  143. We tested how well this simple example application tolerated several
  144. concurrent users. We deployed our application in Apache Tomcat 8.0.22 on
  145. a Windows 10 machine with Java 1.7.0 and an older quad core mobile Intel
  146. i7 processor. With its default settings (using the default heap size of
  147. 2GB), Tomcat was able to handle about 200 concurrent users for a longer
  148. time. The CPU usage for that small number of concurrent users was not a
  149. problem (it was lower than 5%), but the server configurations were a
  150. bottleneck. Here we stumbled upon the first scalability pain point:
  151. server configuration issues (see next chapter). It might sound
  152. surprising that we could only run a such small number of concurrent
  153. users by default, but do not worry, we are not stuck here. We will see
  154. the reasons for this and other scalability pain points in the following
  155. chapter.
  156. [[scalability-pain-points]]
  157. Scalability pain points
  158. -----------------------
  159. We will go through typical pain points of a web application developer
  160. which she (or he) will encounter when developing a web application for
  161. hundreds of concurrent users. Each pain point is introduced in its own
  162. subchapter and followed by typical remedies.
  163. [[server-configuration-issues]]
  164. Server configuration issues
  165. ~~~~~~~~~~~~~~~~~~~~~~~~~~~
  166. One typical problem that appears when there are lots of concurrent
  167. users, is that the operating system (especially the *nix based ones) run
  168. out of file descriptors. This happens since most *nix systems have a
  169. pretty low default limit for the maximum number of open files, such as
  170. network connections. This is usually easy to fix with the `ulimit`
  171. command though sometimes it might require configuring the `sysctl` too.
  172. A little bit unexpected issues can also surface with network bandwidth.
  173. Our test laptop was on a wireless connection and its sending bandwidth
  174. started choking at about 300 concurrent users. (Please note that we use
  175. an oldish laptop in this entire test to showcase the real scalability of
  176. web apps –your own server environment will no doubt be even more
  177. scalable even out of the box.) One part of this issue was the wifi and
  178. another part was that we served the static resources, such as javascript
  179. files, images and stylesheets, from Tomcat. At this point we stripped
  180. the static resources requests out of our test script to simulate the
  181. situation where those are served from a separate http server, such as
  182. nginx. Please read the blog post
  183. “https://vaadin.com/blog/-/blogs/optimizing-hosting-setup[Optimizing
  184. hosting setup]” from our website for more information about the topic.
  185. Another quite typical configuration issue is that the application server
  186. is not configured for a large number of concurrent users. In our
  187. example, a symptom of this was that the server started rejecting
  188. (“Request timed out”) new connections after a while, even though there
  189. were lots of free memory and CPU resources available.
  190. After we configured our Apache Tomcat for high concurrent mode and
  191. removed static resource requests, and connected the test laptop into a
  192. wired network, we were able to push the number of concurrent users from
  193. 200 up to about 500 users. Our configuration changes into the server.xml
  194. of Tomcat are shown in Figure 6, where we define a maximum thread count
  195. (10240), an accepted threads count (4096), and a maximum number of
  196. concurrent connections (4096).
  197. image:img/figure6a.png[image]
  198. _Figure 6: Configuring Tomcat’s default connector to accept a lot of
  199. concurrent users._
  200. The next pain point that appeared with more than 500 users was that we
  201. were out of memory. The default heap size of 2GB eventually ran out with
  202. such high number of concurrent users. On the other hand, there was still
  203. a lot of CPU capacity available, since the average load was less than
  204. 5%.
  205. [[out-of-memory]]
  206. Out of memory
  207. ~~~~~~~~~~~~~
  208. Insufficient memory is possibly the most common problem that limits the
  209. scalability of a web application with a state. An http session is used
  210. typically to store the state of a web application for its user. In
  211. Vaadin an http session is wrapped into a `VaadinSession`. A
  212. VaadinSession contains the state (value) of each component (such as
  213. `Grid`, `TextFields` etc.) of the user interface. Thus,
  214. straightforwardly the more components and views you have in your Vaadin
  215. web application, the bigger is the size of your session.
  216. In our inventory application, each session takes about 0.3MB of memory
  217. which is kept in memory until the session finally closes and the garbage
  218. collectors free the resources. The session size in our example is a
  219. little bit high. With constant load of 100 concurrent users, a session
  220. timeout of 30 minutes and an average 3 minutes usage time, the expected
  221. memory usage is about 350MB. To see how the session size and the number
  222. of concurrent users affect the needed memory in our case, we made a
  223. simple analysis which results are shown in Figure 7. We basically
  224. calculated how many sessions there can exist at most, by calculating how
  225. many users there will be within an average usage time plus the session
  226. timeout.
  227. image:img/figure6s.png[image]
  228. _Figure 7: Memory need for varying size sessions and a different number
  229. of concurrent users._
  230. [[remedies]]
  231. Remedies
  232. ^^^^^^^^
  233. [[use-more-memory]]
  234. Use more memory
  235. +++++++++++++++
  236. This might sound simplistic, but many times it might be enough to just
  237. add as much memory as possible to the server. Modern servers and server
  238. operating systems have support for hundreds of gigabytes of physical
  239. memory. For instance, again in our example, if the size of a session
  240. would be 0.5MB and we had 5000 concurrent users, the memory need would
  241. be about 28GB.
  242. You also have to take care that your application server is configured to
  243. reserve enough memory. For example, the default heap size for Java is
  244. typically 2GB and for example Apache Tomcat will not reserve more memory
  245. if you do not ask it to do it with **`-Xmx`** JVM argument. You might
  246. need a special JVM for extremely large heap sizes. We used the following
  247. Java virtual machine parameters in our tests:
  248. ....
  249. -Xms5g -Xmx5g -Xss512k -server
  250. ....
  251. The parameters **`-Xms`** and **`-Xmx`** are for setting the minimum and
  252. the maximum heap size for the server (5 GB in the example), the `-Xss`
  253. is used to reduce the stack size of threads to save memory (typically
  254. the default is 1MB for 64bit Java) and the `-server` option tells JVM
  255. that the Java process is a server.
  256. [[minimize-the-size-of-a-session]]
  257. Minimize the size of a session
  258. ++++++++++++++++++++++++++++++
  259. The biggest culprit for the big session size in the inventory
  260. application is the container (BeanItemContainer) which is filled with
  261. all items of the database. Containers, and especially the built in fully
  262. featured BeanItemContainer, are typically the most memory hungry parts
  263. of Vaadin applications. One can either reduce the number of items loaded
  264. in the container at one time or use some lightweight alternatives
  265. available from Vaadin Directory
  266. (https://vaadin.com/directory[vaadin.com/directory]) such as Viritin,
  267. MCont, or GlazedLists Vaadin Container. Another approach is to release
  268. containers and views to the garbage collection e.g. every time the user
  269. switches into another view, though that will slightly increase the CPU
  270. load since the views and containers have to be rebuilt again, if the
  271. user returns to the view. The feasibility of this option is up to your
  272. application design and user flow –usually it’s a good choice.
  273. [[use-a-shorter-session-time-out]]
  274. Use a shorter session time out
  275. ++++++++++++++++++++++++++++++
  276. Since every session in the memory reserves it for as long as it stays
  277. there, the shorter the session timeout is, the quicker the memory is
  278. freed. Assuming that the average usage time is much shorter than the
  279. session timeout, we can state that halving the session timeout
  280. approximately halves the memory need, too. Another way to reduce the
  281. session’s time in the memory could be instructing users to logout after
  282. they are done.
  283. The session of a Vaadin application is kept alive by requests (such as
  284. user interactions) made from the client to the server. Besides user
  285. interaction, the client side of Vaadin application sends a heartbeat
  286. request into the server side, which should keep the session alive as
  287. long as the browser window is open. To override this behaviour and to
  288. allow closing idle sessions, we recommend that the `closeIdleSessions`
  289. parameter is used in your servlet configuration. For more details, see
  290. chapter
  291. https://vaadin.com/book/-/page/application.lifecycle.html[Application
  292. Lifecycle] in the Book of Vaadin.
  293. [[use-clustering]]
  294. Use clustering
  295. ++++++++++++++
  296. If there is not enough memory, for example if there is no way to reduce
  297. the size of a session and the application needs a very long session
  298. timeout, then there is only one option left: clustering. We will discuss
  299. clustering later in the Out of CPU chapter since clustering is more
  300. often needed for increasing CPU power.
  301. [[out-of-cpu]]
  302. Out of CPU
  303. ~~~~~~~~~~
  304. We were able to get past the previous limit of 500 concurrent users by
  305. increasing the heap size of Tomcat to 5GB and reducing the session
  306. timeout to 10 minutes. Following the memory calculations above, we
  307. should theoretically be able to serve almost 3000 concurrent users with
  308. our single server, if there is enough CPU available.
  309. Although the average CPU load was rather low (about 10%) still with 800
  310. concurrent users, it jumped up to 40% every now and then for several
  311. seconds as the garbage collector cleaned up unused sessions etc. That is
  312. also the reason why one should not plan to use full CPU capacity of a
  313. server since that will increase the garbage collection time in worst
  314. case even to tens of seconds, while the server will be completely
  315. unresponsive for that time. We suggest that if the average load grows to
  316. over 50% of the server’s capacity, other means have to be taken into use
  317. to decrease the load of the single server.
  318. We gradually increased the number of concurrent users to find out the
  319. limits of our test laptop and Tomcat. After trial and error, we found
  320. that the safe number of concurrent users for our test laptop was about
  321. 1700. Above that, several request timeout events occurred even though
  322. the CPU usage was about 40-50% of total capacity. We expect that using a
  323. more powerful server, we could have reached 2000-3000 concurrent users
  324. quite easily.
  325. [[remedies-1]]
  326. Remedies
  327. ^^^^^^^^
  328. [[analyze-and-optimize-performance-bottlenecks]]
  329. Analyze and optimize performance bottlenecks
  330. ++++++++++++++++++++++++++++++++++++++++++++
  331. If you are not absolutely sure about the origin of the high CPU usage,
  332. it is always good to verify it with a performance profiling tool. There
  333. are several options for profiling, such as JProfiler, XRebel, and Java
  334. VisualVM. We will use VisualVM in this case since it comes freely with
  335. every (Oracle’s) JDK since the version 1.5.
  336. Our typical procedure goes like this: 1. Deploy your webapp and start
  337. your server, 2. Start VisualVM and double click your server’s process
  338. (“e.g. Tomcat (pid 1234)”) on the Applications tab (see Figure 8), 3.
  339. Start your load test script with, for instance, 100 concurrent users, 4.
  340. Open the Sampler tab to see where the CPU time is spent, 5. Use the
  341. filter on the bottom to show the CPU usage of your application (e.g.
  342. “`biz.mydomain.projectx`”) and possible ORM (Object-relational mapping)
  343. framework (e.g. “`org.hibernate`”) separately.
  344. Typically, only a small part (e.g. 0.1 - 2 %) of CPU time is spent on
  345. the classes of your webapp, if your application does not contain heavy
  346. business logic. Also, CPU time spent on the classes of Vaadin should be
  347. very small (e.g. 1%). You can be relaxed about performance bottlenecks
  348. of your code if the most time (>90%) is spent on application server’s
  349. classes (e.g. “`org.apache.tomcat`”).
  350. Unfortunately, quite often database functions and ORM frameworks take a
  351. pretty big part of CPU time. We will discuss how to tackle heavy
  352. database operations in the Database chapter below.
  353. image:img/figure7s.png[image]
  354. _Figure 8: Profiling CPU usage of our inventory application with Java
  355. VisualVM_
  356. [[use-native-application-server-libraries]]
  357. Use native application server libraries
  358. +++++++++++++++++++++++++++++++++++++++
  359. Some application servers (at least Tomcat and Wildfly) allow you to use
  360. native (operating system specific) implementation of certain libraries.
  361. For example, The Apache Tomcat Native Library gives Tomcat access to
  362. certain native resources for performance and compatibility. Here we
  363. didn’t test the effect of using native libraries instead of standard
  364. ones. With little online research, it seems that the performance benefit
  365. of native libraries for Tomcat is visible only if using secured https
  366. connections.
  367. [[fine-tune-java-garbage-collection]]
  368. Fine tune Java garbage collection
  369. +++++++++++++++++++++++++++++++++
  370. We recommended above not to strain a server more than 50% of its total
  371. CPU capacity. The reason was that above that level, a garbage collection
  372. pause tends to freeze the server for too long a time. That is because it
  373. typically starts not before almost all of the available heap is already
  374. spent and then it does the full collection. Fortunately, it is possible
  375. to tune the Java garbage collector so that it will do its job in short
  376. periods. With little online study, we found the following set of JVM
  377. parameters for web server optimized garbage collection
  378. ....
  379. -XX:+UseCMSInitiatingOccupancyOnly
  380. -XX:CMSInitiatingOccupancyFraction=70
  381. ....
  382. The first parameter prevents Java from using its default garbage
  383. collection strategy and makes it use CMS (concurrent-mark-sweep)
  384. instead. The second parameter tells at which level of “occupancy” the
  385. garbage collection should be started. The value 70% for the second
  386. parameter is typically a good choice but for optimal performance it
  387. should be chosen carefully for each environment e.g. by trial and error.
  388. The CMS collector should be good for heap sizes up to about 4GB. For
  389. bigger heaps there is the G1 (Garbage first) collector that was
  390. introduced in JDK 7 update 4. G1 collector divides the heap into regions
  391. and uses multiple background threads to first scan regions that contain
  392. the most of garbage objects. Garbage first collector is enabled with the
  393. following JVM parameter.
  394. ....
  395. -XX:+UseG1GC
  396. ....
  397. If you are using Java 8 Update 20 or later, and G1, you can optimize the
  398. heap usage of duplicated Strings (i.e. their internal `char[]` arrays)
  399. with the following parameter.
  400. ....
  401. -XX:+UseStringDeduplication
  402. ....
  403. [[use-clustering-1]]
  404. Use clustering
  405. ++++++++++++++
  406. We have now arrived at the point where a single server cannot fulfill
  407. our scalability needs whatever tricks we have tried. If a single server
  408. is not enough for serving all users, obviously we have to distribute
  409. them to two or more servers. This is called clustering.
  410. Clustering has more benefits than simply balancing the load between two
  411. or more servers. An obvious additional benefit is that we do not have to
  412. trust a single server. If one server dies, the user can continue on the
  413. other server. In worst case, the user loses her session and has to log
  414. in again, but at least she is not left without the service. You probably
  415. have heard the term “session replication” before. It means that the
  416. user’s session is copied into other servers (at least into one other) of
  417. the cluster. Then, if the server currently used by the user goes down,
  418. the load balancer sends subsequent requests to another server and the
  419. user should not notice anything.
  420. We will not cover session replication in this article since we are
  421. mostly interested in increasing the ability to serve more and more
  422. concurrent users with our system. We will show two ways to do clustering
  423. below, first with Apache WebServer and Tomcats and then with the Wildfly
  424. Undertow server.
  425. [[clustering-with-apache-web-server-and-tomcat-nodes]]
  426. Clustering with Apache Web Server and Tomcat nodes
  427. ++++++++++++++++++++++++++++++++++++++++++++++++++
  428. Traditionally Java web application clustering is implemented with one
  429. Apache Web Server as a load balancer and 2 or more Apache Tomcat servers
  430. as nodes. There are a lot of tutorials online, thus we will just give a
  431. short summary below.
  432. 1. Install Tomcat for each node
  433. 2. Configure unique node names with jvmRoute parameter to each Tomcat’s
  434. server.xml
  435. 3. Install Apache Web Server to load balancer node
  436. 4. Edit Apache’s httpd.conf file to include mod_proxy, mod_proxy_ajp,
  437. and mod_proxy_balancer
  438. 5. Configure balancer members with node addresses and load factors into
  439. end of httpd.conf file
  440. 6. Restart servers
  441. There are several other options (free and commercial ones) for the load
  442. balancer, too. For example, our customers have used at least F5 in
  443. several projects.
  444. [[clustering-with-wildfly-undertow]]
  445. Clustering with Wildfly Undertow
  446. ++++++++++++++++++++++++++++++++
  447. Using Wildfly Undertow as a load balancer has several advantages over
  448. Apache Web Server. First, as Undertow comes with your WildFly server,
  449. there is no need to install yet another software for a load balancer.
  450. Then, you can configure Undertow with Java (see Figure 8) which
  451. minimizes the error prone conf file or xml configurations. Finally,
  452. using the same vendor for application servers and for a load balancer
  453. reduces the risk of intercompatibility issues. The clustering setup for
  454. Wildfly Undertow is presented below. We are using sticky session
  455. management to maximize performance.
  456. 1. Install Wildfly 9 to all nodes
  457. 2. Configure Wildfly’s standalone.xml
  458. 1. add `“instance-id=”node-id”` parameter undertow subsystem, e.g:
  459. `<subsystem xmlns="urn:jboss:domain:undertow:2.0" instance-id="node1"> `(this
  460. is needed for the sticky sessions).
  461. 2. set http port to something else than 8080 in socket-binding-group,
  462. e.g: `<socket-binding name="http" port="${jboss.http.port:8081}"/>`
  463. 3. Start your node servers accepting all ip addresses:
  464. `./standalone.sh -c standalone.xml -b=0.0.0.0`
  465. 4. Code your own load balancer (reverse proxy) with Java and Undertow
  466. libraries (see Figure 9) and start it as a Java application.
  467. [source,java]
  468. ....
  469. public static void main(final String[] args) {
  470. try {
  471. LoadBalancingProxyClient loadBalancer = new LoadBalancingProxyClient()
  472. .addHost(new URI("http://192.168.2.86:8081"),"node1")
  473. .addHost(new URI("http://192.168.2.216:8082"),"node2")
  474. .setConnectionsPerThread(1000);
  475. Undertow reverseProxy = Undertow.builder()
  476. .addHttpListener(8080, "localhost")
  477. .setIoThreads(8)
  478. .setHandler(new ProxyHandler(loadBalancer, 30000, ResponseCodeHandler.HANDLE_404))
  479. .build();
  480. reverseProxy.start();
  481. } catch (URISyntaxException e) {
  482. throw new RuntimeException(e);
  483. }
  484. }
  485. ....
  486. _Figure 9: Simple load balancer with two nodes and sticky sessions._
  487. [[database]]
  488. Database
  489. ~~~~~~~~
  490. In most cases, the database is the most common and also the most tricky
  491. to optimize. Typically you’ll have to think about your database usage
  492. before you actually need to start optimizing the memory and CPU as shown
  493. above. We assume here that you use object to relational mapping
  494. frameworks such as Hibernate or Eclipselink. These frameworks implement
  495. several optimization techniques within, which are not discussed here,
  496. although you might need those if you are using plain old JDBC.
  497. Typically profiling tools are needed to investigate how much the
  498. database is limiting the scalability of your application, but as a rule
  499. of thumb: the more you can avoid accessing the database, the less it
  500. limits the scalability. Consequently, you should generally cache static
  501. (or rarely changing) database content.
  502. [[remedies-2]]
  503. Remedies
  504. ^^^^^^^^
  505. [[analyze-and-optimize-performance-bottlenecks-1]]
  506. Analyze and optimize performance bottlenecks
  507. ++++++++++++++++++++++++++++++++++++++++++++
  508. We already discussed shortly, how to use Java VisualVM for finding CPU
  509. bottlenecks. These same instructions also apply for finding out at what
  510. level the database consumes the performance. Typically you have several
  511. Repository-classes (e.g. `CustomerRepository`) in your web application,
  512. used for CRUD (create, read, update, delete) operations (e.g.
  513. `createCustomer`). Commonly your repository implementations either
  514. extend Spring’s JPARepository or use `javax.persistence.EntityManager`
  515. or Spring’s `Datasource` for the database access. Thus, when profiling,
  516. you will probably see one or more of those database access methods in
  517. the list of methods that are using most of your CPU’s capacity.
  518. According to our experience, one of the bottlenecks might be that small
  519. database queries (e.g. `findTaskForTheDay`) are executed repeatedly
  520. instead of doing more in one query (e.g. `findTasksForTheWeek`). In some
  521. other cases, it might be vice versa: too much information is fetched and
  522. only part of it is used (e.g. `findAllTheTasks`). A real life example of
  523. the latter happened recently in a customer project, where we were able
  524. to a gain significant performance boost just by using JPA Projections to
  525. leave out unnecessary attributes of an entity (e.g. finding only Task’s
  526. name and id) in a query.
  527. [[custom-caching-and-query-optimization]]
  528. Custom caching and Query optimization
  529. +++++++++++++++++++++++++++++++++++++
  530. After performance profiling, you have typically identified a few queries
  531. that are taking a big part of the total CPU time. A part of those
  532. queries might be the ones that are relatively fast as a single query but
  533. they are just done hundreds or thousands of times. Another part of
  534. problematic queries are those that are heavy as is. Moreover, there is
  535. also the __N__+1 query problem, when, for example, a query for fetching
  536. a Task entity results __N__ more queries for fetching one-to-many
  537. members (e.g. assignees, subtasks, etc.) of the Task.
  538. The queries of the first type might benefit from combining to bigger
  539. queries as discussed in the previous subchapter (use
  540. `findTasksForTheWeek` instead of `findTaskForTheDay`). I call this
  541. approach custom caching. This approach typically requires changes in
  542. your business logic too: you will need to store (cache) yet unneeded
  543. entities, for example in a `HashMap` or `List` and then handle all these
  544. entities sequentially.
  545. The queries of the second type are typically harder to optimize.
  546. Typically slow queries can be optimized by adding a certain index or
  547. changing the query logic into a little bit different form. The difficult
  548. part is to figure out what exactly makes the query slow. I recommend
  549. using a logging setting that shows the actual sql query made in your log
  550. file or console (e.g. in Hibernate use `show_sql=true`). Then you can
  551. take the query and run it against your database and try to vary it and
  552. see how it behaves. You can even use the `EXPLAIN` keyword to ask MySQL
  553. or PostgreSql (`EXPLAIN PLAN FOR` in Oracle and `SHOWPLAN_XML` in SQL
  554. Server) to explain how the query is executed, what indexes are used etc.
  555. The __N__+1 queries can be detected by analysing the executed sqls in
  556. the log file. The first solution for the issue is redesigning the
  557. problematic query to use appropriate join(s) to make it fetch all the
  558. members in a single sql query. Sometimes, it might be enough to use
  559. `FetchType.EAGER` instead of `LAZY` for the problematic cases. Yet
  560. another possibility could be your own custom caching as discussed above.
  561. [[second-level-cache]]
  562. Second-level cache
  563. ++++++++++++++++++
  564. According to Oracle’s Java EE Tutorial: a second-level cache is a local
  565. store of entities managed by the persistence provider. It is used to
  566. improve the application performance. A second-level cache helps to avoid
  567. expensive database queries by keeping frequently used entities in the
  568. cache. It is especially useful when you update your database only by
  569. your persistence provider (Hibernate or Eclipselink), you read the
  570. cached entities much more often than you update them, and you have not
  571. clustered your database.
  572. There are different second-level cache vendors such as EHCache, OSCache,
  573. and SwarmCache for Hibernate. You can find several tutorials for these
  574. online. One thing to keep in mind is that the configuration of, for
  575. example, EHCache varies whether you use Spring or not. Our experience of
  576. the benefits of second-level caches this far is that in real world
  577. applications the benefits might be surprisingly low. The benefit gain
  578. depends highly on how much your application uses the kind of data from
  579. the database that is mostly read-only and rarely updated.
  580. [[use-clustering-2]]
  581. Use clustering
  582. ++++++++++++++
  583. There are two common options for clustering or replication of the
  584. database: master-master replication and master-slave replication. In the
  585. master-master scheme any node in the cluster can update the database,
  586. whereas in the master-slave scheme only the master is updated and the
  587. change is distributed to the slave nodes right after that. Most
  588. relational database management systems support at least the master-slave
  589. replication. For instance, in MySql and PostgreSQL, you can enable it by
  590. few configuration changes and by granting the appropriate master rights
  591. for replication. You can find several step-by-step tutorials online by
  592. searching with e.g. the keywords “postgresql master slave replication”.
  593. [[nosql]]
  594. NoSQL
  595. +++++
  596. When looking back to the first figure (Figure 1) of the article, you
  597. might wonder what kind of database solutions the world's biggest web
  598. application’s use? Most of them use some relation database, partly, and
  599. have a NoSQL database (such as Cassandra, MongoDB, and Memcached) for
  600. some of the functionality. The big benefit of many NoSQL solutions is
  601. that they are typically easier to cluster, and thus help one to achieve
  602. extremely scalable web applications. The whole topic of using NoSQL is
  603. so big that we do not have the possibility to discuss it in this
  604. article.
  605. [[summary]]
  606. Summary
  607. -------
  608. We started the study by looking at typical applications and estimated
  609. their average concurrent user number. We then started with a typical
  610. Vaadin web application and looked at what bottlenecks we hit on the way,
  611. by using a standard laptop. We discussed different ways of overcoming
  612. everything from File Descriptors to Session size minimization, all the
  613. way to Garbage collection tweaking and clustering your entire
  614. application. At the end of the day, there are several issues that could
  615. gap you applications scalability, but as shown in this study, with a few
  616. fairly simple steps we can scale the app from 200 concurrent users to
  617. 3000 concurrent users. As a standard architectural answer, however: the
  618. results in your environment might be different, so use tools discussed
  619. in this paper to find your bottlenecks and iron them out.