aboutsummaryrefslogtreecommitdiffstats
path: root/documentation/articles/ScalableWebApplications.asciidoc
diff options
context:
space:
mode:
Diffstat (limited to 'documentation/articles/ScalableWebApplications.asciidoc')
-rw-r--r--documentation/articles/ScalableWebApplications.asciidoc723
1 files changed, 723 insertions, 0 deletions
diff --git a/documentation/articles/ScalableWebApplications.asciidoc b/documentation/articles/ScalableWebApplications.asciidoc
new file mode 100644
index 0000000000..83ae8d6ee3
--- /dev/null
+++ b/documentation/articles/ScalableWebApplications.asciidoc
@@ -0,0 +1,723 @@
+[[scalable-web-applications]]
+Scalable web applications
+-------------------------
+
+[[introduction]]
+Introduction
+^^^^^^^^^^^^
+
+Whether you are creating a new web application or maintaining an
+existing one, one thing you certainly will consider is the scalability
+of your web application. Scalability is your web application’s ability
+to handle a growing number of concurrent users. How many concurrent
+users can, for instance, your system serve at its peak usage? Will it be
+intended for a small scale intranet usage of tens to hundreds of
+concurrent users, or do you plan to reach for a medium size global web
+application such as 500px.com with about 1000-3000 concurrent users. Or
+is your target even higher? You might wonder how many concurrent users
+there are in Facebook or Linkedin. We wonder about that too, and thus
+made a small study to estimate it. You can see the results in Figure 1
+below. The estimates are derived from monthly visitors and the average
+visit time data found from Alexa.com. 
+
+image:img/webusers.png[image]
+
+_Figure 1: Popular web applications with estimated number of concurrent
+users._
+
+The purpose of this article is to show you common pain points which tend
+to decrease the scalability of your web applications and help you find
+out ways to overcome those. We begin by introducing you to our example
+application. We will show you how to test the scalability of this
+example application with Gatling, a tool for stress testing your web
+application. Then, in the next chapters, we will go through some pain
+points of scalability, such as memory, CPU, and database, and see how to
+overcome these.
+
+[[book-store-inventory-application]]
+Book store inventory application
+--------------------------------
+
+Throughout this example we’ll use a book store inventory application
+(see Figure 2) as an example application when deep diving into the world
+of web application scalability. The application contains a login view
+and an inventory view with CRUD operations to a mockup data source. It
+also has the following common web application features: responsive
+layouts, navigation, data listing and master detail form editor. This
+application is publicly available as a Maven archetype
+(`vaadin-archetype-application-example`). We will first test how many
+concurrent users the application can serve on a single server.
+
+image:img/mockapp-ui.png[image]
+
+_Figure 2: Book store inventory application_
+
+The purpose of scalability testing is to verify whether the
+application's server side can survive with a predefined number of
+concurrent users or not. We can utilize a scalability test to find the
+limits, a breaking point or server side bottlenecks of the application.
+There are several options for scalability testing web applications. One
+of the most used free tools is Apache JMeter. JMeter suits well for
+testing web applications, as long as the client-server communication
+does not use websockets. When using asynchronous websocket
+communication, one can use the free Gatling tool or the commercial
+NeoLoad tool.
+
+You can find a lot of step by step tutorials online on how to use
+Gatling and JMeter. There is typically a set of tips and tricks that one
+should take into account when doing load testing on certain web
+frameworks, such as the open source Vaadin Framework. For more
+information on Vaadin specific tutorials, check the wiki pages on
+https://vaadin.com/scalability[vaadin.com/scalability].
+
+Gatling and JMeter can be used to record client to server requests of a
+web application. After recording, the recorded requests can be played
+back by numbers of concurrent threads. The more threads (virtual users)
+you use the higher the simulated load generated on the tested
+application.
+
+Since we want to test our application both in synchronous and
+asynchronous communication modes, we will use Gatling. Another benefit
+of Gatling compared to JMeter is that it is less heavy for a testing
+server, thus more virtual users can be simulated on a single testing
+server. Figure 3 shows the Gatling settings used to record the example
+scenario of the inventory application. Typically all static resources
+are excluded from the recording (see left bottom corner of the figure),
+since these are typically served from a separate http server such as
+Nginx or from a CDN (Content Delivery Network) service. In our first
+test, however, we still recorded these requests to see the worst case
+situation, where all requests are served from a single application
+server.
+
+image:img/figure3s2.png[image]
+
+_Figure 3: Gatling recorder._
+
+Gatling gathers the recorded requests as text files and composes a Scala
+class which is used to playback the recorded test scenario. We planned a
+test scenario for a typical inventory application user: The user logs in
+and performs several updates (in our case 11) to the store in a
+relatively short time span (3 minutes). We also assumed that they leave
+the inventory application open in their browser which will result in the
+HttpSession not closing before a session timeout (30min in our case).
+
+Let’s assume that we have an extremely large bookstore with several
+persons (say 10000) responsible for updating the inventory. If one
+person updates the inventory 5 times a day, and an update session takes
+3 minutes, then with the same logic as we calculated concurrent users in
+the Introduction, we will get a continuous load of about 100 concurrent
+users. This is of course not a realistic assumption unless the
+application is a global service or it is a round-the-clock used local
+application, such as a patient information system. For testing purposes
+this is, however, a good assumption.
+
+A snippet from the end of our test scenario is shown in Figure 4. This
+test scenario is configured to be run with 100 concurrent users, all
+started within 60 seconds (see the last line of code).
+
+[source,scala]
+....
+.pause(9)
+.exec(http(>"request_45")
+ .post(>"/test/UIDL/?v-uiId=0")
+ .headers(headers_9)
+ .body(RawFileBody(>"RecordedSimulation_0045_request.txt")))
+.pause(5)
+.exec(http(>"request_46")
+ .post(>"/test/UIDL/?v-uiId=0")
+ .headers(headers_9)
+ .body(RawFileBody(>"RecordedSimulation_0046_request.txt"))
+ .resources(http(>"request_47")
+ .post(uri1 + >"/UIDL/?v-uiId=0")
+ .headers(headers_9)
+ .body(RawFileBody(>"RecordedSimulation_0047_request.txt"))))}
+setUp(scn.inject(rampUsers(100) over (60 seconds))).protocols(httpProtocol)
+....
+
+_Figure 4: Part of the test scenario of inventory application._
+
+To make the test more realistic, we would like to execute it several
+times. Without repeating we do not get a clear picture of how the server
+will tolerate a continuous high load. In a Gatling test script, this is
+achieved by wrapping the test scenario into a repeat loop. We should
+also flush session cookies to ensure that a new session is created for
+each repeat. See the second line of code in Figure 5, for an example of
+how this could be done.
+
+[source,scala]
+....
+val scn = scenario("RecordedSimulation")
+ .repeat(100,"n"){exec(flushSessionCookies).exec(http("request_0")
+ .get("/test/")
+ .resources(http("request_1")
+ .post(uri1 + "/?v-1440411375172")
+ .headers(headers_4)
+ .formParam("v-browserDetails", "1")
+ .formParam("theme", "mytheme")
+ .formParam("v-appId", "test-3556498")
+....
+
+_Figure 5: Repeating 100 times with session cookie flushing (small part
+of whole script)_
+
+We tested how well this simple example application tolerated several
+concurrent users. We deployed our application in Apache Tomcat 8.0.22 on
+a Windows 10 machine with Java 1.7.0 and an older quad core mobile Intel
+i7 processor. With its default settings (using the default heap size of
+2GB), Tomcat was able to handle about 200 concurrent users for a longer
+time. The CPU usage for that small number of concurrent users was not a
+problem (it was lower than 5%), but the server configurations were a
+bottleneck. Here we stumbled upon the first scalability pain point:
+server configuration issues (see next chapter). It might sound
+surprising that we could only run a such small number of concurrent
+users by default, but do not worry, we are not stuck here. We will see
+the reasons for this and other scalability pain points in the following
+chapter.
+
+[[scalability-pain-points]]
+Scalability pain points
+-----------------------
+
+We will go through typical pain points of a web application developer
+which she (or he) will encounter when developing a web application for
+hundreds of concurrent users. Each pain point is introduced in its own
+subchapter and followed by typical remedies.
+
+[[server-configuration-issues]]
+Server configuration issues
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+One typical problem that appears when there are lots of concurrent
+users, is that the operating system (especially the *nix based ones) run
+out of file descriptors. This happens since most *nix systems have a
+pretty low default limit for the maximum number of open files, such as
+network connections. This is usually easy to fix with the `ulimit`
+command though sometimes it might require configuring the `sysctl` too.
+
+A little bit unexpected issues can also surface with network bandwidth.
+Our test laptop was on a wireless connection and its sending bandwidth
+started choking at about 300 concurrent users. (Please note that we use
+an oldish laptop in this entire test to showcase the real scalability of
+web apps –your own server environment will no doubt be even more
+scalable even out of the box.) One part of this issue was the wifi and
+another part was that we served the static resources, such as javascript
+files, images and stylesheets, from Tomcat. At this point we stripped
+the static resources requests out of our test script to simulate the
+situation where those are served from a separate http server, such as
+nginx. Please read the blog post
+“https://vaadin.com/blog/-/blogs/optimizing-hosting-setup[Optimizing
+hosting setup]” from our website for more information about the topic.
+
+Another quite typical configuration issue is that the application server
+is not configured for a large number of concurrent users. In our
+example, a symptom of this was that the server started rejecting
+(“Request timed out”) new connections after a while, even though there
+were lots of free memory and CPU resources available.
+
+After we configured our Apache Tomcat for high concurrent mode and
+removed static resource requests, and connected the test laptop into a
+wired network, we were able to push the number of concurrent users from
+200 up to about 500 users. Our configuration changes into the server.xml
+of Tomcat are shown in Figure 6, where we define a maximum thread count
+(10240), an accepted threads count (4096), and a maximum number of
+concurrent connections (4096).
+
+image:img/figure6a.png[image]
+
+_Figure 6: Configuring Tomcat’s default connector to accept a lot of
+concurrent users._
+
+The next pain point that appeared with more than 500 users was that we
+were out of memory. The default heap size of 2GB eventually ran out with
+such high number of concurrent users. On the other hand, there was still
+a lot of CPU capacity available, since the average load was less than
+5%.
+
+[[out-of-memory]]
+Out of memory
+~~~~~~~~~~~~~
+
+Insufficient memory is possibly the most common problem that limits the
+scalability of a web application with a state. An http session is used
+typically to store the state of a web application for its user. In
+Vaadin an http session is wrapped into a `VaadinSession`. A
+VaadinSession contains the state (value) of each component (such as
+`Grid`, `TextFields` etc.) of the user interface. Thus,
+straightforwardly the more components and views you have in your Vaadin
+web application, the bigger is the size of your session.
+
+In our inventory application, each session takes about 0.3MB of memory
+which is kept in memory until the session finally closes and the garbage
+collectors free the resources. The session size in our example is a
+little bit high. With constant load of 100 concurrent users, a session
+timeout of 30 minutes and an average 3 minutes usage time, the expected
+memory usage is about 350MB. To see how the session size and the number
+of concurrent users affect the needed memory in our case, we made a
+simple analysis which results are shown in Figure 7. We basically
+calculated how many sessions there can exist at most, by calculating how
+many users there will be within an average usage time plus the session
+timeout.
+
+image:img/figure6s.png[image]
+
+_Figure 7: Memory need for varying size sessions and a different number
+of concurrent users._
+
+[[remedies]]
+Remedies
+^^^^^^^^
+
+[[use-more-memory]]
+Use more memory
++++++++++++++++
+
+This might sound simplistic, but many times it might be enough to just
+add as much memory as possible to the server. Modern servers and server
+operating systems have support for hundreds of gigabytes of physical
+memory. For instance, again in our example, if the size of a session
+would be 0.5MB and we had 5000 concurrent users, the memory need would
+be about 28GB.
+
+You also have to take care that your application server is configured to
+reserve enough memory. For example, the default heap size for Java is
+typically 2GB and for example Apache Tomcat will not reserve more memory
+if you do not ask it to do it with **`-Xmx`** JVM argument. You might
+need a special JVM for extremely large heap sizes. We used the following
+Java virtual machine parameters in our tests:
+
+....
+-Xms5g -Xmx5g -Xss512k -server
+....
+
+The parameters **`-Xms`** and **`-Xmx`** are for setting the minimum and
+the maximum heap size for the server (5 GB in the example), the `-Xss`
+is used to reduce the stack size of threads to save memory (typically
+the default is 1MB for 64bit Java) and the `-server` option tells JVM
+that the Java process is a server.
+
+[[minimize-the-size-of-a-session]]
+Minimize the size of a session
+++++++++++++++++++++++++++++++
+
+The biggest culprit for the big session size in the inventory
+application is the container (BeanItemContainer) which is filled with
+all items of the database. Containers, and especially the built in fully
+featured BeanItemContainer, are typically the most memory hungry parts
+of Vaadin applications. One can either reduce the number of items loaded
+in the container at one time or use some lightweight alternatives
+available from Vaadin Directory
+(https://vaadin.com/directory[vaadin.com/directory]) such as Viritin,
+MCont, or GlazedLists Vaadin Container. Another approach is to release
+containers and views to the garbage collection e.g. every time the user
+switches into another view, though that will slightly increase the CPU
+load since the views and containers have to be rebuilt again, if the
+user returns to the view. The feasibility of this option is up to your
+application design and user flow –usually it’s a good choice.
+
+[[use-a-shorter-session-time-out]]
+Use a shorter session time out
+++++++++++++++++++++++++++++++
+
+Since every session in the memory reserves it for as long as it stays
+there, the shorter the session timeout is, the quicker the memory is
+freed. Assuming that the average usage time is much shorter than the
+session timeout, we can state that halving the session timeout
+approximately halves the memory need, too. Another way to reduce the
+session’s time in the memory could be instructing users to logout after
+they are done.
+
+The session of a Vaadin application is kept alive by requests (such as
+user interactions) made from the client to the server. Besides user
+interaction, the client side of Vaadin application sends a heartbeat
+request into the server side, which should keep the session alive as
+long as the browser window is open. To override this behaviour and to
+allow closing idle sessions, we recommend that the `closeIdleSessions`
+parameter is used in your servlet configuration. For more details, see
+chapter
+https://vaadin.com/book/-/page/application.lifecycle.html[Application
+Lifecycle] in the Book of Vaadin.
+
+[[use-clustering]]
+Use clustering
+++++++++++++++
+
+If there is not enough memory, for example if there is no way to reduce
+the size of a session and the application needs a very long session
+timeout, then there is only one option left: clustering. We will discuss
+clustering later in the Out of CPU chapter since clustering is more
+often needed for increasing CPU power.
+
+[[out-of-cpu]]
+Out of CPU
+~~~~~~~~~~
+
+We were able to get past the previous limit of 500 concurrent users by
+increasing the heap size of Tomcat to 5GB and reducing the session
+timeout to 10 minutes. Following the memory calculations above, we
+should theoretically be able to serve almost 3000 concurrent users with
+our single server, if there is enough CPU available.
+
+Although the average CPU load was rather low (about 10%) still with 800
+concurrent users, it jumped up to 40% every now and then for several
+seconds as the garbage collector cleaned up unused sessions etc. That is
+also the reason why one should not plan to use full CPU capacity of a
+server since that will increase the garbage collection time in worst
+case even to tens of seconds, while the server will be completely
+unresponsive for that time. We suggest that if the average load grows to
+over 50% of the server’s capacity, other means have to be taken into use
+to decrease the load of the single server.
+
+We gradually increased the number of concurrent users to find out the
+limits of our test laptop and Tomcat. After trial and error, we found
+that the safe number of concurrent users for our test laptop was about
+1700. Above that, several request timeout events occurred even though
+the CPU usage was about 40-50% of total capacity. We expect that using a
+more powerful server, we could have reached 2000-3000 concurrent users
+quite easily.
+
+[[remedies-1]]
+Remedies
+^^^^^^^^
+
+[[analyze-and-optimize-performance-bottlenecks]]
+Analyze and optimize performance bottlenecks
+++++++++++++++++++++++++++++++++++++++++++++
+
+If you are not absolutely sure about the origin of the high CPU usage,
+it is always good to verify it with a performance profiling tool. There
+are several options for profiling, such as JProfiler, XRebel, and Java
+VisualVM. We will use VisualVM in this case since it comes freely with
+every (Oracle’s) JDK since the version 1.5.
+
+Our typical procedure goes like this: 1. Deploy your webapp and start
+your server, 2. Start VisualVM and double click your server’s process
+(“e.g. Tomcat (pid 1234)”) on the Applications tab (see Figure 8), 3.
+Start your load test script with, for instance, 100 concurrent users, 4.
+Open the Sampler tab to see where the CPU time is spent, 5. Use the
+filter on the bottom to show the CPU usage of your application (e.g.
+“`biz.mydomain.projectx`”) and possible ORM (Object-relational mapping)
+framework (e.g. “`org.hibernate`”) separately.
+
+Typically, only a small part (e.g. 0.1 - 2 %) of CPU time is spent on
+the classes of your webapp, if your application does not contain heavy
+business logic. Also, CPU time spent on the classes of Vaadin should be
+very small (e.g. 1%). You can be relaxed about performance bottlenecks
+of your code if the most time (>90%) is spent on application server’s
+classes (e.g. “`org.apache.tomcat`”).
+
+Unfortunately, quite often database functions and ORM frameworks take a
+pretty big part of CPU time. We will discuss how to tackle heavy
+database operations in the Database chapter below.
+
+image:img/figure7s.png[image]
+
+_Figure 8: Profiling CPU usage of our inventory application with Java
+VisualVM_
+
+[[use-native-application-server-libraries]]
+Use native application server libraries
++++++++++++++++++++++++++++++++++++++++
+
+Some application servers (at least Tomcat and Wildfly) allow you to use
+native (operating system specific) implementation of certain libraries.
+For example, The Apache Tomcat Native Library gives Tomcat access to
+certain native resources for performance and compatibility. Here we
+didn’t test the effect of using native libraries instead of standard
+ones. With little online research, it seems that the performance benefit
+of native libraries for Tomcat is visible only if using secured https
+connections.
+
+[[fine-tune-java-garbage-collection]]
+Fine tune Java garbage collection
++++++++++++++++++++++++++++++++++
+
+We recommended above not to strain a server more than 50% of its total
+CPU capacity. The reason was that above that level, a garbage collection
+pause tends to freeze the server for too long a time. That is because it
+typically starts not before almost all of the available heap is already
+spent and then it does the full collection. Fortunately, it is possible
+to tune the Java garbage collector so that it will do its job in short
+periods. With little online study, we found the following set of JVM
+parameters for web server optimized garbage collection
+
+....
+-XX:+UseCMSInitiatingOccupancyOnly
+-XX:CMSInitiatingOccupancyFraction=70
+....
+
+The first parameter prevents Java from using its default garbage
+collection strategy and makes it use CMS (concurrent-mark-sweep)
+instead. The second parameter tells at which level of “occupancy” the
+garbage collection should be started. The value 70% for the second
+parameter is typically a good choice but for optimal performance it
+should be chosen carefully for each environment e.g. by trial and error.
+
+The CMS collector should be good for heap sizes up to about 4GB. For
+bigger heaps there is the G1 (Garbage first) collector that was
+introduced in JDK 7 update 4. G1 collector divides the heap into regions
+and uses multiple background threads to first scan regions that contain
+the most of garbage objects. Garbage first collector is enabled with the
+following JVM parameter.
+
+....
+-XX:+UseG1GC
+....
+
+If you are using Java 8 Update 20 or later, and G1, you can optimize the
+heap usage of duplicated Strings (i.e. their internal `char[]` arrays)
+with the following parameter.
+
+....
+-XX:+UseStringDeduplication
+....
+
+[[use-clustering-1]]
+Use clustering
+++++++++++++++
+
+We have now arrived at the point where a single server cannot fulfill
+our scalability needs whatever tricks we have tried. If a single server
+is not enough for serving all users, obviously we have to distribute
+them to two or more servers. This is called clustering.
+
+Clustering has more benefits than simply balancing the load between two
+or more servers. An obvious additional benefit is that we do not have to
+trust a single server. If one server dies, the user can continue on the
+other server. In worst case, the user loses her session and has to log
+in again, but at least she is not left without the service. You probably
+have heard the term “session replication” before. It means that the
+user’s session is copied into other servers (at least into one other) of
+the cluster. Then, if the server currently used by the user goes down,
+the load balancer sends subsequent requests to another server and the
+user should not notice anything.
+
+We will not cover session replication in this article since we are
+mostly interested in increasing the ability to serve more and more
+concurrent users with our system. We will show two ways to do clustering
+below, first with Apache WebServer and Tomcats and then with the Wildfly
+Undertow server.
+
+[[clustering-with-apache-web-server-and-tomcat-nodes]]
+Clustering with Apache Web Server and Tomcat nodes
+++++++++++++++++++++++++++++++++++++++++++++++++++
+
+Traditionally Java web application clustering is implemented with one
+Apache Web Server as a load balancer and 2 or more Apache Tomcat servers
+as nodes. There are a lot of tutorials online, thus we will just give a
+short summary below.
+
+1. Install Tomcat for each node
+2. Configure unique node names with jvmRoute parameter to each Tomcat’s
+server.xml
+3. Install Apache Web Server to load balancer node
+4. Edit Apache’s httpd.conf file to include mod_proxy, mod_proxy_ajp,
+and mod_proxy_balancer
+5. Configure balancer members with node addresses and load factors into
+end of httpd.conf file
+6. Restart servers
+
+There are several other options (free and commercial ones) for the load
+balancer, too. For example, our customers have used at least F5 in
+several projects.
+
+[[clustering-with-wildfly-undertow]]
+Clustering with Wildfly Undertow
+++++++++++++++++++++++++++++++++
+
+Using Wildfly Undertow as a load balancer has several advantages over
+Apache Web Server. First, as Undertow comes with your WildFly server,
+there is no need to install yet another software for a load balancer.
+Then, you can configure Undertow with Java (see Figure 8) which
+minimizes the error prone conf file or xml configurations. Finally,
+using the same vendor for application servers and for a load balancer
+reduces the risk of intercompatibility issues. The clustering setup for
+Wildfly Undertow is presented below. We are using sticky session
+management to maximize performance.
+
+1. Install Wildfly 9 to all nodes
+2. Configure Wildfly’s standalone.xml
+1. add `“instance-id=”node-id”` parameter undertow subsystem, e.g:
+`<subsystem xmlns="urn:jboss:domain:undertow:2.0" instance-id="node1"> `(this
+is needed for the sticky sessions).
+2. set http port to something else than 8080 in socket-binding-group,
+e.g: `<socket-binding name="http" port="${jboss.http.port:8081}"/>`
+3. Start your node servers accepting all ip addresses:
+`./standalone.sh -c standalone.xml -b=0.0.0.0`
+4. Code your own load balancer (reverse proxy) with Java and Undertow
+libraries (see Figure 9) and start it as a Java application.
+
+[source,java]
+....
+public static void main(final String[] args) {
+ try {
+ LoadBalancingProxyClient loadBalancer = new LoadBalancingProxyClient()
+ .addHost(new URI("http://192.168.2.86:8081"),"node1")
+ .addHost(new URI("http://192.168.2.216:8082"),"node2")
+ .setConnectionsPerThread(1000);
+ Undertow reverseProxy = Undertow.builder()
+ .addHttpListener(8080, "localhost")
+ .setIoThreads(8)
+ .setHandler(new ProxyHandler(loadBalancer, 30000, ResponseCodeHandler.HANDLE_404))
+ .build();
+ reverseProxy.start();
+ } catch (URISyntaxException e) {
+ throw new RuntimeException(e);
+ }
+}
+....
+
+_Figure 9: Simple load balancer with two nodes and sticky sessions._
+
+[[database]]
+Database
+~~~~~~~~
+
+In most cases, the database is the most common and also the most tricky
+to optimize. Typically you’ll have to think about your database usage
+before you actually need to start optimizing the memory and CPU as shown
+above. We assume here that you use object to relational mapping
+frameworks such as Hibernate or Eclipselink. These frameworks implement
+several optimization techniques within, which are not discussed here,
+although you might need those if you are using plain old JDBC.
+
+Typically profiling tools are needed to investigate how much the
+database is limiting the scalability of your application, but as a rule
+of thumb: the more you can avoid accessing the database, the less it
+limits the scalability. Consequently, you should generally cache static
+(or rarely changing) database content.
+
+[[remedies-2]]
+Remedies
+^^^^^^^^
+
+[[analyze-and-optimize-performance-bottlenecks-1]]
+Analyze and optimize performance bottlenecks
+++++++++++++++++++++++++++++++++++++++++++++
+
+We already discussed shortly, how to use Java VisualVM for finding CPU
+bottlenecks. These same instructions also apply for finding out at what
+level the database consumes the performance. Typically you have several
+Repository-classes (e.g. `CustomerRepository`) in your web application,
+used for CRUD (create, read, update, delete) operations (e.g.
+`createCustomer`). Commonly your repository implementations either
+extend Spring’s JPARepository or use `javax.persistence.EntityManager`
+or Spring’s `Datasource` for the database access. Thus, when profiling,
+you will probably see one or more of those database access methods in
+the list of methods that are using most of your CPU’s capacity.
+
+According to our experience, one of the bottlenecks might be that small
+database queries (e.g. `findTaskForTheDay`) are executed repeatedly
+instead of doing more in one query (e.g. `findTasksForTheWeek`). In some
+other cases, it might be vice versa: too much information is fetched and
+only part of it is used (e.g. `findAllTheTasks`). A real life example of
+the latter happened recently in a customer project, where we were able
+to a gain significant performance boost just by using JPA Projections to
+leave out unnecessary attributes of an entity (e.g. finding only Task’s
+name and id) in a query.
+
+[[custom-caching-and-query-optimization]]
+Custom caching and Query optimization
++++++++++++++++++++++++++++++++++++++
+
+After performance profiling, you have typically identified a few queries
+that are taking a big part of the total CPU time. A part of those
+queries might be the ones that are relatively fast as a single query but
+they are just done hundreds or thousands of times. Another part of
+problematic queries are those that are heavy as is. Moreover, there is
+also the __N__+1 query problem, when, for example, a query for fetching
+a Task entity results __N__ more queries for fetching one-to-many
+members (e.g. assignees, subtasks, etc.) of the Task.
+
+The queries of the first type might benefit from combining to bigger
+queries as discussed in the previous subchapter (use
+`findTasksForTheWeek` instead of `findTaskForTheDay`). I call this
+approach custom caching. This approach typically requires changes in
+your business logic too: you will need to store (cache) yet unneeded
+entities, for example in a `HashMap` or `List` and then handle all these
+entities sequentially.
+
+The queries of the second type are typically harder to optimize.
+Typically slow queries can be optimized by adding a certain index or
+changing the query logic into a little bit different form. The difficult
+part is to figure out what exactly makes the query slow. I recommend
+using a logging setting that shows the actual sql query made in your log
+file or console (e.g. in Hibernate use `show_sql=true`). Then you can
+take the query and run it against your database and try to vary it and
+see how it behaves. You can even use the `EXPLAIN` keyword to ask MySQL
+or PostgreSql (`EXPLAIN PLAN FOR` in Oracle and `SHOWPLAN_XML` in SQL
+Server) to explain how the query is executed, what indexes are used etc.
+
+The __N__+1 queries can be detected by analysing the executed sqls in
+the log file. The first solution for the issue is redesigning the
+problematic query to use appropriate join(s) to make it fetch all the
+members in a single sql query. Sometimes, it might be enough to use
+`FetchType.EAGER` instead of `LAZY` for the problematic cases. Yet
+another possibility could be your own custom caching as discussed above.
+
+[[second-level-cache]]
+Second-level cache
+++++++++++++++++++
+
+According to Oracle’s Java EE Tutorial: a second-level cache is a local
+store of entities managed by the persistence provider. It is used to
+improve the application performance. A second-level cache helps to avoid
+expensive database queries by keeping frequently used entities in the
+cache. It is especially useful when you update your database only by
+your persistence provider (Hibernate or Eclipselink), you read the
+cached entities much more often than you update them, and you have not
+clustered your database.
+
+There are different second-level cache vendors such as EHCache, OSCache,
+and SwarmCache for Hibernate. You can find several tutorials for these
+online. One thing to keep in mind is that the configuration of, for
+example, EHCache varies whether you use Spring or not. Our experience of
+the benefits of second-level caches this far is that in real world
+applications the benefits might be surprisingly low. The benefit gain
+depends highly on how much your application uses the kind of data from
+the database that is mostly read-only and rarely updated.
+
+[[use-clustering-2]]
+Use clustering
+++++++++++++++
+
+There are two common options for clustering or replication of the
+database: master-master replication and master-slave replication. In the
+master-master scheme any node in the cluster can update the database,
+whereas in the master-slave scheme only the master is updated and the
+change is distributed to the slave nodes right after that. Most
+relational database management systems support at least the master-slave
+replication. For instance, in MySql and PostgreSQL, you can enable it by
+few configuration changes and by granting the appropriate master rights
+for replication. You can find several step-by-step tutorials online by
+searching with e.g. the keywords “postgresql master slave replication”.
+
+[[nosql]]
+NoSQL
++++++
+
+When looking back to the first figure (Figure 1) of the article, you
+might wonder what kind of database solutions the world's biggest web
+application’s use? Most of them use some relation database, partly, and
+have a NoSQL database (such as Cassandra, MongoDB, and Memcached) for
+some of the functionality. The big benefit of many NoSQL solutions is
+that they are typically easier to cluster, and thus help one to achieve
+extremely scalable web applications. The whole topic of using NoSQL is
+so big that we do not have the possibility to discuss it in this
+article.
+
+[[summary]]
+Summary
+-------
+
+We started the study by looking at typical applications and estimated
+their average concurrent user number. We then started with a typical
+Vaadin web application and looked at what bottlenecks we hit on the way,
+by using a standard laptop. We discussed different ways of overcoming
+everything from File Descriptors to Session size minimization, all the
+way to Garbage collection tweaking and clustering your entire
+application. At the end of the day, there are several issues that could
+gap you applications scalability, but as shown in this study, with a few
+fairly simple steps we can scale the app from 200 concurrent users to
+3000 concurrent users. As a standard architectural answer, however: the
+results in your environment might be different, so use tools discussed
+in this paper to find your bottlenecks and iron them out.