2kin June 6, 2012 at 11:19

WebSphere Application Server topology for high availability

From the sandbox

Hello, Habr!

In this article, I want to talk about what approaches are available to ensure fault tolerance and scalability of the IBM WebSphere Application Server 7 application server infrastructure.

First, a little terminology that will be used:

High availability (Eng. High availability) - a method of designing a system that allows you to achieve a high level of system availability for any period of time.

For business systems, high availability means creating redundancy in critical business systems. Then a failure of one component, be it a failure of a router or a network card or a rolgram component, will not cause the application to crash.

Availability is mainly expressed as a percentage or as a “nine”.

A = MTBF / (MTBF + MTTR).

90% (“one nine”) - 16.8 hours of downtime per week
99% (“two nines”) - 1.7 hours of downtime per week
99.9% (“three nines”) - 8.8 hours of downtime per year
99.99% (“four nines”) - 53 minutes of downtime per year

MTBF (English Mean time between failures) - The average duration of work between stops, that is, shows what operating time is an average of one failure.

MTTR (English Mean Time to Restoration) - the average time required to restore normal operation after a failure occurs.

SPOF (English single point of failure) - part of the system, which in case of failure makes the system inaccessible.

Was- J2EE application server company IBM. There are several delivery options:
0. Community Edition - an open source project based on Apache Geronimo;
1. Express - 1 node / 1 application server;
2. Base - 1 node / n application servers;
3. Network Deployment (ND) - includes a set of components for building a scalable and fault-tolerant infrastructure from a large number of application servers;
4. and a few other specific options (for z / OS, Hypervisor Edition, Extended Deployment).

Next, we will consider everything related to the Network Deployment 7 (WAS ND) version. At the moment, versions 8.0 and 8.5 already exist, but the approaches described in the article are applicable to them.

Key terms related to Network Deployment topologies:

Cell - An organizational unit that includes a deployment manager (Deployment Manager) and several nodes (Node). The deployment manager manages nodes through Node Agents.

A node consists of a node agent, which, as we already understand, is used for management, and one or more application servers (Application Server).

Such a hierarchy (Cell / Node / Server) helps organize the entire set of servers and group them according to functionality and availability requirements.

Application Server - JVM 5th Java EE specification (WAS versions 8 and 8.5 comply with Java EE 6 specifications)

Profile- A set of application server settings that are applied when it starts. When a JVM instance starts, the settings of its environment are read from the profile and the functions that the application server will perform depend on its type. Deployment manager, node agent, application server are private examples of profiles. Later in the article, we will consider why and when to use different profiles and how they interact together, and what they can achieve.
Stand-alone profiles differ from federated ones in that multiple Stand-alone profiles are managed from different administrative consoles, and federated profiles are managed from a single point, which is much more convenient and faster.

Formulation of the problem

So, based on the tasks to ensure high availability of a certain business system running on the infrastructure of application servers, we need to build such an infrastructure that will ensure the fulfillment of these requirements.

Level I

Standard three-level architecture. We have one physical / virtual server on which the stand-alone WAS profile is located with its administrative console, DBMS and HTTP server.
We will list what points of failure are present in this configuration and from level to level we will try to eliminate them:
1. HTTP server;
2. Application server;
3. Database;
4. All software components that ensure the interaction of our server with other components of the software infrastructure (Firewall, LDAP, etc.)
5. Hardware.

Level II

At this level, we eliminate the only point of failure - the application server. To do this, we need to create a cluster of two application servers and to manage them we will need two more components:
a) deployment manager;
b) management agent.

The deployment manager actually performs the function of uniting the administrative consoles of all application servers that are under its control. When changing the configurations of one or several servers, the settings are “descended” from the deployment manager to the server through management agents.

In the event of a failure of one of the application servers, HAManager will automatically recover all data on the second server.

Level III

At this level, we can close several points of failure at once - the HTTP server and the physical server on which the application servers are running. To do this, we will move our database outside of our physical servers. Already on 2 servers we’ll deploy 2 nodes and in each of them we will create a pair of application servers. And we will unite all the servers in a single cluster. In the event of a failure of one of the physical servers, data and application states will be restored to the second system. In addition to this, using a load balancer (another type of profile) we can distribute incoming requests between systems and thus distribute the load and increase the performance of our applications. Applying this topology, we get a new possible point of failure - a load balancer.

Tier VI

We supplement level III with a backup load balancer and, in addition to this, ensure the reliability of our database. We will not consider database clusterization mechanisms in detail since they themselves deserve a separate article.

Level v

And with a final chord, we’ll duplicate the entire infrastructure and move it away, in case our data center is flooded.

In addition to this, it may not be out of place to place our Front-end servers in the DMZ zone.

Total

As you can see, ensuring the continuous operation of critical business systems can be VERY costly, and before you start building such solutions, you need to evaluate all the risks and readiness for implementation.

Thanks for attention.

Tags: