WebSphere Portal v6 Best Practices

 

+
Search Tips   |   Advanced Search

 

 

Content

  1. Project Breakdown
  2. Topology planning
  3. Security
  4. Content management
  5. Search
  6. Virtual portals
  7. Developing a portal
  8. Caching
  9. Deploying, testing, and maintaining a portal
  10. Staging and preproduction environment
  11. Monitoring
  12. Sample workshop agenda
  13. Sample portal tracking worksheet
  14. Portlet sourcing
  15. Exercise
  16. Portlet Sourcing Worksheet
  17. Solution Assurance Checklists
  18. IBM Skills Assessment Ratings
  19. Online Resources

 

Project Breakdown

Project startup 20 to 30 days
  • Identify the team players
  • Build the skills for development and infrastructure
  • Set expectations and time lines
  • Create the project plan.
  • Solution definition 20 to 30 days
  • Translate business requirements into technical architecture.
  • Document designs
  • IBM Solution Assurance Review
  • Portlet sourcing
  • Project standards Three days
  • Identify change management process
  • Develop documentation methodology and testing procedures.
  • Environment setup Six months elapsed time
  • IBM Techline sizing
  • Procure hardware and software.
  • Install environments and begin base-line stress testing.
  • Pilot release One to two months
  • Select and design portlets.
  • Document and run the use cases.
  • Perform preliminary stress testing
  • Production release Length of pilot plus three months
  • Select more pilot projects and build onto the pilot.
  • Add more functionality like Web content management
  • Enable security
  • Perform more stress testing
  • Test fail-over procedures.
  • Project close Three days
  • Postmortem meeting
  • Ensure ongoing administration is in place
  • develop any follow-on plans.
  • The first three-to-four months is considered the pilot stage, where one spends time...

    • connecting to established systems
    • understanding the functionality
    • developing a limited deployment strategy
    • building portlets
    • leveraging click-to-action
    • migrating established applications
    • integrating other products

    1. Pay special attention to the process of creating reasonable requirements.
    2. Select the proper team composition.
    3. Choose the proper components and architecture for the infrastructure.
    4. Avoid unnecessary complexity.
    5. Create an effective testing plan and environment.

     

    Roles and responsibilities

    1. Project management team

    2. Architecture board

      Lead architect
    3. Final decisions regarding architectural topics.
    4. Works closely with the project management team
    5. Fills in the gaps where nobody seems to be responsible.
    6. Works with the other architects
    7. WebSphere Portal architect
    8. Planning of the portal
    9. Design the topology
    10. Test and deployment scenarios.
    11. Network architect
    12. Topology design
    13. Connection to various environments
    14. Deployment of the systems
    15. Database architect
    16. Design reasonable database schemas
    17. User registration architect
    18. Design or explain the LDAP schemas
    19. LDAP search strings
    20. Design login process
    21. Security architect
    22. Set and validate security guidelines.
    23. Development architect
    24. Work with lead architect
    25. Fill skill gaps in the team.
    26. Test architect
    27. Design the load and stress test
    28. Design the test scenarios to ensure that the portal provides the functionality as requested.
    29. Establish bug tracking tools
    30. Developers

      Portlet developers or a more specialized group need to be assigned to create the portal elements such as themes and skins out of the design guidelines.

      Experienced developers are required to create and debug the connectors to the business logic or the business logic itself.

      You will find some features especially for portlet development within the IBM Rational Application Developer tools. You generally recommend these tools (verify the developers have the right size development workstation, with a minimum of 2 GB RAM).

    31. Code maintainer

      Responsible code maintenance such as...

      • Concurrent Versions System (CVS)
      • IBM Rational ClearCase)

      Also writes scripts for daily or weekly builds and deployment.

    32. Designers

      Often the output will be pictures and PDF documents, which the developers will need to transform into style sheets and JSPs. Verify the designers are available to the developers for requests that go beyond the graphics.

      Usability tests.

    33. Administrators

      Does all the portal configuration task. You also need administrators who have access to the back-end systems. You might need additional rights for the database back end or for some user on one of the already established systems.

      The portal administrator has to work closely with the code maintainer to get scripts ready for automatic deployments. Good administrators try to replace themselves with scripts.

    34. Testers

      Distinguish between functional and non-functional testers. Do not start testing too late in a project.

    35. Support staff

      Perform maintenance until the next development iteration starts. Supports the help desk for technical questions regarding the portal system.

      This is often a task area that is outsourced. The support or maintenance staff gets the “leftover” documentation of the project, which is a bad start and might lead to difficulties.

      Keep in mind that the response time and the quality of support requested from the portal users are significant factors in user satisfaction.

      Ideally, some of the developers in the project team will become a part of the support and maintenance team.

    36. Release manager

      Performs the coordination of the dates regarding other things going on in the organization. For example, say that you want to perform a stress test over night, but that night half of the back-end databases were shut down for maintenance. It is not a good idea to perform this on the same day that you go live with the portal system, or when there is a major upgrade taking place at one of the systems somewhere in the back-end architecture. Just to experience these things the day you present the pilot to the CIO might cast a bad light on the project.

      Although these can be obvious issues, they can happen over and over again; therefore, you recommend dedicating a person for this role.

    37. Executive sponsor

      A good portal project needs a person in place who has the power to make bold decisions, an important person to help clear things up if the project management team is not able to get things organized in a way that is most effective for the project.

      Portal projects need organizations to communicate and collaborate in different ways to which they are accustomed. The executive sponsor, therefore, needs to facilitate and intervene as needed.

      Note: Disagreements can erupt between the disparate teams involved in the project. You must have an executive sponsor who is committed to bringing everyone involved together to achieve a successful outcome.

      An executive sponsor does the marketing of the project to the top management level.

     

    Determine and reduce the complexity

    Start with a base portal installation and choose a few components to deploy at first. For example, configure the portal for the database and LDAP servers. Then, add connectivity to a back-end application. Part of the project management team.

    Build the portal from the base components first. Only install added functionality when the infrastructure is solid. Use an iterative process and load test along the way. This will simplify the process of problem isolation.

    Do not attempt to create a reference architecture on the first project. Let the reference architecture grow in a planned fashion.

    From a pure Java code point of view, it is often better to throw away current servlet-based code and frameworks instead of trying to reuse it. If you have a model-view-controller (MVC)-based, well-layered application, keep the business logic. Throw away the controller and user interface (UI) logic. Obviously, you might want to keep cascading style sheets (CSSs) and liberally cut and paste some JSP code.

    If you then create new portlets, remember that the best portlets are the most simple ones. Aviod rebuilding something that is already available as a portlet service or within frameworks, such as Struts and JavaServer Faces.

    Do not fall into the super portlet trap. Because portlets are so simple to implement, multiple portlets are not as onerous. Simple and multiple portlets are better than super single portlets. The use of portlets forces you to reduce the complexity of user interface and controller logic and pushes complex business logic to where it belongs.

     

    Define non-functional requirements as part of service level agreements

    Non-functional requirements are usually defined in service level agreements (SLAs) which define the agreed to service levels or measurements (availability and performance objectives) by which the solution will be supported in the organization.

    The non-functional requirements are a part of the iterative process to create proper SLAs. They can change over time. For example, the business need of the portal changes, and as a result, the response time needs to be shorter.

     

    High availability

    Availability is technically defined as a result of the mean time between failure (MTBF) and the mean time to repair (MTTR).

    A failure or repair situation is not the only factor that influences the availability factor. Assume that the portal environment is not available on the network every night for one hour to perform a backup. This adds up to 365 hours or 15 days and 5 hours a year, which leads to a maximum availability of 98.83% (23h/23h+1h).

    It will be up to you to explicitly exclude scheduled outages from the availability factor, and if so, remember to include a discussion on the handling of scheduled outages within the organization. You often see that the IT department receives a certain number of hours per year to perform maintenance, usually within a certain time frame at night. Beyond this, it is reasonable to designate a person or define a process where we can apply for an exception to the scheduled outages, for example, to apply a security patch to the system.

    Because a portal is by definition the front end of a number of systems, its availability is influenced by all of the supporting back-end systems. The overall availability of an application is the result of the availability of all components multiplied. A theoretical example is:

    Availability of the user’s Web browser 98%
    Availability of the users’s DSL connection 98%
    Availability of the Internet backbones 99.99%
    Availability of the Web site’s firewall 99.5%
    Availability of the Web server 99.8%
    Availability of the Portal server 99.6%
    Availability of the network in the data center 99.99%
    Availability of the operating system 99.99%
    Availability of the database 99.5%
    Basic data center availability, covering disasters, and so on 99.999%

    This leads to an overall availability of less than 94.5%.

    From a user’s perspective, all of these components are required to allow the user to leverage the portal application; it does not matter to the user why the portal is not available.

    In the discussions, make clear that you are defining the availability of the portal system itself, and explicitly exclude the other components involved. Often, this argument will not work because the owner of the portal might also not care about which component fails. You should at least agree to a point in the system where it can be measured, for example, available from a certain point in the internal network. Try to take responsibility for the portal piece in the system, and then negotiate proper SLAs with the other system owners.

    Needless to say, costs will determine the high availability factor to which we can commit. Redundancy of hardware and software is not a trivial expense.

    The problem of measurement is an obvious one regarding response time. However, it does apply to all the topics that you define in the service level agreements. Whether they are technical or non-technical, know in advance and describe in the SLAs how you measure the numbers.

    The best advice we can give you is to clearly define the term availability in the project plan to mitigate any risks and allow you to reach the goal. You are fully convinced that no matter what high availability high availability number you target, WebSphere Portal is now mature enough to comply.

     

    Number of registered users

    This number is often useless. There are certain projects where this number does matter. For example, if you have the users in an LDAP database, this number factors into the execution time of LDAP search paths. A good example of this is the use of the People Finder portlet.

    It also plays a role in the estimation of hard disk capacity of the database and might even remotely affect the general non-functional requirements of the database. Generally, the effects of this number are overestimated.

     

    Number of concurrent users

    The number of concurrent users describes the number of users that are currently active on the Portal server. This is misleading, because it is not the same as activity, and it must not be assumed that we can calculate out of it the amount of work done on the portal. Therefore, the number on its own is not very helpful when calculating the number of CPUs required for the portal system.

    The number of concurrent users is important when calculating the Java heap size. Set the Java heap size to facilitate efficient application execution, including garbage collection. Garbage collection begins to become excessive when the Java Virtual Machine (JVM) cannot find sufficient memory to execute a request. Therefore, garbage collection can be one of the biggest bottlenecks for an application. Depending on the size of the portal system, a close assumption of the required Java heap size is also important in order to calculate the number of virtual machines (VMs). See “Where to cache?”.

    To calculate the required heap size from the number of concurrent users, we need to know the amount of memory a single session requires. Without knowing ahead of time what size a single session will be, you are again at a disadvantage. The size of a single user session in a clean portal out of the box is about 5 KB. How much a single session in the portal system will require depends the

    The number of concurrent users can be drastically altered by simply altering the session timeout. A two-hour timeout might show six times the number of concurrent users in comparison to a 20-minute timeout.

    Tip: Session timeout is closely related to the calculation of concurrent users.

     

    Think time

    Think time is the amount of time a logged-in user pauses before requesting a new page view. Note that in a Techline sizing exercise, the default is 30 seconds. Sometimes, it is impossible to get an accurate number any other way but empirically. More than likely, have the portal application up and running in order to get a good estimate for this value. Remember, this is an iterative process.

    If providing a best educated guess is not possible, we can enable a pilot with a selected group of users. Selected user-group pilots do occur quite frequently during big portal projects. These pilots, in conjunction with usability pilots, offer solid feedback to the project team. Consider think time as an important component of the stress tests.

    As a general rule, you see think times higher in transactional portal sites than in content-based portal sites. Think times in intranet portals appear to be at the low end, low-cost e-commerce sites fall in the middle, and banking sites are at the high end. Another approach is to duplicate the think time from similar Web site as a starting point.

    Without having the average think time, it is almost impossible to get an accurate picture of the requests per second variable.

     

    Requests per second

    A request is anything that causes a Web page or information to be regenerated. This number is requested for a Techline sizing. You usually need to calculate the requests per second from other provided information. This is a very important number to have, because it helps to calculate the CPU load that you will experience.

    Be careful if this number is just given to you. It might be purely a best guess and not be based on a solid calculation. Remember that it is the duty of a good IT architect to detect if input from the business side is inaccurate.

     

    Logons per second

    This number is valuable in calculating the CPU load. However, it is more important for accurate CPU sizing of the LDAP and database servers than for Portal server sizing. In WebSphere Portal V5, there has been much improvement in performance for the logon task. It is still quite an intensive operation for the LDAP and Portal database servers.

     

    Percentage of anonymous requests

    An anonymous request is a request coming from a user that is not authenticated and thus has no session within the portal environment. The big advantage that we can gain out of an anonymous requests is that these requests enable you to perform massive caching.

    Tip: Try to avert the requirement to turn on public sessions.

    Having no session, they also require less memory in the heap. Generally speaking, anonymous portlets provide less functionality and are usually simpler, but also consume less resources in contrast to a portlet that performs a transactional requests to a back-end system. Anonymous users do not log on and thus do not generate a logon requests, thereby easing the load on both the LDAP and portal databases.

     

    Page response time

    Response time is a number perceived by the end user as the time it takes the Portal server to respond to a request.

    The page response time is a very important number to negotiate. More times than not, the number supplied here is unrealistic.

    Assume for a moment that the application cannot be improved any more. We can decrease the load on a CPU by increasing the response time during page execution. A portal CPU can serve up to 10 times the number of requests for a 40 second response time than it can for a .5 second response time. Both numbers are too extreme, but provide you with an idea of how these two relate to each other.

    The more load on the portal, the longer the response time.

    To allow for a more reasonable sizing, you might want to plan for a generous response time during peak hours. This creates a more robust response time in an average load. Accurate, realistic numbers are very helpful in sizing for the portal system.

     

    Techline sizing

    We can size required hardware for a portal implementation by employing the Techline Sizing Center.

    Note, vis-a-vis Techline sizings: The map is not the terrain.

     

    Calculating the costs

    You begin this section with a quote from Albert Einstein, “Everything that can be counted does not necessarily count; everything that counts cannot necessarily be counted.” This is how you approach calculating the costs of owning WebSphere Portal. IBM WebSphere Portal drives sales, employee productivity, and site satisfaction while helping to reduce employee turnover, travel time, and training expenses. WebSphere Portal simultaneously reduces costs and improves service. The challenge is not only finding return on investment (ROI), but helping you choose the best ROI model.

     

    Return on investment

    At some level, all business decisions will have to look at cost. For example, there is a cost to implementing a search engine and there is a cost associated with people not finding the content that they are looking for on the site. When looking at adding a search engine to the environment, consider the cost of adding the solution from two perspectives: the cost and the time to implement.

    From the cost perspective, the purchase price of the software might be significant. Most products require some form of maintenance for ongoing support and upgrades. Make sure that you understand all of the costs associated with the initial purchase and ongoing maintenance of a given solution. From the time to implement perspective, after you purchase a search product, install, configure, and implement the solution to work in the environment. This cost is often difficult to define, but underestimating this effort can lead you to flawed ROI numbers.

    When you bring these (and other) factors into the calculation, it is possible that the return on the investment is not high enough to invest in a search product. If this is the outcome for a portal decision, consider performing a return on investment study. Refer to the following link a tool to evaluate cost factors and revenue benefits:

    http://www.ibm.com/software/sw-lotus/lotus/general.nsf/wdocs/roicalc

    Use the term “total ROI” to denote the combination of the benefits (the increased ROI due to increasing benefits, keeping costs the same) together with the ROI impact of cost avoidance.

     

    Real cost reductions

    How to quantify hard dollar cost savings for a portal implementation. when considering a WebSphere Portal implementation? At IBM, you migrated the intranet site to WebSphere Portal in April 2004. As part of that implementation, you made collaboration tools available to a workforce of more than 320,000 users, 40% of which do not work in a traditional office setting. IBM employees host more than 10,700 e-meetings per month and log more than 141,000 person hours in e-meetings. IBM conservatively estimates that based on the number of attendees and the number of Web conferences, you save $10 per Web conference due to avoided travel costs. This figure takes into account that not all e-meetings required travel. You average 10,700 Web conferences per month with 6.8 attendees per Web conference.

    Important: IBM estimates that it saves $7,898,000 in annual travel costs.

    IBM reduces survey administration and analysis costs through the use of Web forums and online survey tools, also a part of the portal, and generates an increased response rate. You saw a reduction of five surveys per year at an average cost of $2 per survey.

    Important: IBM estimates that it saves $3,192,730 in annual survey administration costs.

    Determine if we can apply some of these metrics for the business practices, and along the way move, the portal mission from an outward bound communication tool to a platform where real work gets done.

     

    Initial technical considerations

    This section provides a good overview of the technical considerations to consider while planning a WebSphere Portal solution. You often see projects where this has been done too late or too early.

    Too early is, for example, if there is no business justification for a portal yet. You would have no reasonable basis to do the planning. You see organizations declare WebSphere as their middleware and purchase WebSphere Portal for their emerging technology or research and development staff to experiment. The missing link here is a line of business with a need for a portal application.

    On the other side, it is too late if the project plan is already established and the time line fixed. You would have no room for the technical considerations to still influence the project. Your only course of action is to make the best out of it, which is a bad start for an infrastructure project.

     


    Topology planning

    There are many presentations out there describing various topologies. This section provides some typical samples, mentioning their advantages and disadvantages.

     

    Standard configuration

    The configuration below is the general best practice to build up a WebSphere Portal environment. This describes a classical 3-tier portal layer topology, including a demilitarized zone (DMZ). With the exception of the single server developer installation, this is probably the configuration that is most often tested in the software lab.

    The first tier is behind the outer firewall, sometimes also called domain firewall because it secures the domain. This should be a fairly naked environment, applying security using the concept of least complexity. You would never deploy the Portal server in the DMZ. In addition, do not deploy any other JVM-based systems there. Keep the DMZ as clean as possible.

    For security reasons, you generally recommend some hardened UNIX-or Linux-based system.

    We can usually use some low-cost servers for this layer, but do not use a cluster on this layer. The machines are inexpensive and software, such as the IBM HTTP Server, do not require license costs per CPU. While this layer needs to be load balanced by some Network Dispatcher, the load balancing of the Portal servers is performed by the WAS plug-in that runs within the HTTP server.

    We do not include the Network Dispatchers in the illustration, because they do not work on the upper layers of the network protocol (for Network Dispatchers, it is sub-IP level, which equals OSI-Level 2), and therefore, a network administrator is usually in charge of selecting and configuring them.

    It is reasonable to include a DMZ. The consequences of not having a DMZ can be more than the cost of a DMZ.

    Important: The DMZ is also a must have concept if you are planning an intranet portal.

    More security threats happen from inside a company than from outside. All connections from inside or outside have to be terminated in the DMZ.

    The backside of the DMZ builds the DMZ firewall, sometimes also called a protocol firewall. Here, do not have the same ports open as you have on the outer firewall, where you should allow only access on the ports 80 and 443.

    The back end shows various databases and other systems. Those systems can, for example, include host systems, custom-developed systems, or even systems with proprietary protocols. It depends, therefore, on the already available topology and on the type and the goals of the portal application. You highly recommend using a LDAP system for the user registry. Depending on the amount of data you expect in the LDAP, verify the implementation you select is able to cope with the requirements. To use a LDAP system that is already in production, make sure that it can handle the additional load that comes through the portal and also make sure that you have a mirror system of the target LDAP available for the stress tests.

    It is not unusual to also have the back-end database or the other systems, or both, separated by one or more firewalls.

     

    Alternative to the standard configuration

    There is an alternative to the previously described standard configuration that you do see more and more often in the field where the Web server is replaced by a caching proxy. The use of a reverse proxy allows much more cached data than, for example, the Web server plug-in can do on its own. Similar to a Web server, reverse proxies also allow to host static public content. In addition to the technical features that come with a caching proxy, such as WebSphere Edge Server , there are sometimes pure organizational reasons. Network and data center administrators sometimes do not any software components designed by application architects running in their DMZ. For the most part, this is because of the fear that it might get harmed from a security perspective. Therefore, you might have to move the HTTP server with the WAS plug-in out of the DMZ.

    We do not recommend that you leave out the Web server at all and rely on the load balancing that you will get out of the DMZ. The additional hop of a Web server should not influence the performance negatively, and the computing resources required should be covered. It is worth the resources to have the proper balancing ensured in case a WAS goes down for maintenance or due to a program failure.

    You did not find any performance comparisons of accessing WAS directly versus having a Web server still in place. You did not, however, discover many sites who use this configuration, and again, you do not recommend it.

    There had been internal tests done with a co-location scenario to see if we can gain performance by increasing the priority (priority boosting policies and priority levels) of the JVM. Having all components in place optimally tuned and for a given fixed client workload, it did show a marginal performance improvement. Trying the same configuration on a sites’s portal showed no effect or a negative effect. It is likely that the reason was the sole CPU power to handle the connections, not the throughput.

    It is not necessarily a problem to have IBM HTTP Server and the Portal server on the same machine, as long as you build up a DMZ, and do not deploy the application server in the DMZ. From a performance perspective, there is no reason not to run a lightweight Web server, such as IBM HTTP Server or the open source Apache, on the same physical machine.

    Note that this is a very general statement and that there might be reasons, such as a high load on static content from the Web server, that might lead to a different configuration. The following section describes an improved standard configuration.

     

    Improved standard configuration

    In the two previous sections, we describe two configurations, each with a drawback: either no caching proxy or the collocation of the Web server and WAS. If choose between them, take a close look at the work they are supposed to do in the environment. Can you leverage at all the advantages that a caching proxy gives you? Are you able to push all work away from the Web server so that it does nothing else but host the WAS plug-in? Sometimes, you might not have the chance to get the preferred way due to restrictions from the host.

    We can improve the topology with a separate Web server layer. The advantages include:

    1. Easier maintenance (because there is then only one product per node)
    2. Greater flexibility in the architecture for later changes, because both components in are in one place and one is fully accessible because it is not in the DMZ

    From a pure technical point of view, it is hard to understand why leaving out these clustered HTTP servers is a worthwhile topic to discuss. Because hardware is so cheap and license costs are not an issue in this layer, adding a couple of more nodes should not be an issue. But the experience in the field, which shows extremely tight budgets, does prove that it is sometimes an issue. Just be careful not to run into the trap of saving $2,000 for hardware or hosting and spending an additional $20,000 for consulting services.

     

    Example with calculated single points of failure

    Having no single points of failure in the system is a core element for any reliable system design. Because a reliable system is a logical and natural requirement for a portal system, the best practice for a portal is to aviod any single points of failure in the system.

    However, because portal systems are not built because of architectural designs but because of rational and clear business needs and decisions, there are cases were the general best practices do not apply.

    Naturally, you want to provide a reliable system. However, it all depends on the service level agreements listed in the contract. It is up to the parties involved to define clearly what they understand by the term reliable system.

    Let us assume that the diagram below describes a public information system, such as the portal of a city, containing information about events and other things going on in the city. Assume further that all access to this portal site is anonymous, with required logins or sessions. We can agree that you do not require a WebSphere cluster with session failover here, because the WebSphere administrator is the only one able to take advantage of session failover. The same applies to the databases and LDAP server. Only the content management system (CMS) should be able to reliably deliver its content.

    If you add a free-of-charge service to the scenario, for example, to give the users the ability to write and publish their own blogs, would that lead to a different situation? If the LDAP has a failure, users will not be able to log on, and if the database or the portal node on which the users currently resides has a problem, the users will lose all their work. Therefore, consider how much you are willing to invest for the reputation of a free-of-charge service.

    Going even a step further, assume that we can log on to this portal as a representative of a company to add commercials such as banners. Here, you always want the users to be able to submit their commercials. In this scenario, you are concerned about how high the probability of such a problem is and how much business do you lose during such a situation.

     

    Example of a high-end configuration

    It is technically possible to have a single set of HTTP servers act as the front end for all IWWCM (WWCM) and WebSphere Portal servers. This requires a manual modification to the plug-in that combines elements from the plug-ins created by each cluster.

     

    Clustering

    There are two main reasons to create a cluster...

    1. To scale horizontally, vertically, or both

      IBM recommends giving a JVM < 1.5 GB of heap with one JVM per two to four CPUs.

    2. To reach a higher availability

    There tend to be more horizontal clusterng implementations than vertical clustering for the following reasons...

    1. There are physical limitations to scaling hardware viz-a-viz CPU and memory usage.

    2. WebSphere Portal scales linearly.

      Tests have shown that WebSphere Portal on its own scales linearly, however, this is not always true with the applications built on it. Garbage collection can slow down a system, depending on how often it occurs. To work around this, apply reasonable changes to the JVM parameters.

    3. Virtualization technologies.

      IBM Eserver pSeries servers offer the ability to create multiple logical partitions (LPARs) on one powerful machine. From a portal perspective, each of them looks like a real physical machine. This allows much better management of the available server resources. For example, instead of running three vertically clustered nodes on an 8-CPU machine, we can now create the same number of LPARs and cluster it horizontally. This shifts more of the administration to managing the operating system and reduces the burden from WebSphere administration. Tasks such as the portal installation and upgrades are simplified. For example, port conflicts are eliminated. Growth management becomes easier.

      For the pilot release, for example, you might want to put development, test, and production environments on a single machine, and then, as the system grows dynamically add more LPARs.

    4. Blade servers.

      Blade servers typically come with four or less CPUs. Because WebSphere Portal scales linearly, consequently there is usually no need for a vertical cluster.

     

    Security concepts

    It is beyond the scope of this paper to discuss security concepts in detail. Reference Redbooks and Redpapers that focus on security, for more information.

     

     

    SSL encryption

    One of the main reasons not have all data SSL encrypted is the performance decrease that goes with it. The encryption of data is a performance-intensive task, both on the client and the server side. This again might lead to a less satisfactory user experience. One of the main reasons not to use SSL encryption at all are costs. SSL certificates and their maintenance are expensive. SSL termination often requires additional hardware.

     

    Test to ensure that the logon credentials are encrypted while transmitted

    No matter what the browser shows you, make sure that you test with a network sniffer, such as Ethereal to be sure what really got transmitted over the network.

     

    Request the SSL certification early

    Assign someone to be responsible for SSL certification. Discuss this issue early if you require a certificate from a certificate authority (CA), such as VeriSign or Thawte and prepare for the proper requests early. If you are running an Internet portal, you always want to get a certificate from an official certificate authority. For the development and test systems, you might want to start off with a self-signed certificate.

    If you are building up an intranet portal, there might be already certificates in the company that we can reuse.You might also use the self-signed certificate, but for better user satisfaction, use a cross-signed certificate of an official CA.

    Secure the SSL certificate. Make sure that only authorized people have access to the SSL certificate itself. Securing the certificate while still distributing it over the terminating servers is a challenging task that is frequently underestimated.

     

    SSL ID tracking

    SSL ID tracking is a convenient method supported by WAS for tracking a user session using the SSL ID instead of using a session cookie. Because the policies of some companies or the private policies of some users prevent the use of cookies, this is a good opportunity to solve this for certain portal systems.

    Starting with Microsoft Internet Explorer V6, Internet Explorer, and all new browsers (such as Mozilla, Firefox or Opera), appear to allow the usage of SSL ID tracking.

    IBM has seen this technique successfully in use at one project. There, it was used as a bypass. Usually, the user gets a session ID cookie, but if their browsers do not accept this, they get redirected to another proxy in the DMZ that leverages SSL ID tracking.

    If you intend to use this technique, make sure that you contact an experienced portal architect during planning. This has not been tested with WebSphere Portal and is therefore not supported. Additionally, you know that IBM GSKit, which is the SSL library used on the reverse caching proxy of IBM WebSphere Edge Server, and IBM HTTP Server only support a fixed number of SSL ID cache entries.

     

    Hardware SSL terminators

    An often asked question regarding SSL usage is whether to use hardware SSL terminators or whether the HTTP server or proxy is able to do that. For the answer, evaluate what is possible with what costs in the configuration.

    If a Web hosting company already has SSL terminators in place in the environment, they will likely want to push you toward this configuration. In such a case, there is usually good experience in quality and performance and they have worked out good ways for how to distribute the certificates securely.

    However, SSL termination is a good chance to leverage the CPU power of the machines that you have in the DMZ. Understand that even if just a quarter of the responses are delivered SSL encrypted, it might lead to a double in CPU usage. Do not just go with this rough estimate, but make sure that you do proper load tests.

    Note: Use the same topology and equal SSL terminators for the functional and non-functional tests as for the production environment.

    In general, for SSL usage with WebSphere Portal, apply all the rules and best practices that also apply to any other multi-tier J2EE architecture.

     

    Security beyond encryption

    As you have learned in the introduction of this section, security is much more than just the encryption of data. Just to state two examples, part of the security includes:

    1. Ensuring that the portal does not break when a higher than expected load of users are accessing the system.
    2. Ensuring that the data in the database survives a hard disk head crash.

    Although these situations appear to be logical for everybody, there are more delicate situations. Assume that you have an e-mail portlet on the portal and are writing a complex, time-consuming. When you click the Submit button, WebSphere Portal returns with the Login window and the message “Your session has been timed out. Please log in again.” What do you expect WebSphere Portal to do after login? Secure the user’s content and come back with the message “e-mail submitted successfully,” or secure the user’s account by ensuring that no tampered data gets into the account and always redirecting to the Welcome page after login.

    What is the interpretation of security here? WebSphere Portal is able to do both, but you have to ensure that it is set up correctly to address the needs and that the application does not break the definition.

     

    Non-technical parts of security concepts

    Applied Cryptography by Bruce Schneier explains in detail some cryptography algorithms and their implementation. Secrets and Lies covers the reasons for technical insecurities. The author explains throughout this book that there are more frauds because of improper security concepts than frauds because of improperly implemented encryption algorithms. In other words, sites should not invest their time in getting the latest and most expensive security algorithm for their system. They should instead invest in a clean security concept.

    Assume that you have portal system where all the best practices regarding password policies are checked and enforced. For example, the passwords need to consist of at least 10 characters including at least one number and one special character. The password must be renewed every three months, and the passwords must not be similar to one of the last six passwords chosen. If you forget the passwords, call a support line, a secure process. Because many more people than calculated forget their passwords, a lot of load was put on the support line, which raised costs and dramatically lowered user satisfaction. Therefore, the process was changed. Users then could state their user name to get their password from the operator. You hope to never see this type of security situation again.

    “Simply put, complexity is the worst enemy of security.”

    Bruce Schneier, CTO Counterpane Internet Security, Inc.

     

     

    Standard operating system level security considerations

    There are a number of standard items that you should fulfill in order to raise the security on an operating system level. The following list mentions the most important:

    1. Do not run WebSphere Portal as the root user. Although it is quite reasonable to do the installation as the root user, in order to ensure that you do not run into problems, change the ownership of the installation directory after the installation and run WebSphere Portal under the account of a specially created system user.

    2. Ensure that developers do not have write access to the WebSphere Portal installation directories on any environment. They must not be allowed to add libraries or change configuration files on their own. However, also ensure that they do have read access. This will allow them to compare configuration files and, even more importantly, get an instant view on the log files. Not allowing them access often leads to waiting cycles and wasting time.

    3. Remove unused portlets and components. Unused components might unnecessarily extend the startup time, increase the memory usage, or negatively influence stability. Be careful, know what you are doing. If in doubt about a portlet or component, do not remove it.

    4. Do not use default passwords and establish a password expiration policy.You found systems on the Internet where you could log in with the user wpsadmin using the password wpsadmin. Even though these where only demonstration systems, make sure that this does not happen to you.

     

    Single Sign-On

    Almost every company runs a single sign-on project, sometimes called global sign-on. The original trigger for most is users being frustrated by numerous, different user IDs and passwords. For the most part, the federation of accounts is the goal.

    See also:

    1. Single Sign-on
    2. IBM Redbook, Develop and Deploy a Secure Portal Solution Using WebSphere Portal V5 and Tivoli Access Manager , SG24-6325
    3. Integrating WebSphere Portal with the security infrastructure

     

     

    Client Web application SSO

    At this layer, WebSphere Portal is often misused as a replacement to security systems such as IBM Tivoli Access Manager. WebSphere Portal does nothing else but leverage the functionality of WAS, which enables a single sign-on user experience with other applications that leverage IBM technologies, such other WASs, Domino-based applications, or applications secured by Tivoli Access Manager. This is true if all of them leverage the LTPA token as an authentication token and use the same user directory.

     

     

    Portal back-end application SSO

    This is sometimes also called pseudo-SSO, because you are really not signing on with the single set of credentials. For example, you might have a portlet on the portal window that enables you to enter the credentials you always used for a certain traditional back end. The portlet saves the credential in a credential vault for you. Each time you access now the portal, it transparently logs you in to that configured system with the data out of the credential vault, transmitting, for example, Web services data from there and displaying it as you are used to on the portal window. This provides a big advantage, because the often difficult to understand authentication/authorization system of that established back end does not need to be touched. It is also often leveraged because it is an inexpensive way of migrating existing to portlets.

     

    Separate distinctive portals

    Do not mix systems with different targets. For example, do not create a portal that services employees (B2E) sites (B2C) and business partners (B2B). Because of security, possible divergent evolution paths, create separate portals for each of these communities.

     

    Content management

    Content in a portal comes from a variety of sources:

    • streaming video and audio
    • office-type documents
    • output from reports
    • digitalized assets

    These files can reside in an already existing content management application, on a file share, or in a document management system. Integration with the portal can be as easy as downloading and installing a portlet from the catalog, for example there is a Domino Document Manager portlet for displaying those documents in WebSphere Portal. Or much more likely, it will be an add-on project that will take a significant amount of planning and implementation.

    IBM has also observed the discovery of the need for a Web content management system. Many sites have a home-grown way to promote Web content to their Web site and now want to leverage IBM WWCM as part of their portal. A good way to analyze the content management requirements is to use.

    Discuss these points in the content and portlet sourcing meetings:

    1. What content do you want to display on WebSphere Portal?
    2. Where is the content stored?
    3. Who needs to create and change it, and who views it?
    4. What documents do you want to store?
    5. How many documents will be created each day?
    6. How large is each document?
    7. How many users will be accessing them (concurrently and total)?
    8. What will they be doing with them?
    9. What roles will they be assigned and how will workflow be used?
    10. What are the expectations regarding time to access, view, and store a document?

    Set realistic service level agreements regarding content presentation and manipulation.

    The Content Repository API for Java Technology (JSR-170) specification will soon have a major impact on the content management landscape. It is a specification developed under the Java Community Process (JCP) with more than 60 members representing major content management solutions including Apache, IBM, Oracle, and BEA. JSR 170 specifies a standard API to access content repositories in Java 2 independent of the implementation. This specification continues to evolve and will specifically focus on getting content under control by offering a scalable and reliable infrastructure, something many sites are struggling with today as their repositories have spiraled out of control. JSR-170 will also attempt to offer friendly user interfaces, which promise a query service to search any compliant repository in a standardized fashion without learning a proprietary search API or language.

    This standard offers great possibilities in a portal implementation effort. As a developer, this means that you will have one single API to work from without having to worry about which vendor’s repository is beneath the surface. As a chief information officer (CIO), you are probably managing several content management systems from multiple vendors. Having the same API on top of these existing applications will allow the organization to write applications without having to duplicate application logic. As a Web editor, JSR-170 will allow in-place editing of content in the portal. Making changes in the directly in the portal, instead of dealing with a vendor’s interface, will reduce time to delivery and increase accuracy of data published on through the portal.

     

    IBM WWCM

    IBM WWCM enables users at all levels to manage end-to-end Web content creation and information life cycles, personalization, and publishing for dynamic intranet, extranet, Internet, and portal sites. WWCM allows the right information to get to the right people at the right time, so portals and Web sites are efficient and effective.

    The installation of WebSphere Portal comes with a complete WWCM installation. Only configuration steps are now required to deploy WWCM. WWCM can be purchased and deployed separately or as part of a larger WebSphere Portal initiative. Refer to the IBM WWCM home page for more information:

    Every hour of time spent on design will save days of rework. There are some critical aspects of design that need to be done up front. For example, it is important to discuss the content model. Determine what the site navigation will involve and what taxonomy is required. Workflow design and security should be part of this discussion. You will need to understand who can see the content, who can edit the content, and who can delete the content.

    To be useful, content must be accurate and up-to-date:

    1. Ensuring that content is accurate and up-to-date is a big challenge.
    2. Content must be created, reviewed, edited, and approved before it can be published by the portal to end users.

    Begin the WWCM/WebSphere Portal initiative with a WWCM Architecture and Design Workshop. This engagement provides you with the key steps to building a good Web site and offers hints and tips for using WWCM and WebSphere Portal.

    There is a gap between the WebSphere Portal content model and the site and site areas in IBM WWCM. This topic is addressed in Chapter 9 of the IBM Redbook, IBM WWCM for Portal 5.1 and IBM WWCM 2.5, SG24-6792. Therefore, you strongly encourage using this workshop to help you bridge that gap.

    You also encourage you to use the Techline sizing exercise from the Sales Productivity Center before you implement WWCM. This is a separate and distinct process and uses a separate checklist.

     

    Content delivery

    One of the most important issues analyze is where you will deliver the content:

    1. Stand-alone Web site (through WWCM rendering server)
    2. WebSphere Portal (through rendering portlets or custom portlets)
    3. Stand-alone Web site and WebSphere Portal

     

    Roles

    In addition to the project team listed earlier you would typically see the following roles in a WWCM/WebSphere Portal project. These individuals make up an information architecture team. This team should have representatives from all content areas, for example, human resources. It is important to have this team led by a single person who is authorized to make decisions.

    Designers Designers help to create the corporate design. They work with the IT team to technically implement the corporate design. Designers also help to create the authors’ template environment. This is the way content will be added into the environment.
    Subject matter experts (SMEs) and authors SMEs and authors provide the content. They help to determine the taxonomy and content structure. They should also have input to the design of templates and offer support for them in a deployed environment.
    Content owners Content owners are typically content authors as well. They strive to provide quality assurance of the content. Content owners participate in pilots are provide valuable usability feedback.
    Target group (end users) End users will help to outline their business needs and provide you with feedback during a pilot.

     

    Architecture

    There are several ways to architect a WWCM environment. Take careful consideration when designing the implementation. Each site’s needs are different and there is no best practice guide. You strongly recommend the consulting services of an IT architect before you begin to configure the environment.

    1. Remote rendering in a portal environment using the remote rendering portlet to show published content on a different server.

    2. Non-portal delivery of WWCM content. Note that the WWCM rendering server is a WWCM module that needs to be is installed in the Portal server, but portal delivery is not required.

    3. Portal delivery within WWCM environment using the local rendering portlet to show published content on the same server the authoring takes place.

    4. Authoring UI in WebSphere Portal

      The authoring portlet is used to create and manage content.

    The authoring portlet does not function properly on a WWCM server that is installed into a cluster. Authoring cannot be done in a clustered environment. Furthermore, in order to make configuration changes to production servers (some of which are done through the authoring portlet), you have to remove the server from the cluster prior to making such changes.

    Restriction: The current WWCM software cannot take advantage of clustering with respect to the content repository. Each server that has WWCM enabled will need to be configured to use a separate database. This is the exact opposite of the WebSphere Portal database, which must be shared by all portal servers in a cluster.

     

    Your WWCM infrastructure

    In a typical IBM WWCM installation, there are multiple physical Web Content Management servers. In the experience, these servers end up performing one of four roles within the IBM WWCM infrastructure:

    1. Development
    2. Authoring
    3. Staging
    4. Production (live)

    Development is where you create and unit test the WWCM technical assets such as presentation templates, HTML components, menus, and navigators. In addition, this server can be the first place you install patches and fix packs to ensure that they do not have a negative impact on the WWCM servers.

    Typically, you syndicate changes to the authoring environment to roll out changes. These changes are, in turn, syndicated or replicated out to the staging and live environments after the appropriate testing.

    Restriction: WWCM does not currently support the notion of selective replication. In other words, we cannot tell WWCM to only syndicate design changes from the development server to other servers in the environment. This limitation means that be very careful when syndicating content to and from a development server.

    One approach is to set the syndicator to syndicate only live content and delete all content (through the API) prior to syndication.

    Another similar approach is to configure syndication the same way but have no live content in the development server.

    Regardless of the preferred approach, be very careful when syndicating from the development server to other servers.

    Realistically, there are a variety of common infrastructure designs in place. The primary differences in these designs are due to variations in several basic assumptions:

    1. Site size and complexity: For a small Web site, it might not be necessary to have individual servers dedicated to all four types of WWCM environments.
    2. Funding: In many cases, there is a limitation in the funding provided for building out the WWCM infrastructure.
    3. Corporate standards: It is quite common for smaller companies to omit the staging environment because it requires time and resources to perform the content review in this stage.

    Refer to IBM WWCM for Portal 5.1 and IBM WWCM 2.5, SG24-6792, for a sample of a basic, intermediate, and advanced sample architecture. There is no single answer that will work in all scenarios. The IBM Redbook includes key success factors such as designing a user-centric Web site and planning the site framework and site areas. Taxonomy, metadata, and workflow are other topics for consideration and are discussed in great detail.

     

    Cache strategy

    A typical IBM WWCM implementation shows dynamically generated pages, often combining navigation elements with files (such as style sheets or JavaScript scripts), images, and site content.

    Because WWCM is run as part of a WebSphere Portal environment, WWCM performance depends on many settings outside of the WWCM configuration. For example, WWCM performance depends on hardware, Web server caching, WAS settings, and WebSphere Portal server settings.

    For more information about architecture, consult Chapter 8 of IBM WWCM for Portal 5.1 and IBM WWCM 2.5, SG24-6792, for strategies to improve performance for:

    1. Caching

    2. Pre-rendering

    3. Dynamic cache Dynamic cache (WebSphere Dynamic Cache Service) is a WebSphere service that is enabled on an application server by default. It supports caching of servlet and JSP responses, WebSphere commands objects, Web services objects, and Java objects.

    Important: Before implementing or testing a caching strategy, it is important to be aware of other caching systems that might affect WWCM performance. If WWCM is delivering content through WebSphere Portal, consult the portal administrator and determine if portal pages delivering WWCM content are configure to be cached by the portal.

    In addition to portal's cache, review the cache settings on any Web servers delivering WWCM content configured with the WAS plug-in. Basic caching should not be affected by the Web server’s cache, but if an advanced cache strategy is used, ensure that the Web server’s cache settings are configured to deliver secure and personalized content. Otherwise, performance might degrade because the Web server is caching too much or constantly rebuilding the cache.

    A good paper to read about this topic is Using WebSphere Dynamic Cache Service with IBM WWCM

     

    Search strategy

    If the site is large or contains a wide variety of content, site visitors will expect to be provided with one or more techniques for searching the site for content. WWCM provides the following techniques for searching the content repository:

    1. WWCM integrated search module
    2. WWCM API
    3. WebSphere Portal search
    4. Third-party search product (for example, OmniFind, Verity, Lucene)

    Each search option has its advantages and disadvantages. In many situations, you will need a combination of these capabilities to meet the specific requirements.

    Review Chapter 10 of IBM WWCM for Portal 5.1 and IBM WWCM 2.5, SG24-6792, for more information regarding searching. You should determine in the planning phase which strategies you will incorporate in the WWCM/WebSphere Portal project. The security settings on content are taken into account during a search. Searching is performed on the content objects, keywords, and categories stored in the repository. A good deal of forethought about content creation and security must be done in order to retrieve reasonable search results.

    Note: The Portal Search Engine has a spider that crawls WebSphere Portal sites and regular Web sites. There is a difference between the way the crawler follows the links inside the sites. A way to overcome this is to build a custom portlet. The Workplace Web Content Management API provides you with access. Refer to Chapter 10 in IBM WWCM for Portal 5.1 and IWWCM 2.5, SG24-6792, for instructions about building this portlet.

     

    Migration strategy

    Undertaking a migration of Web pages or documents into a content management system (CMS) is often approached as a “last minute” exercise. Normally, assurances from CMS vendors imply that there are no issues associated with this part of implementing a successful CMS strategy. However, like many other aspects of IT implementations, it is the correctness and validity of the data entered into a system that determines its success or failure. This is why IBM has teamed with Vamosa to recommend the use of Vamosa's unique toolset to migrate all of an organization’s existing content into WWCM.

    Note that Vamosa’s technology represents one of the strategic options for migrating content into WWCM. There are other technical approaches available. Regardless of which specific technology is used to migrate content, determine the business case for a migration.

    It is important to clearly define the measurement of success for the migration effort.

    Based on the constraints placed on the project, identify and ratify the approach and required toolset necessary to satisfy the objectives. For example, if the volume of data to be migrated is significant, an automated approach is the best solution. For smaller volumes, usually less than 5000 pages, a manual approach should be sufficient.

    The actual undertaking of the migration should be a relatively straightforward task. All of the previous definitions have set out a framework of operation that reflects both source information and the required target system. By defining both the as-is and the to-be requirements, as well as the transformation rules and exception handling, combined with the QA necessary for the migration, all the technical elements are in place to proceed.

    Although the theory of achieving 100% data migration is the goal of any migration, it is usual in most circumstances that data is migrated within the context of iterative migration sets that work toward a 100% success rate. Typically, the “business rules” that are designed and defined will exploit a large subset of the data to be migrated. However, after the first set of runs, there will be a minor subset of data that has not been addressed correctly. Additional “business rules” will be defined and the migration steps will be rerun. This process will be repeated until a high enough success rate has been achieved.

     

    Search

    You provide this section to help you understand the general process of building a search index or search collection. You also describe details of typical problem areas that you might encounter.

    The WebSphere Portal Information Center devotes an entire chapter to the topic of WebSphere Portal Search. Nonetheless, there are reoccurring questions regarding this topic.

    As sites acknowledge the need and importance of text search capabilities in WebSphere Portal, they also find that creating search collections is not as easy as originally anticipated. Though this does hold true for most of the cases, you do see sites challenged with situations where they do not understand what WebSphere Portal Search is doing and what is being crawled and need help about how to proceed.

    You divide this section into two parts: The first part provides an overview of the WebSphere Portal Search Engine itself and then another that covers frequently overlooked topics. IBM development continues to provide enhancements with every release, soon perhaps rendering this part of the paper obsolete. Until that time, this should provide you with good guidelines about how best to use Portal Search.

    This section is based on the ideas of Andreas Prokoph's paper A Guide to using WebSphere Portal Search- First steps.

    For a clustered environment setup, you recommend reading Setting up Portal Search in a WebSphere Portal clustered environment.

     

    Overview of the Portal Search Engine

    WebSphere Portal provides integrated portal site and Web content indexing and search capabilities. In addition, there are advanced features such as methods for content categorization of indexed content and an optional workflow approvals process to manage the publication of indexed content. There are two search portlets provided for the end user, one that offers a basic Web-style search and another that offers the ability to perform an advanced search.

    To implement the search capability in the portal, it is important to understand the basic mechanisms of how information and data flows from content sources until they are stored in the full-text indexes. It might be also interesting to note here that the way WebSphere Portal Search works applies to most of the Web search engines available today.

    In order to fetch and store content in the search index, the following processing steps apply:

    1. The crawler fetches content from a specified location, which is typically a starting point URL. For Web content, it is assumed that all relevant information can be gathered by following the hyperlinks in the text.

    2. For every page, the crawler consults the rule sets or filters, if available, to determine whether or not this new page is important for further processing or if it is to be bypassed.

    3. If the page passes the filter criteria, it is sent for processing to the text analytic components. Here, the page is checked for incoming information such that it can be efficiently processed and stored in the full-text index.

    4. All required information is now available to store the page information in the search index. The last processing step is then performed by the indexer whose task is to merge the information about the page into the new or existing search index.

     

    Completeness

    Ensure that the crawler can technically reach all content to which users typically have access. If a single entry point for the crawler is not enough, adding more content sources or defining a seed list, for example, a page with a list of URLs, and pointing the crawler to that list will certainly help.

    Ensure that all content that it is hyperlinked through the home page or the initial URL provided to the crawler can be reached. Many content management systems provide integrated content in a Web site and ensure proper linkage among the pages. Many systems will allow you to generate a site map that can be provided to crawlers. A good starting point that will finally lead to complete Web site coverage is essential.

    Note if there are too few pages crawled and indexed. The administrator in charge of setting up the search collection should have a good feeling about how many pages there are. If the two numbers do not match, the crawler might not have the full access.

    There is no perfect checklist about how to build a good site map or a seed list. This again is an iterative process. Include a person within the organization who is responsible for taxonomy as part of the planning and review process.

    There is a non-interactive command line tool called wget that is worth investigating. It is free of charge, because it is provided through the GNU Open Source License. If you are in doubt as the validity of the collection coverage, try using this tool to explore the site structure. You might also include it in a regularly run UNIX cron job, for example, to check for changes in certain site map branches.

     

    Crawlability

    Crawlers, by definition, behave somewhat differently than users with a Web browser. In the past, crawlers did not honor cookie requests. The main reason was that cookies are often used for personalizing a Web site experience and crawlers assumed anonymous access to Web sites. This was true until WebSphere Portal Version V5.0.x. Beginning with , the WebSphere Portal Search Engine crawler began to offer support for cookies. The biggest hurdle yet remains with JavaScript. Crawlers do not interpret JavaScript at this time.

     

    JavaScript

    Crawlers do not provide a JavaScript interpreter of their own. One of the reasons is again because of its use for personalization and security. Therefore, the crawler will simply skip JavaScript clauses found in the HTML sources. One could argue that URLs are often seen in the JavaScript itself, and a simple parser could help in pulling them out and using them as a page reference. However, in those cases, it is unknown as to whether the crawler is eligible to do so. An example is an “if-else” clause in JavaScript that provides a link for the anonymous user and the other link for the authenticated user. Although this is not a good practice in terms of security, JavaScript is often used this way.

     

    robots.txt

    The search engine looks for a special file named robots.txt. The robot directives are usually defined and configured by a Web site owner rather than by a portal site owner. If the crawler is prohibited from crawling the site or parts of the site, this request must be honored; even the administrative interface of the crawler allows the crawler to ignore the robots directives. If you really want to use this feature, which is mostly only required within a company’s intranet, it is a good practice to inform the affected Web master or site administrator and request permission to crawl those areas.

    Here are a couple of general reasons for setting up the robots.txt directive file:

    1. Enforcing copyrights on published materials.

    2. Locking out crawlers from visiting certain areas of the Web site, for example, preventing the submission of an order forms or initiating other types of actions.

    3. Load balancing

      There are many crawlers active on the Internet. These can potentially hit a Web site one or more times per day, affecting performance. To protect the user experience of a Web site, many crawlers will be prohibited from visiting the site. Note that crawlers generate a much higher usage rate that typical users do. Consequently, only the very popular ones will be allowed to run.

    The robots.txt file applies to Web sites only. Such a mechanism is usually not useful for portal sites. This is because the robots.txt file relies on patterns within URLs to be descriptive and used as a unique reference to pages or subdomains of a site. However, in WebSphere Portal, the URLs are of a more dynamic nature and thus render them unusable.

     

    Site maps

    Take advantage of a content management environment if available. These are typically able to create a site map for you on the first page, making crawling very efficient.

    1. A site map should include all pages that a crawler should fetch.

    2. It should not contain more than 100 to 200 links on a single page. If more links are available, place a “next page” link at the end and point to the continuation page.

    3. If placed on the first page, the crawler can efficiently traverse through the list of links. If it encounters additional links off visited pages, it then determines if it has already visited that page. The efficiency of this really depends on the crawler’s implemented logic.

    4. The site map is an option used to determine a list of all pages to be crawled and indexed. This can also be provided by the home page if you are certain that the crawler can reach all subsequent pages. IBM has seen too many examples where this was not the case. An extreme example is the use of an initial Macromedia Flash splash screen and not providing a crawler friendly way to bypass the initial screen.

    Note if the crawler starts, but comes to an end after a few minutes. You might have run into one of the cases where crawlability failed, perhaps due to JavaScript, robot directives, or a home page that does offer full coverage.

    Note: If the WebSphere Portal Search Engine crawler is not able to crawl the site, the crawlers of Internet search engines such as Yahoo! and Google will not either.

     

    Configuration and administration of search collections

    Search collections are partitions of information logically grouped into independent content sources. The goal is to ease overall search performance, thereby increasing the quality of the search results. Part of the partitioning can be performed by referencing subdomains of the overall Web site, others by applying filters that define what content the crawler fetches and whether or not it is then also indexed.

    A sample partitioning for a search collections might be:

    1. Regular content

      Content that is updated frequently should be defined as a single content source. In many cases, the content is published as HTML. Here, it might be, for example, all standard HTML pages with a decent size of a 10 KB maximum.

    1. Static content

      This might be for archived materials. The update frequency can be adjusted to once every quarter or less often, for example.

    2. Large pieces of content or documents

      The processing of large items such as PDF and zipped files can take a considerable amount of time. It is important to strike a balance about what to index and how often.

    The following steps describe how to create a search collection:

    1. Create the search collection.

      Consider the delta between features offered and the performance impact they might create. By using the non-default, advanced features, the processing time might increase from insignificant, for example, about 30%, to very significant. The factors relate to a number of parameters, such as pages per document, size of text, and size of vocabulary.

      However, users might appreciate certain features of the Portal Search Engine such as the summarizer.

      Therefore, you might not use every feature, but instead, do performance tests or at least do some calculations on it before you promise nice-to-have features.

    2. Define one or more content sources.

      For each group of a search collection, define a single content source entry and provide the adequate definitions and configuration to ensure that the best possible search collection is built.

      One of the definitions that you might want to adjust is the number of crawler threads. This can save system resources.

    3. Initiate the crawler process.

      A frequently asked question is: “It seems that the crawler always runs for hours even though it does not really look like that it is doing anything because the hard disk does not seem to be in use. Is this normal?” You might have run into the problem of not adequately separating the search collections. Proper filters might not have been applied, and the crawler might have run into GB-sized ZIP files, so now the document converter is trying to generate an HTML representation from it.

    In summary, the advantages you gain from partitioning the search collection include:

    1. Throughput of data significantly increases. Your focus should be on content that typically changes often.
    2. Static content, parts of the site that are not updated very often, can be set up to crawl with very long intervals. This saves system resources.
    3. Processing of large documents can be isolated within a separate process that is invoked (scheduled) independent of the more rapidly changing content.

    Tip: It is worth the investment of the time to plan for search collection partitioning.

     

    Search collections in a cluster scenario

    WebSphere Portal Search Engine builds its search indexes on the file system of the server on which it is installed. There is no way of clustering these search indexes.

    We can install Portal Search Engine on one of the portal servers and then from within the cluster, configure WebSphere Portal to use that search service, thereby creating a remote search function. This does mean that you have a single point of failure as far as search is concerned. In the future, Portal Search Engine will build its indexes in the database and they will become part of the portal cluster.

    If you configured a remote search service for a portal cluster, configure the default location for search collections to a directory on the remote server that has write access.

    It is very important to plan the search collections before you start the collection process. The portal site default search collection is created only once. This happens when the portal administrator selects the search administration portlet, Manage Search Collections. If this is done before you configure the portlet for remote search, the default portal site search collection is only available on the primary node of the cluster and is not available on the remote server. If this happens, re-create the portal site collection to make it available for search on all nodes of the cluster.

     

    Virtual portals

    A virtual portal is accessed through a specific URL mapping, which is associated with that virtual portal. By definition, each virtual portal must be accessed with a unique URI.

    Virtual portals can be customized to expose a unique look-and-feel. Different themes and skins can be assigned to different virtual portals. In contrast to true portals,

    For login, you may not use screens. Due to the implementation of screens, they are a concept no longer usable for virtual portals.

    Each virtual portal can have its own distinct user population.

    For the administration of virtual portals, the delegation model of WebSphere Portal access control is leveraged. Within a virtual portal, subadministrators can apply access control independently.

    Login and Selfcare are implemented as a portlet and can be customized for a specific virtual portal. As already described, the implementation concept of screens did not allow a transformation to virtual portals. Therefore, the functionality formerly only available as screens was moved to portlets, which helped more than the virtual portals feature. For example, the management of various versions is easier with portlets than with screens.

    A new administration portlet exposes a user interface to manage virtual portals.

    See also: Creating portal instances on demand

     

    True multiple portals vs multiple virtual portals

    True multiple portals consist of the Portal software installed multiple times on a single environment. In contrast to running multiple virtual portals, you share less resources but get full individual configuration options.

    The table below gives an overview of what resources are shared in which configurations.

    This table also incudes the column “Virtualization features,” which might be the IBM AIX 5L feature of LPARs or VMware installations.

    IBM does not recommend VMware-based portals for production usage and will not support a portal implementation on VMware.

     

    Level of sharing

    Resources Virtualization features True multiple portals Multiple virtual portals
    Physical hardware resources Yes Yes Yes
    Operating system No Yes Yes
    WAS and WebSphere Portal libraries and settings No No Yes

    Important: The more resources you share, the more dependencies you create.

    Due to the steadily increasing pressure to reduce costs, the pressure to share or reuse components and resources increases as well.

    Assume that an environment gets only load once a month or once a quarter. You want to find ways to leverage this idle hardware the rest of the time, but you will likely need to act strategically and thoughtfully, because it might be of high importance that this environment performs very well that single day. Some new IBM AIX 5L features might be of help here.

    Similarly, we can save costs by hosting multiple application on a single server. Just combining only two applications on a single operating system saves you the administration and maintenance of one operating system. If there are critical OS updates available, for example, security fixes, upgrade one less system. Assume, however, that the project iteration just finished and they intend to upgrade the application. It might now have different prerequisites, such as requiring a later OS level. If the other applications are not compatible with an upgrade, separate them again. Additionally, can you ensure that none of the applications will break the OS and therefore stop all other applications?

    Note: By sharing resources and thus saving costs, you buy in risks.

    You agree that the technologies discussed have been used for many years and there are operating systems with very good tools that ensure that certain applications get their allocated CPU time and use of memory within an established maximum.

    At the level of virtual portals, there are many technology dependencies between the portals, and we cannot assume that there are proven tools that ensure that the portals cannot damage each other.

    In summary, you conclude that virtual portals are a great technology, if you have the right use for them. For example:

    1. Portals that have the same development iteration cycle and the same development team
    2. Portals that use the same applications, maybe even the same user groups and have the same project owner
    3. Portals that have similar usage targets, thus requiring equal security considerations

    Some of these arguments might fit into the portal systems; therefore, the question is whether we can take the risk. At single portal systems, you often see problems because of class loading, for example. A new, correct portlet gets deployed and problems still occur because of some previous mistake that did not yet show. You often see people with projects where they must look for memory leaks within one of the portlets or the deployed portal add-ons. If you increase the set of code and the number of people involved that own code on the portal, these problems might multiply. In addition, note that logging is not scoped to the virtual portals, because there is only one WebSphere Portal installation based on one WAS installation.

    From a project owner point of view, you will understand that the portal system is not up and running if some code in that system does not behave correctly. However, you might have a hard time understanding this if it is because of the code of some other project.

    If host more than one portal system and have the requirement to share resources such as hardware and OS, also consider installing WebSphere Portal multiple times in the environment. The cost factor that increases here is not CPU, and often it is also not administration. Today, because port conflicts are not an issue any more, the additional administration for the additional application server out weighs the problems that you might encounter by hosting the virtual portals of separate portal project development teams. However, the requirements regarding RAM will be much higher.

    Tip: WebSphere Portal licensing does not distinguish between true multiple portals or multiple virtual portals. You are allowed to install as many portals on a machine and as many virtual portals.

    The interesting factor for licensing is the number of CPUs you are leveraging based on the pricing model.

     

    Developing a portal

    A well-designed portal can provide a common user interface and content base that can be integrated and leveraged across all portal applications to deliver a unified, collaborative workplace. In this chapter, you discuss design, integration, and performance topics to help you create a first-class portal application.

     

    Customization versus advanced personalization

    Before you talk about developing a portal application, we need to discuss some terms that will help you understand how we can create a unique experience for the users. Use the term customization to mean the rendering of portlet content based on users’ preferences or manipulating the portal layout based on the users’ security attributes. Use personalization to mean delivering portlet content based on a business rule or collaborative filtering. Collaborative filtering uses statistical techniques to identify groups of users with similar interests or behaviors. Inferences can be made about what a particular user might be interested in, based on the interests of the other members of the group.

    Customization centers around what resources (portlets and pages) you show the users based on their role. This is a core capability of WebSphere Portal. There are tools provided that help administrators provision these resources. Typically, portlets enable users to specify what kind of information should display. For example, a weather portlet might be customized to show weather in the user’s home town.

    Personalization can deliver content to a user based on a profile and business rule, or determine characteristics of a user based on previous purchases or pages viewed. Personalization then selects content that is appropriate for that profile. For example, if a person has a high salary range, personalization can be configured to retrieve information about premium products; the page is assembled with the proper information and the user sees the personalized page.

    Include some planning time for deploying personalization so that we can optimize performance. Personalization includes a rules engine, a recommendation engine, and a resource engine. Although a full installation of WebSphere Portal installs and configures personalization functions by default, there are additional databases to consider. For example, personalization uses IBM DB2 Content Manager Runtime Edition for the storage of rules, campaigns, and other objects. There is a logging framework used to record information about Web site usage to a feedback database. The LikeMinds recommendation engine also requires a database to store information gathered through the logging APIs.

    To use personalization, we need content to display to the users. Therefore, personalization and content management go together. You will also need to understand where content is stored and how it is managed to optimize performance.

    The Techline sizing exercise does not take into account the additional resources (processor, memory, hard disk, and so on) that you might require for using personalization yet. The performance lab will hopefully include this in their metrics in the future. However, for the moment, you do not have any good quantifiable rules to apply here.

     

    Developing portlets

    IBM Rational Application Developer enables you to design, develop, analyze, test, profile, and deploy Web, Web services, Java, J2EE, and portal applications within its integrated development environment (IDE). Rational Application Developer and the Portal Toolkit are tightly integrated with WebSphere Portal. You will receive one license of Rational Application Developer for every WebSphere Portal server you purchase. Rational Application Developer generates the model view control (MVC)-compliant code for both the IBM and the JSR-168 portlet API. See “Caching” for more information about these two. The IBM Redbook IBM Rational Application Developer V6 Portlet Application Development and Portal Tools, SG24-6681, will help you get started with development:

    http://www.redbooks.ibm.com/abstracts/sg246681.html

    An excellent resource for developers is the IBM developerWorks Web site, provided by experts at IBM to assist software developers. developerWorks is a place to access tools, code, training, forums, standards, and how-to documentation. Refer to the following link:

    http://www.ibm.com/developerworks/

    As you begin to develop your own portlets, you will create many versions of the same portlet. Consider these portlets as assets and collect them as you would tools in a toolbox. Try to keep the portlets simple. Designing portlets to be flexible, providing the end user with many configuration options, is risky. This often confuses the end user and leads to unexpected results on the portal page. Consider creating multiple, single-purpose portlets instead.

    Tip: Avoid creating the super-configurable, multipurpose portlet. Your development goal is to keep portlets small and simple, making them easier for other developers to understand and to maintain.

     

    APIs and frameworks

    There are two portlet specifications available today. Which one should you use? One portlet container is for the new JSR 168 Portlet API and the other is for the IBM Portlet API that WebSphere Portal supported before JSR 168 was available. When creating a portlet, a developer has to declare which type to use. The IBM Portlet API offers more functionality than the Java portlet standard. For example, the IBM Portlet API makes better use of native WebSphere services such as retrieving authentication credentials. However, portlets built to the IBM specification cannot be used in any other portal, only WebSphere. IBM has indicated that it will eventually end support for the IBM Portlet API at some point in the future. It is not imminent, however, and it is possible to migrate from the IBM standard to the Java standard.

    If you have a need for one of the Domino portlets, there is no choice; they support the IBM API. However, in WebSphere Portal , IBM portlets can share data with JSR 168, so we can create new portlets to interact with them.

    Tip: Use the JSR 168 Portlet API whenever possible.

    IBM strategic direction is based on the JSR 168 standard. For developers, this means that IBM portlets will most likely have to be migrated to JSR 168 at some point. In addition, JSR 168 will probably get better with time as new versions come out with added capabilities. In the meantime, IBM provides extensions to enable JSR 168 portlets to take advantage of some of the capabilities of WebSphere, according to Stefan Hepper, architect for the WebSphere Portal Server Development at IBM. Read his paper, Portlet API comparison white paper: JSR 168 Java Portlet Specification compared to the IBM Portlet API, available at:

    http://www.ibm.com/developerworks/websphere/library/techarticles/0406_hepper/0406_hepper .html

    Struts is a very popular framework for Web applications using a model view controller design pattern. The Struts framework can be used to effectively design Web applications and support development teams of different sizes and organizations. It is important to note that WebSphere Portal provides a Struts portlet framework to be used with JSR 168 portlets. A good example of where to use Struts is for portlets that require a wizard interface.

    JavaServer Faces is a technology defined by the JSR 127 standard that helps you build user interfaces for dynamic Web applications that run on the server. The JSF framework manages UI states across server requests and offers a simple model for the development of server-side events that are activated by the client. WebSphere Portal includes support for JSF portlet applications by providing a JSF portlet run time that makes is possible to run JSF applications as portlets in WebSphere Portal.

    JSF will dramatically change portlet development in the future. It is important to start learning JSF. Refer to the following excellent sources of information:

  • The WebSphere Portal Information Center:

    http://publib.boulder.ibm.com/infocenter/wpdoc/v510/index.jsp

  • An article about developing JSF portlets with Rational Application Developer:

    http://www.ibm.com/developerworks/rational/library/05/genkin/

     

    User interface design

    Successful project implementation begins with setting proper expectations. Be careful what you agree to regarding customizing the portal theme. IBM has often seen where an organization will hire a marketing firm to create a new image or presence for them on the Web. This engagement results in the delivery of a mock-up Web site in the form of a JPEG file. This file is subsequently handed off to the portal implementation team, sometimes well into the project, with the direction, “Our portal needs to look like this.”

    Understand that customizing a portal theme to match an exotic, elaborate, or complex Web site can be very time-consuming.

    Over and over, you have seen teams agree to create a custom theme without performing the due diligence necessary to include a reasonable time estimate in the project plan. IBM has seen project deadlines slip because this step was grossly underestimated. One way to mitigate risk here is to settle on the theme of the portal before the project begins. Another way is to educate the user interface design team about portal capabilities.

    Tip: Designers who understand core functionality of portal are more likely to design an

    interface to leverage those capabilities.

    There are several approaches to creating a custom theme. If you are new to theme building, and the theme is not too complex, try the IBM WebSphere Portal Theme Builder portlet. This might already give you a good starting point. We can download this portlet from the IBM Workplace Solutions Catalog:

    http://catalog.lotus.com/wps/portal/portal

    This is one of the most popular downloads and enables you to create new themes with customized basic branding. The portlet provides a preview window that shows you what the current theme will look like. This approach is good for those who do not have any HTML skills.

    A second approach is to copy an existing theme and modify it. This is a good idea if you are familiar with HTML and cascading style sheets. The basic installation of WebSphere Portal provides you with several themes. We can review the sample themes and select one that is close in design to the one you are working toward. We can use any tool with which you are comfortable, but note that Rational Application Developer has a built-in theme and skin designer. Here is a good article that describes, and provides as a download, several examples of themes and skins:

    http://www.ibm.com/developerworks/websphere/library/techarticles/0502_bartek/0502_bartek .html

    Again, be careful what you agree to design. Carefully manage the amount of content and complexity of the layout, especially if you are new to WebSphere Portal. On the first theme, use only a single style sheet (styles.css). Using more than three or four navigation levels is not advised, because it becomes difficult to manage. Another tip is to be realistic about the numbers of portlets users have access to on one page. This can result in a performance problem. Also, for better manageability, you suggest that you avoid placing portlets in rows; use the column containers. Another best practice is to add lightweight portlets on pages everyone accesses, such as a bookmark portlet. Add more complex portlets to pages that users select, for example, the mail portlet.

    No matter which approach you take to design a custom theme, get end users interacting with it as as possible. Do not wait until the end of the project to get feedback. Expect many iterative changes in this process.

     

    Markup generation

    No matter how well you design and implement you portal project, it is all worth nothing if the output it delivers is unsatisfying.

     

    Pervasive access

    WebSphere Portal was originally designed as a pervasive portal by IBM Pervasive Computing Development. Therefore, its major goal was to have a portal that can be accessed from anywhere and from any device. Therefore, the portlets have the possibility to deliver different content for different markups and the portlet has to describe which markups it is able to deliver in the portlet descriptor. Based on this information, WebSphere Portal will exclude certain portlets if they report being unable to deliver a certain markup.

    This technique allows an easy implementation of pervasive portals, portals that support a broad range of markups and therefore devices. We can design and create a portal for HTML access and have at the same portal a user-interface optimized for Wireless Markup Language (WML) access to be leveraged by Wireless Application Protocol (WAP) clients.

    Over the years, it turned out that the usage scenarios of, for example, an HTML and a WML markup-based portal do sometimes differ significantly. Therefore, products such as WebSphere Everyplace Access and WebSphere Everyplace Mobile Portal were delivered. These products leverage WebSphere Portal as a core engine but add additional features on top of it. WebSphere Everyplace Mobile Portal, for example, extends the WebSphere Portal server to meet the requirements of mobile and wireless service providers. WebSphere Everyplace Mobile Portal also introduces an additional markup that enables developers to create applications independent of the specific devices, but just specific to certain broad categories. Here, developers create a meta-markup called XDIME, which stands for XML Device Independent Markup Extensions. This allows markup to be generated once without the author having to know what specific device it will be delivered to and thus addresses the very short life cycles on the mobile device market. Based on a constantly refreshing database, WebSphere Everyplace Mobile Portal decides how to generate the appropriate WML or XHTML markup that gets delivered to the real device.

    Still some corner specifications have big advantages or are necessary for development. These include screen size, type of input device, connection speed, and computing power. To describe an extreme, it is obviously different if you design an application of an IBM WorkPad c3 PDA with an 160x160 pixel black and white screen that comes with a Palm Modem that can reach a maximum of 33.6 Kbps than if you design an application for a Sharp Zaurus SL-6000 PDA that comes with an 480x640 pixel color screen, integrated WLAN, and even a small, integrated keyboard. Therefore, understand that we can deliver the markup for certain devices; you have to ensure that the applications are also usable on those devices.

    Note: You believe that portals with a poorly designed user interface and applications that

    do not leverage the mobile devices capabilities are the main factor of why there are not a lot of successful pervasive portal systems in production.

    Being able to address the expectations users who want to work with their specific devices will be a key factor for a pervasive portal project.

    For more information, consult the following articles about how to use WebSphere Everyplace Mobile Portal:

      http://www.ibm.com/developerworks/websphere/library/techarticles/0507_jadhav/0507 _jadhav.html

      http://www.ibm.com/developerworks/websphere/library/techarticles/0411_burke/0411_ burke.html

      http://www.ibm.com/developerworks/websphere/library/techarticles/0511_chen/0511_c hen.html

    For more information about WebSphere Everyplace Access, consult the appropriate Redbook series WebSphere Everyplace Access Volume I-IV, available at:

    http://www.redbooks.ibm.com

     

    HTML browsers

    The problems of supporting various browser vendors and various versions of a browser seemed to have decreased. One of the reasons might be that the pace of the vendors releasing new browser versions lessened as well. Still, the issue exists that by using certain features, the portal application will not be usable by some browsers.

    “The original HTML documents recommended “be generous in what you accept,” and it has bedeviled us ever since because each browser accepts a different super-set of the specifications. It is the specification that should be generous, not their interpretation.”

    Doug McIlroy

    We can use tools, such as HTML TIDY to check the compliance level of the portals output in reference to the latest standards and specifications.

    You are in a good position if you know what percentage of the users will use which browser.

    From the experience in the field, you have the impression that more technical users switched to browsers based on the Gecko engine (such as Firefox, Mozilla, Netscape V6 and later, and others), while most others continue to use Microsoft Internet Explorer. Due to problems of the past and to keep administration and support costs low, you also see companies with policy defining which browser the employees have to use. In general, however, by supporting and cleanly testing both Firefox and Internet Explorer V6, you should have already covered 80-99% of the users. Usually a minority of users have other engines, such as Opera and KHTML. If you support all four engines, you should be fine.

    We do not like to see iFrames on portal pages. Years ago, the main reason was that Netscape

    4.x browsers did not support it, but you do not see these browsers frequently any more. More concerns are regarding security and the portal concept. iFrames are just browsers on their own. With an iFrame in the page, WebSphere Portal would not be in control of the iFrames content, which is converse to the concept and might lead to unexpected problems. If you run into a situation where you have to use iFrames or where you strongly believe that such technologies are of advantage for the specific project, make sure that you understood the implied risks and all the necessary steps to get the iFrame working smoothly with the portal. This includes, for example, the proper session refreshing. We can learn more about that topic in the dedicated white paper written by Richard Gornitsky and John Boezeman, The use of iFrames and Web Clipping in WebSphere Portal, which should be available in April at the WebSphere Portal developerWorks pages:

    http://www.ibm.com/developerworks/websphere/zones/portal/

    Additionally, it is a good approach if you keep the JavaScript to minimum. JavaScript gets especially problematic if leads it you put logic in the JavaScript. Your HTML and all its embedded elements should represent the View component. The business logic should reside completely at the server side.

    Cross-Site-Scripting restrictions in modern browsers prevent JavaScript inside one iFrame from accessing variables and traversing the DOM within the parent window (or other iFrames) unless they are within the same domain and use the same protocol, for example, HTTP versus HTTPS. This causes many iFrame-based applications to fail (if they rely on JavaScript, and many do). When there are Cross-Site-Scripting issues, variables end up blank, causing the JavaScript to fail or stop executing. Debugging is quite hard.

    For these reasons, many existing iFrame applications simply will not work without rewriting the JavaScript.

     

    Integration

    Integration with other systems is the most complex and time-consuming part of a portal project. At that start of the project, build a single node of portal and integrate with the fewest systems necessary to make it functional. You will be tasked with configuration and integration with a Web server, a directory (LDAP) server, and a database server.

    Configure the first system, perform load tests, and get a performance baseline. Then, build another node, cluster the two, and retest to get a second performance baseline. See “Test” for more about test concepts.

    IBM has seen many installation failures due to insufficient privileges of the portal ID in the back-end system. It is most common in Oracle database environments.

    Important: Work closely with the database administrator (DBA). Insist that the portal ID

    will be granted all permissions during setup.

    Integration with back-end applications is costly because these projects tend to be complex and drag on longer than expected. Scope out these portlet development projects carefully, for example, by using the portlet sourcing method described in Appendix C, “Portlet sourcing”. You must understand the authorization model that is already in place for the back-end system before you develop a portlet. What you have observed is that every integration point tends to add another layer of bureaucracy.

    Additionally, if you are new to WebSphere Portal, try to avoid introducing too many new technologies at one time.

     

    Directory (LDAP) management

    WebSphere Portal and WAS require some form of user registry. There are several possible ways to provide WAS and WebSphere Portal with access to a user registry:

    1. Lightweight Directory Access Protocol (LDAP) user registry
    2. Database user registry (for Portal/Member Manager)
    3. Custom user registry

    What you have typically seen is sites using an LDAP user registry to store user information and to authenticate users. This section discusses the issues to consider if you plan to use an LDAP user registry with WebSphere Portal.

    Because the LDAP will typically host either site or employee data or both, it is one of the most important and critical components in the organization. When building a portal system, it is therefore important to include a person on the infrastructure team that is already an LDAP expert and knows very well how to extend and leverage LDAP schemas. This is not necessarily a full-time person on the team, but a person that should be readily available to you at critical points in the project. For example, this person will tweak portal configuration files to work with the LDAP schema when you are enabling security for the portal. This expert should be comfortable using an LDAP browser tool and should know the LDAP server replication strategy as well.

    You will need to understand if you will install a new LDAP server or use an existing LDAP server (more common). IBM has seen several sites try to integrate WebSphere Portal with an unsupported version of LDAP. Verify that the version of LDAP is supported by WebSphere Portal by checking the Information Center:

    http://publib.boulder.ibm.com/infocenter/wpdoc/v510/index.jsp?topic=/com.ibm.wp.ent.doc/ wpf/inst_req.html

    We can install the LDAP on the same machine as WebSphere Portal or on a remote machine. Installing the LDAP server on a remote machine can improve performance and is therefore recommended. Be sure to have a discussion with the security architect in the organization to determine if secure the data flowing between the LDAP server, WebSphere Portal, and WAS. If the answer is yes, set up LDAP over a Secure Sockets Layer (SSL).

    A minimum of one group and one user is required for WebSphere Portal. The required group is wpsadmins or an equivalent. Members of this group have administrative authority within WebSphere Portal. It is expected that a WebSphere Portal administrative user will be a member of the wpsadmins group in LDAP, for example, wpsadmin, although Portal does not actually enforce this. If content management functions are configured, you recommend that you also create the groups wpsContentAdministrators, wpsDocReviewer, and wcmadmins.

    Tip: Use the same user ID for more than one purpose.

    You will need to configure WAS to access the LDAP server through the Member Manager. Member Manager is the common user repository management instance for WebSphere Portal. In this configuration, one or more user registries and, therefore, one or more realm, can be created. A realm is a concept that denotes a specific body of users accessing a specific portal configuration.

    Tip: The recommended configuration is for LDAP with realm support, which will allow you to create virtual portals in the future.

     

    Collaboration components

    Collaboration features help people in the organization work together and share information online to achieve their business goals. A collaborative portal can improve the organization’s responsiveness, innovation, competencies, and efficiency. According to a survey done by META Group, 70% of those implementing a portal desire collaboration within their portal environment.

    Collaborative features within WebSphere Portal include Lotus Collaboration Center, the collaborative portlets, and the Lotus Collaborative Services API. To use these features in the portal, set up one of more of the supported versions of the following products: Lotus Domino, Lotus Sametime, and Lotus QuickPlace.

    The Collaboration Center is a set of pages deployed during the installation of WebSphere Portal, providing multiple customized instances of eight portlets including Lotus Web Conferencing and People Finder. These portlets depend heavily on the correct implementation of the LDAP server with the WebSphere Portal server. LDAP integration can be very tricky, especially if you have a customized LDAP schema. The Collaboration Center appears on the Workplace page of the portal.

    Setting up a collaborative portal should be treated as a separate and distinct subproject of the overall portal project. It requires additional planning and configuration of not only the portlets but of the back-end servers that support them (Domino, Sametime, and QuickPlace). You must include a full-time person on the infrastructure team that is dedicated to the installation, configuration, and ongoing maintenance of the collaboration servers. Ideally, this person will have expert administration skills in Domino and be already familiar with Sametime and QuickPlace as well. If you do not have this skill on staff, allow time in the project plan for a person to develop these skills or hire a person for the project that does (preferred).

    Collaboration portlets require additional configuration for compatibility with external authorization products such as...

    • IBM Tivoli Access Manager
    • Netegrity SiteMinder

    Refer to the following Technote to understand what is involved for the integration with Tivoli Access Manager:

    http://www.ibm.com/support/docview.wss?uid=swg21191185

    How WebSphere Portal uses Domino and planning the user directory are two important topics to discuss early in the planning phase of collaboration integration. Configuration varies depending on which components you are using. For example, if you plan on using portlets for both Sametime and QuickPlace, the user directory has to be an LDAP directory and they both must share that directory. This is not true if you are only using Sametime and Domino portlets. Refer to the following topic in the Information Center for more information:

    http://publib.boulder.ibm.com/infocenter/wpdoc/v510/index.jsp?topic=/com.ibm.wp.ent.doc/ collab/ksa_cfg_setupmgr_collab.html

    You need to consider the performance and availability of Domino servers when configuring WebSphere in a Domino environment. For example, to use a Domino LDAP server as the user repository for the portal, install WebSphere Portal on a separate machine from the Domino LDAP server. A Domino LDAP server for the portal should reside on a machine that is dedicated to serving the portal environment. Note that for i5/OS, you recommend that a specific Domino server be created to run the collaboration components and that the Domino server remains on the same i5/OS server as WebSphere Portal.

    Single sign-on between the Domino environment and the portal environment enables you to log in to the portal and then use collaborative portlets without having to authenticate a second time. A best practice is to install and configure all servers prior to enabling single sign-on. Note that if you complete the required single sign-on configuration between the two environments, there is no procedure to disallow automatic logins for a specific user. For example, if user A logs in to the portal, user A will always be logged in to Domino.

    If there is a a non-Domino LDAP directory server in place, for example, IBM Directory Server, we can employ several strategies to integrate the existing directory with Domino and thus achieve single sign-on and awareness across any Lotus collaborative portlets. Refer to the following article from IBM developerWorks:

    http://www.ibm.com/developerworks/lotus/library/sso1/

    IBM has repeatedly seen projects fail at the point where the collaboration components are integrated. Do not underestimate the time and expertise needed to get this complex environment implemented correctly. You need expertise in two distinct areas: Domino and WebSphere Portal. Do not make the mistake of many sites before you and assume that one person or group can hone these skills during the project.

     

    Traditional systems

    IBM has built strong relationships with numerous independent software vendors (ISVs). Many of them have built portlets, even suites of portlets, to integrate with WebSphere Portal. Before you begin a development project integrating with an older software system, visit the IBM Workplace Solutions Catalog and check to see if the vendor has submitted portlets. The catalog is available at:

    http://catalog.lotus.com/wps/portal/workplace

    You will find more than 2,000 portlets here. There are many for the most popular traditional back-end systems such as SAP, PeopleSoft, and Hyperion. When you want to download a portlet, you might be directed to the vendor’s Web site. In that case, the vendor is responsible for any warranties, support, and licensing terms that relate to the portlets. You might be able to download a portlet for the project and use it right from the catalog.

     

    Performance analysis

    You must consider performance requirements in the planning phase of the portal system. By choosing the right topology (see also “Topology planning”) and defining reasonable non-functional requirements in the service level agreements (see also “Defining non-functional requirements as part of service level agreements”), you are part the way there.

    You often see, however, that performance analysis begins after the application design was implemented or, even worse, after the first stress tests revealed poor results. Most of the performance problems you have seen are the result of poorly-designed applications.

    Tip: Tune the portal during stress/load testing, but design the application to perform well before you start developing.

     

    Caching

    Page response time is crucial for a better user experience.

    Caching can...

    1. Prevent a request from coming to WebSphere Portal at all.

      This can be true for referenced parts within the portal page, such as CSS files or images, for HTML fragments, or for complete portal pages if anonymous users access the portal.

    2. Avoid unnecessary rendering of portlets, which speeds up page response time.

    3. Prevent unnecessary back-end calls.

    Caching downsides include complexity and a history of buggy implementiatons. Buying a new processor license might be cheaper than paying for a work-month caching implementation.

    See also:

     

    Where to cache?

    The general guideline for caching is to hold the cached data as close to the user as possible. This starts at the Web browser. Here, we describe in detail the marked locations and point out any concerns.

    1. Web browser caching

      Web browser caches are a great thing because they reduce the amount of traffic to the system by magnitudes. There is a major difference if a user requests a certain static file from the portal at every request or just once a session. There is nothing special to add here from a WebSphere Portal best practices perspective.

      However, although this should be well known and common best practices, you do not always see this applied. Typically, portals seem to suffer from overloaded pages.

      Due to the nature of a portal, on one page, there might be many portlets and many different applications underneath supporting them. Each portlet/application comes with its own user interface definition files, such as CSS and JavaScript. This can lead to portal pages with a 3-digit KB size or an unreasonable number of referenced CSS and JavaScript files. This can lead to a notable latency in rendering the portal screen. Robust client-side machines and networks might mask this problem in the development phase. Be certain to test the applications on a machine that a typical end user would have. More importantly, design the applications with these best practices in mind:

      1. Clean out the generated HTML output from static style sheet elements and JavaScript and move those in static .css or .js files that can be cached by the Web browser. Every byte counts.

      2. Remove heavy HTML comments in the HTML output. Use JSP comments instead of HTML comments in the JSPs.

      3. Check the referenced .css and .js files. Do they include the same style sheet, just differently named, or the same JavaScript algorithm? If so, try to agree on a single version. Note that this might not be possible for all portlets, for example, if you are using a portlet supplied by a vendor.

      4. Reference the smallest number of files as possible. Keep in mind that every referenced file in an HTML page needs to open a port and make a request to the server to either get the file or get the answer that the file did not change and can be used from cache. Less files lead to a better responsiveness. However, it does not make sense to have big JavaScript functions in a portlet that only a minority use in a global JavaScript file.

      Remember to consider security when planning what to cache; this often gets overlooked. When developing applications, keep in mind that the user might access them from an Internet cafe or other kiosk situation. Consider the possibility of unauthorized use of a browser cache.

    2. WebSphere Edge Server caching

      At this entry point to the server architecture, you want to cache as many items as any possible. We do not have to use WebSphere Edge Server; it can be any reverse caching proxy. IBM has not seen any major issues with Edge Server working with WebSphere Portal, so this seems to be a good choice. On portal projects that do not cover all the server-side software, you see often the hosting company choose a reverse caching proxy product based less on features and more on with what their administrators are familiar. Make sure it complies with the needs.

      The question becomes how much is a reasonable amount to cache at this layer. The discussion centers around two main topics:

      1. Security

        Because the Edge Server or any other reverse caching proxy that you use is usually located in the DMZ, you do not want to keep any personalized or security-relevant data in the cache. The cache at a proxy is really a big data pool of content, and there are no security borders that prevent unwanted access to cached data on this layer.

      2. Invalidations

        If you update or remove any resource in the back end, change the populated immediately to all of its preliminary caches. Otherwise, you might experience some odd effects.

        To go any deeper here is out of the scope for this paper. However, you recommend that best practice of always choosing the secure path, which means to perform the invalidation more times than not. Additionally, make sure that you have test scenarios in place that cover any special cases that might be involved.

      For more information, refer to the appropriate consultants or the WebSphere Edge

      Server documentation

    3. Web server caching

      In the past, it was common to move any static data from WAS to the Web server. This had and still has the advantage that the file serving servlet of WAS does not need to deliver the GIFs and similar files, but the HTTP server can perform this on its own. Web servers are quicker in doing this, require less resources in terms of computing power, and require at least one hop less (and sometimes more due to firewalls and other servers) from a network perspective.

      Moving static files to the Web server is generally a good task (we do not want to discourage you from doing this). However, it is often not worth the time, because this process tends to lead to a lot of time-consuming organizational coordination. IBM has seen projects where there was a considerable amount of time invested to get every single graphic file out to the Web server. In trying to avoid a bottleneck where WAS needed to serve up a number of smaller graphic files, even graphics for portlets usre placed on the Web server. This method can be very time-intensive because each time update the WebSphere Portal wps.ear file or any portlet that comes with static data, we need this extra step. It is usually much easier and leads to less coordination if portlet developers are allowed to reference their items relative to their portlets and include it there.

      The WAS Web server plug-in does not just do load balancing; it also does a great job of caching. Refer to the Information Center of you WAS (for example, http://publib.boulder.ibm.com/infocenter/wasinfo/v6r0/index.jsp ) to read more about how to set this up. Depending on the size of the static files, this is often a better choice than going the traditional route of copying the static files to the Web server machine. In addition to the static files, the plug-in, Edge Server, and other caching proxy products support Edge Side Include (ESI). This feature enables you to also hold HTML fragments in the cache. Depending on the portal project, it might worthwhile to explore this topic. You usually see the bigger projects leveraging this feature.

      A double caching for static elements, where you cache them in a reverse proxy in the DMZ and in the Web server plug-in, does not do any harm as long as the invalidations work correctly. You might also leverage ESI only at the plug-in level in order not to worry too much about security issues that might come up by holding too much data in the DMZ (in case the Web server is located behind the DMZ).

    4. HTML fragment caching

      WebSphere Portal, by default, offers you the ability to configure various caching options. To understand these options, including cache expiration, be sure to study the Information Center, available at:

      http://publib.boulder.ibm.com/pvc/wp/510/ent/en/InfoCenter/wpf/tune_cache.html

      The internal mechanisms used for the default HTML fragment caching also rely on the WAS dynamic caching.

    5. Leveraging dynamic caching

      Because caching is considered to be defined on a J2EE level instead of on a specific API level, the topic was not considered during the definition of JSR 168, the Java Portlet API. Unfortunately, there is no publicly defined caching API by the Java community, However, we can leverage the great and partly unique possibilities of the WAS caching API “dynacache.” shows a brief overview of the features that come with this API to help provide you some insight as to how the parts are interconnected.

      IHS = IBM HTTP Server

      Leveraging dynacache helps reduce bottlenecks in the system. For example, assume that you detect that the traditional systems are too slow to cope with the given non-functional requirements of the system. Assume that the target response time is three seconds. You observe that a call to the traditional system by itself takes two seconds. Then, you are likely to miss the target. If, however, the data in this system is needed by many portlets and retrieved by many users, it is advantageous to cache it. The latency toward the traditional system might then only appear once every 1000 requests.

      Dynacache is a feature available with the latest versions of WAS and WebSphere Portal. There are already a couple of excellent articles available that describe, by example, how to leverage those features. Refer to the list provided at the beginning of this section or search for even more, for example, at the developerWorks WebSphere Portal zone, available at:

      http://www.ibm.com/developerworks/websphere/zones/portal/

    The following list describes some of the issues you commonly see, including some frequently asked questions for which you have not yet found any clear answers:

    – How much data should I allow to be cached?

    The answer is, again, it depends. Generally, the answer is as much as any possible; but how much is possible depends. One factor that consider is the size of memory you have available for cache. It goes into the heap of the JVM. If the garbage collector already has trouble getting enough free memory for WebSphere Portal, you might need to consider other ways, because the garbage collector can use the energy you save by caching.

    Other obvious factors are the cache replacement factors, which we can recognize out of the cache hit and cache miss ratio. Here you should be able to see whether you try to cache too many items within a certain memory.

    Note: Where it is easy to see the number of entries in the cache monitor, it seems to

    be tricky to obtain the size of the individual objects.

    – Is it better to include many small objects or fewer bigger objects?

    Try to leave out the bigger objects. They consume a relatively large amount of space in memory, so the ratio of saved requests per available MB of cache is far better for smaller objects.

    In addition, from a user’s perspective, you might not mind waiting for a large PDF 10 or 20 seconds longer, instead of waiting for each single request to WebSphere Portal just a single second longer.

    – Is it better to leverage cache disk-offload or do less caching?

    This is a question of how expensive (in terms of money and time) the back-end calls are and of how much memory you have still available in heap to be used for cache data. As an option, you might consider increasing the heap size.

    Just be aware that reading from a fast database can be quicker than reading from a very large disk cache. It will be up to you to find the right cache size for the applications.

    – Do I want to use cache synchronization between the nodes in a cluster setup?

    To synchronize cluster data over various nodes in a WebSphere cluster is a very powerful feature and thus it is important to use it carefully. You see two main issues with it:

    • The amount of data replicated between the node members can be an issue. Usually not because of the load that is put on the network (because current networks are very fast), but more because of the load on the WebSphere Portal JVMs. This is especially noticeable during the startup of a JVM, when it begins to synchronize with all other members. It is important to note that this process pushes a substantial load on the already running JVMs. Be careful when in a production scenario.

    • Depending on the setup, you might find that you gain or lose by having cache data local to each node member. The amount of members you have in the cluster environment is not as significant as how much memory you have available for caching and how expensive the back-end calls are. This is expensive not only in the increase in time but also in money, because you might have the situation where each call to a certain system costs x amount. Replication might save, at a theoretical maximum, the factor of node members you have. If replication leads, however, to a higher cache replacement, you are on the downside again.

    Important: Invalidations are always synchronized in a cluster within the dynacache implementations. To just synchronize the cache invalidations, you do not need to enable cache data synchronization.

    – When would I want to use a cache domain?

    A cache domain can be of help if you run require cache replication but also discover that point-to-point replication might lead to a large performance drain. Whether to go with one or the other option depends on the setup and the applications. And again, check carefully if you do not have a fast performing database in place. Buying in the complexity of a cache domain should pay off in performance improvement.

    Note: Do not forget that we can configure the caches individually in order to have some synchronized and some not.

    – Can I just use the session to cache some user data instead of using dynacache?

    In some rare cases where you really need to cache some data scoped to the user, you might need to hold the cache keys in the session to allow referencing. Some of the reasons include:

    • By concept, the session data should only be altered in the action/event phase of a portlet. To do otherwise is not just bad design, but might also lead to problems.

    • We do not have the configuration option to replicate the data within a cluster, independent of the decision regarding session sharing.

    • The size of the session grows unnecessarily.

    • You mix data of different values. As cached data is replicated, session data is not.

    Note: Do not misuse the session for caching.

    – Because dynacache is easy to use, can I use it to share data between my portlets?

    No. Dynacache was designed and created to cache data. Although it is easy to

    leverage, this is not a reason to misuse its concept and implementation.

    In summary, depending on the portlet applications and portal system, leveraging dynacache is a technique where you have the chance to really gain a large amount of performance. In addition, some portal sites are not able to live without it.

    Note: WebSphere Portal also leverages dynacache for its internal caching strategies. For a cluster scenario, it replicates some cache data between the cluster nodes.

    6. Database/back-end caches

    This is out of scope for WebSphere Portal and therefore also for this best practices paper. Consult the database administrators and database specialists. Being able to tune the back ends in one way or another, however, often proves to be a major factor in the overall performance of the system. Therefore, it is worth checking whether there is the possibility to either add a cache or add, for example, more indexes to the database.

    Tip: Test the performance of the database back end even it you are not in charge of setting up and configuring the database back end.

     

    What to cache?

    Similar to “Where to cache?”, the answer here has just as many possibilities. The question that remains is “What is possible?” or, because you want to cache as much as possible, “What would you not want to cache?”

    As already discussed “WebSphere Edge Server caching”, there are two main problems with using caching techniques, security and invalidation:

  • Security

    Personalized data can only be cached on a personalized level, for example, in the browser and in cache keys bound to a specific user session. This is not a good idea, because the browser cache the data is not really secured and manipulating cache data in the session is not recommended.

    Therefore, the best way to secure data is not to cache data that needs to be secured.

  • Invalidation

    As already discussed in “WebSphere Edge Server caching”, invalidation can be a tricky topic. Again, you want to make sure with reasonable test scenarios that whatever you cache we can securely invalidate again and that the reload cycles (cache timeouts) are not too high. With this process, even in the case of a non-working invalidation, you will be back on track after a couple of minutes. However, the time in between this might appear to the user as an outage.

    WAS comes with a cache monitor tool that we can use to view the static files and HTML fragments that are in the cache. This monitor consists of a couple of JSPs that call the appropriate WAS APIs. IBM has, therefore, also seen sites that extended these JSPs to their needs, which seems to be a good idea to us.

    Do not substitute a focus on caching strategies for functionality and clear design. A good design might still outweigh any caching ideas, or in the words of Tom Alcott, “We can’t tune or clone the way out of a bad application design.”

     

    Sessions

    This section provides additional information about how to treat sessions within a WebSphere Portal environment. Guidelines need to be recalculated for every specific project, but these should help you to apply best practices.

     

    Portal and portlet sessions

    There is often some confusion regarding portal and portlet sessions. In essence, the portlet session objects are namespace encoded session objects that are embedded within the portal session. There were also some changes taking place when moving to the JSR 168 portlet API. For a good overview of the differences, see:

    http://www.ibm.com/developerworks/websphere/library/techarticles/0312_hepper/hepper.html

     

    The importance of size

    One of the most frequently asked questions about the earlier WebSphere Portal portlet API was how to cast the portlet session down in order to be able to read and write to the global HTTP session. It was a reoccurring question because the answer changed over time due to changes in the underlying WAS versions. A common add-on to this question was, “Why is such an essential functionality not included in the standard API?” The answer is because of its original design and because there is a good chance that it gets misused. Therefore, it was common to see it used as a shared memory to exchange data between portlets. This is not a good use because it can lead to massive data in the sessions.

    There was the argument that we can save data in the session by allowing a global scope, especially if more than one portlet has the intension of saving the same data in the portlet sessions. This was ultimately addressed in the JSR 168 portlet API, where we can address data in the session now in a global or private scope. However, do not use it, for example, to send messages from one portlet to the other.

    The size of a portal session will naturally be larger than the best practices numbers that a standard WAS-based application suggests. This is true because a WebSphere Portal server session on its own, excluding portlets that put additional data to the session, uses about 4-5 KB. This number was already reduced in the latest releases. For more information, see:

    http://www.ibm.com/developerworks/websphere/techjournal/0505_col_hines/0505_col_hines.html

    Because portlets sometimes include the complexity that previously whole Web applications had on their own, the overall session size sums up to factors for the global session size. This is not a best practice because portlets are supposed to be far more lightweight. Session sizes that are too large will eventually lead to problems. As a rule, you might want to target a single-digit number. This is not hard number that you have to reach in order to have a successful portal project. However, no matter how the portal system is configured, you believe this is a reachable number.

     

    Session sharing in a cluster environment

    Assuming that you want to use session sharing, we can either share the session through the database or use memory-to-memory replication. For a closer description of both techniques, refer the WAS Information Center, available at:

    http://www.ibm.com/software/webservers/appserv/was/library/

    You might assume from reading this description that you would experience a great deal of performance improvement by using memory-to-memory replication, because you do not need to go to the database to persist the session there. In reality, however, you do not recognize a real noticeable boost because most of the computing time that is relevant for performance goes into serialization of the session object instead of persisting it.

    Years ago, on portal projects based on early WAS V5 products and earlier, IBM recommended that sites did not use the memory-to-memory replication because you observed a large number of problem reports. If a new product feature does not give you much of an improvement, always select the well-known and often-used approach. Naturally, you always suspect such a new feature in case of a problem, even if there is not a rational basis to do so. Having too many new features at once might, therefore, cost you valuable time during critical problem determination.

    With V5.1.x and later of WAS, you see more and more sites adopting the technology of memory-to-memory replication. Another way of analyzing which technology to deploy is to evaluate how much control you have over the environment. For example, if you use database persistence, but do not have full control over the database, you might experience delays during development and while troubleshooting. However, a valid reason for choosing database persistence might be that you run into trouble with the heap size of the JVMs. Every session adds to the heap of every single portal JVM. Large sessions and a large number of concurrent sessions might influence the decision.

     

    Deploying, testing, and maintaining a portal

    Poor testing is one of the five key reasons why projects fail. Testing as one of the most important items in the portal project.

     

    Unit tests

    Unit testing is mandatory.

    We can use JUnit to test back-end calls. Unit tests are also a recommended process within Rational Unified Process. Unit tests and the JUnit framework are supported by Rational developer tools.

    Developers will require well-equipped workstation hardware to use Rational tools, including 2 GB minimum physical memory.

     

    Daily and weekly builds for integration tests

    Good code is checked into a central version control system such as CVS.

    Rational Software bundles IBM Rational ClearCase.

    The code maintainer would now create scripts that compile the sources that are in the version control system and create deployable packages. Each day, the code maintainer would create a daily build and deploy it to the integration development environment, and weekly, the maintainer would do the same with more mature, marked code to the tester integration environment.

    IBM recommends establishing the role of a code maintainer. This person would probably not need to be at the project the first day that the developers start, but understand that it will take some time to have scripts ready that automate the process. Depending on the size of the project, it often turns out to be a full time job.

    In some cases, it might be required to have an additional environment that enables the developers to directly deploy some of their components. This is, for example, true if the developers have an insufficient test environment or no test environment at all on their local workstations.

    IBM has seen time savings by using VMware (http://www.vmware.com ) images for such environments. If developers deploy code on their own, they might not be sure about how to revert cleanly. Having the policy of creating snapshots before using those environments will ease backups and prevent the time-consuming activity of finding problems.

    Developers require such an integration environment to see their code working for the first time, integrated with the code of the other developers in the project. Here, you will often recognize things such as class loader problems (sometimes less experienced portlet developers might assume that they can put libraries in central locations, such as the WAS /lib directory). This helps highlight dependencies, which developers might not think about when deploying code on their local workstation. Therefore, a dedicated team member must oversee the deployment. In such an environment, developers might also see their code working with real back-end test environments for the first time. On their local workstation, they would typically write the code against some stubs or dummy test environments, because it is often not possible to grant every developer direct access to a back-end test environment.

    Testers will expect a more mature and less often changing environment for their function tests (versus daily deployments). Because cycling through some test cases might require quite some time, it is not an ideal situation if the code basis changes every 24 hours, possibly leaving the system unavailable for two hours a day (due to deployment and possible deployment problems). One week a frequently chosen time frame. Depending on the project, you might find an even more reasonable time frame for the project.

    Every problem record the function testers detect will be reported in a bug tracking tool. They will mark the build version they used while reporting the problem. Verify the developers know to which code version in the version control system to which the build belongs. In addition, establish methods that ensure that the right developers get assigned to the reported bugs and that they are solved by priorities. There are a number of commercial bug tracking tools available, and lately, you have seen the frequent and successful use of the open source tool Bugzilla.

     

    Management and business process tests

    Do not underestimate the importance of keeping the top management and the project sponsor happy. In Figure 4-1, you also added a layer for business tests that you rarely find in general postoperative documents.

    Because portal projects are usually integration projects that tend to be big and thus cost-intensive projects, the management might be especially concerned about the progress of the project. In addition to delivering regular progress reports, you learned that the acceptance is higher if they are able see a pilot or test system as soon as any possible. Do not worry about disappointed reactions to the current functionality, because they will understand that user interaction will be slow and functionality incomplete, but they will honor the progress they can actually see.

    If an additional demonstration environment does not cost too much time and money, it is often worth to establish it, even if it is just for the sake of the top management. Be careful promising demonstrations without such an environment. We can “lend” any of the environments to do some demonstrations, but understand that the team will lose quite some time by freeing that environment and ensuring that the demonstration works. Having a demonstration environment enables people to experiment with a certain set of almost stable components at any time.

    There is sometimes a requirement for a business process test environment, which can be similar to the demonstration environment, consisting of a single server portal environment.

    IBM has seen companies that were required to ensure that a certain business process that had a different user interface still works correctly using the portal. Other than the functional tests, these tests are more dedicated toward overall scenarios. While the function tests ensure that the right output is returned based on a certain input, these tests are concerned with the systems involved in the background, for example, to ensure that the requests to certain traditional systems do not lead to corrupt or inconsistent data over time, or assume a request triggers a workflow that leads to a printed letter for the site. In such cases, the function tests might not evaluate all systems involved in the workflow. Therefore, you see if long-running (in terms of days or weeks) transactions are involved. We do not want to hold any environment for such a long time, but these testers do not want to have anything changed during that time. Therefore, it is good to provide a special environment, even if it is not fully equipped.

    Note: Every additional environment requires additional time for the administrators. Every person on the project waiting for an environment to leverage causes possible delays in the project.

     

    Non-functional tests

    As a natural prerequisite for non-functional tests, you will require the proper hardware and environments. You discuss what you should do to get started and then continue by elaborating about what we can do if the results are not as expected.

     

    Prepare for non-functional tests

    While you see that functional tests are relatively well prepared at most portal projects, projects suffer and sometimes even run into long delays because of insufficient non-functional tests. Is it difficult to write well-performing portal components? Not very.

    One reason might be that the procedures of functional tests do equal usual J2EE projects. Portal projects do, however, usually involve complex environments, because these projects target the integration of environments. This again leads to a high level of complexity, which adds difficultly for any part of an IT project (including security, as described in “Security concepts”. We can try to reduce complexity (no user, no back-end systems, no cluster, and so on), but this is converse to portal projects goals. After carefully reading “Planning a portal”, you might have already recognized that you try to push in that direction to save you money and reduce project risks.

    Now with an architecture in place with a certain topology and a certain application, you have to make sure that this is all going to work as you defined in the service level agreements (“Defining non-functional requirements as part of service level agreements”.

     

    Environments

    One of the most frequently asked questions about non-functional tests is if it is possible to use the integration environment for some non-functional tests. As we can see from Figure 4-1, these environments have distinct goals. Different people are working on the environments, and the environments are also of different hardware sizes.

    The load test environment must be a mirror of the production environment. Of course, it might have fewer CPUs. In some cases, it might also have less memory. It might even have less machines within a cluster.

    Important: It is important that the load test environment is a mirror of the production environment and must, therefore, match the production environment topology.

    You will not get representative results if the load test environment is not equal in topology. To save costs, projects often put all components on a big single box and do their load tests on that environment. You know that budgets are tight everywhere, but doing the load tests in such a way simply wastes money.

    If you build a completely new system, it is sometimes possible to leverage the environment that will be later used as a production environment for the load tests. This provides a big advantage in that we can compare the results of the load tests with the real results just after you went into production. You will never be able to simulate the real world in the load tests with 100% accuracy, but it is good if you are able to create test scenarios that leverage the components with only a 20% discrepancy. Some items, such as user behavior and connection line delays, will always be hard to reproduce. If you perform the tests on an absolutely equal environment or the production environment itself, you will have good numbers to compare. Unfortunately, this does not prevent you from requiring a load test environment.

    Important: You need a load test environment.

    A portal project will always be an iterative project, and the iterations might happen earlier than you expect them. In addition, you do not have the chance to save time and build up a load test environment after you have gone into production. Then, you lose the advantage of comparing figures with load tests on an environment that is equal to the production environment in size.

     

    Stress tests

    Because some sites worry about how the portal system will perform, they understand why load tests are required. Stress tests are done, where the word “stress” is implemented literally. While this type of a task force gets high management attention, the tests themselves are sometimes designed poorly. For example, you have seen a test generator that was programmed by one of the developers. It hit the portal system with an enormous amount of requests, neither analyzing the responses nor caring about possible user behaviors.

    Building reasonable load tests requires experience. Here, you mention a couple of important parameters:

    1. User behavior such as “think time.”
    2. Login process of users and the percentage that explicitly log out or get implicitly logged out by session invalidation.
    3. Application usage, for example, which applications are most frequently used.
    4. Transaction usage, such as how many transactions get submitted to the back end (for example, a user might use an application often, but frequently cancel before submitting).

    In addition to these general factors that non-functional testers would know, you add a best practice that you learned in the portal projects. In many cases, it was helpful to do a so-called baseline test and then add as few components as any possible.

    This means that you should first do the load test right after the installation of the base software components. This enables you to have a test available that we can compare directly to the tests that you do in the software laboratories. If these numbers differ substantially from the numbers that you got from Techline sizing, this may indicate a problem. Are the tests correct? Are there any problems in the environments, for example, on a network or operating system level? Do not continue unless you have some convincing answers.

    The following example describes this clearly. You got called to a site because of unresolvable performance problems for his portal. While analyzing the system, you detected that even a static HTML page on the HTTP server was suffering with the same performance problems as the whole portal. In the end, it became clear that a wrong router configuration was the reason that the system was not able to keep up with the requested response time. These are unlucky situations that lead to massive time delays.

    Testing to failure means that you assume that things will break and that you will need to fix things.

    Baseline tests are often not done because non-functional tests start too late in a project phase and creating them is considered to be a throw-away asset. The test scenario on a default WebSphere Portal installation is different than the test scenarios on the custom portal systems and there is not a lot of reuse possible. You believe it is worth the time.

    Continuing from there, you should apply an iterative approach here as well. Add as few components as possible. For example:

    1. Start with a non-clustered environment and an out-of-the-box WebSphere Portal implementation.

    2. Cluster the portal system.

    3. Exchange the portals default security system (LDAP with a special schema) with the security system, but do not any portlets yet.

    4. Exchange the portals themes with the custom themes, but do not add any of the portlets yet.

    5. First add those portlets that seem to be critical to you from a performance point of view.

    Do not continue until you are convinced that you know what is happening and why the numbers changed the way they did. Make sure that you document the results in a way that anybody (for example, a performance specialist) will understand it several weeks later.

     

    Testing for the service level agreements

    If you followed the advice in “Defining non-functional requirements as part of service level agreements”, you defined acceptable performance requirements in the service level agreements. Used Techline sizing to estimate the proper hardware configuration. Now, you are ready to deploy the first application and perform a stress test making sure that it exceeds the performance requirements. Load test the environment to the point of failure. Tweak individual components to achieve optimum performance, including every portlet.

    Tip: It is good to outperform the negotiated service level agreements, but because tuning requires time and money, consider it as an extra.

     

    Load test tools

     

    Resolving performance problems by code reviews, profiling, and more

    Usually, at the point where a project detects that it will not meet the required performance, code reviews and profiling is requested. You discuss this topic by starting with a couple of lines of code...

    class MyList extends List 
    { 
        public void add(Object o) { this.add(o); this.trimToSize(); 
    } 
    public void remove(Object o) { this.remove(o); this.trimToSize(); } 
    } 
    

    This might be a bit too easy, but it was found at a large B2C portal project, where experienced developers where working, and it had a definite impact on the garbage collector.

    What happened? At every request, a large number of objects where added to a big list of the type MyList. Because the list was always just right in size, each additional object required the Java implementation to allocate new memory, copy all objects over to the newly allocated memory, and lose the previously used memory. The Java implementation is smart enough to allocate a bit more memory in order to make sure that this does not happen the next time an object is added. However, the trimToSize() made sure that additional bit is just cut off again.

    It is very likely that the developer thought that the list is going to be big and set the trimToSize() to save memory. What this developer did not know or understand when writing the code was that the method gets called multiple times during a single request. Just because of this method, each request was allocating MBs of heap to just throw it away immediately afterward. Because it was a high-volume portal that had to support more than 100 concurrent requests per second, the portal JVMs were straining under the load. No tuning helps remove a small line of code in the wrong place.

    Developers also make mistakes, but unfortunately you do not think that it was the developer’s fault. To be clear here, it is not a good idea to put trimToSize() within the add/remove method, but you see a different trend. You see many projects that do not seem to be very well organized. Skills, especially development skills, are requested like a shopping list. Developers come in, get set up, write their code, and are pushed to another project. Therefore, components are developed by different people, having different perceptions of the code's usage. In addition, sometimes parts of the development goes to off-shore developers, which might lead to a setup through telephone calls and requirement papers that were cobbled together.

    You are not proposing that off-shore development is necessarily a detriment to the projects. Instead, you want to describe the risks that you buy into with such an approach. The likeliness that the portal does not perform as assumed by the Techline sizing are higher as well as the risk for project delays. If you do not insist on clean working code, you might pragmatically calculate the costs and risks.

    For example, every developer and IT architect will usually tell you that it is not necessary to buy new hardware because Portal does not match the performance requirements. However, a pure calculation might unveil that it is sometimes a valid option. This might be true if you are running out of the projected development time and you intend to discard the code, extend the code for the next iteration, or if you intend to add more hardware. Therefore, you gain the chance to first add the hardware and then solve the problems in the code.

    A code review of the project might help, but is usually overestimated. At the upper example, an experienced reviewer would have detected that the garbage collector allocates lots of memory per request and would have then, for example, by profiling, looked for the classes responsible for allocating memory during a request. This can require an extensive period of time, the proper tools and environments, and perhaps a bit of luck. Therefore, if you do not have the proper environment now, you are losing even more time and money.

    Often code reviews happen on a more general basis. Due to the usually large amount of code, the reviewer will only be able to check certain subsets of the code. The code will be compared to general best practices and some hints and tips are given. IBM has rarely seen that this has shown major effects on projects. You rather tend to see it as an instrument for project management to underline their opinion that it is not the code and to show that they have managed their project well.

    So does profiling help? If it is done well, it can help. The advantage of profiling is that it can be effective if it is done by people who do not know what the portal system is supposed to do.

    The correct approach is to do profiling as early as possible. The developers on their own should profile their code. In reality, often they do not get the time to do it, because time lines are generally tight. It requires some experience to discover what parts of the code should get a high priority regarding profiling and which parts are of less interest. This gets more difficult, because with WebSphere Portal, you have a product that brings much functionality with it. Therefore, people get confused by the number of classes and request flows. This is, however, the wrong approach. While profiling, try to exclude all classes that belong to the WebSphere Portal product. You will not find any problems in these classes. In addition, many profiling tools break if you try to profile a whole WebSphere Portal implementation.

    A very easy way to enable profiling is to use the IBM Java built-in tracing facility. It leverages the Java Virtual Machine Profiler Interface (JVMPI) hooks and requires less processor and memory resources than any other “regular” profiling tool.

    For a detailed description of the trace features and functionality, refer to the Java Diagnostics Guide

     

    Tuning

    Before you start tuning the portal environment, refer to the IBM WebSphere Portal Tuning Guide

    In general, you recommend that you only tune parameters when you understand their impact and thus are fully aware of what you are doing.

    Furthermore, within a portal system, there are many more components other than WebSphere Portal and the applications that might be a possible bottleneck, for example, the network or the operating system. The components around authentication, such an LDAP server, are often even more important. Database tuning is another key element to remember.

     

    After going live

    If the project does not end the day the portal system goes live, tests need to continue as well.

     


    Staging and preproduction environment

    At this point, you also discuss the staging and preproduction environments that you included in Figure 4-1. These are combined in some projects, and you will need them by preparing next iterations before we can move new items to the production environment. On these environments, you also test the deployment of the new items. Therefore, never add anything manually, because you would not add anything manually to the production environment. This is the last step to ensure that the deployment scripts work correctly.

    An example where you would not want to combine the preproduction with the staging environment is if you have a business process that require some preproduction stage. The staging environment should be dedicated to updates that include some technical change, such as adding another portlet.

    You need the load test environment and not combine it with the staging environment. The environments have different targets and are generally used by different people. Saving costs here can seriously hurt the maturity of the portal system.

    One way to use a preproduction or production-test environment is to use it for enhanced demonstrations, for example, to preview new possible features. In this case, you might want to leverage a “portal jail” using Web Services Remote Portlet to provide these features. A portal jail would not be a trusted portlet server for portlets that follow a less stringent QA process. Using Web Services Remote Portlet, you then still comply with security and quality insurance guidelines and you have a perfect environment for producing demonstrations to sites or lines of business within the organization. It can also be a very effective way to try out the “what if” scenarios or to try out new portlets in a production-like scenario.

    You might argue that it is cost intensive to require these environments the whole time and that you naturally need some consolidation. In cases where everything works perfectly, this might be a true statement, but in reality, you have the experience that this does not work out successfully. A distinct load test and staging environment is required.

     

    Resources

    An interesting thing you detect is that after a portal system goes live, the people involved with the project leave the project. The work is done, but it also continues. Developer resources and test resources are replaced by maintenance resources.

    It is a rare project where maintenance team members or the developers of the next iteration do not complain about insufficient documentation. It will never be possible to have enough documentation. There are so many things happening during a project that even the best documentation might not cover, for example, which components were often the source of problems or what back-end systems appeared to deliver strange responses even though these responses were never resolved.

    Therefore, in a best case scenario, always keep some original people in the project. With some leaving for new challenges and some coming in with new ideas, the quality is kept high.

    Although it might be difficult, you also believe that it helps if people are not declared as exchangeable resources that can be removed and added like any other on demand hardware component.

     

    Release planning

    Use the 80/20 rule when planning a portal project. Focus on delivering 80% of the project’s functionality by connecting to existing systems using portlet builders, such as WebSphere Portal Application Integrator, and by avoiding more difficult integrations, such as using the Web Clipping portlet. Rebuild only 20% of the most useful functionality from the existing Web site.

     

    80/20 Rule for WebSphere Portal Projects

     

    Pilot First Release Delta Releases

    • 3-4 months • +3 to 6months • + 3 to 6 months

    • Time spent connecting • Rebuild most • Migrate to old systems useful 20% traditional

    • Little new functionality • Link to low applications

    • Limited deployment usage 80% • Leverage

      • Think about portlet • Begin to flexible portal
      • builders leverage click-to-framework and action, online integrated

    • Avoid Clipping products, awareness, and so on

    Use the 80/20 rule for the project

     

    Deployment

    Here is an example of how we can implement a portal build process across the environments:

    1. A developer implements portlets, servlets, Enterprise JavaBeans, and other J2EE artifacts using either WebSphere Studio or source code that is delivered into a version control system.

    2. A designer creates themes, skins, HTML pages, portlet JSPs, and other design elements using any editor.

    3. The results are delivered into a version control system.

    4. An administrator creates the content tree (labels, URLs, and pages) using the WebSphere Portal administrative user interface of a development portal.

    5. The resulting content trees and portlet instances are exported using XMLAccess or a script and then delivered into a version control system.

    6. The release manager assembles a consistent release in the version control system and creates the delivery. The release manager executes scripts (for example, ANT) to extract Java sources, design elements, and configurations from the version control system and then runs a build (compile and package).

    7. The operator takes delivery and deploys it onto the staging and production systems. The operator executes ready-made configuration tasks (for example, ANT, XMLAccess configurations, and wsadmin scripts) to deploy the delivery.

     

    Determining what to move

    To deploy a portal release, resources must be synchronized, packaged, and deployed to a portal server. This is a manual process that involves many people and few tools. Resources can be:

    1. Portal configuration (this is stored in the portal database):
      • Portal content tree
      • Portal application configuration, settings, and data
      • Portlet access control information

    2. Portal artifacts (these are stored in the file system):
      • Portal configuration (property files)
      • Theme and skin file assets (JSPs, style sheets, images)
      • Portlet code (Java classes, JSPs, XML files)

    Important: You must define and test the application deployment process.

     

    Automating custom code deployment

    Although there is no automated method to move portal applications from one environment to another, there are two options:

    1. Completely replace the old release with a new release. The drawbacks of this option is that any data that was customized by the user is lost. Although this option works, you do not recommend it.

    2. Use the XMLAccess tool to load incremental or differential releases.

     

    Staging concepts

    A subsequent solution release is staged from the integration to the staging and to the production system. The physical implementation of configuration staging through a series of systems does not really move these configurations between systems. This process is based on repeatable modifications of portal solution releases on multiple systems. For each solution release, a differential portal solution configuration is imported into the system; artifacts are managed manually by manual updates and deletions.

    What elements of the portal you move depends on the type of release you have. If you have an incremental release, you:

    1. Add new resources to a release.

    2. Update resource attributes (only add properties to lists). If you have a differential release, you:

      1. Maintain all functionality of the incremental release.
      2. Delete existing resources.
      3. Update resource attributes (add or delete properties in lists).

    If you have data that has been configured by the user, configure the scope of the portal for a single user.

    The following process is an example of a possible subsequent portal solution staging process. Derivations of this process are possible and expected. It focuses on configuration and artifact management.

    Important: Do not release the portal code from the staging or the production environment

    before sanitizing all logs. Deploy a zero tolerance for warnings and errors and stay with it.

     

    Clustering and deployment

    In WAS, a cluster is composed of multiple identical copies of an application server. A cluster member is a single application server in the cluster. WebSphere Portal is installed as an enterprise application server within the WAS infrastructure. All of the clustering features available within the WAS infrastructure are also available and apply to WebSphere Portal. Therefore, a WebSphere Portal cluster is simply a collection of multiple WebSphere Portal servers that are identically configured.

    You see many improvements with WebSphere Portal clustering in the latest version. For example, WebSphere Portal configuration tasks are cell aware. This means that tasks can determine whether the node is federated and then act appropriately. Nodes running on different operating systems are supported in the same cluster. The activate-portlets task can be used to activate portlets across all cluster members at one time.

    Setting up clustering is one of those features that can be difficult, even with these improvements and better documentation. If you are new to WAS and have never clustered an application in the past, you suggest that you enlist the help of IBM Services. A good working knowledge of WAS is needed to achieve success with this endeavor.

    In WebSphere Portal, there are two ways to create a cluster of WebSphere Portal servers. There is an easy way and a more complex way. The first way (easier) is to create a WebSphere cell with multiple federated nodes first and then install WebSphere Portal onto each node in the existing cell. The second way is to start with a set of stand-alone nodes with WebSphere Portal already installed on them and then federate them into a cell.

    There is a caveat to the easier method of using an existing cell of application servers and then installing WebSphere Portal into that environment. When you federate an application server node into a cell, it loses its default configuration and inherits the configuration held by the Network Deployment Manager. For example, if there is an issue with the cell and you have to un-federate the node, the original configuration is loaded from a backup copy and the node reverts to its previous configuration. So, in this example, when you un-federate it, you have a default application server and the WebSphere Portal application is lost. You need to think carefully about this option, even if it is easier to configure up front.

    With the more complex method, where you install WebSphere Portal on each node and then federate it, the WebSphere Portal application becomes part of the node's default configuration and cannot be “lost” even if it gets un-federated.

    See A step-by-step guide to configuring a WebSphere Portal V5.1 cluster using WAS V5.1.1.1

    IBM has seen sites struggle with some of the following issues.

    When attempting to deploy portlets in a clustered environment, you get error messages such as “Cannot install the selected WAR file.” To enable portlet deployment in the cluster, edit the DeploymentService.properties file on each node and set the wps.appserver.name property to the name of the cluster.

    Each node in the cluster should have the same synchronization settings to ensure consistency between the WebSphere Portal server configurations on each node.

    After federating a WebSphere Portal node and then attempting to access the portal, you receive an Error 503 “Failed to load target servlet [portal] message.” To correct this, update the deployment manager configuration for the new portal node. Also, be sure the CellName property in the wpconfig.properties file is set for the new portal.

    You need to enable dynamic caching on the cluster member nodes to correctly validate the portal caches. If you do not, users might have different views or different access rights, depending on which cluster node handles the user’s request. Refer to the guide for more information.

    Important: You should create a replicator entry on each node to provide replicator failover.

    This prevents the scenario where the node with the replicator goes down, preventing other

    nodes from being able to start because they cannot access the replicator.

    The more replicators you have defined in the environment, the better the replication fail-over capability will be. In the event that an application server process on which the replicator is defined is unavailable or goes down, there will be other replicators available to fill the gap. However, because additional replicators will impact the overall performance of the environment, carefully plan the total number of replicators needed.

    For best performance, we can also provide a completely separate system running a dedicated application server instance as the replicator host. This dedicated application server instance need not have WebSphere Portal installed on it, although it must be in the same cell and in the same replication domain as the WebSphere Portal cluster. For more information about using replicators, refer to the WAS Information Center:

    For more information, also see the excellent IBM Redbook, IBM WebSphere Performance, Scalability, and High Availability WebSphere Handbook Series, SG24-6198

     

    IBM WWCM users

    If you intend to use the Web Content Management function included with WebSphere Portal in a cluster environment, additional configuration is required. Refer to the “Cluster Installation Process” topic in the IBM WWCM Version 2.5 -Installation Guide for information about using Web Content Management in a cluster (http://www.lotus.com/ldd/doc/uafiles.nsf/docs/WCM25/$File/WebContentManagement-2- 5-InstallationGuide.pdf). In addition, pay specific attention to information for the following areas:

    1. Data repository

      Unlike WebSphere Portal nodes that share the same database in a clustered environment, Web Content Management installations require a separate data repository, even when used in a cluster.

    2. Authoring portlet

      Although the Authoring portlet cannot be used to create content in a cluster, still install the Authoring portlet on all cluster nodes to support syndication and caching.

    3. Secondary node

      When adding a secondary node to a cluster, run the update-wcm-cluster-configuration task on the node.

    4. User registry

      If you change the user registry for Web Content Management (for example, by configuring for an LDAP directory), run the update-wcm-wmm task.

    5. Web server

      If you are using an external Web server with the cluster, run the modify-wcm-host task on each node in the cluster.

    To support search in a clustered environment, install and configure search for the remote search service on a WAS node that is not part of the WebSphere Portal cluster. For more information about using search in a cluster, refer to “Search”.

     

    Keeping track of the growth

    Your organization will likely have multiple e-business problems to solve, thus will most likely have multiple portals to host and support. As the expertise matures, so will the portals. Because WebSphere Portal integrates with existing applications, such as site relationship management (CRM), enterprise resource planning (ERP), and sales force automation (SFA), you have seen sites struggle with a rapidly growing, complex infrastructure. Portals have a tendency to grow over time, potentially doubling or even tripling in function and size. A number of best practices for maintaining complex systems exist, and the software architect must take full advantage of these in order to manage this type of complexity and growth. One is creating a portal solution based on a methodology for leveraging large reusable assets. Refer to IBM Redbook Architecting Portal Solutions, SG24-7011 for best practices about design patterns and architecture.

    Another best practice is to apply maintenance on the servers on a timely basis. How will you know if there are updates to apply? It is easy with the IBM support ID and IBM self-help portal. IBM uses WebSphere Portal for the site:

    http://www.ibm.com/support

    In “Register on the IBM software support Web site for Passport Advantage” you explain how to register for this site. Be sure to choose the link My support and personalize the preferences. Here we can add which IBM products for which you want e-mail notification. Each week, you will receive an e-mail that lists new technotes, releases, and fix packs. 9

     

    Maintenance

    You will need to perform minor maintenance on you WebSphere Portal installation at some point. By minor, you mean small fixes, such as database fix packs, WAS interim fixes, and software point releases, while maintaining 24x7 availability of the portal. We can do this, but a bit of planning is needed. Refer to the Information Center for step-by-step procedures:

    http://publib.boulder.ibm.com/infocenter/wpdoc/v510/index.jsp?topic=/com.ibm.wp.ent.doc/ wpf/clus_upgrade.html

    Assumptions for maintaining 24x7 operation during an upgrade process include:

    1. WAS distributed session support is enabled to recover user session information when a cluster node is stopped for maintenance.
    2. Load balancing must be enabled in the clustered environment, and multiple HTTP servers must be available to provide Web server fail-over support.
    3. The portal cluster has at least two horizontal cluster members.

    The situation will also arise when perform maintenance to the components that make up the clustered portal environment. A component can be a Web server, a database server, or even a directory server. You will want to service these without bringing the portal down. This is possible, and we can learn how by following the advice in this article:

    http://www.ibm.com/developerworks/websphere/library/techarticles/0506_khatri/0506_khatri .html

     

    Monitoring

    Your ability to respond to portal performance problems is key to ensuring and maintaining a successful portal project environment. IBM has seen real-time monitoring of WebSphere Portal become one of the biggest concerns in the field. System administrators are pressured to monitor and manage performance and availability of J2EE and WebSphere applications efficiently. Monitoring tools will help operations staff, administrators, and developers resolve bottlenecks . You need to think about monitoring more than just the portal. Back-end systems such as content management servers and databases need to be monitored. Focus on database capacity, throughput, and response time. Front-end systems such as HTTP servers and caching proxies cannot be ignored. The key is to determine and isolate the problem. For example, determine if the issue is within the portal, one of the custom applications, or within the supporting infrastructure.

    You strongly encourage you to evaluate, select, and deploy a tool. There are many products and vendors from which to choose. Each one claims to be the best solution. Tools can be broken down into three major categories:

    1. Development and profiling/performance solutions JProbe, IBM Rational PurifyPlus

      1. Application server and J2EE monitoring solutions...

        • IBM WebSphere Studio Application Monitor
        • Wily
        • IBM Tivoli Performance Viewer
        • IBM Tivoli Composite Application Monitor
        • IBM Tivoli Web Site Analyzer

    2. Monitoring management frameworks...

      • Candle Omegamon suite
      • Mercury Performance Center

    This section focuses on the application server and J2EE monitoring solutions. There are differences among the tools offered here. Determine the type of tool you are after.

    • Do you just need a tool that simply analyzes Web site usage? Are you already a WebSphere Portal Extend site? If so, IBM Tivoli Web Site Analyzer might be a good place to start.

    • Are you looking for a tool that can integrate into the existing Tivoli Monitoring or Rational tools? Then, one or perhaps both of the IBM Tivoli Composite Application Monitor products is the answer.

    If you purchased WebSphere Portal Extend, you are entitled to install the IBM Tivoli Web Site Analyzer product. All other tools mentioned are available at an additional cost.

    Monitoring tools need access to a wide range of data to provide usable results. There are several ways of accessing data from WebSphere applications:

    1. Performance monitoring infrastructure
    2. JVMPI
    3. Byte code instrumentation
    4. Application request metrics

    All the tools discussed in this section use one of these approaches, and some use more than one. Most tools use an agent to manage client/server topology to collect, correlate, and display data.

    Performance Monitoring Infrastructure (PMI) is a client/server-based production level monitoring solution. The server collects PMI data in memory, for example, the servlet response time or data connection pool usage. The data points are then retrieved using a Web client, a Java client, or a Java Management Extensions (JMX) client. Most of the following tools use this interface for a least part of the data they collect. WAS includes the PMI client Tivoli Performance Viewer.

    Important: Depending on what PMI features are turned on, the performance impact of PMI itself can be slight or substantial.

    The Java Virtual Machine Profiler Interface (JVMPI) is a JVM-level interface that enables the collection of data about the JVM itself. For example, it can collect data about garbage collection, JVM memory usage, thread information, and object allocation. JVMPI is a two-way function call interface between the JVM API and an in-process profiler agent. The JVM API notifies the profiler agent of various events, such as heap allocations and thread starts. The profiler agent can activate or deactivate specific event notifications, based on the needs of the profiler. The Tivoli Performance Viewer leverages a JVMPI to enable more comprehensive performance analysis.

    Important: Java Virtual Machine Profiler Interface moderately increases the performance impact.

    Byte code instrumentation can be broken down into two distinct areas: application-level instrumentation and server-level instrumentation. Instrumentation of application classes can be performed at run time as classes are loaded. Data is also collected at the method level. This technique is used by Wily and Tivoli Composite Application Monitor. Server level instrumentation can be performed on specific WAS classes by adding a monitoring hook. This can be done dynamically or by rebuilding specific classes. This technique is used by Wily, Tivoli Composite Application Monitor, and WebSphere Studio Application Monitor.

     

    IBM Tivoli Web Site Analyzer

    Tivoli Web Site Analyzer is a Web application that captures and analyzes Web site data to provide useful reports about visitor traffic, visitor behavior, site usage, site content, and site structure. Support for WebSphere Portal includes specific report elements that enable you to analyze portal usage data, such as ranking of the portal pages viewed by visitors and portal login trends. Refer here for more information:

    http://www.ibm.com/software/tivoli/resource-center/bsm/dem-web-site-analyzer.jsp

    Important: IBM Tivoli Web Site Analyzer was withdrawn from the market on July 13, 2005.

    However, we can still obtain support for this product until IBM withdraws support for WebSphere Portal Extend V5.x.

     

    IBM Tivoli Performance Viewer

    Tivoli Performance Viewer s a Java client that retrieves the Performance Monitoring Infrastructure (PMI) data from an application server and displays it in a variety of formats. You see most sites use this tool for a look. We can view data in real time with it and in chart form, allowing for a visual comparison of multiple counters.

    Tivoli Performance Viewer provides advice for tuning the system for optimal performance and gives recommendations about inefficient settings. With Tivoli Performance Viewer, we can report on enterprise beans, EJB methods, servlets, Web container pool, Object Request Broker (ORB) thread pool, and the connection pool. Refer to the Information Center for more information:

    http://publib.boulder.ibm.com/infocenter/wsdoc400/index.jsp?topic=/com.ibm.websphere.ise ries.doc/info/ae/ae/tprf_tpvmonitor.html

     

    IBM Tivoli Composite Application Monitor for WebSphere

    IBM Tivoli Composite Application Monitor for WebSphere is the follow-on product to WebSphere Studio Application Monitor that launched on November 8, 2005. It provides enhanced monitoring for WebSphere Portal. It is more robust and consumable than WebSphere Studio Application Monitor and can optionally interface with other Tivoli products. It was designed for application support, test, and development teams to help you gain deep insight into the health of the production and preproduction environments by using key performance metrics that pinpoint the source of bottlenecks or other defects in application code. Improvements include:

    1. Better coverage of portal pages and portlets, including reports for both portlets and portal pages.
    2. Additional and improved portal nested request types and contextual navigation. The six nested request types are...

      • Page Loading
      • Page Rendering
      • Model Building
      • Portal Topology
      • Authentication
      • Authorization
    3. Advanced historical reporting of key portal performance trends with deep-dive tracing for root cause problem determination.

    Tivoli Composite Application Monitor for WebSphere runs on these platforms:

    • IBM AIX 5L V5.2 and V5.3
    • Sun Solaris 8 and 9
    • Microsoft Windows 2000, 2000 AS, and 2003
    • Red Hat Enterprise Linux 3.0 (IBM Eserver pSeries, xSeries, iSeries, and zSeries)
    • Red Hat Enterprise Linux 4.0 (pSeries, xSeries, iSeries, and zSeries)
    • SUSE Linux Enterprise Server 8 (pSeries, xSeries, iSeries, and zSeries)
    • SUSE Linux Enterprise Server 9 (pSeries, xSeries, iSeries, and zSeries)
    • HP-UX 11i v1

    Tivoli Composite Application Monitor for WebSphere supports the following databases:

    • IBM DB2 UDB V8.1 and V8.2
    • Oracle 9i V2 and 10g.

     

    IBM Tivoli Composite Application Monitor for Response Time Tracking

    Tivoli Composite Application Monitor for Response Time Tracking is the follow-on product for Tivoli Monitoring for Transaction Performance. It can proactively recognize, isolate, and resolve transaction performance problems using robotic and real-time techniques. It is an end-to-end transaction management solution that monitors end-user response time and helps you visualize the transaction’s path through the application systems, including response time contributions of each step.

    Tivoli Composite Application Monitor for Response Time Tracking helps you to adopt an end-user’s perspective to monitor and measure performance. It follows the application’s path to help speed problem resolution. It automatically learns the environment and establishes response time thresholds. It will help you to validate the service level delivered to the end user.

     

    Wily Introscope

    Wily Introscope enables you to monitor complex Web applications end-to-end. Use it to manage mission-critical applications from the browser to application components to back-end systems. Introscope can ensure manageability of the entire portal workflow in production. It can also isolate problems with individual portlets. Introscope provides you with the capability to create custom dashboard views of the entire application infrastructure, including Java applications, application servers, Web servers, messaging middleware, databases, and transaction servers. Refer here for more information:

    http://www.wilytech.com/solutions/products/Introscope.html

     

    SurfAid

    SurfAid is a services offering that has three options. The Executive Metrics product offers Web site analytics and reporting. The Publishers product adds the COUNTER code of practice onto the metrics. An advanced tool called Analysis adds ad-hoc queries and the ability to create dynamic reports onto the Metrics product. All SurfAid products work by transferring Web log files to a SurfAid facility on a daily basis. The data is mined and the results are stored in a relational data warehouse. Reports are generated and data is queried using a Web interface. Refer to this link for more information:

    http://www-928.ibm.com/web/home/index.html

     

    Ascera Manager 5 for Portal

    Ascera Manager for Portal is a production monitoring and diagnostic tool. It has a Discovery engine that automatically discovers new WebSphere Portal servers in a cluster. It can optionally integrate with other network management tools such as HP OpenView, Tivoli, CA Unicenter, and BMC Patrol. Ascera’s goal is to monitor complex portal applications from the top down, focusing on business units of work and subprocesses.

     

    Summary

    While all of the tools provide good real-time analysis of what is happening, they do not provide any insight into “what is normal” (except for Tivoli Performance Viewer wizards.) There is no easy answer to this question because it can be very application and environment dependent. Good performance evaluation generally requires the help of an experienced WAS or WebSphere Portal person, or both. IBM has not looked deeply at specific performance measures because of the breadth of this subject. Again, this requires deep experience in J2EE, WebSphere, and WebSphere Portal to assist with this effort. The performance degradation on most of the tools discussed here appears to be 3-5%. Deeper monitoring (or sampling more often) can affect these estimates. IBM has seen empirically that running production systems at very, very high load and using a monitoring tool can alter system performance and perhaps even cause system crashes.

     

    Sample workshop agenda

    Date and time Topic
    Day 1: Requirements and portal capabilities
    8:45 – 9:00 Introductions and agenda
    9:00 – 10:00 Business requirements Portal project team structure
    10:00 – 11:00 Portal application requirements
    11:00 – 12:30 Existing system architecture and integration approaches
    Lunch
    1:30 – 2:30 Technical requirements (performance, availability, and so on)
    2:30 – 4:00 Portal features summary (demonstration optional)
    Day 2: Architecture
    9:00 – 12:00 Portal architecture best practices Logical architecture
    Lunch
    1:00 – 4:00 Architecture white boarding session
    Day 3: High-level application design
    9:00 – 11:00 Portal application design best practices Development structure, roles, responsibilities
    11:00 – 12:00 Portal design/interaction, specific portlets (list) Lotus collaborative portlets Content management Other “canned” portlets Custom portlets
    Lunch
    1:00 – 4:00 Portal application white boarding
    Day 4: Portal development and follow-up discussions and generate documentation
    9:00 – 12:00 Project plan review and risk assessment
    Lunch
    1:00 – 4:00 Portal operations and deployment considerations (administration, monitoring, portal solution release process)
    Day 5: Follow-up discussions and generate documentation
    9:00 – 12:00 Workshop wrap-up

     

    Sample portal tracking worksheet

    In this appendix, you discuss a sample portal tracking plan that we can use for the portal planning. Due to the size of the tracking worksheet, you provide a link to the source.

    This worksheet assists in tracking the progress of the project without using a full-fledged project plan or software. This highly customizable document helps you list and estimate pieces of work very early in the project. Then, use this data to assign work to different developers on the team and to track their progress during the project. This approach provides the following benefits:

    1. A spreadsheet is very easy to use and maintain, unlike the more formal project plan which goes stale and is never updated.
    2. Project managers appreciate when you have initial tasks and estimates early in the project for them to feed into their project plan.
    3. The form can be used to feed into design and to track the progress of the team. The lead developer or architect is usually the one to build and maintain the spreadsheet. This lead can use it to assist in breaking out different components (portlets, services, and other components) that will be needed in the design.
    4. The form is compact (usually one or two pages) that can be carried around and updated with the current status as you talk to team members. Once or twice a week, the changes can be incorporated into the document and a new one printed. This will probably occur daily in the early stages of the project, that is, in the design phase.

    Refer to the following Web page for this worksheet:

    http://www.ibm.com/developerworks/websphere/library/techarticles/0511_bernal/0511_bernal .html#download

     

    Portlet sourcing

    The appendix includes a portlet sourcing exercise and worksheet.

     

    Exercise

    Consider adding the portlet sourcing process to yours. It can save you a lot of time (two months on average).

    Portlet sourcing is a critical activity for a successful project.

    The benefits of portlet sourcing include:

    1. The primary basis for accurate project sizing
    2. A great way to communicate the organizations needs effectively
    3. The primary source of portlet requirements for developers

    This exercise describes the three step portlet sourcing process:

    1. Start by identifying a default Web page.

    2. Next, you identify which areas you want to include in the portal. You give each portlet a label.

    3. This document describes each portlet needed for this page. Research and record items such as the data provider, state, and personalization requirements.

    See Appendix B, “Sample portal tracking worksheet” for a worksheet for use in the next project.

    Allow ample time to complete the portlet sourcing exercise. Typically, plan a week for wire frames and a half day per unique portlet. Use wire frames and state transition diagrams to document and validate the user experience.

    One of the most difficult items to ascertain is how to bring content into the portal. This exercise will help you to identify:

    1. The content source (where the content is and who owns it)
    2. How you are going to get the content into the portal

    Be sure to include end users in the planning sessions and not just the business analyst. This might save rework time because the end users are the audience you will be trying to please in the pilot phase.

     

    Portlet Sourcing Worksheet

    Portlet sourcing worksheet Hints
    Summary information
    Portlet title News feed.
    Portlet ID
    Ownership
    This worksheet completed by John Smith.
    Last worksheet update June 13, 2005.
    Portlet development assigned to
    Portlet codeveloper/reviewer (if any)
    Portlet tester
    Target portlet completion date
    Navigation
    Location (on which pages) Home.
    Primary model Cooperative (affects other portlet's states), or wizard (changes state/appearance based on user interaction).
    Sends/receives messages with which other portlets (Click-To-Action and portlet messaging ) No.
    Access control
    Who can see it All users.
    Who can edit “edit” data Administrators.
    Who can modify configuration data Administrators.
    Who can place it on pages Administrators.
    Content
    Single or Multiple items displayed on portlet Multiple.
    If multiple, number of items to display 5-10 news items.
    Sort order of content (or random)
    Filtering: By date No.
    Filtering: By user? (See personalization below.) No.
    Uses people awareness No.
    Uses Click-To-Action No.
    Content includes links to more detailed information. Yes.
    If links, expected behavior when clicked Content appears in new window.
    Customization/personalization
    Display portlet based on LDAP group membership. Everyone sees.
    At portlet content level Using user- configured selections.
    Multiple language support One language.
    Languages
    Multiple device support/markup languages One markup language.
    Markup languages
    Data owner/source
    SLA in place with owner No.
    Risk of change in data format Slight.
    Risk of availability problems Moderate.
    Development
    Portlet source Configured.
    If Portal Catalog, name and navcode from catalog Catalog: Typically minimal configuration required.
    If “Configured,” type of portlet

    Configured: No Java development, but requires significant configuration. For example, Web Clipping.


    If “Developed,” tooling to develop it

    Developed: Developed using Java tooling.


    Needs to support JSR 168 No.
    Needs to support Web Services Remote Portlet No.
    Number of states

    “States” are different appearances of the same portlet as you interact with it; we can have one JSP per state, for example.

    Modes/window states
    View Yes.
    Configure No.
    Edit No.
    Help Yes.
    Print

    Not really a mode, but do be able to print just this portlet

    No.
    Special (different view) for maximized state

    Maximized state might want to take advantage of more screen space.

    No.
    Special (different view) for solo mode

    “Solo” is the rendering of a portlet in a browser by itself, typically without header and navigation elements.

    No.
    Testing
    Test data source Production data.
    If test data, does that data source have to be created
    Is portlet or data source likely to be performance-sensitive

    Multiple queries of multiple databases of data that cannot be cached are a performance- sensitive portlet. Be sure to test the portal with and without performance- sensitive portlets to understand their impacts.

    Yes.
    Caching
    Cacheable Yes.
    Same cache for all users Cache is same for all users.
    Will you cache data feed Yes.
    Will you allow caching in portlet.xml Yes.
    Will you use the WAS dynamic caching system (dynacache)

    This enables command caching of database queries, dynamic Java object caching of JSPs, servlets and so on, and edge caching.


     

    Solution Assurance Checklists

    1. Checklist: Requirements
    2. Checklist: Product capability
    3. Checklist: Solution evaluation criteria
    4. Checklist: Design requirements
    5. Checklist: Configuration
    6. Checklist: Solution characteristics
    7. Checklist: Solution components
    8. Checklist: Implementation and operation
    9. Checklist: Services and education
    10. Checklist: Skills assessment
    11. Checklist: Security assessment
    12. Checklist: Performance assessment
    13. Checklist: Development environment
    14. Checklist: Legacy data requirements
    15. Checklist: Hardware configuration
    16. Checklist: Communications and networking
    17. IBM Skills Assessment Ratings

     

    Checklist: Requirements

    Activity: Requirements Y/ N Factors increasing risk Factors reducing risk Task owner Due date
    Organization and executive sponsor identified?




    Assumptions and risks understood and documented?




    Present business environment/background understood and documented?




    Baseline requirements:

    • Functional
    • Operational
    • Performance
    • Scalability and workload understood, documented, and signed off?
    • Business process integration requirements understood and documented?
    • Web content management requirements understood and documented?





    Conditions of satisfaction and acceptance criteria are known, documented, and agreed?




    Business benefits and drivers understood and documented?




    Staffing/resources available to support this solution?




    Discuss a readiness plan before implementing WebSphere Portal .




     

    Checklist: Product capability

    Activity: Product capability Y/ N Factors increasing risk Factors reducing risk Task owner Due date
    Have the overall architecture and software products needed to support the solution been determined?




    Can WebSphere Portal meet availability, scalability, and workload requirements? Are the requirements documented and signed off?




    Can the published capabilities and functionality of WebSphere Portal server meet general expectations?




    Review of Information Center for product capability. Contact a Techline specialist through the sales or marketing branch to size the environment or call Techline Phone Support at 1-888-426-5525.




     

    Checklist: Solution evaluation criteria

    Activity: Solution evaluation criteria Y/ N Factors increasing risk Factors reducing risk Task owner Due date
    What kind of project is being proposed?

    • First portal installation.
    • Migration of previous version of WebSphere Portal
    • Migration of another vendor’s portal to 5.1:

    If a migration, are plans for migration of processes in place?

    Have all other product-related Solution Assurance Reviews been completed?

    Have all migration tools for WebSphere Portal been reviewed?

    Is there a fallback plan if migration is not successful?






    Will the portal involve:

    • Migration of non-WebSphere Web application to portal model?

    • Migration of WebSphere Web application to portal?

    • New application?

    • Migration of large amounts of data from one database/source to another and is there a plan to handle it?





    What types of functionality will the portal use?

    • Customized out-of-box portlets (excluding Web Clipping).

    • Web Clipping.

    • Inter-portlet communication.

    • Portal document management.

    • Virtual portal. (Note: If realm support is required, be aware that multi-LDAP support will not be available until a post-GA fix pack).

    • Business process integration.

    • IBM WWCM (if the response is Y, include the SAR guide for WWCM V2.5 and with this review).

    • Web services

      Support for Web Services Remote Portlet V1.0 is included in WebSphere Portal

    • Web services proxy portlet.

    • Credential vault.

    • HOD portlet. Is the HOD server installed and configured? Is the communications server needed?

    • HATS portlet.

    • Prebuilt portlet from catalog access to back-end systems.

    • Custom portlet development access to back-end systems.

    • Lotus Extended Search.

    • Other portlets shipped with WebSphere Portal (specify).

    • Other portlets from the portlet catalogue (specify). Is their functionality (and any limitations) known and understood?





    Are requirements for branding the look and feel of the portal known? Customizing themes and skins might be a non-trivial piece of work.




    Will JCA adapters be required? Are they pre-written or will they need to be developed?




    Will WebSphere MQ connectivity be required?




    Will pervasive devices be used to connect to the portal?




    Are unusually severe consequences possible if the solution implementation is unsuccessful or delayed, or is the solution considered to be mission critical?




    Are you aware of or do you anticipate any conditions that might impair the ability to deliver the solution successfully?




    Are you aware of any conditions that might reduce the likelihood of feeling very satisfied with the solution, even if it is successful?




    Is the solution complex?




    Has a project leader with overall responsibility for coordinating this project been assigned?




    Has an overall technical leader, with overall responsibility for development of the project, been assigned?




    What stage is the project currently in? Consider the scope of the required solutions assurance appropriately:

    • Preliminary research
    • Design and architecture
    • Development
    • Test and deployment





     

    Checklist: Design requirements

    Activity Design requirements Y/ N Factors increasing risk Factors reducing risk Task owner Due date
    Does the design meet the functional requirements?




    Does the design meet the performance requirements?




    Does the design meet the availability requirements?




    Does the design meet data backup and recovery requirements?




    Does the design meet upgradability requirements?




    Does the design meet scalability requirements? See the clustering section of the Information Center.




    Has anyone been involved with a similar project before?




    Is a benchmark or proof of concept required? If so, has it been resourced and funded?




    Does the design or solution depend on an unannounced product? List which products.

    What will be the source of best practice guidance used to design this solution? Development guides, IBM Redbooks, labs, and so on.




     

    Checklist: Configuration

    Activity: Configuration Y/ N Factors increasing risk Factors reducing risk Task owner Due date
    Have you read the WebSphere Portal Release Notes?




    Have all fixes noted in the release notes been downloaded?




    Proposed IBM software and hardware configured and documented?




    Proposed non-IBM hardware and software configuration documented and verified (might require a safety review and/or disclosure agreement)? Note: If Citrix software is part of the solution, and Internet Explorer running on top of a Citrix server is to be shared between Citrix clients, this is not a supported WebSphere Portal client. We can run Internet Explorer side-by-side with a Citrix client on the user’s workstation.




    Have the configurations been produced or checked by a specialist experienced in using the configurator?




    Have the appropriate versions of all required software been included?




    Has software and hardware compatibility been checked?




    Has the overall architecture and application flow of the solution been determined?




    Have the performance, scalability, load balancing, and high availability requirements been documented?




    Do you have an active network connection that supports TCP/IP?

    • Yes, to an intranet.
    • Yes, to the Internet.





    Are there other software products, (excluding prerequisites and corequisites) that will be integrated into the solution (both IBM and non-IBM)?




    Have the security requirements been documented, (for example, firewalls, authentication, user authorization, and so on)?




    Root user, or a user with root authority must be used to install much of the software.




    Does the user who will be performing the installation have permission to create/update users and databases? Need to have requisite permissions for LDAP and database.




    Will a proxy server be involved in the installation?




    Communication ports:

    • Are all communication ports that will be used, understood, defined, and documented?
    • Will all ports be open during software installation?





     

    Checklist: Solution characteristics

    Activity: Solution characteristics Y/ N Factors increasing risk Factors reducing risk Task owner
    Due date
    Have you built adequate time into the project schedule for load/scale/performance testing?




    Is overall raw performance an issue?




    Which version control system is to be used?

    • CVS
    • ClearCase
    • Other (specify)
    • No version control





    Will workflow be used in WWCM?




    Will workflow be used in Portal Document Manager?




    Is the solution or application required to be up and running 24x7?




    Have you consulted the IBM Support Web site to see if there are any reported problems or available fixes/fix packs that might affect the success of the project?




     

    Checklist: Solution components

    Activity: Solution components Y/ N Factors increasing risk Factors reducing risk Task owner
    Due date
    Have you identified the Web server to be used as part of the solution?






    Have you chosen the database that will host the portal configuration data?

    • Yes, but the specific vendor and version have not been checked against the list of supported databases.
    • Yes and the database is one of the supported databases. Review the requirements for database, instance, and user names, and specific parameters for the appropriate database manager in the WebSphere Portal Information Center. Pay attention to the Oracle pre-WebSphere Portal installation setup.
    • Yes, but the database is not one of the supported versions or databases.





    Have you identified the type, if any, of LDAP that will be used?

    • Yes. Review the LDAP installation section of the WebSphere Portal Information Center.
    • Yes. If using Lotus Sametime, review “Setting up an LDAP connection in a Domino environment” in the Sametime Administrator's Guide at http://www.lotus.com/ldd/notesua.nsf/f ind/st30
    • Yes. Verify that the WebSphere Portal LDAP schema will fit into an existing LDAP.
    • Yes. A custom user registry (CUR) will be used.





    Have you identified the platforms and operating systems on which the Web server and application server will run?

    • Yes, but the OS version (including required patches) and hardware have not been checked against the list of supported run times.
    • Yes. The OS version (including required patches) and hardware are in the list of versions supported by WebSphere Portal.
    • Yes, but the OS version (including required patches) or hardware is not in the list of officially supported versions.





     

    Checklist: Implementation and operation

    Activity: Implementation and operation Y/ N or N/ A Factors increasing risk Factors reducing risk Task owner Due date
    Does a project plan exist for the implementation and has it been produced or reviewed by someone with relevant experience? Has a project manager been included?




    Has a post sales support contract been included in the proposal to cover the operational hours?




    Is it explicitly stated who is responsible for systems management, both defining the processes and tools and providing the services (change/problem/performance/availability/cap acity/operations)?




    Are there any formal acceptance tests? Who owns them and has sufficient resource been planned to produce, negotiate, and execute them?




    Has a training and education plan been defined and documented?




    What are the plans for a test environment? If the no plans, mark this as a high-risk item in this document and document an action plan and owner to address this.




    What are the plans to test this technical solution? Do the plans incorporate use of:

    • A sophisticated load simulation tool
    • Creation and use of load scripts
    • Creation and execution of testing scenarios
    • Adequate knowledge of representative use cases If no plans, mark this as a high-risk item in this document and document an action plan and owner to address this.





    Will the solution be stress tested?




    Are there plans for final performance tuning of the application server? Review the Performance Tuning Guide at: http://www.ibm.com/support/search.wss?rs= 688&tc=SSHRKX&q=tuning




    Has time for testing been allocated for this in the project plan?




    What plans are in place to generate new test scenarios when new code is written during any stage of the application?




     

    Checklist: Services and education

    Activity: Services and education Y/ N Factors increasing risk Factors reducing risk Task owner Due date
    Are there any factors that might prevent successfully completing the proposed project by themselves?




    Are services as part of the solution?




    Who will provide the services if any will be used, and is there a formal commitment to provide the services?




    Has the services provider quoted for the services and has this been included in the proposal? Does the service provider have adequate skills?




    Where more than one provider is responsible, is it clear who is responsible for what? Are completion criteria clear?




    Does the solution include education that addresses the areas of application development, implementation, as well as production, operations, and administration?




    If education is included, has an appropriate education provider been identified and engaged?




    Has IBM Services been involved in the bid process?




    Does the bid cover products other than WebSphere Portal? For example, Tivoli Access Manager, IBM Content Manager, and so on? If so, has a services engagement been included? You strongly recommend that you include a services engagement to ensure a successful installation and rollout.




    Review the support Technotes: http://www.ibm.com/software/genservers/po rtal/support/




     

    Checklist: Skills assessment

    Activity: Skills assessment Y/ N Factors increasing risk Factors reducing risk Task owner Due date
    Have at least some project members successfully completed any earlier projects using WebSphere Portal?

    • Yes. At least some project members have previous WebSphere Portal experience.
    • Yes. At least some project members have previous WebSphere Portal V5.0 experience.
    • Yes. At least some project members have previous WebSphere Portal V4.1 or V4.2 experience.





    Does the architecture of the overall system require complex WebSphere administration (for example, clustering, fail-over management, and performance tuning)?




    What skill level is available for architecting the application server (0-5)?




    What skill level is available for administering the application server (0-5)?




    What skill level is available for administering the Web server (0-5)?




    Does the project team contain members with appropriate Lotus product skills?

    • IBM WWCM
    • IBM Lotus Sametime
    • IBM Lotus QuickPlace
    • Set up and use of Lotus Domino Server
    • LDAP





    Review the Notes/Domino documentation
    Review the Lotus Sametime documentation
    Review the Lotus QuickPlace documentation





    Are Java programming skills required for this project? If so, what skill level is available (0-5)?




    Are JSP/servlet development skills required for this project? If so, what skill level is available (0-5)?




    Are Portlet development and packaging skills required for this project? If so, what skill level is available (0-5)?




    What level of Portlet administration skills are available (0-5)?




    Are EJB development and packaging skills required for this project?

    • If so, what skill level is available (0-5)?
    • If so, have the transactional capabilities been documented?
    • What types of EJBs are in the solution?


        Stateful
        Stateless
        Entity bean managed
        Entity container managed





    Are custom user registry skills required for this job?

    • If so, what skill level is available?
    • If so, does the team have experience working with and coding to the interfaces for the WAS CUR and the WebSphere Member Manager Repository?
    • Contact the WebSphere Portal Development lab through a PMR if more information is needed on the WebSphere Portal Member Manager Repository requirements.





    Are Web service development/deployment skills required for this project? If so, what skill level is available (0-5)?




    Is performance tuning required for this project? If so what skill level is available (0-5)?




    What skill level is available for LDAP administration/integration (0-5)?




    What skill level is available for Netegrity SiteMinder integration (0-5)?




    What skill level is available for Tivoli Access Manager integration (0-5)?




    What skill level is available for other third-party authentication and authorization integration (0-5)?




    Are JCA skills required for this project? If so what skill level is available (0-5)?




    Are WebSphere MQ skills required for this project? If so what skill level is available (0-5)?




    What skill level is available for pervasive device support (transcoding) (0-5)?




    Does the project have at least functional admin skills for each operating system running some solution component (portal server, Web server, app server, admin repository, data store, enterprise server)?

    • Yes, but not on all operating systems.
    • Yes, functional admin skills are available for all necessary operating systems.





    Does the project have at least functional DBA and other required admin skills for each database to be used by the solution (includes admin repository and other application databases)?

    • Yes, but not for all databases.
    • Yes, functional DBA and admin skills are available for all databases to be used.





     

    Checklist: Security assessment

    Activity: Security assessment Y/ N Factors increasing risk Factors reducing risk Task owner Due date
    Does the solution require separation of the Web server and application server?




    Will it be necessary to secure resources on the Web server?




    Will further security (such as basic or certificate) other than the form based portal login be required?




    Has the appropriate security for the application server been assessed and decided upon?




    What type of user authentication is required for the project?

    • Using RDBMS (for example, DB2, Oracle); no LDAP.
    • LDAP; state which directory.
    • Supported third party; describe.
    • Unsupported third party; describe.
    • Custom user registry/member repository; describe.





    Will the credential service be used and has the mechanism to communicate with enterprise authorization systems been established?




    Will the credential service use the credential vault.




    Has the level of SSO implementation been agreed upon?




    Does the user registration process need to be customized (for example, validated against third-party process, database, transaction system, and so on) and is it understood how this will be accomplished?




    Will reverse proxies be used? For example, many sites front their portal server with WebSEAL. WebSEAL can function both as a reverse proxy and also has a Tivoli Access Manager plug-in such that it will communicate directly with Tivoli Access Manager for authentication and authorization.




     

    Checklist: Performance assessment

    Activity: Performance assessment Y/ N Factors increasing risk Factors reducing risk Task owner Due date
    Has the documentation section about optimizing performance been read?




    Has the Capacity Planning Guide for WebSphere Portal 5.1, WebSphere 5.x, the chosen database, and LDAP server been read?




    Have you considered caching and edge of network servers?




     

    Checklist: Development environment

    Activity: Development environment Y/ N Factors increasing risk Factors reducing risk Task owner Due date
    Which integrated development environment (IDE) is to be used for the project?






    Does the chosen development environment provide integrated source level debugger, persistence mapping to legacy data, and easy deployment?




    Does the chosen development environment provide integrated source level debugger and test environment for portlets and easy deployment of portlets?




     

    Checklist: Legacy data requirements

    Activity: Legacy data requirements Y/ N Factors increasing risk Factors reducing risk Task owner Due date
    Which of the following enterprise servers are to be accessed by the application?

    • PeopleSoft.
    • Oracle.
    • WebSphere MQ.
    • Third-party Web content management (state which server).
    • SAP.
    • Baan.
    • Siebel.
    • J.D. Edwards.
    • Other; describe.

    Indicate skill level (0-5) for administering each system that will be accessed






    Will this application include model business processes that span multiple back ends?




    Is two-phase commit a requirement of any of the business processes being modeled?




    Are the Portlet Builders in the WebSphere Portal Application Integrator package being considered as the integration point with Portal?




     

    Checklist: Hardware configuration

    Activity: Hardware configuration Y/ N/ NA Factors increasing risk Factors reducing risk Task owner Due date
    Has the hardware configuration been sized and verified?




    Have the following item been taken into consideration?

    • Cost.
    • Number of users, applications, nodes, partitions.
    • Footprint (physical size of the machines).
    • Scalability and high availability.
    • Does each machine meet the component hardware requirement?
    • How many machines will be required?
    • Redundancy required for each machine?





    Does the machine have multiple LPARs? This adds another layer of complexity.




    Have hardware requirements for other software needed to implement the full solution been documented and verified?




    Does the fully configured machine allow for expected growth (vertical scalability)?




    Will the planned configuration provide the required throughput and response time?




    Could the configuration be expanded to meet future growth?




    Are client, server, and host connectivity options planned?




    Are the availability requirements known and documented and does the proposed configuration deliver this?




    Are hardware maintenance requirements known and agreed?




    Have components been identified for the systems backup and recovery procedures?




    Have system management tools and services been proposed?




    Have required system management consoles been configured?




     

    Checklist: Communications and networking


    Y/ N Factors increasing risk Factors reducing risk Task owner Due date
    Is the appropriate network infrastructure in place

    • LAN
    • WAN
    • router
    • bridge
    • cabling





    Are network schematics documented and available?




    Have the appropriate connectivity options been explored and decided upon?




    Has the communication protocol been decided upon?




    Are additional networking hardware devices (routers, switches, and so on) required to implement the solution?




    If pervasive device support is required, are the appropriate gateways or service providers in place and understood for accessing the secured network?




    Will a Domain Name System (DNS) be employed in the network?




    If firewalls are employed, will specific ports be able to be opened?




    Note: Is anyone aware of any risks or issues that have not been adequately explored?Is there any other information that would be useful for the SA team members to be aware?

     

    IBM Skills Assessment Ratings

    Level 5 Comprehensive knowledge with ability to make sound judgments. Can give expert advice and lead others to perform. Extensive and comprehensive experience.
    Level 4 In-depth knowledge and can perform without assistance. Can direct others in performing. Repeated and successful experience.
    Level 3 Can perform with assistance. Has applied knowledge. Has performed with assistance on multiple occasions. Has performed independently in routine situations.
    Level 2 Limited ability to perform. Has general knowledge only. Very limited experience.
    Level 1 Limited knowledge. No experience.
    Level 0 No knowledge. No experience.
  •  

    Online Resources