![]() ![]()
|
|
| |||
|
Management Marathon Profiles of CA and Tivoli users shed light on the tortuous road to network and systems management nirvana.
By Elisabeth Horwitt Whether you choose Computer Associates International, Inc.'s Unicenter TNG or Tivoli Systems, Inc.'s TME enterprise management framework, count on deployment to be costly, onerous and slow. On the other hand, if implemented properly - and that's a big "if" - the frameworks will help you automate, centralize and streamline your network and systems management operation. That's the gist of our site visits to two successful implementers of the two major frameworks: Allegiance Corp., a medical products manufacturer in McGaw Park, Ill., which went with CA's Unicenter TNG, and Charles Schwab & Company, Inc., a San Francisco-based brokerage that invested in Tivoli's TME. In some respects, the users painted a different picture of the frameworks from the one you'll get from most industry analysts, who routinely rap the massive management products as being overly expensive and ineffective. Yes, the products are expensive, although coming up with specifics on price is nearly impossible given the degree of customization required. Jeff Hersh, manager of Deloitte & Touche Consulting Group, a New York-based network consultancy, says the framework itself can cost between $1 million and $4 million for a Fortune 500 enterprise. Once you buy additional modules and factor in design and deployment, which takes anywhere from six to 18 months, the implementation costs quickly surpass the salaries of most professional athletes. Analysts also knock the frameworks as not living up to their promised levels of integration, ease of use, consistency and distributed capabilities. Allegience and Schwab would beg to differ, having seen firsthand the efficiencies the frameworks have brought, although they will agree there is more to be done. Isn't there always? The users would have to agree the products are complex, however. A recent survey by Gartner Group, Inc. found Unicenter TNG and TME installations only have a 30% success rate. Schwab was one of the unfortunate 70% at one point; it took two tries for the company to get TME right. The Allegiance and Schwab stories should help you understand what these buyers hope to gain and how much effort it will take to make it happen.
SCHWABWhen an enterprisewide software implementation fails, rarely do the implementers immediately turn around and decide to have another shot with the same product. Charles Schwab & Company, Inc. did exactly that, however, after its first deployment of Tivoli Systems, Inc.'s TME enterprise management framework sputtered in 1993. Back then, the platform never achieved wide usage at Schwab - partly because it was an early version that offered more promise than substance and partly because the original implementers made some mistakes, says Richard Weiss, architect of enterprise management systems at the San Francisco-based brokerage. Schwab took another shot 18 months ago and implemented TME 2.0, now part of the IBM TME 10 network and systems management family. Despite the company's prior experience, the second deployment was far from a walkover. "When Tivoli says TME is a framework, it means it,'' says David Bruce, a senior staff member in Schwab's Phoenix data center who was a major participant in the second TME deployment. What comes out of the box are the software equivalents of "concrete and lumber,'' he says, basic building blocks that require a great deal of customization. Two to two and a half full-time IT workers spent about a year defining and designing the TME basics, including standard policies and procedures, thresholds and automated responses, Bruce says. Weiss declined to say how much Schwab paid for TME, citing special discounts for being a beta user and the fact that the second deployment built on existing TME software. However, the management software alone easily can run into seven figures for a major installation like Schwab's, Weiss confirms. Prices range from $2,000 per managed server and $75 per managed client, he says. Given that Schwab has more than a thousand IBM AIX, Windows NT and Sun Solaris servers located at six computing centers and some 10,000 end users at 300 branches nationwide, that comes out to one hefty software tab. And the software wasn't the most expensive part of the rollout, Weiss says. Far greater were the human costs of planning, design, implementation and training. "TME in its present form is very powerful but requires a lot of configuration work up front to make it usable,'' Weiss says. "It takes commitment on the part of implementers and a leap of faith by the business people funding it. [IT] has to communicate to business people that this is something that will require a lot of work and will take months before you start to see a return on investment [ROI].'' For proprietary and competitive reasons, Bruce and Weiss refused to divulge ROI figures, other than to say the investment company is achieving the cost efficiencies and proactive management that prompted Schwab's initial TME purchase. Indeed, the benefits have been substantial for other large companies that implemented TME, according to a 1997 report by International Data Corp. Commissioned by Tivoli, the study of 10 global companies found the sites gained a payback on total implementation costs in an average of 115 days. The annual savings per 100 users came to $14,065 in management efficiency, $117,533 in management productivity and $89,769 in availability.
Trying again with TMEBruce, who joined Schwab when the TME rollout was in progress, is charged with running a Tivoli Managed Region, or group of servers, at the Phoenix data center. There, for example, staffers have programmed TME agents to automatically take care of 95% of the alarms, Bruce reports. As a result, "instead of getting called three times a night, we only get called for the exceptions,'' he says. Agents also have been programmed to handle housecleaning chores such as periodically cleaning out systems logs and responding to common problems. Relieved of fire fighting and grunt work, six administrators can effectively manage 163 systems, Bruce says. While keeping people costs down is important, a far more crucial justification for TME, as far as management is concerned, is data center managers' improved ability to meet or exceed service-level agreements (SLA), Weiss says. Schwab's SLAs basically guarantee users that network performance variables won't fall below certain levels. Optimizing response time and availability has become particularly crucial to Schwab now that more and more customers access its systems directly via the World Wide Web instead of having a service representative do it for them, Weiss says. Phil Wade, Schwab's vice president of enterprise management systems, oversaw the second TME implementation, while Weiss led the technical work. Joining the company after the first TME try, Wade agreed with his predecessors that Schwab needed a distributed, object-oriented framework to manage its mix of resources. "One of the driving reasons we wanted a product like Tivoli is to get early notification when something goes wrong so we can fix it before it impacts our customers,'' Wade says. When Schwab chose TME in 1993, Tivoli was one of the few vendors promising anything close to a distributed network and systems management framework. But Weiss says the early version of TME fell short of its promise. For example, there was no cross-platform management from a central console, he says. It also was impossible to establish interaction between distributed management domains, or "regions,'' as Tivoli calls them. "It was not possible to build a large-scale distributed management system,'' Weiss says. So why did Schwab choose TME the second time around? The investment it already had made in the product was part of the reason, but more important, "Version 2.0 was a big improvement - much more of a framework,'' Weiss says. Furthermore, the first try gave the enterprise management systems group a good grounding in the ins and outs of TME. For those reasons, Schwab never seriously looked at a rival enterprise management framework.
The secret to Schwab's successThe second implementation succeeded thanks to significant improvements to TME and the way IT handled the deployment, according to Weiss. In particular, IT learned that getting management policies and procedures right in advance was at least as important as correctly deploying and operating the actual product. Policies implement the terms of the all-important SLAs, such as maximum acceptable response time and hours of availability for various critical business applications. The trick is defining what constitutes a problem, who gets notified and how to escalate it, Weiss says. For example, CPU utilization of 90% may be unacceptable for one application, while the thresh-old is 50% for another. Once the threshold is exceeded, does a management agent initiate action? If not, which person, node or application does it alert? The management systems, systems engineering, operations and application development staff all cooperated to define policies and processes because they all share responsibility for managing and ensuring optimal availability and response time levels for corporate systems, Weiss says. It took the group about six months to design and set up the basic policies that would be consistent across the enterprise. "We were probably a little lax in the beginning in ensuring uniform implementations across the enterprise,'' Weiss concedes. From a technical perspective, policy deployment was made fairly easy by TME's object-oriented, agent-based architecture, Weiss says. By subscribing to a group, managed objects can inherit characteristics of other objects in that group, such as a CPU usage threshold for a server or a security level for a user. A group could be defined as all servers supporting a particular mission-critical application or all users at a certain managerial level. Object orientation allows IT to define policy at one location and propagate it to TME agents across the enterprise. TME also includes prewritten scripts for basic tasks such as monitoring and automatically responding to key events such as when a threshold is exceeded. Schwab programmers then customized those scripts to the company's management parameters. More difficult was getting individual user groups to implement the standards, Bruce says. Some highly profitable divisions have resisted standardizing, he says. "But they're beginning to understand the business costs of not adhering.'' Consistency of both basic management policies and system configurations is crucial to TME's long-term success at Schwab, Bruce and Weiss agree. "End-to-end systems and network management depends on standardizing the way in which the system is going to tell you when it is down, what it tells you and how,'' Bruce says. If his machine sends a different type of alarm from that sent by the same type of machine in the Denver data center, the systems monitoring and alarm correlation programs won't be able to figure out what's going on, he says. Standardized policies were a comparative snap to implement because TME can propagate the definitions across the enterprise. The next phase of defining and manually configuring customized scripts and thresholds for specific applications and projects was much tougher and more time-consuming, Bruce reports. Even with standardization, the scope of the job is monumental, simply because of the number of different processes TME monitors on an individual server - 40 in all. It takes 20 to 30 minutes to set up each different type of information to gather, Bruce says. For example, agents need to be set up to monitor individual daemons, or processes on a server, and notify somebody if one dies. "For an Oracle [Corp.] application or an operating system, you're talking five or six daemons,'' Bruce says. At the Phoenix data center, it takes about 40 hours to set up a standardized TME configuration managing a new project, which might comprise a server running Oracle and a couple of applications, Bruce says. "If you need to customize, you're talking 80 to 120 hours.''
Room for improvementIn addition to the time it takes to configure and customize TME, the product has other shortcomings. For instance, Bruce is eager to try Tivoli's recently announced lightweight client. The regular TME client software consumes about 80M bytes on every managed machine, while the lightweight client requires significantly less resources, he says. It accomplishes this by placing much of the client modules on the server. That takes some weight off the managed systems but adds traffic to the network, Bruce points out. "We wouldn't want to use [the lightweight client] wholesale.'' Weiss puts deeper integration between TME and a broader range of third-party management tools and systems high on his wish list. Tivoli has published APIs that enable third-party vendors to integrate their tools into TME. And like CA, Tivoli has hundreds of vendor partners. "But there is a wide range of integration levels you can build into a product and still be a partner,'' Weiss points out. Obviously, the TME framework is integrated most tightly with its own core modules. Weiss likes to use the Courier file distribution module to deploy new versions of the framework or various underlying applications. The asset management module can work with Courier to prequalify a target node's configuration, for example. This is by no means a given if you choose a third-party asset management tool, he says. Third-party software boasting tight integration with TME is beginning to emerge, Weiss says. Known as Plus Modules, these products can take advantage of core TME capabilities. For example, a user at a TME console can interact with a third-party management application to remotely configure a threshold on, for example, a router, or send a command down to take that router offline. One such Plus Module, Legato Systems, Inc.'s Networker, allows users to monitor the status of backup jobs right from TME, while Remedy Corp.'s ARS can open trouble tickets based on events received from TME. TME provides a satisfactory level of integration with Schwab's network management system, Hewlett-Packard Co.'s OpenView, according to Bruce. The Event Console allows managers to look at both types of alerts. In addition, TME can correlate network and system alerts, letting an administrator know, for instance, that he should be getting alarms from certain machines if a router goes down. Weiss wants more. "What I would like to see is full topology integration, where Tivoli can have access to topology maps and managed objects of OpenView,'' he says. If Tivoli succeeds in providing tight integration between TME and IBM's AIX-based TME 10 NetView, Weiss says he reluctantly would consider migrating to it from OpenView. IBM has been slowly integrating NetView with TME under its TME 10 product umbrella. Right now, however, much of Schwab's energy is concentrated on setting up application management under TME. For example, Bruce's group has developed a utility that automatically monitors application event logs. And Weiss' group has deployed many of the agents to monitor the network devices and servers on which distributed applications run. What the group has been eagerly waiting for is the recently released TME 10 Global Enterprise Manager (GEM), which promises to help TME manage application SLAs in relation to the underlying servers and network systems across the enterprise. GEM can discover application-level details and use the accompanying Applications Policy Manager to propagate policies down to that level. GEM also includes the Application Management Specification, which defines standardized formats for describing an application's management characteristics. AMS could turn out to be the systems management equivalent of Management Information Bases, Weiss says. First, however, Tivoli needs to establish the specification as an industry standard among independent software vendors, where support has been slow in coming. Weiss' group is fairly confident, however, that Tivoli will deliver what it takes to integrate application management into the enterprise context. Overall, TME's long-term advantages more than make up for the expense and hassle of deployment, Bruce and Weiss agree. "If you're intelligent about doing your work and use Tivoli to manage all the standard stuff, you only have to take care of a problem once,'' Bruce says.
ALLEGIANCEAt first glance, the computer operations control center at Allegiance Corp. perfectly exemplifies the swivel chair type of network and systems management that network IS staffers love to hate. On one wall there are rows and rows of management consoles, each tying into a different network or systems management tool or managed system. There's SunNet Manager to monitor the activity of LANs, routers and switches, and multiple system management tools specific to the various platforms in use, including various flavors of Unix, Windows NT, MVS and CICS. Allegiance has some 400 servers at 100 sites nationwide. To get a composite view of how the different pieces work together, workers at corporate headquarters in McGaw Park, Ill., need to shift from monitor to monitor. Worse, each move requires a mindshift to a different environment and all the commands, icons and operations that accompany it. This is exactly the kind of costly, kludgy and inefficient configuration that corporate IT departments have been trying to get away from for at least the past decade. However, operations control users won't have to put up with it much longer. Come spring, all of the event and alert streams feeding into the different operations control center consoles will be consolidated onto Allegiance's enterprise systems management platform, Computer Associates International, Inc.'s Unicenter TNG. When that happens, users will be able to monitor system activities enterprisewide from a single 67-inch console; and the dozen or so monitors on the wall will be relegated to backing up TNG in case of failure, according to Tony Navarro, director of enterprise technology services at Allegiance. It will have taken Allegiance's technical staff almost two years of hard work to reach this crucial - but by no means final - phase of TNG deployment. Close ties with CA enabled the firm to start trying out a beta version of Unicenter TNG in July 1996, well ahead of the product's general release this past January.
Unveiling UnicenterThe basic setup of the TNG framework didn't take all that long, Navarro reports. It took about five work days to configure the TNG RealWorld console and system alerts, he says. It took another half-day to load and set up the repository, which holds systems configurations, trouble tickets and other key data. It also took about a day to design and set up thresholds on agents for SQL and NT servers. Once Allegiance had created a customized "server build process'' based on Unicenter's Unattended Install program, deployment of those customized agents was a simple matter of requesting TNG to update a particular server. More time-consuming was "defining [business] processes and procedures'' that in turn determine critical thresholds and automated response scripts for each server, Navarro says. "It takes us about 30 minutes to an hour to set up a single business process view.'' Technical Services had a production version of TNG up and running for operations center people to start using as early as last March. However, it was a limited, initial implementation that primarily provided automated alerts and scripted responses to key events on NT and Unix systems, he adds. The consolidation of alerts from all systems managed by the operations control center is only now nearing completion. On the network management side, Allegiance's separate Network Control Center still is very much in the process of integrating TNG with existing management tools and applications such as Cisco Systems, Inc.'s CiscoWorks and Bay Network, Inc.'s Optivity. The roll out of TNG software to remote sites to enable remote management is expected to stretch over the next two and half years. Furthermore, the two-year TNG implementation does not include the time that Technical Services spent several years ago implementing the earlier Unicenter systems management platform on which the current TNG installation is built. The lengthy time frame attests to the complexity of the TNG platform itself as well as to the monumental task for which it is being deployed: consolidated, centralized management of multivendor network, systems and applications enterprisewide. However, Navarro's people have no doubt that the present and future benefits of TNG will be worth the time and trouble. While Navarro refused to divulge actual return on investment (ROI) figures, he points out that the medical products manufacturer was one of 10 companies recently surveyed by Inter-national Data Corp. for a 1997 report on integrated management systems. The CA-sponsored study found integrated network and systems management suites such as Unicenter provided a total annualized savings of $507,000. Of that figure, $368,000 comes from improved availability and $77,000 from automation of management tasks. Corporations gained a payback for such systems in an average of 69 days. The report also found that implementing Unicenter cut average downtime from 6% to 1%, resulting in average annual savings over a five-year period of $120,000 per 100 users. Automation of security administration and virus detection resulted in initial savings of $29,000 per year for every 100 users, according to the report. On the cost side, businesses invested an average of $59,000 per 100 users in purchasing and setting up Unicenter and training their staffs. It should be noted that the report is based on the earlier Unicenter, which lacks TNG's object orientation and network and application management capabilities.
Migrating to new managementAllegiance's computer operations department always has sought a central focal point for messages from systems, network and application management tools and domains, says Tom Cesar, a technical analyst and lead TNG implementer at the company. Prior to TNG, Allegiance partially consolidated its systems and network exception messages for the mainframe using CA's NT-based Automation Point software. Automation Point monitors, consolidates and provides automated responses to events on Unix and IBM MVS systems. It is integrated with Unicenter for event correlation, CA says. However, Allegiance intends to complete migration from mainframes to distributed systems sometime this year, which makes Automation Point an increasingly nonviable solution. Technical services started shopping for an enterprise systems management platform in 1994. "We didn't want to integrate a bunch of client/server management solutions ourselves. We wanted out-of-the-box integrated product suites,'' Navarro says. The company evaluated platforms from Boole & Babbage, Inc., Digital Equipment Corp., Hewlett-Packard Co., OpenVision and Tivoli Systems, Inc. One of the big reasons Unicenter won out was because it could manage NT servers immediately. Rivals such as Digital and Tivoli hadn't yet delivered that capability, Navarro says. This was important because Allegiance had just implemented a strategic NT-based just-in-time order processing system. Still, Unicenter 1.X only tackled the systems side of the enterprise management challenge. As a major CA customer and partner, Allegiance had an active part in Unicenter's metamorphosis into an integrated, distributed network/systems/ application management framework. At Allegiance, Unicenter TNG goes well beyond monitoring applications for bad bytes or interrupted processes. The framework also watches systems and network devices and correlates application activity with other key information, such as server CPU and disk usage levels or network traffic spikes. There are 47 optional TNG management applications to handle functions such as monitoring, inventory, asset management and job scheduling. These CA applications offer a high degree of integration with core TNG components and each other. TNG applications can send alerts to the console for real-time viewing and to the object-oriented common repository for future analysis. What's more, they also can use core TNG administrative tools that handle tasks such as software distribution, automated workload management and calendaring. For example, the systems backup tool and the automated system log cleanup tool can look at the same calendaring function to ensure their schedules don't conflict. While deployment isn't yet complete, TNG's integrated functions already are creating savings for Allegiance in the form of improved productivity. IT has rolled out new environments without adding to the systems management group of five, Navarro says. But TNG really will start to prove its worth when Cesar's group completes consolidation of the various alerts and events coming in from various managed systems onto the main console. "It will give the workers who monitor the network a focal point where they can see everything they need to be concerned about right in front of them,'' Cesar says. Cesar's staff still is working on the three-dimensional graphics in TNG's Real World user interface. "We're still not sure if we really need the 3-D,'' Cesar admits. What the workers do need, however, is the console's ability to zoom down to agents on SAP, NT or Unix systems, enabling them to reconfigure thresholds, reschedule a server job or reconfigure workload parameters, he says. This kind of centralized management wasn't possible with Unicenter 1.5's Enterprise Manager, which Cesar says lacks network management altogether. The product only allowed administrators to look at one server or function at a time and didn't permit managers to view routers or hubs from the central screen. Unicenter has come a long way since then, Navarro notes. Indeed, TNG is less a new version of Unicenter than an entirely new architecture, making the decision to migrate to TNG a no-brainer, according to IT staffers. TNG's automated discovery and distribution tools made it fairly easy to set up and deploy the software and agents, Navarro says. After Technical Services installed TNG on top of the old Unicenter 1.X, TNG went out and discovered all the old Unicenter modules and replaced or upgraded them. On the other hand, migrating from a systems management suite to an enterprise management framework constitutes a quantum leap in understanding, according to Cesar. Navarro's people learned TNG through hands-on work with the CA beta version, Navarro says. Operations users got an initial half-day training session, followed by a second session a few months later.
There's more work to doCesar's group currently is determining how much disk and RAM space and CPU power TNG uses on various systems, including the managed servers and the central repository node that collects, processes and stores incoming management data. "It's a pretty big drain,'' Cesar says, which is why he upgraded the TNG system to a 200-MHz Pentium Pro dual-processor server. Cesar's group also still is learning how to configure TNG to meet different users' network management needs. For example, the operations center workers who man the TNG console need to see key alerts that filter up from SNMP traps on Cisco routers or Bay hubs. "Specialized tools like CiscoWorks are beyond them,'' he says. But Allegiance's network managers need the deeper functionality of CiscoWorks and Optivity, Bay's management system. For instance, Optivity can show details such as which port lights are on and off, Navarro says. Allegiance's Remote Monitoring and SNMP traps allow TNG to obtain error and traffic-level data directly from hubs and routers. However, this is passive, one-way SNMP device polling. CA is working to improve integration between TNG and major network management systems such as CiscoWorks, Optivity and Cabletron Systems, Inc.'s Spectrum, according to a CA spokesman. When that is complete, Navarro says, his group will be able to actively manage a hub or router from TNG. TNG also has some gaps to fill when it comes to application management. For example, the framework can manage some aspects of Lotus Development Corp.'s Lotus Notes and Microsoft Corp.'s Exchange, but it can't manage messaging systems as networked applications, Navarro notes. He says IT can use TNG's published specifications to write its own agents to manage the e-mail network from TNG if this becomes a priority. On the other hand, Cesar was pleasantly surprised by the level of performance monitoring TNG's NT SQL agent provides. TNG also manages all the important aspects of R/3, with one exception: software updates, which are part of a closed SAP architecture. "SAP is going to expose more APIs in the next release," he says. TNG's generic capability of reading virtually any application or system event log is particularly useful to Navarro. IT developed standard specifications that dictate exactly what and how internally developed applications report to the log. "You write to this API to trigger this workload tool or log this event,'' he says. But if you buy something off the shelf, there's not a lot you can do to control what the product reports to TNG via the log, he adds.
Overall, however, "there are no gaping holes, no big-ticket items we want
to see [in TNG],'' he says. "Anyway, nothing is ever 100%.'' |
![]() Tivoli white papers
Unicenter overview
How Tivoli and Partners Deliver Integrated Management Solutions
Masters of management Horwitt is a freelance writer and consultant in Waban, Mass. She can be reached at 75244.1666@ compuserve.com.
| Copyright, 1995-2001 Network World, Inc. All rights reserved. |
|