Fallon Clinic spent US$24 million on a network upgrade, and director of IT infrastructure Susan Paul was darned sure that money would be well spent. So she followed the 2005 network upgrade - from a small 256kbit/s frame relay connection to 10Mbit/s Ethernet-like transparent LAN service to 26 locations - with another investment in network management software.

Founded in 1929, Fallon Clinic is one of the largest private multi-speciality medical groups in Massachusetts. The organisation, which supports up to 150 servers, 2,000 desktops and 280 doctors servicing those 26 locations, depends on healthcare-specific applications which must meet strict service-level agreements (SLA).

"Some of things we needed to do involved migrating from a shared hub to a switched LAN at each site, installing new servers, updating computers," Paul explains. "But I also immediately recognised he we needed to invest in software to monitor service levels across applications, servers and network elements to prove we were successfully meeting our SLAs."

Fallon Clinic did its due diligence, examining products from vendors including BMC, Concord (acquired by CA), Heroix and NetIQ, and chose two management software applications from Heroix. Paul says she decided to invest in Heroix EQ, an agent-based monitoring product, and Heroix Longitude, an agentless monitoring product, because the software was "easy to use, configurable and customisable."

"I chose Heroix because it seemed best suited to meet my needs to understand how the network and systems were behaving," she says. Paul reports that as recently as the September Labour Day holiday, Heroix software helped her staff keep the network up and services available.

At that time, the Heroix eQ software automatically paged a network engineer about a downed service on the organisation's Exchange servers. It seems a Trojan had been infiltrating the network and slowly installing itself on the company's servers and desktops. Fallon was able to remediate the problem using Heroix by identifying the computers that were infected and pulling them off the network, she says.

The company's eQ Suite comprises agents, which reside on every PC, server or application that needs monitoring, and a server-based console and an SQL database to collect and store agent data. EQ Suite components can run on a variety of systems, from Unix, Linux and Windows, to NetWare and OpenVMS. Once the software is installed, users can set the eQ console to warn them of problems via e-mail or pass along an alert to a larger manager-of-managers system such as HP OpenView, CA Unicenter or IBM Tivoli. The console can also kick off an automated response to a problem, such as a system restart or removing certain files to improve memory space, the company says.

"We discovered this on Sunday and were able to immediately respond," Paul says. We went through a clean up process so that by Monday night the network was clean and running fine. We would have been totally paralysed when people came back to work Tuesday if we didn't get that alert."

The second application, Longitude, is used to monitor the performance of Fallon Clinic's Citrix servers, Unix application servers and enterprise databases. Longitude is installed on a dedicated server and uses industry-standard APIs to collect data from managed machines. The management software collects data from servers, operating systems and applications - and routers and switches.

Paul, who says the reporting features were key in her selection, also produces weekly performance reports for upper management with Heroix. She'd like to see a bit more flexibility in those features in future releases though.

"I really wanted to be able to customise reporting a little more so I could show executive management just what they wanted to see and not everything," She explains. She says a recent upgrade of the software (Fallon has been using it for about a year) to the newest version could address those issues. "This release seems to have more configurability for the SLA reporting. We are doing some of it manually now because executives don't want to look at the whole thing, and we can provide snapshots."

How Fallon revamped its network

When Fallon Clinic wanted to implement Epicare, an electronic medical record application from Epic Systems, the Massachusetts-based healthcare organisation realised it could no longer depend on home-grown methods and lightweight connections to keep the network up and application performance optimised. Susan Paul, the IT infrastructure director, explained the $24 million investment her organisation made to revamp its network and where Fallon stands today.

First tell me a bit about yourself and your network at Fallon.

I am the director of IT infrastructure for Fallon. Fallon Clinic has about 26 sites scattered throughout central Massachusetts that connect back to our data centre in Worcester. We have approximately 260 to 280 doctors who service those sites. Over the past three or four years we have been planning out an implementation of electronic medical record (EMR) application and as a result of that we needed to do a lot of bolstering of the infrastructure because it was pretty old and had been not well cared for, let's say, in the past.

What areas needed to be pumped up to support the EMR application?

We needed to migrate from a shared hub to a switched LAN at each one of the sites. We upgraded from a small frame relay connection, a 256kbit/s connection to 10Mbit/s Ethernet-like transparent LAN service connection to each site. We also put in a full back up line to each site so if our primary lines went down the backup would take over. We installed new computers, and obviously we were bringing in new servers. I also recognised that we needed to upgrade how we monitored service levels.

How many servers and desktops do you support?

We have about between 130 and 150 servers and about 2,000 desktops. The 26 sites are connected to each other via 10Mbit/s wide-area connections, and then tied back to the Worcester data centre over a 100Mbit/s connection. The Epicare application is running on a Citrix farm implementation. That entire infrastructure runs at the data centre so it was critical that we have a really solid network in place to be able to get the application out to all the sites.

Do you mean service-level agreement monitoring and reporting?

We have right now an SLA with Epicare for uptime of about 100 percent, and that's why we have so much failover. The Epicare server is clustered, so we have a disaster recovery site if that cluster goes down. As we were planning, I realised I needed to have in place a system that would provide monitoring, capacity planning, trending so that we would be able to understand problems were starting to happen before we got a phone call from a user that we had a problem.

What product did you implement to report on SLAs?

We put in software from Heroix about a year ago after an RFP process. We picked it because it was very easy to use. We selected it basically because it's ease of use, flexibility in configuration as well as flexibility in presentation and reporting.

How was Heroix easy to use?

Heroix has a central console that populated itself automatically. We could also modify or customise the central console design. One of the reasons this appealed to us is because we do have 24-hour operations, and we have data centre operators that are there overnight. We wanted to give them the ability to be able to look at a high-level picture of the network and see if there was an issue at one of the sites. Then they could drill down into that site to understand what the issue was so if a page didn't go out they would be able to page the person responsible or needed to fix the problem. There was a lot of customisation and it was very flexible with how it would present the data to us. It also learned our network quickly.

How is the Epicare application doing with revamped network?

We are in phase two of our roll out. The first phase was practice management, rolling out scheduling and billing to all of the sites. We had a home-grown application running on a mainframe that captured patient data such as labs and medications and telephone message dictations so we have been slowly converting that off of the mainframe into Epicare. We are at the end of the phase two and ready to go into phase three, which is rolling out Epicare to all the sites. And it has been performing great. The nice thing about the network upgrade is that it now works behind the scenes. The network will fail-over to the backup and no one even experiences a hiccup. We've had it happen a few times and it's been totally transparent to our end users.

How did you balance upgrading the network and rolling out new applications without disrupting services?

It required a lot of planning. As we were upgrading the network, we brought the new services online and kept the old services for a period of time. On the LAN, we would bring a site down at night and bring it up the next day on the new switched network. We would never do a big bang because you are guaranteed to have a problem without having a failover in place. It's been about a three-year effort, so it definitely didn't happen overnight.