Managing Sun servers is not, according to Sun, difficult. Sun users with years of Solaris experience under their belts would probably agree. However, the learning curve can be steep so here are a few points to start with.
Historically Solaris and Sun systems have grown up from the workstation environment with little need to consider storage. So disk management or its integration into the operating system has never been a focal point for Sun. Even simple tasks like disk volume management – where you logically combine two disks to create a file system that is larger than the size of one physical disk – isn’t yet included in the Solaris core operating system. There are promises of a new integrated file system, called Dynamic File System, being available with Solaris 10 but, as that is still some way off, there is still a little time to wait.
In order to manage disks under the current versions of Solaris you need an add-on called Solstice Disk Suite for all versions up to Solaris v8 (renamed Solaris Volume Manager for Solaris v9). These applications are supplied free with the operating system. However one of the common mistakes made when setting up a new Solaris box is the failure to install this add-on, either at all or at the appropriate time. Solstice Disk Suite or Solaris Volume Manager should be installed immediately after the base operating system and before any other applications and/or databases are loaded, even if you are not sure whether you are ever going to need it. One of the downsides of the application is that it has it’s own unique commands and requires additional technical knowledge over and above Solaris system admin skills.
With the significant increase in volumes of data handled by servers, disk configuration has become a key issue in server management. Solstice Disk Suite allows disk mirroring to provide disaster recovery capabilities and maximise uptime in the event of a disk failure, striping disks for added performance as well as raid configuration. Disk management is a common support issue, especially where changes to the disk configuration are required and it is discovered that Disk Suite or Volume Manager hasn’t been installed and configured. The solution often requires large amounts of data to be moved around and a significant amount of additional work for the system engineer.
When planning a new installation, bear in mind that in the future you may need this additional application and install it immediately after the base operating system. This can remove hours of frustration and technical support calls at a later date.
Attach a Console
Rarely needed, but absolutely essential in the event of a really serious, usually hardware related, error when the system dies before it can write its error messages to disk.
A console can be just a simple screen, usually attached via a serial connection, where the system will direct its system messages including those critical error messages. A low-spec laptop or notebook computer running a simple terminal programme such as HyperTerminal makes an ideal console; all the output can be captured in its buffers. Traditional dumb terminals can be used, although data is more likely to be lost when it scrolls off the screen due to lack of buffering, but this is better than nothing.
According to Sun support engineers, these serious errors don’t happen too often but when they do the console log can prove invaluable in getting a prompt and accurate diagnosis and then a speedy resolution to the problem, especially if it turns out to be an intermittent problem -- typically due to memory or CPU failures.
Depending on the model of Sun server you are running, these may contain an in-built console facility. This can be a self contained multi-function monitoring card – Remote System Control (RSC) – installed in the Solaris box that automatically captures error messages or on newer systems integrated, either on the motherboard or as a plug-in card, as part of a complete Lights Out Management (LOM) system. A major benefit of these systems is that the output can be redirected over the network or through a phone line to a diagnostic engineer.
If your system doesn’t have in-built monitoring then find a redundant laptop and attach it as a console – it’s a great insurance policy for that rare, but debilitating, hardware failure.
Open Boot Prom (OBP) Diagnostics
OBP is Sun’s equivalent of the PC BIOS and forms the interface between the operating system and all the hardware. From the support point of view it offers a wealth of hardware diagnostics – everything from “is the SCSI bus working?” to “can the machine see and talk to the network?” Memory, CPU and system board testing is also included. If a system crashes with a suspect hardware failure it is often possible to identify the failing component using the relevant OBP diagnostic.
Running the command INIT 0, from the Solaris system prompt, will reboot the system and stop at the OBP level, i.e. before it loads the Solaris OS and presents the OK prompt. Common commands include the ability to interrogate the network, identify any devices attached to the SCSI bus or run the Open Boot Diagnostics set, which brings up an interactive menu offering options for testing various hardware components. The results are in English and will clearly tell you whether a component has failed or passed the test – there are no strange codes generated that then need to be looked up.
A full list of OBP options and commands can be found on Sun's documentation server. A search for ‘openboot command reference manual’ brings up a list of available documentation – the one you need depends on the age and configuration of your system.
OPB diagnostics are a fast way to fault find or eliminate suspect components when working with a support engineer to locate and identify a problem.