DCS Upgrade: How to Reduce Stress During Execution

DCS Upgrade: How to Reduce Stress During Execution

This guest blog post was written by Sunny R. Desai, an engineer in the DCS/PLC/SCADA department at Reliance Industries Ltd.


The Reliance Industries Hazira complex, which manufactures a wide range of polymers, polyesters, fiber intermediates, and petrochemicals, needed to determine the best way to update the industrial automation system. The complex commissioned a naphtha cracker plant in March 1997 using then state-of-the-art technology, including a UNIX-based control system. Over the years, there had been a progression of vendors developing new systems based on the latest technological platforms and then declaring their older systems obsolete.

Many of the components of the UNIX-based control system had been declared obsolete, and the vendor had withdrawn active support, spares, and engineering resources. There was a decreasing availability of spares, which were very expensive. The distributed control system (DCS) is critical for plant operation; the obsolescence and unavailability of spares directly affected the availability of the system for plant operation. Because electronic components degrade over time, the failure rates of components was increasing. All these factors increased the time and effort to restore a failure.

Evaluation philosophy

The company’s evaluation philosophy for developing an upgrade plan was based on the following major criteria:

  • Plant safety and reliability were of prime importance.
  • Efforts should be made to prolong or stretch the system life as long as possible without compromising safety or plant reliability.
  • If the reliability of the existing system could be enhanced with a partial upgrade, it was preferred.
  • If the partial upgrade of the system was not possible or did not improve the reliability of the system, efforts should be made to keep the full upgrade cost to a minimum.


A partial system upgrade was not suggested, because it would only improve the visualization, but not the reliability, of controllers, I/O modules, and other hardware components. Also, keeping in mind the remaining installed base at the site, other plants would benefit from the spares generated by removing hardware from this plant. The final recommendation was a full cracker plant control system upgrade.

Project scope

The following table shows an inventory of the existing DCS hardware.

Sr. No. Unit Quantity
1 Marshaling panels 111
2 System panels 20
3 Total I/O 14,000
4 Alarm consoles 10
5 DCS servers 11
6 Operator stations 18

Update considerations

In the initial stage of the project, it was decided  that the new system should be similar to the existing system in terms of the visualization, faceplates, and programming to avoid confusion between an actual problem in the field and a programming error.

The technical requirements the plant considered during the engineering stage included:

  • time scheduling for minimum downtime during plant shutdown
  • interfacing to existing field instruments
  • reliability and safety of process areas
  • redundancy features
  • creating fresh logics in the new system from the existing system
  • converting proportional, integral, derivative (PID) tuning parameters from the existing system
  • graphical design similar to the existing system for ease of operation
  • communication with third-party systems
  • advanced process control (APC) program modification
  • third-party historian program modification
  • secured network architecture

Method followed

Time scheduling was the biggest challenge for the entire team, with a very limited number of days for executing the job, which included removing and fitting new system panels, removing all the components from the marshalling panels, removing and fitting new alarm consoles, formatting and installing DCS servers, replacing old operator stations with new ones, removing existing control network cables and laying new ones, and replacing all the power circuit breakers.

The upgrade was to be performed in the short duration of a shutdown. To restrict the cost, the team decided not to replace the I/O panels, but only the internal components. This would also minimize the carbon footprint of this project. It would take a lot of time to remove and fix components individually, so the team decided to mount components on a plate and install the plate in the panels. All the components, such as I/O modules, communication modules, barriers, terminal blocks, and power supplies, were installed on the plate at the vendor’s factory, and factory acceptance testing (FAT) was performed on the same setup.

To meet the desired system reliability, redundant controllers, communication modules, and some analog output cards, power supplies, and diode ORings (creating logical OR relationships) were connected in cross redundancy to avoid a single-point failure of any device in the distribution. Power supply units were only loaded at less than 35 percent of capacity.

The software function blocks did not have the functionality of the old system, so many customized blocks were created in the new system, including blocks for APC and digital loops. A team of specialized engineers, including process personnel, tested the logic in the new system. This team visited the vendor a couple of times before conducting the FAT to ensure that the logic worked per the existing control and automation philosophy.

The project’s new controller supported a significantly larger number of I/O than the existing controllers. The plant preferred not to merge the I/O in one controller, but had them distributed in the controllers exactly following the existing philosophy. No controllers were designed to have third-party communication and I/O communication together.

It was decided in the design basis that all electronic components and cards should comply with the ANSI/ISA-71.04-2013, G3 classification. The installed components must operate for a minimum of 48 hours under extreme conditions: temperature at 0–50°C; relative humidity at 10–96 percent at 32°C noncondensing; maximum vibration at 0.2 G, 20–300 Hz; maximum displacement at 0.01 inches; 5–20 Hz. Considering the constant improvements performed in the plant during normal plant operations, it was decided to restrict the controller load to less than 40 percent, network load to less than 50 percent, free memory to greater than 50 percent, and power supply load to less than 40 percent.

Considering the failures of the power supply and diode ORing across the site, a redundant power supply with a redundant diode ORing scheme was used for this project. An active diode ORing with a load-sharing indication and alarm relay was also used.

Exhaustive FAT and SAT procedures were developed, and 100 percent loop testing and redundancy testing of controllers, power supplies, I/O modules, communication networks, and servers were performed. The team also performed 100 percent graphic testing and alarm simulation. All the closed loops were checked and 100 percent APC functioning and OPC communication was tested. During the SAT, the Profibus signature was taken for all the nodes and was preserved for future reference.

Consequences and mitigation plan

Following are the key risks involved in the project:

Accurate as-built information unavailable

Consequence: If the information available is not accurate, panel wiring and closed-loop operation in the field will be affected. Also, several instruments in the field may remain left out, affecting the complete plant operation and causing a delay in the startup of the plant.

Mitigation plan: To provide an accurate project design basis, several walk-downs and manual surveys were performed to verify existing documentation was accurate. This activity was performed while the plant was running and did not affect the plant operation. The team took several photographs of the existing wiring in the panels, noted the color code of every field cable, noted every termination where they found a discrepancy with the existing drawing, and prepared a detailed file consisting of all the information collected. This file was the key to the successful implementation of this project.

Manually converting the program to the new system

Consequence: If the program built in the system is not accurate per the old program, it will affect the entire plant operation. The running process may trip many times, causing safety concerns in the plant and a financial loss.

Mitigation plan: The only trusted document available for the programming was the existing program running in the old system. First, it was important to understand the difference between the logic block functions of the old system and that of the new system. The team listed the functional differences between the two, and modified the new blocks to function according to the old blocks. They decided to include the experts in the old system, from the vendor as well as the user side, in the engineering team of the new system. This was one of the crucial moves for ensuring a smooth upgrade. Several weeks were spent on this activity. Once the compatible logic blocks were built in the new system, a trial test was performed to check the operation of the blocks. On finding the operation satisfactory, clearance was given to the vendor to continue building the program. At this point, the team calculated the optimistic and the pessimistic time to complete the project execution, and all worked wholeheartedly to meet the deadline.

Building visualization in the new system

Consequence: If the conversion is not proper, plant operation will be affected, as operators will not be able to take quick actions when required. Because graphics are the main interface between the operation team and the new system, faults in the graphics will directly affect the operation team’s acceptance of the new system.

Mitigation plan: The panel operators were used to the old visualization, so it was decided that the visualization in the system should be a look-alike of the old system, and the latest visualization features should not be used in this upgrade. This would help the operators accept the new system and also avoid confusion between actual problems in the field and improper mapping of tags in the graphics during the plant startup. Once the visualization was built in the new system, the engineering team performed a test. This test included verifying the sketches on the graphics and mapping tags to the graphic element, alarm window, trend window, group graphic windows, group trend window, and faceplate functioning. Once this test was approved, the vendor was permitted to build more graphics.

Converting the PID tuning parameters

A PID controller is a control-loop feedback mechanism commonly used in industrial control systems. A PID controller continuously calculates an “error value” as the difference between a measured process variable and a desired set point. The tuning parameter directly influences the accuracy of the control loop, and thereby the quality of the product. The value of these parameters can only be fixed during the running plant operation. These values are a critical asset for the plant, as they are fixed by the years of experience in operating the plant.

Consequence: Steady-state operation of the plant will be affected, with a direct influence on the quality of the product.

Mitigation plan: The vendor had developed a mathematical equation and a tool for converting the tuning parameters from the old system to the new system. The vendor had verified this tool at a different conversion project where the company was satisfied by the results of the conversion, and hence it was readily accepted in this project. Later the team found the results of this tool were quite accurate.

Third-party communication

Online analyzers, machine-condition monitoring systems, turbine-governing systems, antisurge systems, and plant emergency shutdown systems are standalone systems critical for plant operation. They maintain the quality of the product, control the safe operations of compressors, and maintain the efficiency of the compressors and safe operation of the plant. The readings of these systems must be continuously available for the operators. The communication between the DCS and these systems is done with a Modbus protocol, which is tricky and time consuming to configure.

Consequence: This will have an impact during the plant startup when operators will not be able to see some data from the online analyzers, machine vibration conditions, and data from the emergency shutdown system of the plant. This will also affect the steady-state operation of the plant.

Mitigation plan: It was possible to complete this activity during the preshutdown period of the project. The team arranged a spare controller and developed a test setup where the running third-party system communicated with the new system. All the wiring diagrams, communication settings, and response times were noted and later were directly implemented in the new system. This way it became quite easy to establish the communication with the actual new system.

Removing old components and fitting new ones in the panel

Consequence: If this activity is not completed in the planned time period, it will affect the entire startup of the plant.

Mitigation plan: All the cables in the panels were not to be cut and removed immediately. Multicore cables from the field and power cables from the main control board cabinet were to be retained, so it was not easy to execute the system changeover during the shutdown period. To meet this challenge, several drawings were prepared for each panel indicating which cables and ducts were to be cut and removed and which cables were to be retained, where the ferrules were to be changed, where the lugs were to be replaced, and TBs were marked where there was interpanel wiring. Markings to distinguish between removal and retention were done in each panel. All the electricians were trained to thoroughly understand the drawing. To test the amount of time it would take to remove and fix the components in the panel, a mock operation was performed during the FAT. For this, a fully loaded spare panel was shifted from the site to the FAT area. During the mock operation, the team noted the time required, all the challenges faced, and the tools required, and they planned improvements. This activity helped a lot during the actual execution of replacing the components in each panel.

Building secured network architecture

Newer control systems are highly network based and use common standards for communication protocols. Many controllers are Internet Protocol addressable. Standard operating systems, such as Windows, are increasingly used in industrial control systems, which are now typically connected to remote controllers via private networks. The ability to access the system as a result of this interoperability exposes network assets to infiltration and subsequent manipulation of sensitive operations. Furthermore, increasingly sophisticated cyberattack tools can exploit vulnerabilities in commercial off-the-shelf system components, communication methods, and common operating systems found in modern control systems.

Consequences: This affects the safety of the plant and can have a financial implication on failure. This may affect the entire business operation and can also have a social impact.

Mitigation plan: Considering the current scenario around the world and the constant increase in the cyberthreat, the team decided on a secured network architecture with twin firewalls installed in the system. They preferred that the two firewalls were of different makes. This would help minimize any network-related threats. Antivirus software was installed in each system, making sure it was set to run the latest definition files. All the USB ports, CD/DVD drives, the autorun feature of Windows, the Windows scheduler, the remote desktop feature, and all the spare network ports were disabled. The auto log-off feature was enabled. These steps help prevent any virus attack on the system.

APC implementation

Consequences: Steady-state operation of the plant will be affected, minimizing the efficiency of the process and having a direct financial effect.

Mitigation plan: APC implementation was a challenge in itself, because the new system did not have similar functionality to the old system. All the programing blocks of the old system were studied in detail, and blocks were developed in the system with the same functionality. APC will communicate with the new system via a dedicated OPC station. In the old system, the OPC wrote in a data block, but these data blocks were not available in the new system. To meet this requirement, thousands of tags were created, which will not use the license. The communication between the APC and these tags via OPC was tested during the FAT. All the programs for taking a loop in APC and removing a loop from APC were tested during the FAT period. The team performed 100 percent testing. Many data types required by APC that were supported by the old system and were not supported by the new system were found during the FAT. These data types were created in the new system and were mapped in the APC program. APC communication was established during the shutdown period, but the APC program in the APC controllers was modified in the post-shutdown period. Earlier, in the UNIX-based system, a separate PC provided operator interface to the APC. In the new system, a link to an Internet browser was provided for the operator to have the APC interface in the DCS screen itself, so a separate PC was not required. Once the plant was started, the APC program was tested for one of the furnaces, and in a week all the other APC programs were modified. Similar modifications were also done in a real-time optimizer and third-party historian.

Technological improvements

With this upgrade, we added some of modern automation’s state-of-the-art technology. Some of the noticeable improvements were:

  • Secured network: The new system network was based on the demilitarized zone (DMZ) architecture implemented using a twin firewall configuration. One firewall isolates the enterprise network from the plant network, and another firewall isolates the plant network from the control network, thereby creating a secured DMZ network. Both the firewalls are of a different make and can only be configured using a standalone configuration laptop.
  • Enhanced diagnostic features: Diagnostic logs are generated for each module, with master/slave in the hardware topology; they can be saved and analyzed for root-cause analysis and preventive maintenance. A Web-based diagnostic portal, where we can view warnings and controller errors, communication modules, and slave modules, is available in the system.
  • Smart client: This smart client is a true thin-client office workplace that seamlessly retrieves data from the system and connected third-party systems. The smart client is a dashboard visualization application that provides a read-only view into the system and allows the user to call up graphics.
  • Redundancy at the I/O level: For certain super-critical output tags that affect the production and safety of the entire plant, we have installed redundant analog output cards. During the normal course of operation, the cards share current demand from the field devices. Each card is capable of supplying the full demand current from the field devices. When one card fails, the other identifies the need of current in the circuit and supplies the full current.
  • Twin active diode-ORing scheme: Dual ORs with a load-sharing indicator are used in this project, which helps us monitor the load sharing between the two power supplies. This gives us an opportunity to identify a probable failure. Along with a redundant power supply, we have also installed a redundant diode ORing.


Proper planning and coordination with the vendor resulted in an efficient installation and commissioning of the new system. All the 14,000 I/O, marshalling panels, system panels, alarm consoles, servers, and operator stations were successfully replaced in a very limited time. It is a success story for the entire team; we completed the project well within the planned period. The project can be summarized with the statement: more stress on planning reduces stress during execution.

About the Author
Sunny R. Desai has an engineering degree specializing in instrumentation and control. Desai is currently an engineer in the DCS/PLC/SCADA department, Central Engineering Service, at Reliance Industries Ltd.

Connect with Sunny:


A version of this article originally was published at InTech magazine.

How to Link HAZOP and LOPA to Calculate a Safety Instrumented System

How to Link HAZOP and LOPA to Calculate a Safety Instrumented System

This article was written by Peter Morgan, director and principal consultant of Control System Design Services Inc. 


Although a hazard and operability (HAZOP) analysis identifies failure events or upsets and the severity of the outcomes by engaging knowledgeable plant personnel, the process typically provides only qualitative information about event frequency and mitigated frequencies. This helps plants make decisions for safety improvement, but does not provide the detailed information necessary for a safety integrity level (SIL) determination. The layer of protection analysis (LOPA), on the other hand, is a means to analyze event frequencies and the mitigating effects of protection layers. It does not provide a process for identifying possible failures in the plant and the severity of their consequences. This article shows how the two activities can be easily linked to provide the information required for a safety integrity level determination for a safety instrumented system (SIS), according to the recommendations of ISA-84.00.01-2004 (IEC 61511 Mod).


HAZOP severity levels

HAZOP identifies severity levels for event outcomes (typically four of five).

Likelihood of occurrence

The HAZOP also identifies the frequency or likelihood of events ranging from 1 in 100 years to 1 in 100,000 years (i.e., not likely to ever occur).

It is immediately obvious that this broad categorization of the frequency of initiation events does not provide the precision necessary to evaluate the demand on a safety system. This, however, need not detract from the usefulness of the HAZOP as long as actual event frequencies (if they are known at the time of the HAZOP) are recorded in the LOPA.

The HAZOP process provides an assessment of the effect of individual events and their mitigation through existing safeguards to determine whether or not a design change or additional layers of protection are required. Objectively, when the plant owner establishes a target risk of, for example, 10–4 per year for events of severity 4, this is the target risk for all events combined. For example, for a burner management system (BMS) on a boiler, failure of a feedwater valve or loss of combustion air are required to shut off the fuel supply. In one case, failure of the BMS to act on demand could cause boiler or turbine damage and, in the other case, a boiler explosion. The HAZOP process treats these as quite separate events, but both create demands for action from the BMS. The SIL calculation cannot be done until the HAZOP has been completed, and all events in this severity category have been identified and assessed. It is important to note that although events of a particular severity may be mitigated by existing safeguards (including the operator) so that the residual risk appears acceptable without further mitigation, if events of this severity place a demand on the SIS, then the event must be included in the LOPA and subsequent SIL calculation.

he example HAZOP worksheet shows just two events to demonstrate the integration of the HAZOP process and the LOPA. Note that it is not uncommon to have to consider thirty or more events as demands on a particular SIS (e.g., in the case of a BMS).

The qualitative assessment of the residual risk for these events indicates that additional protection is required in one case but not necessarily the other. However, the assessment acknowledges that the risk will be further reduced for both events by tripping the boiler on detection of a high drum level through the action of an additional layer of protection (i.e., a BMS in this case). Adding a column to the traditional HAZOP worksheet is a way to flag that these events can be further mitigated by a SIS and that the events are to be included in the SIL calculation.

Note that a HAZOP analysis carried out to establish the safety requirements for a replacement safety system cannot include the existing system in assessing the demands placed on the replacement system. This may be obvious, but it is a trap easily fallen into by those imbued in the normal operation of the plant with all installed systems available.

The LOPA worksheet uses item reference nomenclature (#) that allows each event to be readily identified in the HAZOP by node, deviation, item, and consequence.

Initiating frequency is obtained either directly from the HAZOP or from published device failure statistics from the industry or from equipment manufacturers.

The identified protection layers mitigate the event by reducing the likelihood that the event will occur. Note that the mitigation cannot be dependent on the correct operation of the SIS (BMS in this case). ISA-84.00.01 allows operator action in the mitigation of events (e.g., by responding to alarms), but limits the frequency reduction factor to 0.1.

Intermediate event frequency is the product of the event initiating frequency and the identified mitigation factors; it represents the individual event likelihood after mitigation but without the protection offered by the SIS. Note that these are not required to be determined during the HAZOP, but that assessments by HAZOP participants can be useful and should be recorded if offered.

This analysis (compared to ISA-84.00.01) adds an additional entry in the table to identify the SIS inputs (process measurements) that are required for event mitigation. This helps calculate the required availability of each SIS input to achieve the target probability of failing on demand (PFD) for the entire system.

The mitigated event frequency is the event likelihood with the SIS protection. It is the product of the “intermediate event frequency” and the SIS PFD.

SIL calculation

The plant owner establishes target risk for event impacts in each severity level. Published statistics for fatalities in various industries are a basis for establishing target risk for the most serious events, in this case 1E-4 (once in 10,000 years) for severity level 4 events.

For events that can be mitigated by the SIS (BMS in this case), every initiating cause that results in an event outcome of severity level 4 must be considered as a demand on the SIS for the purpose of calculating the required SIL.

Events that cause an impact severity level 3 may also place a demand on the SIS. If the combined frequency of all events in this category is more than one order lower than events of severity level 4, a SIL determination–based severity level 4 will be sufficient. In other words, a SIL determination based on impact severity level 3 and a target risk for the severity of 1E-3 would cause a lower target PFD than that calculated based on severity level 4 events. If this is not the case, a SIL calculation based on severity level 3 events will determine the target PFD for the SIS.

Target risk  =
PFDSIS × Intermediate event frequency

Target risk / Intermediate event frequency

The example LOPA worksheet only shows two severity level 4 events. When all severity level 4 events are included in the analysis, the intermediate event frequencies for impact severity level 4 is 0.046 per year.

So that PFDSIS = .0001/.046 = 2.17E-3

This places the SIS in a SIL 2 category (PFD between 1E-2 and 1E-3) with a requirement that the overall system PFD is 2E-3 or better.

Minor changes to the familiar HAZOP process can increase the utility of the HAZOP in providing information for a layer of protection analysis. The calculation of the required safety integrity level for a new or replacement safety system is simple. When based on a HAZOP and target risk agreed to by the plant owner and operating staff, it provides a credible performance requirement that is both practical and compliant with ISA-84.00.01.

About the Author
Peter Morgan is director and principal consultant of Control System Design Services Inc. He has more than 40 years of experience in the design and commissioning of control systems, control systems performance assessment, and logic design for nuclear and conventional power plants.
Connect with Peter:


A version of this article originally was published at InTech magazine

Murphy’s Law Is Alive and Well in Industrial Processes

The following tip is from the ISA book by Greg McMillan and Hunter Vegas titled 101 Tips for a Successful Automation Career, inspired by the ISA Mentor Program. This is Tip #23.

101 Tips for a Successful Automation CareerWe all know Murphy’s famous law, “If anything can go wrong, it will.” I have to believe that Murphy was an automation engineer because I have encountered his law in action on every project I have ever worked on.

I have sat in HAZOPs where the group wanted to discount a scenario because it involved two simultaneous failures. I have also worked in a chemical plant that encountered FIVE simultaneous failures, blew up a vent line, and narrowly missed injuring an operator. Equipment breaks, and people make mistakes. Anticipate it, and design for it.

Concept: Simple systems work reliably. Complicated systems find new and interesting ways to fail. Whenever possible go for the simplest, most robust solution. As an automation engineer, the KISS concept (Keep It Simple Stupid) should be your mantra.

Whenever possible go for the simplest, most robust solution.

Details: Automation engineers love to create gloriously complex solutions. With so many computers and gadgets available, it is hard NOT to want to incorporate the latest and greatest into a design. However, the true purpose of automation is to control the process. Sometimes it takes a multivariable predictive control model to do that, but many times it can be done with a float switch and a solenoid. Try not to complicate a solution any more than necessary. When you are designing an emergency system to dump a quench chemical into a reactor, consider using gravity rather than special pumps and other equipment. Gravity always works (at least on planet Earth), while pumps and/or electricity can fail—especially under emergency conditions.

Anticipating every failure is difficult, but you must make every effort. What happens if the operator presses the wrong button? What happens if no button is pressed at all? If power is lost, might the instrument air and cooling water systems fail as well? What about steam and nitrogen? What are the ramifications of these multiple failures?

When you are designing a control panel, consider using dual 24VDC power supplies. Feed one with a UPS circuit and the other with a non-UPS circuit. Despite what their name might imply, an Uninterruptable Power Supply becomes an Interruptible Power Supply more often than not. Having dual feeds can allow a control panel to continue operating despite the failure.

Software design is particularly tricky because there are so many paths that the logic can traverse. Operators are forever using the equipment in ways that were never intended and if the software is not designed to handle it, the program can hang in unexpected places. During testing try hitting the wrong buttons and try to force the program to step through the sequence in a different way to see what happens. While this will drive the programmers crazy, the resulting system will be much more robust as a result. Finding and resolving problems in testing is always better than discovering them on start-up!

Watch-Outs: Never allow the final software quality control testing to be implemented by the same person who programmed it. A different person is much more likely to hit the sequences in a different way or throw the system a curve that the programmer had not anticipated. Avoid the temptation to use exotic controls and programming to patch a poorly designed process. You can program around poor mechanical designs, but the project will be more stable if the fundamental problems are resolved.

Exceptions: Sometimes a HAZOP group can lump a series of totally improbable scenarios together and reach outlandish conclusions. However, there ARE certain scenarios that can create a cascade affect. (A loss of power might trip the steam system and take out the cooling water supplies as well.)

Insight: Safety interlock calculations include a testing interval and incorporate the failure modes into the calculations for a very good reason. Untested interlocks have caused hundreds (and probably thousands) of accidents when they failed to perform their function. Be particularly wary of interlocks that involve multiple instruments and/or devices to sense a failure. The probability of failure on demand will be very high.

Rule of Thumb: If you are given an option, always choose the simpler solution. When you are designing a system, do not consider operator error and equipment failure to be isolated and unlikely events. They will occur … and usually at the worst time possible.


Pin It on Pinterest