- Oracle Enterprise Manager Cloud Control 12c:Managing Data Center Chaos
- Porus Homi Havewala
- 2056字
- 2021-08-05 18:34:57
Team effort
The installation effort itself for each new project is huge.
First, a business project identifies the need for an application server and/or a database server. A formal request is made for these. The request is approved, and arrives at the doorstep (or mailbox) of the Unix/Windows team. Teams such as these are normally present in any reasonably sized IT department.
The Unix/Windows team then procures the necessary servers, gets access, and installs the operating system—frequently following manual checklists where they tick off each step. This team may also need to install additional packages/patches at the OS level, as requested by the project managers, and this information may or may not be available to the management at this stage. It may well be the case that any missing OS-level packages could cause delays and annoyances later down the track, but let us say this information is known, and the additional packages/patches are applied by the Unix/Windows team.
The hardware with the installed operating system is then forwarded to the database team in the IT department, where the DBAs get access to the server and install the database software, such as the Enterprise Edition of the Oracle database.
Certain options for the Oracle database may also need to be installed at this stage, such as the partitioning option or the advanced security option, depending on the requirements of the project and the licenses available with the company. Let us presume there is no standardization at this stage, so everything has to be decided manually, or guessed—if there are no clear instructions from the project side.
At this point, if certain OS packages/patches are missing, the DBAs may redirect the servers back to the Unix/Windows team for the missing components to be installed. Assuming this is done in a day or so, the DBAs then re-attempt the database software installation the next day (if they are lucky).
The database software thus installed may itself need to be patched, for example, if Oracle Database 10g Release 2 (10.2.0.1.0) had been installed as the base release on the Solaris Operating System (SPARC) (64-bit), it should be patched on to the ultimate release of Oracle Database 10.2 on this platform, such as 10.2.0.5. It is always recommended to do this for production databases. Not patching for the ultimate or penultimate release can lead to issues later on when applying regular security patches that are often provided only for the latest releases of any version.
After this, the actual database is created. If there are no company standards, it is possible that each database created by different DBAs even in the same team may be different, for example one database may have the Sample Schemas provided by Oracle installed but another database may not have these installed. This is normally done via a simple manual selection in Database Configuration Assistant (DBCA)—Oracle's standalone graphical utility for creating new databases. One DBA may decide to select this, the other DBA may not.
There are other examples of similar configuration drifts. One DBA may decide to use Oracle Managed Files (OMF) for the data files of the database, another may not. It may depend on each individual and his/her likes or dislikes. For example, I, personally, have never used Oracle Managed Files even though there is nothing wrong with using them.
Even if there are published company standards for database creation, they may not be followed 100% of the time as standards are difficult to enforce when manual methods of installation are used and no control is enforced when performing the installation.
There is no separation of roles in the manual installation, there is no DBA Designer (the Senior DBA) who designs the database to be installed, and there is no DBA Operator (the Junior DBA) who follows the design and installs the database. It just depends on who does the installation and who does a better job—the Senior DBA or the Junior DBA. And frequently, the Senior DBA does not even have the time for repetitive installations and just leaves it to the Junior DBA. The Junior DBA then decides to try out different things for educational purposes – possibly.
The same scenario may be applicable to the middleware team when they install the application server software, such as Oracle WebLogic Server or Oracle Application Server, or application servers belonging to other vendors. Manual methods will lead to major or minor differences in the setup and configuration, and if there are no automated methods of checking the configuration compliance, it may lead to a near-impossible situation and there would be no way to enforce company standards on the middleware servers and domains either during or after installation.
The DBAs install the database software, patch it to the required level, and then create the databases required by the applications. They repeat this process for development, test, staging, and production environments.
After a reasonably successful installation, the DBA team is also requested to set up the backup of each database and also to set up the standby databases for disaster recovery using some tool, for example, Oracle DataGuard.
The setup of each backup and standby database is reasonably complicated, especially the latter, and involves a number of detailed steps. Also, each backup and disaster recovery scenario needs to be tested to ensure that it works.
Where is the time to do all this if everything is done manually?
Once the databases are released to the production environment, either the same team of DBAs or a different production team looks into the day-to-day workings of each database, and attempts to ensure that the application is guaranteed a certain level of performance—and this is a difficult order without automated tools.
Every database sooner or later needs diagnosis and tuning—as databases are not static; they change, their data changes, their users change, and their application changes. More load is placed on the database. More data is used in queries. All these changes are inevitable and eventually lead to performance issues.
First of all, to gain an understanding of these issues, performance information from the database is collected, manually analyzed, and certain worst-performing SQL statements are identified.
Each such SQL statement is then painstakingly fixed, often by adding indexes, perhaps without understanding the effect of extra indexes placing a heavier strain on inserts and updates.
Such manual performance diagnosis (finding out the problem) and tuning (fixing the problem) is obviously a very lengthy and tedious process, and is compounded when there are many tens or even hundreds of databases. A team of DBAs would be needed just to look at and fix day-to-day performance issues.
What about patching these databases?
My Oracle Support (MOS) releases Patch Set Updates (PSUs) and Critical Patch Updates (CPUs) every three months, and recommends that all databases are patched regularly with either the CPU or the PSU.
The CPU is a collection of security patches, and the PSU includes the CPU as well as other patches that fix separate issues.
If a particular company is security conscious and wants to make sure that the latest security patches have been applied to all its databases, there will be a need for the DBAs to apply the CPUs or PSUs every calendar quarter, and this involves downloading the appropriate patch, FTPing or SCPing it to each server, and then going through a series of manual steps to apply the patch.
This is fine if there are few databases, but if there are hundreds compounded with several environments for development, testing, and production, then the patches must be painstakingly applied on each and every database home and database.
It is possible, but it takes a lot of time manually. The DBAs will never get their sleep in this case, when they stay behind to complete the manual patching of the numerous databases. Also, the manual repetitive tasks they follow for this patching will be more prone to mistakes—as the mind itself gets numbed by constant repetition of the same steps.
Let us say, on top of all this, there are development projects that constantly upgrade their applications and send database scripts to the DBAs to make schema changes corresponding to the new application release.
This may include new tables, new columns, a modification of existing columns, or new database structures as well as new/modified users and their privileges on the objects.
To implement these changes at the database level for the purpose of such application upgrades, they would use a collection of SQL or PL/SQL commands in a script that they supply to the DBA—to run in the test and then in the staging environments. Finally, the scripts would be run on the production database if the application is found to work well on the test and staging databases.
First of all, the scripts themselves need to be checked thoroughly by the DBA, and this takes time.
The DBA needs to ensure the script is doing the right thing, which he/she does with a manual check, then perhaps a dry run on a separate test database to work out script issues, for example, the creation of indexes on nonexistent columns, the granting of privileges before the table is created, errors in SQL syntax, and so on.
The script then has to be modified and re-run before it is finally ready to be executed in the test environment, and the DBA may send a few stern emails to the developer team to be more careful next time.
Once this is done, the project team commences testing the application on the test database. After a green signal is received, the DBA will be asked to run the script on the staging and production databases.
The DBA then rests easy, thinking that the job is over, but suddenly receives a shock—the developer tells him that the application is not working in production (although it is working in staging and testing).
The DBA calls his/her home to say that he/she will be coming home late today, then starts to investigate the issue by painstakingly comparing the staging and the production databases. Every table, user, and privilege in the two databases has to be compared until the DBA finds out that certain privileges are missing on the production database, whereas the user had higher privileges on the other databases.
This simple reason has taken hours to be discovered. The DBA fixes the script, creates a final version for the particular release, and runs the necessary part of granting the extra privileges (by a cut and paste from the final script), and the application finally works.
Also, the only way the history of all these changes can be preserved in this case is by retaining the script files on the database server, versioning them, and storing them in subdirectories. This is sometimes done instead on the DBA's Windows PC.
Trying to look into such files in this directory structure to find out a past change is normally a very manual, labour-intensive, and painstaking process. I have done this many times all by myself, so am well aware of the issues.
When manual effort is the norm, in such data centers, it is inevitable that the DBAs and other administrators spend much of their time being reactive. Junior and less-trained staff on night shift hours would compound the issue, as would offshore technicians working remotely without a proper understanding of the internal environment.
For example, if the OS maintenance is off sourced and, say, the admin is asked to apply an OS upgrade, and as a part of the process the admin decided to change some OS properties under the impression that it will improve the OS performance. However, the next day, the on-site DBA finds that the standby database has stopped working, and he/she has no way of knowing what has changed as there is no automatic capture or history of configuration changes in such a manual environment. All the DBA has is the general information that the OS was upgraded, and he/she has to then investigate further by comparing, bit-by-bit, the configuration of the server on which the database is still working and the server on which the database has had the problem.
You can guess how much time and effort this may take?