Archive for the ‘Linux’ Category

What can RHEL learn from video game pirates

Wednesday, September 9th, 2009

After reading Free by Chris Anderson I have some thoughts on how Red Hat Enterprise Linux can claim much of the unpaid Linux market share.

In his book, Free, Chris Anderson tells a story of a game developer who asked Slashdot, a technology forum, why pirates continued to steal his games when he charged just $20 for them. While many open source advocates and others might expect the responses to be filled with arguments against intellectual property the response was very different. Most of the individuals who commented back to the game developer said the piracy was occuring for two reasons. First, because the small amount of copy protection and difficulty navigating his web store provided a barrier to playing the game. Second, the individuals generally didn’t feel that the games being provided were worth $20. This meant the game developer had to create a better game or lower his prices and also provide an easier means of access to the game.

A similar situation exists in the Linux market today. Red Hat Enterprise Linux, a paid Linux distribution held roughly 62.2% of paid market share while CentOS, a free copy of Red Hat Enterprise Linux, and other free Red Hat based Linux distributions hold 50% of the unpaid market. In 2008, according to IDC, unpaid Linux subscriptions accounted for 1.2 million of the 2.8 million servers deployed. According to this data 42% of the server based Linux deployments used unpaid Linux in 2008. So why has this situation occured? It’s difficult to draw a conclusion because there are many different variables and situation, but one possible reason that an organization might run unpaid Linux is similar to why a pirate would decide to use a cracked version of a game. First, because they cannot gain access to Red Hat Enterprise Linux quickly. A system administrator may not feel it’s worth speaking with his manager, going through procurement, and then waiting for his/her Red Hat Network account to be created. These are all barriers to scratching their Linux itch, so to speak. Second, they don’t feel that $2500 is an approrpriate price for an operating system. Maybe they just want to serve up some web services to a LAN in their office.

So how can Red Hat remove these barriers to capture the unpaid Linux market?

First, remove the barrier to Red Hat Enterprise Linux and allow any users to download and use Red Hat Enterprise Linux. Certainly this would not entail providing support to consumers who do not pay for a subscription, however updates, erratta, and other information should be provided. More reasearch is required around the economics that would make this possible, but there is such a small marginal cost to providing updates to an unpaid audience that we could afford to give updates to all consumers. The Fedora project already performs similar functionality and the infrastructure could be leveraged to provide free updates to Red Hat Enterprise Linux as well.

Second, lower the price of supported Red Hat Enterprise Linux to a price that the unpaid Linux market finds more reasonable. Some revenue is better than no revenue. Provide pricing to markets that creates a psychology of responsibility and value. If a consumer is willing to spend a certain amount of money for a copy of Red Hat Enterprise Linux wouldn’t it serve our best interest to provide him/her Red Hat Enterprise Linux? When consumers in the unpaid market decide to move to a paid product it would be in Red Hat’s best interest tf the consumer was already running Red Hat Enterprise Linux. This would provide a clear upgrade path. It is important to note that the demand at free is much higher than even a penny and will cause a huge spike in the demand curve. Both of these theories need to be validated further.

The psychology of free may have also hampered mainstream adoption of Linux on the home users desktop. Could free be associated with lower quality to the average consumer? Does the fact that JBOSS costs 15 percent of the total cost of Websphere cause buyers to assume it is of lower quality? Would raising the price change consumer psychology and increase sales in certain market segments? This is another theory that should be explored.

Update

For some reason I’m not able to reply to comments on my own blog (I’m still figuring out docspace).

Per a new IDC report:

Nonpaid Linux server operating systems continues to increase in importance on the overall market dynamics, with nonpaid Linux server operating system deployments accounting for 43.3% of the worldwide total, up from 41.4% in 2007.

There is no debating that CentOS is an exact binary copy of Red Hat Enterprise Linux, but the psychology of using CentOS instead of RHEL might make a huge difference in the long run.

1. Many users might not realize that CentOS is a copy of RHEL.

2. When a user looks to upgrade to a paid version of Linux they might think a leap from CentOS to another distribution.

3. If a user had the option of downloading Red Hat Enterprise Linux for free they might run it rather than another distribution (excluding CentOS). This would make them more likely to begin paying for it later.

Oracle Business Intelligence 10.1.3.4 in Amazon Web Services

Tuesday, March 10th, 2009

I was able to successfully install Oracle Business Intelligence 10.1.3.4 in Amazon Web Services over the course of a few hours this afternoon and evening. The idea is to allow for customers of a friend’s consulting company to train on the instance. The whole exercise cost me $1.32, check out my statement ;)


Summary of This Month's Activity as of March 11, 2009
Billing Cycle for this Report: March 1 - March 31, 2009
Expand All Expand All | Collapse All Collapse All
Rate Usage Totals
Amazon Elastic Compute Cloud
View/Edit Service
Amazon EC2 running Linux/UNIX
$0.10 per Small Instance (m1.small) instance-hour (or partial hour) 6 Hrs 0.60
$0.100 per GB Internet Data Transfer - all data transfer into Amazon EC2 4.713 GB 0.47
$0.170 per GB Internet Data Transfer - first 10 TB / month data transfer out of Amazon EC2 0.101 GB 0.02
$0.10 per GB-month of provisioned storage 0.228 GB-Mo 0.02
$0.10 per 1 million I/O requests 616,002 IOs 0.06
$0.01 per 1,000 puts (when saving a snapshot) 109 Requests 0.01
View Usage Report 1.18
Amazon Simple Storage Service
View/Edit Service
$0.100 per GB - all data transfer in 1.338 GB 0.13
$0.01 per 1,000 PUT, COPY, POST, or LIST requests 138 Requests 0.01
View Usage Report 0.14
Taxes
Estimated Taxes
(Due April 1, 2009)
0.00
Charges due on April 1, 2009† 1.32

I used a 40GB EBS volume which will store the data persistently for just $4.00 a month! The way I configured everything I can just spin up my Amazon Machine Image whenever I want and attach it to the volume. Within 10 minutes I have a running instance of OBIEE and when it’s not in use I can keep it on the shelf for just $4.00 a month (did I already mention that?). The other very big plus is I can snapshot the EBS volume so as soon as they are finished training I can destroy their volume and then return to the clean starting image when a new training is starting.

For a really professional look I’m going to register an Elastic IP address and have them forward training.domain.com to the public IP address. That way whenever he walks into a customers site he can just tell everyone to open a browser and ask them to go to the friendly URL. Also, I can lock down by public network IP address who is allowed to access the site, so they can make sure only the customers site is able to get to the training page.

One more thing, the same server is also running Oracle 10g XE in case they want to build some fancier data sets for their customers to train in.

Check out the cost breakdown

Data Transfer In – 100GB
Data Transfer Out – 100GB
Persistent Storage – 40GB
3 Snapshots of Persistent Storage – 40GB x 3 = 120GB
Hours of Compute time on a small instance – 80 hours

This would cost $57.00 monthly broken down by:
Compute – $8.00
Data Transfer – $27.00
EBS Volumes – $4.00
EBS Snapshots – $18.00

I received some feedback from a friend at Oracle who informed me I should check out Oracle’s Cloud Licensing document. It looks like they charge per core similar to a physical system with just a few small differences. Check out the document here.

Not bad compared to a lot more under a traditional hosted or data center environment. Plus no capital required! Check out the screenshot below of the web login to the public DNS name in AWS.

OracleBIinCloud

Advanced Intrusion Detection Environment

Monday, March 2nd, 2009

Advanced Intrusion Detection Environment (AIDE) is a file integrity checker for UNIX operating systems. Its purpose to provide reporting on the integrity of data on supported file systems. It is essentially the open source version of Tripwire. While you can use a tool such as Splunk to perform file integrity monitoring it requires more overhead on the server (running splunkd all the time) whereas AIDE is run only when it executed. This post will show you how you can install and setup a quick script to run AIDE on a target host from a trusted host (using ssh keys). By running AIDE multiple times on the target host you can determine what files are changing. Also, by using a trusted host you keep any data off of the target host free from being compromised (in theory).

The Environment

1. Target host
2. Trusted host (authorized to ssh as root using a public key to the target host)

The Concept

Here is a simple diagram of what we are going to setup.

aidedesign

Install AIDE using yum

You can install aide using yum or you can download it from SourceForge


#yum install aide

or

#wget http://superb-west.dl.sourceforge.net/sourceforge/aide/aide-0.13.1.tar.gz
#tar -xvzf aide-0.13.1.tar.gz
....
#cd aide-0.13.1
#./configure
...
#make
...
#make install
...

Install the runaide.ksh script

First, make the directories you need.


# mkdir -p /app/aide
# for x in store conf report bin; do mkdir /app/aide/$x; done

On your trusted host copy the following into a file called runaide.ksh


#!/bin/ksh -x
#
# Author: jameslabocki@gmail.com
# Date: 05.12.08
#
# A simple script that performs the following
# 1. Secure copies an aide configuration to each host
# 2. Executes an aide init via ssh on each host
# 3. Secure copies the aide database to the store directory
# 4. Performs a compare of the two databases and records the results
# 5. Emails the results to a specified email address
#
WHOTOMAIL=linux.admins@domain.com,infosec@domain.com
#
#Important variables :)
TS=`date '+%m%d%y'`
YESTERDAY=`date -d "1 day ago" '+%m%d%y'`
TWODAYS=`date -d "2 days ago" '+%m%d%y'`
ROOTDIR=/app/aide
REPODIR=${ROOTDIR}/store
CONFDIR=${ROOTDIR}/conf
REPORTDIR=${ROOTDIR}/report
BINDIR=${ROOTDIR}/bin
HOSTLIST=${CONFDIR}/hostlist
SSH=/usr/bin/ssh
#
#Almost as important variables :)
AIDE=/usr/sbin/aide
HOSTCONFDIR=/root
HOSTCONFFILE=${HOSTCONFDIR}/aide.cfg
HOSTDBFILE=/var/lib/aide/aide.db.new.gz
#
#Loop through the hosts and create database, copyi
for host in `cat $HOSTLIST`; do
scp ${CONFDIR}/aide.cfg ${host}:${HOSTCONFFILE}
ssh ${host} "${AIDE} --init -c ${HOSTCONFFILE}"
scp ${host}:${HOSTDBFILE} ${REPODIR}/${host}.${TS}.gz
#
if [ ! -f ${REPODIR}/${host}.${YESTERDAY}.gz ]; then
echo "${host}.${YESTERDAY}.gz does not exist yet" |mail -s "Aide Report for ${host}" ${WHOTOMAIL}
else
#Shortcoming in aide having to use config files for everything
cp ${CONFDIR}/aidecompare.cfg ${REPODIR}/${host}.${TS}.cfg
cat ${REPODIR}/${host}.${TS}.cfg |awk '{sub(/todaysdatabase/,'"\"${host}.${TS}.gz\""');print}' |awk '{sub(/yesterdaysdatabase/,'"\"${host}.${YESTERDAY}.gz\""');print}' > ${REPODIR}/${host}.${TS}.cfg.tmp
${AIDE} --compare -c ${REPODIR}/${host}.${TS}.cfg.tmp > ${REPORTDIR}/${host}.${TS}.log
cat ${REPORTDIR}/${host}.${TS}.log |mail -s "Aide Report for ${host}" ${WHOTOMAIL}
fi
#
#remove the db from two days ago
rm ${REPODIR}/${host}.${TWODAYS}.gz
#
done

The script looks for a file named hostlist in /app/aide/conf and then runs through the list of hosts copying over an aide.cfg file and running the aide executable on each host. This means that you can customize the aide.cfg in one place for your environment and not worry about maintaining the configuration file across machines. Once you have a hostlist and a general configuration file you can execute the script.

Further Development
I also setup the reports directory to be served over http through apache and authenticated against Active Directory using mod_authz_ldap. This is nice because you can allow a third party to review the reports on a daily basis.

Here are a few helpful crontab entries after you have the script running.


#Remove old aide reports files
30 1 * * * /usr/bin/find /app/aide/store/ -mtime +10 -exec rm {} \;
30 1 * * * /usr/bin/find /app/aide/report/ -mtime +30 -exec rm {} \;
#Aide job that audits file changes on systems at 3AM each day
0 3 * * * /app/admin/servicedelivery/linux/admin/bin/runaide.ksh

Splunk to Jira for PCI

Wednesday, February 25th, 2009

PCI DSS 10.6 requires that logs are reviewed on a daily basis. If the proper alerting mechanisms are in place you can bypass the need to look through thousands of lines of logs, but you still need a trail of what actions are taken after an alert is received.

PCI DSS 10.6

10.6
Are logs for all system components reviewed at least daily? Log reviews must include those servers that perform security functions like intrusion detection system (IDS) and authentication, authorization, and accounting protocol (AAA) servers (for example, RADIUS). Note: Log harvesting, parsing, and alerting tools may be used to achieve compliance with Requirement 10.6.

I am using XMLRPC and a perl script to automatically create issues in Jira when a Splunk alert is triggered. Although you can use Splunk to perform log review by tagging events I thought it would be more beneficial to use our currently installed work flow system so that we don’t have to allow everyone to log into Splunk and we could use some of the other features inside of Jira as well. So what do you need to do to make this work?

1. Install necessary CPAN modules (assuming you already have perl) on your splunk server.


#cpan -e
cpan> install XMLRPC::Lite
...
cpan> install Data::Dumper
...

1.b Enable XMLRPC in Jira under administration -> Plugins -> RPC Plugin.

splunkjirapci1

2. Copy the following script in $SPLUNK_HOME/bin/scripts/openJiraTicket.pl


#!/usr/bin/perl
#
# * Your search can trigger a shell script.
# * Specify the name of the shell script to run.
# * Place the script in $SPLUNK_HOME/bin/scripts.
# * Command line arguments passed to the script are:
# * $0 = script name.
# * $1 = number of events returned.
# * $2 = search terms.
# * $3 = fully qualified query string.
# * $4 = name of saved splunk.
# * $5 = trigger reason (i.e. "The number of events was greater than 1").
# * $6 = link to saved search.
# * $7 = a list of tags belonging to this saved search.
# * $8 = file where the results for this search are stored (contains raw results).
use strict;
use warnings;
#
use XMLRPC::Lite;
use Data::Dumper;
#
my $summary = $ARGV[3];
my $description = "An alert was triggered in splunk with the following information:" . "\n\nSearch String: " . $ARGV[1] . "\nNumber of Results:" . $ARGV[0] . "\nAlert Name:" . $ARGV[3] . "\nReason Alarm Triggered:" . $ARGV[4] . "\n\nLink to Search:" . $ARGV[5];
#
my $jira = XMLRPC::Lite->proxy('http://jira.domain.com:8080/rpc/xmlrpc');
my $auth = $jira->call("jira1.login", "username", "password")->result();
my $call = $jira->call("jira1.createIssue", $auth, {
'project' => 'AI',
'type' => 30,
'summary' => $summary,
'description' => $description,
'assignee' => 'usertoassignto',
});
my $fault = $call->fault();
if (defined $fault) {
die $call->faultstring();
} else {
print "issue created:\n";
print Dumper($call->result());
}
$jira->call("jira1.logout", $auth);

3. Add a new input to your savedsearches.conf file in $SPLUNK_HOME/etc/system/local/savedsearches.conf. Note the action_script is set to openJiraTicket.pl which means whenever the relation field is met the script openJiraTicket.pl will be executed.


[PCIDSS-10_3_4-LinuxFailedLogins]
action_email = linux.admins@domain.com
action_rss = 0
action_script = openJiraTicket.pl
counttype = number of events
enableSched = 1
quantity = 0
relation = greater than
role = productionLinux
schedule = 0 * * * *
search = index=production pam_unix failure startminutesago=60
sendresults = 1
userid = myuser

4. Now create a few failed login attempts via ssh to a server that is logging to Splunk and you should see an issue get opened in Jira by Splunk.

splunkjirapci2

Some other enhancements that can be made to this setup are:

1. Creating an event inside of Splunk itself when the issue is opened in Jira. Then log when the Jira ticket is closed to Splunk as well and measure the variance in time of the opening and closing of issues as a KPI of the operations staff.

2. Extend the functionality of the perl script with an array (or maybe use sqlite) to match the alert name in Splunk to a component in Jira. This would allow you to compartmentalize the alerts into groups in Jira (i.e windows in one component, linux in another, network in another, etc).

3. Use encrypted password in the perl script and use XMLRPC over ssl. I will be doing this shortly. A quick howto

Encrypt your password file

openssl rand 32 -out key.temp
openssl des3 -e -pass file:key.temp -in passwd -out passwd.enc
openssl des3 -d pass file:key.dec -in passwd.enc -out passwd.dec
openssl des3 -d -pass file:key.dec -in passwd.enc -out passwd.dec
openssl des3 -d -pass file:key.temp -in passwd.enc -out passwd.dec

Unencrypt your password from within the script using the key. I learned chomp is VERY important ;)

if ($enpasswd) {
$passwd = `openssl des3 -d -pass file:/etc/pki/tls/private/dbi.key -in $enpasswd |awk NF`;
chomp($passwd);
}

Configuring Oracle ASM on Enterprise Linux 5

Friday, February 20th, 2009

Configuring Oracle ASMLib on Multipath Disks

There are two issues with using multipath disks in ASM. ASM cannot handle seeing the same disk twice. If it does, it will cause an error. A single disk can appear three times in a multipath configuration:

1. The first path to the disk
2. The second path to the disk
3. The multipath access point

Here’s an example: say a system has one local disk, /dev/sda, and one disk attached via external storage. The machine has two connections, or paths, to that external storage. The Linux SCSI driver will see both paths. They will appear as /dev/sdb and /dev/sdc. The system may access sdb or sdc, the access ends up in the same place.

If multipathing is enabled, there will be a multipath disk, for example /dev/multipatha, that can access both paths. That is, any I/O to multipatha can use either path. If a system were using the sdb path, and that cable is unplugged, the system will get an error. But the multipath disk will know to switch to the sdc path.

Most system software will be unaware of the multipath configuration. It can use any of the paths, sdb, sdc, or multipatha, and it won’t know the difference. ASMLib is the same way. The default configuration does not care which path it uses.

ASMLib will choose only one of the paths, because ASM cannot handle seeing the same disk twice. This solves the first issue. ASM only sees one path, and it is happy. Here is the second issue: which path does ASM see? In its default configuration, ASMLib will choose the first path it finds. This is the first path as reported by Linux. Depending on your storage driver, it could be the multipath, or it could be one of the single paths.

The system administrator wants ASMLib to always use the multipath disk. What’s the point of having it if Oracle is not using it? There is no way, however, for ASMLib to know what a multipath looks like. It must be told via its configuration.

Disk Scan Ordering

Disks are marked for ASMLib using a process described in ASMLib Installation. ASMLib learns what disk are marked during a process called disk scanning. ASMLib runs this scan every time it starts up. The system administrator can also force a scan via the /etc/init.d/oracleasm scandisks command.

ASMLib examines each disk in the system. It checks if the disk has been marked for ASMLib. Any disk that has been marked will be made available to ASMLib. Normally, ASMLib checks each disk in the exact order the OS lists them. This is fine for most systems.

In the last section, we described a case where the OS order isn’t good enough. The system administrator wants ASMLib to see multipath disks before it sees single path disks. This allows ASMLib to choose the multipath disk and make it available to Oracle.

ASMLib allows two modifications to the disk scan order. First, it allows exclusion of certain disks. In other words, ASMLib will ignore those disks completely. Second, the system administrator can specify disks that are to be scanned first. Disks in this list are scanned before the rest of the disks in the system.

A multipath configuration can use either facility. The system administrator could choose to exclude all the single path disks. ASMLib will then ignore them, only scanning the multipath disks. Conversely, the administrator could specify that the multipath disks are scanned first. ASMLib will see them first, and prefer them over the single paths.

Configuring the Scan Order

The Oracle ASMLib configuration file is located at /etc/sysconfig/oracleasm. It contains all the startup configuration that the system administrator specified via the /etc/init.d/oracleasm configure command. That command cannot configure scan ordering.

The configuration file contains many configuration variables. The ORACLEASM_SCANORDER variable specifies disks to be scanned first. The ORACLEASM_SCANEXCLUDE variable specifies the disks that are to be ignored.

The variables take a whitespace-seperated list of prefix strings to match. In other words, if a disk starts with one of the prefix strings, it matches. For example, the prefix string sd will match all SCSI devices. The disk sda starts with sd. Note that these are not globs. They do not use wildcards. They are simple prefixes. Also note that the /dev/ path is not part of the prefix.

NOTE: When scanning, only the device names known by the kernel are scanned. With device-mapper, the kernel sees the devices as /dev/dm-XX. The /dev/mapper/XXX names are created by udev for human readability. Any configuration of ORACLEASM_SCANORDER or ORACLEASM_SCANEXCLUDE must use the dm prefix.

A couple examples:

Multipath Disks First

The system administrator configures ASMLib to scan the multipath disks first. In the ASMLib configuration file she edits the ORACLEASM_SCANORDER variable to look like so:

ORACLEASM_SCANORDER="multipath sd"

During a scan, ASMLib first tries all disks that start with “multipath”. The multipath device /dev/multipatha certainly matches. It is scanned first. Next, ASMLib tries all disks that start with “sd”. This is all the SCSI disks. The local disk /dev/sda will be scanned, but it is not an ASM disk. The single path disks /dev/sdb and /dev/sdc are also scanned. They are ASM disks, but ASMLib will see that it already has a path to that disk. It will ignore them. Finally, ASMLib will scan any other disks that did not match either prefix.

Excluding Single Path Disks

The system administrator configures ASMLib to ignore the single path disks. In the ASMLib configuration, he edits the ORACLEASM_SCANEXCLUDE variable to look like so:

ORACLEASM_SCANEXCLUDE="sdb sdc"

Here, the system administrator has been more specific. ASMLib should ignore exactly the disks /dev/sdb and /dev/sdc. It should not ignore other SCSI disks. While scanning, ASMLib will ignore those paths, only seeing the /dev/multipath disk. Once again, Oracle will use the multipath disk.

EMC PowerPath and ASMLib
A lot of administrators want to use EMC PowerPath for multipathing and ASMLib for disk access.

However, PowerPath and the 2.4 kernels are not supported by EMC. Systems with the Linux 2.6 kernel, such as RHEL 4 or SLES 9, and the 2.0 ASMLib kernel driver are supported. Please reference the EMC Support Matrix to verify that any/all interoperability requirements are met.

If you have ANY issue related to support for ASMLib and PowerPath on systems using the Linux 2.4 kernel, such as RHEL 3 or SLES 8, please contact EMC.

In order to keep ASMLib from scanning local disks first run fdisk to determine how many local disks you have.


fdisk -l |grep sd |grep -v bytes |awk -F" " '{print $1}' |cut -c 6-8 |sort -n |uniq
sda
sdb
sdc
sdd
sde
sdf
sdg

Place the out from the fdisk command in the /etc/sysconfig/oracleasm.

ORACLEASM_SCANEXCLUDE="sda sdb sdc sdd sde sdf sdg"

Warning!
Once this is complete the oracleasm service must be restarted. Make sure this action is coordinated with a DBA so Oracle is shutdown on the node being modified!


[root@hostname~]# /etc/init.d/oracleasm listdisks
QAORADATA
QAORALOG
[root@hostname~]# /etc/init.d/oracleasm querydisk QAORADATA
Disk "QAORADATA" is a valid ASM disk on device [8, 33] <-- local disks so failover won't work
[root@hostname~]# /etc/init.d/oracleasm querydisk QAORALOG
Disk "QAORALOG" is a valid ASM disk on device [8, 49] <-- local disks so failover won't work
[root@hostname~]# /etc/init.d/oracleasm stop
Unmounting ASMlib driver filesystem: [ OK ]
Unloading module "oracleasm": [ OK ]
[root@hostname~]# /etc/init.d/oracleasm start
Loading module "oracleasm": [ OK ]
Mounting ASMlib driver filesystem: [ OK ]
Scanning system for ASM disks: [ OK ]
[root@hostname~]# /etc/init.d/oracleasm listdisks
QAORADATA
QAORALOG
[root@hostname~]# /etc/init.d/oracleasm querydisk QAORADATA
Disk "QAORADATA" is a valid ASM disk on device [120, 33] <-- power device, failover will work
[root@hostname~]# /etc/init.d/oracleasm querydisk QAORALOG
Disk "QAORALOG" is a valid ASM disk on device [120, 1] <-- power device, failover will work
[root@hostname~]# ls -al /dev/emcpowerc1
brw-r----- 1 root disk 120, 33 Mar 27 08:47 /dev/emcpowerc1
[root@hostname~]# ls -al /dev/emcpowera1
brw-r----- 1 root disk 120, 1 Mar 27 08:47 /dev/emcpowera1

oracleasm configure


/etc/init.d/oracleasm configure
Configuring the Oracle ASM library driver.

This will configure the on-boot properties of the Oracle ASM library
driver. The following questions will determine whether the driver is
loaded on boot and what permissions it will have. The current values
will be shown in brackets ('[]'). Hitting without typing an
answer will keep that current value. Ctrl-C will abort.

Default user to own the driver interface []: 500
Default group to own the driver interface []: 500
Start Oracle ASM library driver on boot (y/n) [n]: y
Fix permissions of Oracle ASM disks on boot (y/n) [y]: y

Workaround for ASM in EL5

For some reason using the oracleasm command doesn't work in RHEL5. I found a work around is to use asmtool to manually force label the ASM disk.


asmtool -C -l /dev/oracleasm/ -n "QAORADATA" -s "/dev/emcpowerc1" -a force=yes
asmtool: Device "/dev/emcpowerc1" is not a partition
asmtool: Continuing anyway

Audit Database Activity with Splunk or Syslog

Friday, February 20th, 2009

Here are the steps I took to audit database activity and get the data into Splunk. You could technically send this to any log repository capable of listening on a TCP socket. The challenge is that many RDBMS don’t allow you to get database information. Oracle is notorious for this (because they want you to purchase Database Vault). Of course if you don’t have money (like me) for that you can use this solution. You really need three things:

1. Install DBI and Oracle:DBD on the host that will be polling the database.
2. Grant a user who will run the script SELECT on dba_audit_trail table (or whatever table you want to poll and pull into splunk).
3. Setup the perl script with the proper inputs and encrypt the password file for no plain text on the file system.

1. Install DBD::Oracle on 64-bit Linux and Oracle 11g

DBD::Oracle 1.21 to install on a 64-bit Linux OS against the Oracle 11 full client. As root, we downloaded DBD::Oracle from CPAN.


# perl -MCPAN -eshell
cpan> install DBI
cpan> get DBD::Oracle
...
# cd /root/.cpan/build/DBD-Oracle-1.21
# export ORACLE_HOME=actual value of Oracle Home
# export ORACLE_SID=actual value of ORACLE_SID
# export ORACLE_USERID=working ORACLE_USERID
# export LD_LIBRARY_PATH=$ORACLE_HOME/lib:$ORACLE_HOME
# perl Makefile.PL
...
# make
...
# make test
...
# make install

Now you can install the dbipoll perl script.

Generate a random key, encode a file with it, and then decode a file with it. Useful to generate a password, encrypt it and then get rid of the plain text password inside scripts.

openssl rand 32 -out key.temp
openssl des3 -e -pass file:key.temp -in passwd -out passwd.enc
openssl des3 -d pass file:key.dec -in passwd.enc -out passwd.dec
openssl des3 -d -pass file:key.dec -in passwd.enc -out passwd.dec
openssl des3 -d -pass file:key.temp -in passwd.enc -out passwd.dec

Verify you are running encrypted. Add the following temporarily to your sqlnet.ora

TRACE_DIRECTORY_CLIENT=/ora-main/app/oracle/product/11.1.0.6/db/network/trace
#TRACE_FILE_CLIENT=client
TRACE_LEVEL_CLIENT=16

Inspect the trace for encryption

[root@mymachine trace]# cat cli_9668.trc |grep encry
(2863411744) [13-FEB-2009 15:37:50:911] naeshow: These are the encryption algorithms that the client will accept:
(2863411744) [13-FEB-2009 15:37:50:911] naeshow: Choice 0: no algorithm; encryption inactive
service: encryption
(2863411744) [13-FEB-2009 15:37:50:934] na_info: Oracle Advanced Security: encryption service for Linux: Version 11.1.0.6.0 - Production
service: encryption
(2863411744) [13-FEB-2009 15:37:50:935] nacomps: service encryption
(2863411744) [13-FEB-2009 15:37:50:940] nacomus: service encryption
service: encryption
service: encryption
(2863411744) [13-FEB-2009 15:37:50:952] naeecom: The server chose the 'RC4_256' encryption algorithm
(2863411744) [13-FEB-2009 15:37:50:952] na_tns: encryption is active, using RC4_256

Note the last two lines indicating we are encrypted.

Here is the dbipoll.pl script.


#!/usr/bin/perl
#
#
# Now accepts arguments for countkey, table, and columns.
# countkey will ideally be a timestamp field, such as EXTENDED_TIMESTAMP in AUD$.
#
# Need to deal with the following variations...
# 1 - Oracle local
# 2 - Oracle remote
# 3 - mysql local
# 4 - mysql remote
# 5 - Sybase local
# 6 - Sybase remote
#
# Oracle connect string: $dbh = DBI->connect("dbi:Oracle:host=$server;sid=$db;port=$dbport", $user, $passwd);
# Oracle connect string without env vars: $dbh = DBI->connect( "dbi:Oracle:$db", $username, $passwd )
# mysql connect string: $dbh = DBI->connect("dbi:mysql:database=$db\;host=$server;port=$dbport","$user","$passwd");
# Sybase connect string: $dbh = DBI->connect("dbi:sybase", $username, $password, $db);
use IO::Socket;
use strict;
use DBI;
use Getopt::Long;
use Sys::Syslog;
#
our ($server,$db,$table,$columns,$user,$enpasswd,$passwd,$dbtype,$tcphost,$tcpport,
$countfile,$countkey,$ofile,$dbport,$baseconnect,$dbh);
#
GetOptions(
"dbtype=s" =>\$dbtype, # Type of DB (Oracle, MySQL, etc.)
"server=s" =>\$server, # IP address of database server.
"dbport=s" =>\$dbport, # Port for connecting to remote DB.
"db=s" =>\$db, # Database to retriever records from.
"user=s" =>\$user, # DB username.
"password=s" =>\$passwd, # Unencrypted password.
"xpassword=s" =>\$enpasswd, # Encrypted password.
"table=s" =>\$table, # Table to retrieve from.
"columns=s" =>\$columns, # Columns to fetch.
"tcphost=s" =>\$tcphost, # Splunk server IP.
"tcpport=s" =>\$tcpport, # Splunk tcp listener port.
"countfile=s" =>\$countfile, # File to store row count status.
"countkey=s" =>\$countkey, # Field to use as count value
"o=s" =>\$ofile, # File for output (default is stdout).
);
#
# Deal with encrypted passwords first...
if (!$passwd) {
if ($enpasswd) {
$passwd = `openssl des3 -d -pass file:/etc/pki/tls/private/dbi.key -in $enpasswd |awk NF`;
chomp($passwd);
}
}
#
if ($dbtype =~ /oracle/i) {
$ENV{'DBI_DRIVER'} = "Oracle";
$dbtype = "Oracle";
$dbh->{InactiveDestroy} = 1;
#$baseconnect = DBI->connect("dbi:$dbtype:$db");
if (!$dbport) {
# Assume a local DB.
$dbh = DBI->connect("dbi:$dbtype:$db", "$user", "$passwd") ||
die( $DBI::errstr . "\n" );
#
} else {
if (!$passwd) {
$dbh = DBI->connect("dbi:Oracle:host=$server;sid=$db;port=$dbport", $user) ||
die( $DBI::errstr . "\n" );
} else {
$dbh = DBI->connect("dbi:Oracle:host=$server;sid=$db;port=$dbport", $user, $passwd) ||
die( $DBI::errstr . "\n" );
}
}
#
}
if ($dbtype =~ /mysql/i) {
$ENV{'DBI_DRIVER'} = "mysql";
$dbtype = "mysql";
#$baseconnect = DBI->connect("dbi:$dbtype:database=$db;host=$server");
if (!$dbport) {
# Assume a local DB.
$dbh = DBI->connect("dbi:$dbtype:host=$server;database=$db","$user","$passwd") ||
die( $DBI::errstr . "\n" );
#$dbh = $baseconnect . "$user" . "$passwd";
} else {
if (!$passwd) {
$dbh = DBI->connect("dbi:$dbtype:database=$db;host=$server;port=$dbport","$user") ||
die( $DBI::errstr . "\n" );
} else {
$dbh = DBI->connect("dbi:$dbtype:database=$db;host=$server;port=$dbport","$user","$passwd") ||
die( $DBI::errstr . "\n" );
}
}
#
}
if (($dbtype =~ /sybase/i) && (!$ENV{'SYBASE'})) {
$ENV{'SYBASE'} = '/usr/local/freetds';
$ENV{'DSQUERY'} = "$server";
$dbtype = "sybase";
#$baseconnect = DBI->connect("dbi:$dbtype", "$db");
if (!$dbport) {
# Assume a local DB.
$dbh = DBI->connect("dbi:$dbtype:database=$db","$user","$passwd") ||
die( $DBI::errstr . "\n" );
} else {
if (!$passwd) {
$dbh = DBI->connect("dbi:$dbtype:database=$db;port=$dbport","$user") ||
die( $DBI::errstr . "\n" );
} else {
$dbh = DBI->connect("dbi:$dbtype:database=$db;port=$dbport","$user","$passwd") ||
die( $DBI::errstr . "\n" );
}
}
}
my $sth;
if ( -f $countfile) {
# Compare vs. file with saved count
open (CF, "<$countfile");
my $filecount = readline CF;
chomp $filecount;
close (CF);
# Adding an ORDER BY clause here may be a requirement
$sth = $dbh->prepare("SELECT $columns FROM $table where $countkey > \'$filecount\' order by $countkey");
} else {
$sth = $dbh->prepare("SELECT $columns FROM $table order by $countkey");
}
#
our $handle;
if (($tcphost) && ($tcpport)) {
# Send directly to splunk server...
# Create tcp socket to send the data to the splunk server
my $remote = IO::Socket::INET->new( Proto => "tcp",
PeerAddr => $tcphost,
PeerPort => $tcpport,
Type => SOCK_STREAM)
or die "cannot connect to tcp daemon on $tcphost";
$remote->autoflush(1);
$handle = $remote;
#$target = "\$remote";
} elsif ($ofile) {
# Write to local outfile...
open (OFILE, ">$ofile");
$handle = *OFILE;
#$target = *OFILE;
} else {
# send to stdout
open (STDOUT, ">&1");
$handle = *STDOUT;
#$target = *STDOUT;
}
#
# Get some Rows...
$sth->execute or die $sth->errstr;
open (CF, ">$countfile");
while(my $hash_ref = $sth->fetchrow_hashref) {
my $output = "";
my $lastrow;
foreach my $key (keys(%$hash_ref)) {
my $str = $hash_ref->{$key};
if ($key ne "EXTENDED_TIMESTAMP") {
if ($str =~ /"/) {
(my $xstr = $str) =~ s/"/\\"/g;
$str = "\"$xstr\"";
} else {
$str = "\"$str\"" if ($str =~ / /);
$str = "\"$str\"" if ($str =~ /,/);
}
$output = $output . $key . "=" . $str . ",";
}
if ($key eq "EXTENDED_TIMESTAMP") {
# if ($key eq "TO_CHAR(EXTENDED_TIMESTAMP,'MM/DD/YYYYHH24:MI:SS')") {
$str =~ s/\"//g;
$lastrow = $str; #if (lc($str) eq lc($countkey));
}
#
}
chop($output);
# Update countfile
# Where did we leave off?...
*CF->autoflush(1);
seek(CF,0,0);
print CF "$lastrow\n";
#
# And send them somewhere.
print $handle $output . "\n";
}
#
$sth->finish();
$dbh->disconnect;
#
close (CF);
close $handle;

Now schedule this to run (I just wrote a wrapper script in ksh exporting the variables to be safe).


#!/usr/bin/ksh
#
export ORACLE_HOME=/ora-main/app/oracle/product/11.1.0.6/db
export ORACLE_SID=rmdev1
export LD_LIBRARY_PATH=$ORACLE_HOME/lib:$ORACLE_HOME
#
/usr/bin/dbipoll.pl -dbtype=oracle -server=servername -dbport=1521 -db=development -user=splunk -xpassword=/etc/dbipasswd -table=dba_audit_trail -columns="OS_PROCESS,OS_USERNAME,USERNAME,USERHOST,ACTION,ACTION_NAME,TRANSACTIONID,SCN,SQL_BIND,SQL_TEXT,TO_CHAR(EXTENDED_TIMESTAMP,'MM/DD/YYYY HH24:MI:SS'),EXTENDED_TIMESTAMP" -countfile="/app/oracle/product/monitor/countfile" -countkey="EXTENDED_TIMESTAMP" -tcphost=splunkserver -tcpport=9094

Of course you will also need to setup a tcp input port on your splunk server. Once you have all that in place you can run the following query to see what hosts are performing the largest amount of actions on the database and what users are running the majority of actions on the database. I have removed hostnames and usernames legends from the following for security.

index=indexname sourcetype=databasesource | timechart count(USERNAME) by USERNAME

DBAuditSplunkUsers

index=indexname sourcetype=databasesource |timechart count(USERHOST) by USERHOST

DBAuditSplunkHosts

You can also use this to setup alerts if any users runs an update on a database (useful to catch those sneaky developers). :)

RHCS, Xen, and GFS on RHEL 5.2

Monday, January 12th, 2009

I recently built a Red Hat Cluster with GFS and utilized the Xen kernel to create a more robust virtualization environment. The combination of the three services allows you to have a shared storage platform and monitoring of guest operating systems. Some of the benefits are that if a guest or a host dies the cluster service restarts the guest(s) on another available host that is specified in a failover domain. While it is not as robust as VMWare’s SRM and DRS tools it certainly could be with just a little bit of scripting. It is also more cost effective.

I’m not going to go into the details of how I built the cluster in this post as a quick Google search can provide you with many guide. I would just like to share the setup and the gotchas of the install I ran into and provide a benchmark made using iozone.

The Setup

* 2 x Dell 2950, 2 quad core 2.66GHz, 16GB RAM,
* 2 x QLogic QLE2462 – PCI-Express to 4Gb FC.
* MPIO instead of PowerPath
* EMC CX3-80; RAID 6; SATAII; 14 Disks 1TB volume (not ideal, but what I had to use!). Write caching is enabled on the storage group.

The gotchas

1. If upgrading to Xen kernel.

I realized after upgrading to the xen kernel on the servers and rebooting the ramdisk didn’t include the xenblk driver required so the system kept rebooting. I had to boot into the regular kernel and make a new ramdisk using the following command.


#KERNEL=2.6.18-92.el5xen
#mkinitrd --with=xennet --preload=xenblk /boot/initrd-2.6.18-92.el5xenb.img $KERNEL

Then I changed the grub.conf to use that ramdisk and that problem was solved.

2. Recognizing EMC CX3-80 using MPIO.

Be sure to view the multipath.conf.defaults file and copy the proper configuration to your multipath.conf

view /usr/share/doc/device-mapper-multipath-0.4.7/multipath.conf.defaults

Added the following to /etc/multipath.conf.


devices {
device {
vendor "DGC"
product ".*"
product_blacklist "LUN_Z"
getuid_callout "/sbin/scsi_id -g -u -s /block/%n"
prio_callout "/sbin/mpath_prio_emc /dev/%n"
features "1 queue_if_no_path"
hardware_handler "1 emc"
path_grouping_policy group_by_prio
failback immediate
rr_weight uniform
no_path_retry 60
rr_min_io 1000
path_checker emc_clariion
}
}

After that restart multipathing and you should see your disks using multipath -ll

3. Fencing

I was a little disappointed with how fencing works against Dell’s Remote Access Controller (DRAC). Instead of /sbin/fence_drac using ssh or https to reboot fenced node it uses telnet by default. This is less than ideal for security purposes (telnet sending credentials in clear text), but it will have to work for now. It wouldn’t take too long to re-write the perl script to use ssh and the DRAC is already listening on port 22, but it’d be nice to have it included in the fenced RPM.

Benchmark of GFS Version 1 on Xen host

I performed some basic tuning (gfs_tool gettune, noatime, noquota, glock_purge, etc) on the file system. I ran a benchmark on the file system using iozone. The results are below. Please note the x-axis is kB file, y-axis is kB/sec, and z-axis is kB record.

1. The writer report indicates that best performance occurred with 4MB files with a record size between 256KB and 1024KB.

Click for full size

x-axis is file size, y-axis is record size.

2. The re-writer report indicates the positive effect of cache. Performance is very high, even for large file sizes.

Click for full size

Click for full size

3. The reader report.

Click for full size image

Click for full size image

4. The re-reader report.

Click for full size image

Click for full size image

5. Random read report.

Click for full size image

Click for full size image

6. Random write report.

Click for full size image

Click for full size image

Benchmark of GFS Version 1 from within the guest

Now that I ran a benchmark against the file system from the Xen host I thought it would be interesting to run the same benchmark from within the guest OS to see how the Xen hypervisor effects I/O. Again, note the x-axis is kB file, y-axis is kB/sec, and z-axis is kB record.

1. Guest Writer Report

Click for full size image

Click for full size image

Hadoop

Friday, January 9th, 2009

Hadoop distributed file system opens up some very interesting possibilities for organizations who want to reduce storage costs and processing time. Instead of building a formal data warehouse or n-tier architecture which struggles from the typical bottlenecks the architecture of HDFS is a master/worker architecture. Along with allowing organizations to run a file system across commodity hardware it provides the ability to run map reduce jobs over the cluster. Of course the best part is that it was named after the creators child’s stuffed elephant and falls under the Apache license. :)

HDFS Architecture

HDFS Architecture

I was impressed with some of the example jars included and the ability to index string data in a several ASCII text files in just seconds. I used Michael Noll’s Running Hadoop on Ubuntu Linux as a quickstart guide. He even provides links to examples of a map reduce job written in Python.

Some practical examples of where it could be used are for running analytics on large sets of data (think credit transactions, bank statements or even internal log data). The below diagram depicts ASCII data being dumped into an HDFS tier from a database tier. From there MapReduce jobs can be executed against the HDFS tier with faster results than a typical data warehouse.

Using HDFS and MapReduce for Analytics

Using HDFS and MapReduce for Analytics

I hope to use a large set of unused desktops to build a 25 node cluster shortly. Yahoo’s latest HDFS cluster will have me beat by about 3,975 nodes even if I do though. Check out the largest cluster here.

Hadoop Home

Open-Source Economics

Sunday, May 11th, 2008

A great talk by Yochai Benkler on Open-Source Economics can be found here.