Tax Systems and the American Middle Class

February 20th, 2009

Tax Systems and the American Middle Class

On December 16, 1773 at approximately 7:00PM, 90,000 lbs of tea were thrown overboard from the Dartmouth, Ealeanor, and Beaver in Griffin Wharf. The Boston Tea Party, as it became known, helped spark the American Revolution. The Tea Party’s purpose of showcasing America’s disdain with the British governments unfair taxes did not go unnoticed and Americans have debated taxes ever since. Proponents of tax reform argue that the current system is unfair and is also unnecessarily complex. In order to understand what tax system will provide the most benefit to the majority of Americans in terms of fairness, simplicity, and incentive to continue increasing productivity several questions must be answered. First, what percentage of America is middle class and how is middle class defined in terms of income? Second, what tax systems exist and how does each tax and redistribute income differently? Finally, can the effects of a change to the tax system on tax payer behavior be accurately predicted? For example, if a Value Added Tax (VAT) is introduced can we predict how much consumer spending will decline? Once these elements are examined and a tax system is determined to be the most advantageous to the majority of Americans the implementation costs of this system should be studied for feasibility.

Defining the Middle Class
First, it is important to define the American middle class and determine if it contains the majority of the American population. The American middle class is defined by Gilbert as a broad range of people ranging from families of four earning over $27,500 (the poverty line) to the same family which earns more than $100,000 per year (Gilbert, 1998). This range is further subdivided by many sociologists into what is known as the lower, middle, and upper middle classes. For the sake of this research the middle class will remain general. According to the United States Census Bureau (USCB) slightly over two thirds (67.7 percent) of household’s income fell between $25,000 and $149,000 in 2006 (U.S. Census Bureau). Figure 1 provides a visual representation of the USCB data. The majority of American households are classified as middle class.

figure1.jpg

Figure 1. United States Census Bureau data on 2006 income.

Graduated Income Tax

Several tax systems have been proposed as replacements of the current graduated income tax system. In order to compare the proposed systems it is important to understand how the current tax system in the United States operates. According to Merriam-Webster the simplest definition of income tax is, “A tax on the net income of an individual or business”. The first income tax was enacted in 1894 and contained an exemption for any corporation or association organized exclusively for religious, educational, or charitable purposes (Fremont 56). Since the three percent flat tax on income in 1894 many rules have changed. In fact, since 1942 there have been over 30 major amendments to the tax code many of which introduced multiple changes in a single amendment (Hollenbeck Ninety Years of Individual Income and Tax Statistics). The most recent form 1040 from the Internal Revenue Service (IRS) contains over 75 line items. Items such as moving expenses, student loan interest deductions, and domestic production activities only apply to very specific segments of the population. Furthermore, the 1040 form references over 30 additional forms. Over 130 million income tax returns are filed under this system each year (Hollenbeck Ninety Years of Individual Income and Tax Statistics).

The current individual federal income tax system in the United States ranges from 10 to 35 percent based on a graduated income scale (2007 Federal Tax Rate Schedules). The federal tax rate for families considered middle class ranges from 15 to 28 percent. Using the graduated income tax bracket a family of four earning $25,000 would be taxed $3,750 and a similar family earning $130,000 would be taxed $36,120. The second family earns slightly over five times as much income, but pays nearly ten times as much tax as the first family. This large difference in income to tax ratio occurs within what is considered by most sociologists to be the middle class.

An understanding of how the current graduated income tax effects the middle class has been established. There is a large disparity between the upper and lower middle class in regards to the amount of tax burden carried. Two of the most popular replacements to the current system are the flat tax and the fair tax.

Flat Tax System

A flat tax, sometimes called a flat rate tax or proportional tax, is a tax system in which, “the tax rate remains constant regardless of the amount of the tax base.” (Merriam-Webster Online). Recent flat tax plans by Steve Forbes, Paul Bremer, and Arnold Schwarzenegger have caused conversation about a flat tax among Americans. Additionally, many countries in Eastern Europe have introduced flat tax systems and some experts claim it has boosted their economies (Bartlett). In an article called A Brief Guide to the Flat Tax, Mitchell provides two major advantages the average American would receive from a flat tax system:

the most persuasive feature of a flat tax for many Americans is its fairness. The complicated documents, instruction manuals, and numerous forms that taxpayers struggle to decipher every April would be replaced by a brief set of instructions and two simple postcards. This radical reform appeals to citizens who not only resent the time and expense consumed by filing their own tax forms, but also suspect that the existing maze of credits, deductions, and exemptions gives a special advantage to those who wield political power and can afford expert tax advisers. (3)

The majority of flat tax proposals introduce a rate close to 16 percent for all earned income. Utilizing the same scenario from the graduated income tax example this would mean that a family earning $25,000 would pay $4,000 while the second family earning $130,000 would pay $20,800 in tax. While the $400 increase in tax for the family earning $25,000 is not ideal the reduction of the tax burden by more than $15,000 for the family earning $130,000 is astounding.

A point of contention with the flat tax system is that it will not collect enough tax revenue especially from businesses and upper income earners. This is because although the capital gains tax, interest-income, and dividends tax would be eliminated, the personal income and payroll taxes would still exist (Moore). Most in the middle class would not benefit from the elimination of the capital gains tax, interest-income, and dividends as much as they would the elimination of personal income and payroll taxes.

Furthermore, a flat tax system would not address border-adjusted taxes, the tax built into products exported from our country through corporate taxes. According to Leo Linbeck Jr., a member of American For Fair Taxation:

Border-adjusted taxes are, quite simply, the most potent weapons foreign producers have against U.S. producers and workers. Border-adjusted taxes are consumption taxes removed on export by the producing nation and assessed upon imports as ad valorem taxes.

The border-adjusted tax policy is one of the major flaws of the current graduated income tax system. This can be better understood through a short illustration. If an American company named Megacorp which specializes in making $100 dollar widgets is taxed under the corporate tax of 17 percent it must then pass the cost of the tax onto the consumer through the sale of their product. So, a $100 dollar widget would cost $117. When this widget is exported overseas to Latvia the widget price remains the same. However, Foreigncorp, a Latvian company does not pay a 17 percent corporate tax and within Latvia can sell the same widget for $100. Megacorp has lost 17% of it’s competitiveness due to the border tax. Some critics believe there is no point in altering the tax system in the United States if we do not fix the fundamental problems such as border-adjusted taxes.

Fair Tax System

The fair tax system is based on a national sales tax of 23 percent on new goods and services. There are several concessions for impoverished families and a tax rebate to ensure that necessary goods would not be taxed. According to Neal Boortz, a leading advocate of the plan, the fair tax, “would constitute the biggest transfer of power from politicians to the people since the beginning of this country.” (Boortz 193). Proponents argue this would benefit all Americans in several key ways. First, the fair tax would truly make the tax system participatory rather than mandatory. People who do not spend money on new goods and services would not pay the tax. Second, with the elimination of the corporate tax new corporation growth in the country would be unprecedented. This would lead to more jobs and a stronger economy for America.

Some critics argue that the Fair Tax System would not work because of the tax rebate policy. For example, a family earning $22,400 a year could pay $4,253 in national sales tax if buying only necessities. The same family might actually receive a rebate of $5,152 (Hirsch). This scenario illustrates a situation of income redistribution occurring within the system and is precisely what the fair tax system is supposed to eliminate. Scenarios such as the tax rebate policy are only half of the argument from critics.

Moreover, opponents claim that the Fair Tax System is part of a conspiracy to put more of a tax burden on the poor through taxes on everything purchased. This claim is quickly refuted by Fair Tax proponents. They argue that by encouraging investment and savings without taxation the working class will begin to build wealth. Small business owners would also benefit from the ability to grow their businesses without worry about the tax implications. Perhaps the greatest example is given by Neal Boortz who reminds critics, “As things are now, wealthy Americans with no current income have no income tax bill to pay” (Boortz 197).

Implementing a New Tax System

The effects of changing the current tax system can only be theorized. No fundamental changes to the tax system have occurred since its inception. Much of the anxiety about altering the tax system can be attributed to the fact that over 68 percent of American income comes from salaries and wages. The following examples illustrate the challenges of implementing an alternative tax system. Figure 2 illustrates the sources of total income for individuals in 2005.

figure2.jpg
Figure 2. Internal Revenue Service Statistics on Income

If the Fair Tax System is implemented some critics argue that a black market of goods may develop. Amity Shlaes, a writer for Bloomberg News, argues that a 30 percent tax on goods and services would have a negative effect on the consumer market:

Implement the FairTax, though, and the U.S. will find its tax-scape taking on a certain sleaziness. Vendors will materialize on street corners selling that DVD player without tax.

Even citizens who never thought of breaking the law will snatch up those DVDS. Thirty percent is simply too great a take to ignore. Especially vulnerable will be younger people, who already view property rights as an option, not a given. Think Napster — if you don’t pay for downloads, you certainly won’t feel the need to pay a sales tax six times the one your state charges.

If Shlaes theory is correct this would add an additional cost to the implementation of a Fair Tax system. The cost of policing the sale of goods and services to ensure they are being taxed appropriately would ultimately be passed back to the citizens in the form of a higher tax rate.

Another major concern of the implementation of the Fair Tax System is the double taxation of individuals during the transition period. For example, if a frugal individual worked for ten years without any major expenditures and earned $500,000 total in a ten year period. The $500,000 was taxed under the graduated income tax system and $100,000 in taxes was paid. After the implementation of the Fair Tax System if the individual wanted to purchase a $40,000 vehicle with the money he saved and was already taxed the person would pay a VAT of nearly 30 percent, or roughly $12,000. This is obviously a major concern for individuals with large amounts of savings. Although some savings will be passed onto consumers through the Fair Tax System in the way of lower prices it would not even come close to compensating those individuals that fall into the double taxation scenario.

Recommendations

Upon inspection of several alternative tax systems the Fair Tax System proposal appears to be the most viable candidate as a replacement. This is due to the fact it would create a culture of saving and investment while simultaneously simplifying the complexity of the tax code and promoting economic growth. Although this alternative tax system has been proposed the current graduated tax system cannot be replaced quickly. Research into several areas should be completed prior to any actions to change the graduated income tax. First, factoring geographic location into the effect of each alternative system would have on the middle class needs to be completed. For example, a family of four in rural South Dakota may live comfortably earning $50,000 annually while the same family in New York City would need to earn twice that to maintain a similar standard of living. Second, unintended consequences should be researched more fully for each alternative plan. For example, a study of how a national sales tax under the fair tax plan would impact consumer spending should be performed.

Conclusion

Several alternative tax systems show they may benefit the middle class either primarily through paying less taxes or secondarily through a better national economy. These systems also simplify the amount of paperwork and confusion leading to more faith in the tax system as a transparent and fair entity. The systems, especially the Fair Tax System, should be studied regressively in more detail to determine which system is best for the American middle class.

Works Cited

Gilbert, Dennis (1998). The American Class Structure. New York: Wadsworth Publishing. 0-534-50520-1.

“United States – Income in the past 12 months.” U.S. Census Bureau. U.S. Census Bureau. 7 Mar 2008

<http://factfinder.census.gov/servlet/STTable?_bm=y&-geo_id=01000US& qr_name=ACS_2006_EST_G00_S1901&-ds_name=ACS_2006_EST_G00_>.

Hollenbeck, Scott. “Ninety Years of Individual Income and Tax Statistics, 1916-2005.” Internal Revenue Service. March 2008. Internal Revenue Service. 16 Mar 2008 <http://www.irs.gov/pub/irs-soi/16-05intax.pdf>.

“2007 Federal Tax Rate Schedules.” Internal Revenue Service. 25 November 2007. Internal Revenue Service. 16 Mar 2008 <http://www.irs.gov/formspubs/article/0,,id=164272,00.html>.

“Proportional Tax.” Merriam-Webster Online. Merriam-Webster. 16 Mar 2008 <http://www.merriam-webster.com/dictionary/proportional+tax>.

Bartlett, Bruce. “Flat-Tax Comeback.” National Review Online 10 November 2003 15 March 2008 <http://www.nationalreview.com/nrof_bartlett/bartlett200311100918.asp>.

Mitchell, Daniel. “A Brief Guide to the Flat Tax.” Backgrounder 18667 July, 2005 15 March 2008 <http://www.heritage.org/Research/Taxes/upload/80562_1.pdf>.

Boortz, Neal. The FairTax Book: Saying Goodbye to the Income Tax and the IRS. HarperCollins, 2006.

Hirsch, Bill. “The Fair Tax: Magic or Illusion.” Ocala.com. 01 December 2005. Ocala Star-Banner. 22 Mar 2008 <http://www.ocala.com/apps/pbcs.dll/article?AID=/20051127/OPINION/51126005/1183/news08>.

Moore, Stephen. “Flat and simple, stupid.” High Beam Encylopedia. 2 February 1996. National Review. 22 Mar 2008 <http://www.encyclopedia.com/doc/1G1-18020709.html>.

Shlaes, Amity. “ScareTax, Not FairTax, Is Name for Huckabee Plan.” Bloomberg.com:News. 12 December 2007. Bloomberg. 26 Apr 2008 <http://www.bloomberg.com/apps/news?pid=washingtonstory&sid=aIOIqqEofwFU>.

The Boston Tea Party Historical Society. Date when site was accessed <http://www.boston-tea-party.org>.

Works Consulted

McCaffery, Edward. Fair Not Flat: How to Make the Tax System Better and Simpler. University of Chicago: 2002.

Kemp, “Preferences for funding particular government services from different taxes.” Journal of Economic Psychology 29(2008): 54-72.

Davies, Hoy. “Flat rate taxes and inequality measurement.” Journal of Public Economics 84(2002): 33-46.

Altig, Auerbach. “Simulating Fundamental Tax Reform in the United States.” The American Economic Review 91(2001): 574-595.

Snowdon, “How flat is flat.” International Tax Review 18(2007): 10-13.

Nicodeme, Gaetan. “Flat Tax: Does One Rate Fit All?.” Intereconomics 42.3 (May 2007): 138-142. Business Source Premier. EBSCO.

“How tax complexity can be good for you. ” International Tax Review (2005): 1. ABI/INFORM Global. ProQuest. 17 Feb. 2008 <http://www.proquest.com/>

Erik Fjærli. “Tax Reform and the Demand for Debt. ” International Tax and Public Finance : Special Issue: Public Finance and Financial Markets (Guest 11.4 (2004): 435-467. ABI/INFORM Global. ProQuest. 17 Feb. 2008 <http://www.proquest.com/>

Mintz, Jack M. “The thorny problem of implementing new consumption taxes. ” National Tax Journal 49.3 (1996): 461-474. ABI/INFORM Global. ProQuest. 17 Feb. 2008 <http://www.proquest.com/>

Jeffrey Owens. “Fundamental Tax Reform: An International Perspective. ” National Tax Journal 59.1 (2006): 131-164. ABI/INFORM Global. ProQuest. 17 Feb. 2008 <http://www.proquest.com/>

“SOI Tax Stats.” Internal Revenue Service. January 2006. Internal Revenue Service. 7 Mar 2008

<http://www.irs.gov/taxstats/indtaxstats/article/0,,id=129270.html>

Fremont, Marion. Governing Nonprofit Organizations: Federal and State Law and Regulations. Cambridge: Harvard University Press, 2004.

“Form 1040.” Internal Revenue Service. 2007. Internal Revenue Service. 16 Mar 2008 <http://www.irs.gov/pub/irs-pdf/f1040.pdf?portlet=3>.

Regnier, Pat. “Just how fair is the ‘Fairtax’?.” Money Magazine 7 September 2005 15 March 2008 <http://money.cnn.com/2005/09/06/pf/taxes/consumptiontax_0510/>.

Configuring Oracle ASM on Enterprise Linux 5

February 20th, 2009

Configuring Oracle ASMLib on Multipath Disks

There are two issues with using multipath disks in ASM. ASM cannot handle seeing the same disk twice. If it does, it will cause an error. A single disk can appear three times in a multipath configuration:

1. The first path to the disk
2. The second path to the disk
3. The multipath access point

Here’s an example: say a system has one local disk, /dev/sda, and one disk attached via external storage. The machine has two connections, or paths, to that external storage. The Linux SCSI driver will see both paths. They will appear as /dev/sdb and /dev/sdc. The system may access sdb or sdc, the access ends up in the same place.

If multipathing is enabled, there will be a multipath disk, for example /dev/multipatha, that can access both paths. That is, any I/O to multipatha can use either path. If a system were using the sdb path, and that cable is unplugged, the system will get an error. But the multipath disk will know to switch to the sdc path.

Most system software will be unaware of the multipath configuration. It can use any of the paths, sdb, sdc, or multipatha, and it won’t know the difference. ASMLib is the same way. The default configuration does not care which path it uses.

ASMLib will choose only one of the paths, because ASM cannot handle seeing the same disk twice. This solves the first issue. ASM only sees one path, and it is happy. Here is the second issue: which path does ASM see? In its default configuration, ASMLib will choose the first path it finds. This is the first path as reported by Linux. Depending on your storage driver, it could be the multipath, or it could be one of the single paths.

The system administrator wants ASMLib to always use the multipath disk. What’s the point of having it if Oracle is not using it? There is no way, however, for ASMLib to know what a multipath looks like. It must be told via its configuration.

Disk Scan Ordering

Disks are marked for ASMLib using a process described in ASMLib Installation. ASMLib learns what disk are marked during a process called disk scanning. ASMLib runs this scan every time it starts up. The system administrator can also force a scan via the /etc/init.d/oracleasm scandisks command.

ASMLib examines each disk in the system. It checks if the disk has been marked for ASMLib. Any disk that has been marked will be made available to ASMLib. Normally, ASMLib checks each disk in the exact order the OS lists them. This is fine for most systems.

In the last section, we described a case where the OS order isn’t good enough. The system administrator wants ASMLib to see multipath disks before it sees single path disks. This allows ASMLib to choose the multipath disk and make it available to Oracle.

ASMLib allows two modifications to the disk scan order. First, it allows exclusion of certain disks. In other words, ASMLib will ignore those disks completely. Second, the system administrator can specify disks that are to be scanned first. Disks in this list are scanned before the rest of the disks in the system.

A multipath configuration can use either facility. The system administrator could choose to exclude all the single path disks. ASMLib will then ignore them, only scanning the multipath disks. Conversely, the administrator could specify that the multipath disks are scanned first. ASMLib will see them first, and prefer them over the single paths.

Configuring the Scan Order

The Oracle ASMLib configuration file is located at /etc/sysconfig/oracleasm. It contains all the startup configuration that the system administrator specified via the /etc/init.d/oracleasm configure command. That command cannot configure scan ordering.

The configuration file contains many configuration variables. The ORACLEASM_SCANORDER variable specifies disks to be scanned first. The ORACLEASM_SCANEXCLUDE variable specifies the disks that are to be ignored.

The variables take a whitespace-seperated list of prefix strings to match. In other words, if a disk starts with one of the prefix strings, it matches. For example, the prefix string sd will match all SCSI devices. The disk sda starts with sd. Note that these are not globs. They do not use wildcards. They are simple prefixes. Also note that the /dev/ path is not part of the prefix.

NOTE: When scanning, only the device names known by the kernel are scanned. With device-mapper, the kernel sees the devices as /dev/dm-XX. The /dev/mapper/XXX names are created by udev for human readability. Any configuration of ORACLEASM_SCANORDER or ORACLEASM_SCANEXCLUDE must use the dm prefix.

A couple examples:

Multipath Disks First

The system administrator configures ASMLib to scan the multipath disks first. In the ASMLib configuration file she edits the ORACLEASM_SCANORDER variable to look like so:

ORACLEASM_SCANORDER="multipath sd"

During a scan, ASMLib first tries all disks that start with “multipath”. The multipath device /dev/multipatha certainly matches. It is scanned first. Next, ASMLib tries all disks that start with “sd”. This is all the SCSI disks. The local disk /dev/sda will be scanned, but it is not an ASM disk. The single path disks /dev/sdb and /dev/sdc are also scanned. They are ASM disks, but ASMLib will see that it already has a path to that disk. It will ignore them. Finally, ASMLib will scan any other disks that did not match either prefix.

Excluding Single Path Disks

The system administrator configures ASMLib to ignore the single path disks. In the ASMLib configuration, he edits the ORACLEASM_SCANEXCLUDE variable to look like so:

ORACLEASM_SCANEXCLUDE="sdb sdc"

Here, the system administrator has been more specific. ASMLib should ignore exactly the disks /dev/sdb and /dev/sdc. It should not ignore other SCSI disks. While scanning, ASMLib will ignore those paths, only seeing the /dev/multipath disk. Once again, Oracle will use the multipath disk.

EMC PowerPath and ASMLib
A lot of administrators want to use EMC PowerPath for multipathing and ASMLib for disk access.

However, PowerPath and the 2.4 kernels are not supported by EMC. Systems with the Linux 2.6 kernel, such as RHEL 4 or SLES 9, and the 2.0 ASMLib kernel driver are supported. Please reference the EMC Support Matrix to verify that any/all interoperability requirements are met.

If you have ANY issue related to support for ASMLib and PowerPath on systems using the Linux 2.4 kernel, such as RHEL 3 or SLES 8, please contact EMC.

In order to keep ASMLib from scanning local disks first run fdisk to determine how many local disks you have.


fdisk -l |grep sd |grep -v bytes |awk -F" " '{print $1}' |cut -c 6-8 |sort -n |uniq
sda
sdb
sdc
sdd
sde
sdf
sdg

Place the out from the fdisk command in the /etc/sysconfig/oracleasm.

ORACLEASM_SCANEXCLUDE="sda sdb sdc sdd sde sdf sdg"

Warning!
Once this is complete the oracleasm service must be restarted. Make sure this action is coordinated with a DBA so Oracle is shutdown on the node being modified!


[root@hostname~]# /etc/init.d/oracleasm listdisks
QAORADATA
QAORALOG
[root@hostname~]# /etc/init.d/oracleasm querydisk QAORADATA
Disk "QAORADATA" is a valid ASM disk on device [8, 33] <-- local disks so failover won't work
[root@hostname~]# /etc/init.d/oracleasm querydisk QAORALOG
Disk "QAORALOG" is a valid ASM disk on device [8, 49] <-- local disks so failover won't work
[root@hostname~]# /etc/init.d/oracleasm stop
Unmounting ASMlib driver filesystem: [ OK ]
Unloading module "oracleasm": [ OK ]
[root@hostname~]# /etc/init.d/oracleasm start
Loading module "oracleasm": [ OK ]
Mounting ASMlib driver filesystem: [ OK ]
Scanning system for ASM disks: [ OK ]
[root@hostname~]# /etc/init.d/oracleasm listdisks
QAORADATA
QAORALOG
[root@hostname~]# /etc/init.d/oracleasm querydisk QAORADATA
Disk "QAORADATA" is a valid ASM disk on device [120, 33] <-- power device, failover will work
[root@hostname~]# /etc/init.d/oracleasm querydisk QAORALOG
Disk "QAORALOG" is a valid ASM disk on device [120, 1] <-- power device, failover will work
[root@hostname~]# ls -al /dev/emcpowerc1
brw-r----- 1 root disk 120, 33 Mar 27 08:47 /dev/emcpowerc1
[root@hostname~]# ls -al /dev/emcpowera1
brw-r----- 1 root disk 120, 1 Mar 27 08:47 /dev/emcpowera1

oracleasm configure


/etc/init.d/oracleasm configure
Configuring the Oracle ASM library driver.

This will configure the on-boot properties of the Oracle ASM library
driver. The following questions will determine whether the driver is
loaded on boot and what permissions it will have. The current values
will be shown in brackets ('[]'). Hitting without typing an
answer will keep that current value. Ctrl-C will abort.

Default user to own the driver interface []: 500
Default group to own the driver interface []: 500
Start Oracle ASM library driver on boot (y/n) [n]: y
Fix permissions of Oracle ASM disks on boot (y/n) [y]: y

Workaround for ASM in EL5

For some reason using the oracleasm command doesn't work in RHEL5. I found a work around is to use asmtool to manually force label the ASM disk.


asmtool -C -l /dev/oracleasm/ -n "QAORADATA" -s "/dev/emcpowerc1" -a force=yes
asmtool: Device "/dev/emcpowerc1" is not a partition
asmtool: Continuing anyway

Audit Database Activity with Splunk or Syslog

February 20th, 2009

Here are the steps I took to audit database activity and get the data into Splunk. You could technically send this to any log repository capable of listening on a TCP socket. The challenge is that many RDBMS don’t allow you to get database information. Oracle is notorious for this (because they want you to purchase Database Vault). Of course if you don’t have money (like me) for that you can use this solution. You really need three things:

1. Install DBI and Oracle:DBD on the host that will be polling the database.
2. Grant a user who will run the script SELECT on dba_audit_trail table (or whatever table you want to poll and pull into splunk).
3. Setup the perl script with the proper inputs and encrypt the password file for no plain text on the file system.

1. Install DBD::Oracle on 64-bit Linux and Oracle 11g

DBD::Oracle 1.21 to install on a 64-bit Linux OS against the Oracle 11 full client. As root, we downloaded DBD::Oracle from CPAN.


# perl -MCPAN -eshell
cpan> install DBI
cpan> get DBD::Oracle
...
# cd /root/.cpan/build/DBD-Oracle-1.21
# export ORACLE_HOME=actual value of Oracle Home
# export ORACLE_SID=actual value of ORACLE_SID
# export ORACLE_USERID=working ORACLE_USERID
# export LD_LIBRARY_PATH=$ORACLE_HOME/lib:$ORACLE_HOME
# perl Makefile.PL
...
# make
...
# make test
...
# make install

Now you can install the dbipoll perl script.

Generate a random key, encode a file with it, and then decode a file with it. Useful to generate a password, encrypt it and then get rid of the plain text password inside scripts.

openssl rand 32 -out key.temp
openssl des3 -e -pass file:key.temp -in passwd -out passwd.enc
openssl des3 -d pass file:key.dec -in passwd.enc -out passwd.dec
openssl des3 -d -pass file:key.dec -in passwd.enc -out passwd.dec
openssl des3 -d -pass file:key.temp -in passwd.enc -out passwd.dec

Verify you are running encrypted. Add the following temporarily to your sqlnet.ora

TRACE_DIRECTORY_CLIENT=/ora-main/app/oracle/product/11.1.0.6/db/network/trace
#TRACE_FILE_CLIENT=client
TRACE_LEVEL_CLIENT=16

Inspect the trace for encryption

[root@mymachine trace]# cat cli_9668.trc |grep encry
(2863411744) [13-FEB-2009 15:37:50:911] naeshow: These are the encryption algorithms that the client will accept:
(2863411744) [13-FEB-2009 15:37:50:911] naeshow: Choice 0: no algorithm; encryption inactive
service: encryption
(2863411744) [13-FEB-2009 15:37:50:934] na_info: Oracle Advanced Security: encryption service for Linux: Version 11.1.0.6.0 - Production
service: encryption
(2863411744) [13-FEB-2009 15:37:50:935] nacomps: service encryption
(2863411744) [13-FEB-2009 15:37:50:940] nacomus: service encryption
service: encryption
service: encryption
(2863411744) [13-FEB-2009 15:37:50:952] naeecom: The server chose the 'RC4_256' encryption algorithm
(2863411744) [13-FEB-2009 15:37:50:952] na_tns: encryption is active, using RC4_256

Note the last two lines indicating we are encrypted.

Here is the dbipoll.pl script.


#!/usr/bin/perl
#
#
# Now accepts arguments for countkey, table, and columns.
# countkey will ideally be a timestamp field, such as EXTENDED_TIMESTAMP in AUD$.
#
# Need to deal with the following variations...
# 1 - Oracle local
# 2 - Oracle remote
# 3 - mysql local
# 4 - mysql remote
# 5 - Sybase local
# 6 - Sybase remote
#
# Oracle connect string: $dbh = DBI->connect("dbi:Oracle:host=$server;sid=$db;port=$dbport", $user, $passwd);
# Oracle connect string without env vars: $dbh = DBI->connect( "dbi:Oracle:$db", $username, $passwd )
# mysql connect string: $dbh = DBI->connect("dbi:mysql:database=$db\;host=$server;port=$dbport","$user","$passwd");
# Sybase connect string: $dbh = DBI->connect("dbi:sybase", $username, $password, $db);
use IO::Socket;
use strict;
use DBI;
use Getopt::Long;
use Sys::Syslog;
#
our ($server,$db,$table,$columns,$user,$enpasswd,$passwd,$dbtype,$tcphost,$tcpport,
$countfile,$countkey,$ofile,$dbport,$baseconnect,$dbh);
#
GetOptions(
"dbtype=s" =>\$dbtype, # Type of DB (Oracle, MySQL, etc.)
"server=s" =>\$server, # IP address of database server.
"dbport=s" =>\$dbport, # Port for connecting to remote DB.
"db=s" =>\$db, # Database to retriever records from.
"user=s" =>\$user, # DB username.
"password=s" =>\$passwd, # Unencrypted password.
"xpassword=s" =>\$enpasswd, # Encrypted password.
"table=s" =>\$table, # Table to retrieve from.
"columns=s" =>\$columns, # Columns to fetch.
"tcphost=s" =>\$tcphost, # Splunk server IP.
"tcpport=s" =>\$tcpport, # Splunk tcp listener port.
"countfile=s" =>\$countfile, # File to store row count status.
"countkey=s" =>\$countkey, # Field to use as count value
"o=s" =>\$ofile, # File for output (default is stdout).
);
#
# Deal with encrypted passwords first...
if (!$passwd) {
if ($enpasswd) {
$passwd = `openssl des3 -d -pass file:/etc/pki/tls/private/dbi.key -in $enpasswd |awk NF`;
chomp($passwd);
}
}
#
if ($dbtype =~ /oracle/i) {
$ENV{'DBI_DRIVER'} = "Oracle";
$dbtype = "Oracle";
$dbh->{InactiveDestroy} = 1;
#$baseconnect = DBI->connect("dbi:$dbtype:$db");
if (!$dbport) {
# Assume a local DB.
$dbh = DBI->connect("dbi:$dbtype:$db", "$user", "$passwd") ||
die( $DBI::errstr . "\n" );
#
} else {
if (!$passwd) {
$dbh = DBI->connect("dbi:Oracle:host=$server;sid=$db;port=$dbport", $user) ||
die( $DBI::errstr . "\n" );
} else {
$dbh = DBI->connect("dbi:Oracle:host=$server;sid=$db;port=$dbport", $user, $passwd) ||
die( $DBI::errstr . "\n" );
}
}
#
}
if ($dbtype =~ /mysql/i) {
$ENV{'DBI_DRIVER'} = "mysql";
$dbtype = "mysql";
#$baseconnect = DBI->connect("dbi:$dbtype:database=$db;host=$server");
if (!$dbport) {
# Assume a local DB.
$dbh = DBI->connect("dbi:$dbtype:host=$server;database=$db","$user","$passwd") ||
die( $DBI::errstr . "\n" );
#$dbh = $baseconnect . "$user" . "$passwd";
} else {
if (!$passwd) {
$dbh = DBI->connect("dbi:$dbtype:database=$db;host=$server;port=$dbport","$user") ||
die( $DBI::errstr . "\n" );
} else {
$dbh = DBI->connect("dbi:$dbtype:database=$db;host=$server;port=$dbport","$user","$passwd") ||
die( $DBI::errstr . "\n" );
}
}
#
}
if (($dbtype =~ /sybase/i) && (!$ENV{'SYBASE'})) {
$ENV{'SYBASE'} = '/usr/local/freetds';
$ENV{'DSQUERY'} = "$server";
$dbtype = "sybase";
#$baseconnect = DBI->connect("dbi:$dbtype", "$db");
if (!$dbport) {
# Assume a local DB.
$dbh = DBI->connect("dbi:$dbtype:database=$db","$user","$passwd") ||
die( $DBI::errstr . "\n" );
} else {
if (!$passwd) {
$dbh = DBI->connect("dbi:$dbtype:database=$db;port=$dbport","$user") ||
die( $DBI::errstr . "\n" );
} else {
$dbh = DBI->connect("dbi:$dbtype:database=$db;port=$dbport","$user","$passwd") ||
die( $DBI::errstr . "\n" );
}
}
}
my $sth;
if ( -f $countfile) {
# Compare vs. file with saved count
open (CF, "<$countfile");
my $filecount = readline CF;
chomp $filecount;
close (CF);
# Adding an ORDER BY clause here may be a requirement
$sth = $dbh->prepare("SELECT $columns FROM $table where $countkey > \'$filecount\' order by $countkey");
} else {
$sth = $dbh->prepare("SELECT $columns FROM $table order by $countkey");
}
#
our $handle;
if (($tcphost) && ($tcpport)) {
# Send directly to splunk server...
# Create tcp socket to send the data to the splunk server
my $remote = IO::Socket::INET->new( Proto => "tcp",
PeerAddr => $tcphost,
PeerPort => $tcpport,
Type => SOCK_STREAM)
or die "cannot connect to tcp daemon on $tcphost";
$remote->autoflush(1);
$handle = $remote;
#$target = "\$remote";
} elsif ($ofile) {
# Write to local outfile...
open (OFILE, ">$ofile");
$handle = *OFILE;
#$target = *OFILE;
} else {
# send to stdout
open (STDOUT, ">&1");
$handle = *STDOUT;
#$target = *STDOUT;
}
#
# Get some Rows...
$sth->execute or die $sth->errstr;
open (CF, ">$countfile");
while(my $hash_ref = $sth->fetchrow_hashref) {
my $output = "";
my $lastrow;
foreach my $key (keys(%$hash_ref)) {
my $str = $hash_ref->{$key};
if ($key ne "EXTENDED_TIMESTAMP") {
if ($str =~ /"/) {
(my $xstr = $str) =~ s/"/\\"/g;
$str = "\"$xstr\"";
} else {
$str = "\"$str\"" if ($str =~ / /);
$str = "\"$str\"" if ($str =~ /,/);
}
$output = $output . $key . "=" . $str . ",";
}
if ($key eq "EXTENDED_TIMESTAMP") {
# if ($key eq "TO_CHAR(EXTENDED_TIMESTAMP,'MM/DD/YYYYHH24:MI:SS')") {
$str =~ s/\"//g;
$lastrow = $str; #if (lc($str) eq lc($countkey));
}
#
}
chop($output);
# Update countfile
# Where did we leave off?...
*CF->autoflush(1);
seek(CF,0,0);
print CF "$lastrow\n";
#
# And send them somewhere.
print $handle $output . "\n";
}
#
$sth->finish();
$dbh->disconnect;
#
close (CF);
close $handle;

Now schedule this to run (I just wrote a wrapper script in ksh exporting the variables to be safe).


#!/usr/bin/ksh
#
export ORACLE_HOME=/ora-main/app/oracle/product/11.1.0.6/db
export ORACLE_SID=rmdev1
export LD_LIBRARY_PATH=$ORACLE_HOME/lib:$ORACLE_HOME
#
/usr/bin/dbipoll.pl -dbtype=oracle -server=servername -dbport=1521 -db=development -user=splunk -xpassword=/etc/dbipasswd -table=dba_audit_trail -columns="OS_PROCESS,OS_USERNAME,USERNAME,USERHOST,ACTION,ACTION_NAME,TRANSACTIONID,SCN,SQL_BIND,SQL_TEXT,TO_CHAR(EXTENDED_TIMESTAMP,'MM/DD/YYYY HH24:MI:SS'),EXTENDED_TIMESTAMP" -countfile="/app/oracle/product/monitor/countfile" -countkey="EXTENDED_TIMESTAMP" -tcphost=splunkserver -tcpport=9094

Of course you will also need to setup a tcp input port on your splunk server. Once you have all that in place you can run the following query to see what hosts are performing the largest amount of actions on the database and what users are running the majority of actions on the database. I have removed hostnames and usernames legends from the following for security.

index=indexname sourcetype=databasesource | timechart count(USERNAME) by USERNAME

DBAuditSplunkUsers

index=indexname sourcetype=databasesource |timechart count(USERHOST) by USERHOST

DBAuditSplunkHosts

You can also use this to setup alerts if any users runs an update on a database (useful to catch those sneaky developers). :)

RHCS, Xen, and GFS on RHEL 5.2

January 12th, 2009

I recently built a Red Hat Cluster with GFS and utilized the Xen kernel to create a more robust virtualization environment. The combination of the three services allows you to have a shared storage platform and monitoring of guest operating systems. Some of the benefits are that if a guest or a host dies the cluster service restarts the guest(s) on another available host that is specified in a failover domain. While it is not as robust as VMWare’s SRM and DRS tools it certainly could be with just a little bit of scripting. It is also more cost effective.

I’m not going to go into the details of how I built the cluster in this post as a quick Google search can provide you with many guide. I would just like to share the setup and the gotchas of the install I ran into and provide a benchmark made using iozone.

The Setup

* 2 x Dell 2950, 2 quad core 2.66GHz, 16GB RAM,
* 2 x QLogic QLE2462 – PCI-Express to 4Gb FC.
* MPIO instead of PowerPath
* EMC CX3-80; RAID 6; SATAII; 14 Disks 1TB volume (not ideal, but what I had to use!). Write caching is enabled on the storage group.

The gotchas

1. If upgrading to Xen kernel.

I realized after upgrading to the xen kernel on the servers and rebooting the ramdisk didn’t include the xenblk driver required so the system kept rebooting. I had to boot into the regular kernel and make a new ramdisk using the following command.


#KERNEL=2.6.18-92.el5xen
#mkinitrd --with=xennet --preload=xenblk /boot/initrd-2.6.18-92.el5xenb.img $KERNEL

Then I changed the grub.conf to use that ramdisk and that problem was solved.

2. Recognizing EMC CX3-80 using MPIO.

Be sure to view the multipath.conf.defaults file and copy the proper configuration to your multipath.conf

view /usr/share/doc/device-mapper-multipath-0.4.7/multipath.conf.defaults

Added the following to /etc/multipath.conf.


devices {
device {
vendor "DGC"
product ".*"
product_blacklist "LUN_Z"
getuid_callout "/sbin/scsi_id -g -u -s /block/%n"
prio_callout "/sbin/mpath_prio_emc /dev/%n"
features "1 queue_if_no_path"
hardware_handler "1 emc"
path_grouping_policy group_by_prio
failback immediate
rr_weight uniform
no_path_retry 60
rr_min_io 1000
path_checker emc_clariion
}
}

After that restart multipathing and you should see your disks using multipath -ll

3. Fencing

I was a little disappointed with how fencing works against Dell’s Remote Access Controller (DRAC). Instead of /sbin/fence_drac using ssh or https to reboot fenced node it uses telnet by default. This is less than ideal for security purposes (telnet sending credentials in clear text), but it will have to work for now. It wouldn’t take too long to re-write the perl script to use ssh and the DRAC is already listening on port 22, but it’d be nice to have it included in the fenced RPM.

Benchmark of GFS Version 1 on Xen host

I performed some basic tuning (gfs_tool gettune, noatime, noquota, glock_purge, etc) on the file system. I ran a benchmark on the file system using iozone. The results are below. Please note the x-axis is kB file, y-axis is kB/sec, and z-axis is kB record.

1. The writer report indicates that best performance occurred with 4MB files with a record size between 256KB and 1024KB.

Click for full size

x-axis is file size, y-axis is record size.

2. The re-writer report indicates the positive effect of cache. Performance is very high, even for large file sizes.

Click for full size

Click for full size

3. The reader report.

Click for full size image

Click for full size image

4. The re-reader report.

Click for full size image

Click for full size image

5. Random read report.

Click for full size image

Click for full size image

6. Random write report.

Click for full size image

Click for full size image

Benchmark of GFS Version 1 from within the guest

Now that I ran a benchmark against the file system from the Xen host I thought it would be interesting to run the same benchmark from within the guest OS to see how the Xen hypervisor effects I/O. Again, note the x-axis is kB file, y-axis is kB/sec, and z-axis is kB record.

1. Guest Writer Report

Click for full size image

Click for full size image

Hadoop

January 9th, 2009

Hadoop distributed file system opens up some very interesting possibilities for organizations who want to reduce storage costs and processing time. Instead of building a formal data warehouse or n-tier architecture which struggles from the typical bottlenecks the architecture of HDFS is a master/worker architecture. Along with allowing organizations to run a file system across commodity hardware it provides the ability to run map reduce jobs over the cluster. Of course the best part is that it was named after the creators child’s stuffed elephant and falls under the Apache license. :)

HDFS Architecture

HDFS Architecture

I was impressed with some of the example jars included and the ability to index string data in a several ASCII text files in just seconds. I used Michael Noll’s Running Hadoop on Ubuntu Linux as a quickstart guide. He even provides links to examples of a map reduce job written in Python.

Some practical examples of where it could be used are for running analytics on large sets of data (think credit transactions, bank statements or even internal log data). The below diagram depicts ASCII data being dumped into an HDFS tier from a database tier. From there MapReduce jobs can be executed against the HDFS tier with faster results than a typical data warehouse.

Using HDFS and MapReduce for Analytics

Using HDFS and MapReduce for Analytics

I hope to use a large set of unused desktops to build a 25 node cluster shortly. Yahoo’s latest HDFS cluster will have me beat by about 3,975 nodes even if I do though. Check out the largest cluster here.

Hadoop Home

Open-Source Economics

May 11th, 2008

A great talk by Yochai Benkler on Open-Source Economics can be found here.

Gapminder and Hans Rosling

March 6th, 2008

Hans Rosling is a Professor of International Health at Karolinska Institutet in Stockholm, Sweden. I found his short talk on “Debunking Third World Myths” eye opening and inspiring. It seems to be a good example of where a data warehouse could help us better understand the impact of certain variables on countries in a sociological sense. It amazes me that we don’t have more data like this being translated into an open usable format.

Also, check out Gapminder which is mentioned in the talk. It seems like a good tool for unifying and presenting data.