DATABASE TO SUPPORT PRINCIPAL COMPONENTS ANALYSIS
SITUATION: The data on PCDD and PCDF concentrations in different media was derived from technical papers published in environmental journals. It was in a wide variety of formats, with the isomers presented in different orders, and inconsistent representation of non-detects. The pattern recognition program required that the data be complete (all 25 isomers), natural log transformed, with samples as rows and isomers as columns and in Lotus format. Dr. Michael Ungs had started on the project by typing in a little of the data into the required format. His progress was significantly hindered by the complexity of the data manipulation, which included taking natural logs of the absolute values of the non-detects. When I found out what he was trying to do, he was concerned because he had not made significant progress, and the project timeline was very short. In addition, he was scheduled to be out of the office for a few days, after which the pattern recognition was to begin immediately.
ACTION: I suggested that I could develop a database which would give us better analytical flexibility and the ability to trace the origin of every data point. I decided that Paradox was the best program for the project, given the crosstabulation requirement. After analyzing the data, I found that it was more complex than immediately obvious. Many of the papers covered more than one type of sample (study), and for each type, there were several samples. I designed the database to include the paper, study, sample and isomer number.
RESULTS: The database more than met the needs of the project; it also allowed us to easily normalize the isomers and congener groups with respect to the total individual isomers and groups, respectively. Although this did not help our analysis, and only got a one-sentence mention in one of the papers, it was an important option, and one that would not have been available without the database. Several journal articles were published based on this project.