Data analysis --- merge

Raw data, the output data from an array scanner, is usually a flat text file. The raw data file records detailed statistics of the fluorescent intensities: the measurement values as well as quality measures.

The first step in microarray data analysis is to extract data from the raw data files for each individual array and to merge the raw data into one or a few files that are suitable for further analysis.

Most researchers extract only one or two of the columns from the raw data, and merge.

MicroHelper provides a tool for merging raw data, provide that the raw data has the identical format. Perl Example 5 --- data merge also provides a simple Perl script for merging raw data.

Common mistakes to avoid

1. Make sure all data have the identical format.

2. Data will be in a different format if a text file has been opened and saved as an Excel file.

3. Check the Godlist --- gene info columns. The Godlist may be different because of the following reasons:

  • Mistakes have been found and corrected in most recent arrays, but not for earlier ones.
  • New clones/controls have been added for recent printing.
  • Technicians may have changed their minds on whether to include controls in the raw data file.
  • Different arrays are actually used for the experiments.

4. For data that have been saved into a database and then extracted from the database, uncaught mistakes in the data extraction process may corrupt the data.

5. Refrain from using copy and paste to move data. Copy and paste is error-prone. Either human and machine may make mistakes using copy and paste.