Perl Example 5 --- data merge


The following Perl script reads text files in a directory, selects one column of the data and merge them into one file.

For example the data could be raw microarray data for each individual array. The red/green ratio could be selected for each array, and merged into one data file.

Note the program assumes all data files have the same format. Info column must be appended later. The program reads only text files (e.g., tab delimited text files), not Excel files. Excel files must be converted to text files by saving as "tab delimited" or "comma delimited" text files.

#!perl
# This program reads text data files in a directory, selects one column, and merge the data into one file.

#delimiter: "tab delimited". Change to "," if "comma delimited". Be careful that your info colum might contain comma.
$delimiter = "\t";

#input name of the data directory from command line 
$dir = @ARGV[0]; 

#input number of the column to be read from command line. 0 for the first column
$columnNo = @ARGV[1];

#input name of the output file from command line
$outfile = @ARGV[2];

#remind user there is no input data directory or column number
if(!$dir or (!$columnNo && $columnNo ne '0') ) {
  print "No input data directory or column number.\nUsage: perl my_perl.pl dir columnNo outfile\n";
}

#read the directory 
opendir(DIR, $dir) or die "can not open directory $dir\n";

 while($name = readdir(DIR)) {
   #save data file names in an array, don't include . and .. 
   push(@files, $name) if( !($name eq '.' || $name eq '..') );
 }

closedir(DIR) or  die "can not close directory $dir\n";

#process data
if($dir && ($columnNo || $columnNo eq '0') ) {
  
  #read files
  for($i = 0; $i < @files; $i++) {
    $infile = $dir.'/'.@files[$i];
    
    #each individual file
    open(IN, $infile) or die "can not open $infile\n";

      #row number, 0 for first row. Reset for each file
      $rowNo = 0;

      #read the file
      while( $line = <IN> ) {
     
       # get rid of the new line character, otherwise data in the last column incorrect
       chomp($line);

       # split to put data in each row into an array
       @data = split(/$delimiter/, $line);
       
       # remember data in a "matrix".
       $datamatrix{$i, $rowNo} = @data[$columnNo];

       # add 1 to row number
       $rowNo++;
      }

    close(IN) or die "can not close $infile\n";

   }
}

# print results. If output file name provided print to output file, else to screen
if($outfile) {
  open (OUT, ">$outfile") or die "can not open $outfile\n";

  # first row file names
  for($i = 0; $i < @files; $i++) {
     print OUT @files[$i];
     print OUT "\t" if($i < @files -1);
  }
  print OUT "\n";

  # data 
  for($j = 0; $j < $rowNo; $j++) {
   for($i = 0; $i < @files; $i++) {
     print OUT $datamatrix{$i, $j};
     print OUT "\t" if($i < @files -1);
   }
     print OUT "\n";
  }

  close(OUT) or die "can not close $outfile\n";   
}
else {
  # first row file names
  for($i = 0; $i < @files; $i++) {
     print @files[$i];
     print "\t" if($i < @files -1);
  }
  print "\n";

  # data 
  for($j = 0; $j < $rowNo; $j++) {
   for($i = 0; $i < @files; $i++) {
     print $datamatrix{$i, $j};
     print "\t" if($i < @files -1);
   }
     print "\n";
  }
}