Efficiently parsing a File

Discussion:

Tiago Hori

2014-03-18 17:57:15 UTC

Hi Everyone,

I fairly new at this, so please bear with me. :)

I am building this web app for a project I am working at where I need to
store and process large amounts of data.

The data comes in comma delimited style. There are a bunch of headers that
I don't need and then the data. These files contain 96 times 96 entries and
I need to parse each one of those. Right now I have something working that
takes about 5 minutes to parse the whole file. Any tips on how to make this
run more efficiently would be greatly appreciated.

Thanks,

Tiago

here is the relevant code:

if ($_FILES['data'])
{

$line = $parts = $name = $c = "";
$filename = $_FILES['data']['name'];
if (file_exists($filename))
{
die ("A file with this name already exists in the database <br />");
}
else
{

move_uploaded_file($_FILES['data']['tmp_name'], $filename);
echo "Uploaded file '$filename' <br />";
$fh = fopen("$filename", 'r') or
die("File does not exist or you lack permission to open it");
}
$runid = substr($filename, 0, -4);
while (!feof($fh))
{

$line = fgets($fh);
$parts = explode(',', $line);
if($parts[0] == 'ID')
{
$id = sanitizeStrings($parts[0]);
$assay = sanitizeStrings($parts[1]);
$allele1 = sanitizeStrings($parts[2]);
$allele2 = sanitizeStrings($parts[3]);
$name = sanitizeStrings($parts[4]);
//echo "$name" . "\t" . "$id" . "\t" . "$assay" . "\t" .
"$allele1" . "\t" . "$allele2" . "<br />";
echo <<<_END
<table>
<tr>

<th>$name</th><th>$id</th><th>$assay</th><th>$allele1</th><th>$allele2</th>
</tr>
_END;
}
else if ($t = preg_match("/S\d\d-\D\d\d/", $parts[0]))
{
$id = sanitizeStrings($parts[0]);
$assay = sanitizeStrings($parts[1]);
$alleles = sanitizeStrings($parts[9]);
if ($t = preg_match("/[ATCG]:[ATCG]/", $alleles))
{
$genotype = explode(':', $alleles);
$allele1 = $genotype[0];
$allele2 = $genotype[1];
}
else
{
$allele1 = $allele2 = 'No Call';
}
$name = sanitizeStrings($parts[4]);
if ($name != 'Blank')
{

$query = "INSERT INTO genotyped (runid, fishid,
plateid, assayid, allele1, allele2) VALUES ('$runid', '$name', '$id',
'$assay', '$allele1', '$allele2')";
if(queryMysql($query)) $c += 1;
if($c < 10)
{
//echo "$name" . "\t" . "$id" . "\t" . "$assay" . "\t"
. "$allele1" . "\t" . "$allele2" . "<br />";
echo <<<_END
<tr>

<th>$name</th><th>$id</th><th>$assay</th><th>$allele1</th><th>$allele2</th>
<tr>
_END;
}
}

}

}
fclose($fh);
echo "</table>";

}
--
"Education is not to be used to promote obscurantism." - Theodonius
Dobzhansky.

"Gracias a la vida que me ha dado tanto
Me ha dado el sonido y el abecedario
Con él, las palabras que pienso y declaro
Madre, amigo, hermano
Y luz alumbrando la ruta del alma del que estoy amando

Gracias a la vida que me ha dado tanto
Me ha dado la marcha de mis pies cansados
Con ellos anduve ciudades y charcos
Playas y desiertos, montañas y llanos
Y la casa tuya, tu calle y tu patio"

Violeta Parra - Gracias a la Vida

Tiago S. F. Hori. PhD.
Ocean Science Center-Memorial University of Newfoundland

Jim Lucas

2014-03-18 18:38:39 UTC