Discussion:
Efficiently parsing a File
Tiago Hori
2014-03-18 17:57:15 UTC
Permalink
Hi Everyone,

I fairly new at this, so please bear with me. :)

I am building this web app for a project I am working at where I need to
store and process large amounts of data.

The data comes in comma delimited style. There are a bunch of headers that
I don't need and then the data. These files contain 96 times 96 entries and
I need to parse each one of those. Right now I have something working that
takes about 5 minutes to parse the whole file. Any tips on how to make this
run more efficiently would be greatly appreciated.

Thanks,

Tiago

here is the relevant code:


if ($_FILES['data'])
{

$line = $parts = $name = $c = "";
$filename = $_FILES['data']['name'];
if (file_exists($filename))
{
die ("A file with this name already exists in the database <br />");
}
else
{

move_uploaded_file($_FILES['data']['tmp_name'], $filename);
echo "Uploaded file '$filename' <br />";
$fh = fopen("$filename", 'r') or
die("File does not exist or you lack permission to open it");
}
$runid = substr($filename, 0, -4);
while (!feof($fh))
{

$line = fgets($fh);
$parts = explode(',', $line);
if($parts[0] == 'ID')
{
$id = sanitizeStrings($parts[0]);
$assay = sanitizeStrings($parts[1]);
$allele1 = sanitizeStrings($parts[2]);
$allele2 = sanitizeStrings($parts[3]);
$name = sanitizeStrings($parts[4]);
//echo "$name" . "\t" . "$id" . "\t" . "$assay" . "\t" .
"$allele1" . "\t" . "$allele2" . "<br />";
echo <<<_END
<table>
<tr>

<th>$name</th><th>$id</th><th>$assay</th><th>$allele1</th><th>$allele2</th>
</tr>
_END;
}
else if ($t = preg_match("/S\d\d-\D\d\d/", $parts[0]))
{
$id = sanitizeStrings($parts[0]);
$assay = sanitizeStrings($parts[1]);
$alleles = sanitizeStrings($parts[9]);
if ($t = preg_match("/[ATCG]:[ATCG]/", $alleles))
{
$genotype = explode(':', $alleles);
$allele1 = $genotype[0];
$allele2 = $genotype[1];
}
else
{
$allele1 = $allele2 = 'No Call';
}
$name = sanitizeStrings($parts[4]);
if ($name != 'Blank')
{

$query = "INSERT INTO genotyped (runid, fishid,
plateid, assayid, allele1, allele2) VALUES ('$runid', '$name', '$id',
'$assay', '$allele1', '$allele2')";
if(queryMysql($query)) $c += 1;
if($c < 10)
{
//echo "$name" . "\t" . "$id" . "\t" . "$assay" . "\t"
. "$allele1" . "\t" . "$allele2" . "<br />";
echo <<<_END
<tr>

<th>$name</th><th>$id</th><th>$assay</th><th>$allele1</th><th>$allele2</th>
<tr>
_END;
}
}

}

}
fclose($fh);
echo "</table>";


}
--
"Education is not to be used to promote obscurantism." - Theodonius
Dobzhansky.

"Gracias a la vida que me ha dado tanto
Me ha dado el sonido y el abecedario
Con él, las palabras que pienso y declaro
Madre, amigo, hermano
Y luz alumbrando la ruta del alma del que estoy amando

Gracias a la vida que me ha dado tanto
Me ha dado la marcha de mis pies cansados
Con ellos anduve ciudades y charcos
Playas y desiertos, montañas y llanos
Y la casa tuya, tu calle y tu patio"

Violeta Parra - Gracias a la Vida

Tiago S. F. Hori. PhD.
Ocean Science Center-Memorial University of Newfoundland
Jim Lucas
2014-03-18 18:38:39 UTC
Permalink
Post by Tiago Hori
Hi Everyone,
I fairly new at this, so please bear with me. :)
I am building this web app for a project I am working at where I need to
store and process large amounts of data.
The data comes in comma delimited style. There are a bunch of headers that
I don't need and then the data. These files contain 96 times 96 entries and
I need to parse each one of those. Right now I have something working that
takes about 5 minutes to parse the whole file. Any tips on how to make this
run more efficiently would be greatly appreciated.
Thanks,
Tiago
I cleaned up the code a little. Moved a few things around. Removed a few
duplicate things and a part that didn't seem useful. Give this code a try:

if ( $_FILES['data'] ) {
$dataSet = array();
$parts = array();
$c = 0;
$filename = $_FILES['data']['name'];
if ( file_exists($filename) ) {
die ("A file with this name already exists in the database <br />");
} else {
move_uploaded_file($_FILES['data']['tmp_name'], $filename);
}
$runid = substr($filename, 0, -4);
echo "Uploaded file '$runid' <br />";
$fh = fopen($filename, 'r') or
die("File does not exist or you lack permission to open it");
echo '<table>';
while ( !feof($fh) ) {
$parts = fgetcsv($fh);
if ( $parts[0] == 'ID' ) {
$id = sanitizeStrings($parts[0]);
$assay = sanitizeStrings($parts[1]);
$allele1 = sanitizeStrings($parts[2]);
$allele2 = sanitizeStrings($parts[3]);
$name = sanitizeStrings($parts[4]);
} else if ($t = preg_match("/S\d\d-\D\d\d/", $parts[0]) ) {
$id = sanitizeStrings($parts[0]);
$assay = sanitizeStrings($parts[1]);
$alleles = sanitizeStrings($parts[9]);
$allele1 = $allele2 = 'No Call';
if ( preg_match("/[ATCG]:[ATCG]/", $alleles) ) {
list($allele1, $allele2) = explode(':', $alleles, 2);
}
$name = sanitizeStrings($parts[4]);
if ($name != 'Blank') {
$dataSet[] = "('{$runid}', '{$name}', '{$id}', '{$assay}',
'{$allele1}', '{$allele2}')";
}
}
echo <<<_END
<tr>
<th>{$name}</th>
<th>{$id}</th>
<th>{$assay}</th>
<th>{$allele1}</th>
<th>{$allele2}</th>
</tr>
_END;

}
if ( $dataSet ) {
queryMysql("INSERT INTO genotyped (runid, fishid, plateid, assayid,
allele1, allele2) VALUES ". join(', ', $dataSet));
}
fclose($fh);
echo '</table>';
}
--
Jim Lucas

http://www.cmsws.com/
http://www.cmsws.com/examples/
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Tiago Hori
2014-03-18 21:33:43 UTC
Permalink
Hi Jim,

One quick question:

When you use this call to the mysql database ("INSERT INTO genotyped (runid, fishid, plateid, assayid, allele1, allele2) VALUES ". join(', ', $dataSet));

There are more than 6 items in the $dataSet array as the while loop parsed the whole file, correct? I just want to make sure I understand correctly. Does that INSERT command knows to start another row in the table after it inserted the first 6 values?

Thanks!

Tiago
Post by Jim Lucas
Post by Tiago Hori
Hi Everyone,
I fairly new at this, so please bear with me. :)
I am building this web app for a project I am working at where I need to
store and process large amounts of data.
The data comes in comma delimited style. There are a bunch of headers that
I don't need and then the data. These files contain 96 times 96 entries and
I need to parse each one of those. Right now I have something working that
takes about 5 minutes to parse the whole file. Any tips on how to make this
run more efficiently would be greatly appreciated.
Thanks,
Tiago
if ( $_FILES['data'] ) {
$dataSet = array();
$parts = array();
$c = 0;
$filename = $_FILES['data']['name'];
if ( file_exists($filename) ) {
die ("A file with this name already exists in the database <br />");
} else {
move_uploaded_file($_FILES['data']['tmp_name'], $filename);
}
$runid = substr($filename, 0, -4);
echo "Uploaded file '$runid' <br />";
$fh = fopen($filename, 'r') or
die("File does not exist or you lack permission to open it");
echo '<table>';
while ( !feof($fh) ) {
$parts = fgetcsv($fh);
if ( $parts[0] == 'ID' ) {
$id = sanitizeStrings($parts[0]);
$assay = sanitizeStrings($parts[1]);
$allele1 = sanitizeStrings($parts[2]);
$allele2 = sanitizeStrings($parts[3]);
$name = sanitizeStrings($parts[4]);
} else if ($t = preg_match("/S\d\d-\D\d\d/", $parts[0]) ) {
$id = sanitizeStrings($parts[0]);
$assay = sanitizeStrings($parts[1]);
$alleles = sanitizeStrings($parts[9]);
$allele1 = $allele2 = 'No Call';
if ( preg_match("/[ATCG]:[ATCG]/", $alleles) ) {
list($allele1, $allele2) = explode(':', $alleles, 2);
}
$name = sanitizeStrings($parts[4]);
if ($name != 'Blank') {
$dataSet[] = "('{$runid}', '{$name}', '{$id}', '{$assay}', '{$allele1}', '{$allele2}')";
}
}
echo <<<_END
<tr>
<th>{$name}</th>
<th>{$id}</th>
<th>{$assay}</th>
<th>{$allele1}</th>
<th>{$allele2}</th>
</tr>
_END;
}
if ( $dataSet ) {
queryMysql("INSERT INTO genotyped (runid, fishid, plateid, assayid, allele1, allele2) VALUES ". join(', ', $dataSet));
}
fclose($fh);
echo '</table>';
}
--
Jim Lucas
http://www.cmsws.com/
http://www.cmsws.com/examples/
Jim Lucas
2014-03-18 21:40:17 UTC
Permalink
Post by Tiago Hori
Hi Jim,
When you use this call to the mysql database ("INSERT INTO genotyped (runid, fishid, plateid, assayid, allele1, allele2) VALUES ". join(', ', $dataSet));
There are more than 6 items in the $dataSet array as the while loop parsed the whole file, correct? I just want to make sure I understand correctly. Does that INSERT command knows to start another row in the table after it inserted the first 6 values?
Yes.

https://dev.mysql.com/doc/refman/5.5/en/insert.html

[quote]
INSERT statements that use VALUES syntax can insert multiple rows. To do this,
include multiple lists of column values, each enclosed within parentheses and
separated by commas. Example:

INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9);
[/quote]
Post by Tiago Hori
Thanks!
Tiago
Post by Jim Lucas
Post by Tiago Hori
Hi Everyone,
I fairly new at this, so please bear with me. :)
I am building this web app for a project I am working at where I need to
store and process large amounts of data.
The data comes in comma delimited style. There are a bunch of headers that
I don't need and then the data. These files contain 96 times 96 entries and
I need to parse each one of those. Right now I have something working that
takes about 5 minutes to parse the whole file. Any tips on how to make this
run more efficiently would be greatly appreciated.
Thanks,
Tiago
if ( $_FILES['data'] ) {
$dataSet = array();
$parts = array();
$c = 0;
$filename = $_FILES['data']['name'];
if ( file_exists($filename) ) {
die ("A file with this name already exists in the database <br />");
} else {
move_uploaded_file($_FILES['data']['tmp_name'], $filename);
}
$runid = substr($filename, 0, -4);
echo "Uploaded file '$runid' <br />";
$fh = fopen($filename, 'r') or
die("File does not exist or you lack permission to open it");
echo '<table>';
while ( !feof($fh) ) {
$parts = fgetcsv($fh);
if ( $parts[0] == 'ID' ) {
$id = sanitizeStrings($parts[0]);
$assay = sanitizeStrings($parts[1]);
$allele1 = sanitizeStrings($parts[2]);
$allele2 = sanitizeStrings($parts[3]);
$name = sanitizeStrings($parts[4]);
} else if ($t = preg_match("/S\d\d-\D\d\d/", $parts[0]) ) {
$id = sanitizeStrings($parts[0]);
$assay = sanitizeStrings($parts[1]);
$alleles = sanitizeStrings($parts[9]);
$allele1 = $allele2 = 'No Call';
if ( preg_match("/[ATCG]:[ATCG]/", $alleles) ) {
list($allele1, $allele2) = explode(':', $alleles, 2);
}
$name = sanitizeStrings($parts[4]);
if ($name != 'Blank') {
$dataSet[] = "('{$runid}', '{$name}', '{$id}', '{$assay}', '{$allele1}', '{$allele2}')";
}
}
echo <<<_END
<tr>
<th>{$name}</th>
<th>{$id}</th>
<th>{$assay}</th>
<th>{$allele1}</th>
<th>{$allele2}</th>
</tr>
_END;
}
if ( $dataSet ) {
queryMysql("INSERT INTO genotyped (runid, fishid, plateid, assayid, allele1, allele2) VALUES ". join(', ', $dataSet));
}
fclose($fh);
echo '</table>';
}
--
Jim Lucas
http://www.cmsws.com/
http://www.cmsws.com/examples/
--
Jim Lucas

http://www.cmsws.com/
http://www.cmsws.com/examples/
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Curtis Maurand
2014-03-19 00:45:36 UTC
Permalink
I'm assuming that you have a fixed number of items in each row in the
flat file?
You must have an idea of how long each line in the file is. Pick a long
enough buffer so that your lines don't get truncated, but they will
terminate on EOL.

$handle = @fopen("your.delimited.file", "r");
$counter = 0;
$array = array();
if ($handle)
{
while(($buffer = fgets($handle, 4096)) !== false)
{
$counter++;
if ($counter > 1) // skip the first line of header info
{
$array = explode(",", $buffer); //
// now do what you will with the array
}
}
}

You might also look at the mysql "LOAD DATA INFILE"

LOAD DATA INFILE 'filename' INTO TABLE tbl_name ENCLOSED BY ','
IGNORE 1 LINES

very fast, very efficient. It's way faster than doing inserts. the
mysql user needs to be able to read the file. your script executes the
query and mysql does the rest.

http://dev.mysql.com/doc/refman/5.1/en/load-data.html

--Curtis
Post by Tiago Hori
Hi Jim,
When you use this call to the mysql database ("INSERT INTO genotyped (runid, fishid, plateid, assayid, allele1, allele2) VALUES ". join(', ', $dataSet));
There are more than 6 items in the $dataSet array as the while loop parsed the whole file, correct? I just want to make sure I understand correctly. Does that INSERT command knows to start another row in the table after it inserted the first 6 values?
Thanks!
Tiago
Post by Jim Lucas
Post by Tiago Hori
Hi Everyone,
I fairly new at this, so please bear with me. :)
I am building this web app for a project I am working at where I need to
store and process large amounts of data.
The data comes in comma delimited style. There are a bunch of headers that
I don't need and then the data. These files contain 96 times 96 entries and
I need to parse each one of those. Right now I have something working that
takes about 5 minutes to parse the whole file. Any tips on how to make this
run more efficiently would be greatly appreciated.
Thanks,
Tiago
if ( $_FILES['data'] ) {
$dataSet = array();
$parts = array();
$c = 0;
$filename = $_FILES['data']['name'];
if ( file_exists($filename) ) {
die ("A file with this name already exists in the database <br />");
} else {
move_uploaded_file($_FILES['data']['tmp_name'], $filename);
}
$runid = substr($filename, 0, -4);
echo "Uploaded file '$runid' <br />";
$fh = fopen($filename, 'r') or
die("File does not exist or you lack permission to open it");
echo '<table>';
while ( !feof($fh) ) {
$parts = fgetcsv($fh);
if ( $parts[0] == 'ID' ) {
$id = sanitizeStrings($parts[0]);
$assay = sanitizeStrings($parts[1]);
$allele1 = sanitizeStrings($parts[2]);
$allele2 = sanitizeStrings($parts[3]);
$name = sanitizeStrings($parts[4]);
} else if ($t = preg_match("/S\d\d-\D\d\d/", $parts[0]) ) {
$id = sanitizeStrings($parts[0]);
$assay = sanitizeStrings($parts[1]);
$alleles = sanitizeStrings($parts[9]);
$allele1 = $allele2 = 'No Call';
if ( preg_match("/[ATCG]:[ATCG]/", $alleles) ) {
list($allele1, $allele2) = explode(':', $alleles, 2);
}
$name = sanitizeStrings($parts[4]);
if ($name != 'Blank') {
$dataSet[] = "('{$runid}', '{$name}', '{$id}', '{$assay}', '{$allele1}', '{$allele2}')";
}
}
echo <<<_END
<tr>
<th>{$name}</th>
<th>{$id}</th>
<th>{$assay}</th>
<th>{$allele1}</th>
<th>{$allele2}</th>
</tr>
_END;
}
if ( $dataSet ) {
queryMysql("INSERT INTO genotyped (runid, fishid, plateid, assayid, allele1, allele2) VALUES ". join(', ', $dataSet));
}
fclose($fh);
echo '</table>';
}
--
Jim Lucas
http://www.cmsws.com/
http://www.cmsws.com/examples/
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
James Moe
2014-03-18 19:01:05 UTC
Permalink
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Each File is about 1 MB. You are correct. Up to that point there is
a bunch of useless info [...].
Then 5 minutes is a ridiculously long time for processing the file.
SanitizeStrings just use htmlentities, striptags, stripslashes and
mysql_real_escape_string.
Is the data so suspect that it must be processed so thoroughly? I
expect it is in sanitizeStrings() where most of the time is spent.
Either sanitize the string in one pass, or only those parts that may
be an issue. Or not at all, having previously verified the integrity
of the data with another method (checksum, secure hash, secure data
connection, etc.).
The other time-consuming part is insertion into the database. You
could accumulate the INSERT statements in memory or a file, then
submit it all at once.

- --
James Moe
moe dot james at sohnen-moe dot com
520.743.3936
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlMol/EACgkQzTcr8Prq0ZNwSwCfcuZebbh258Di7IbVq0Lp2X5V
dY4AninL2eVJsEMV5EuXBeJ8kvocPMn7
=/g5w
-----END PGP SIGNATURE-----
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Christoph Becker
2014-03-18 20:55:12 UTC
Permalink
Post by Tiago Hori
I fairly new at this, so please bear with me. :)
I am building this web app for a project I am working at where I need to
store and process large amounts of data.
The data comes in comma delimited style. There are a bunch of headers that
I don't need and then the data. These files contain 96 times 96 entries
Do you mean 96 records (rows) with 96 columns each, or do you mean 9216
records? What's the size of the file?
Post by Tiago Hori
and
I need to parse each one of those. Right now I have something working that
takes about 5 minutes to parse the whole file.
I doubt that *parsing* the file takes about 5 minutes.
Post by Tiago Hori
Any tips on how to make this
run more efficiently would be greatly appreciated.
Measure first where you're script spends time, before you optimize in
the "wrong" place. A profiler such as Xdebug's or xhprof should come in
handy.

If you don't have a profiler available, I suggest you replace
queryMysql($query) with true, and check how long it takes to process the
file. If that is much faster than before (and I presume so), you'll
have to optimize the insertion of the values into the database.
--
Christoph M. Becker
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Tiago Hori
2014-03-18 21:29:56 UTC
Permalink
Thanks All.

I am looking at all your suggestions. The file has 9216 entries with 10 columns, but I only need 2 of them really.

The crucial one, is the one that I have to split with explode using the “: “ separator.

The files are about 1MB, that’s why I reached out, I figured it shouldn’t take that long.

Thanks again!

T.
Post by Christoph Becker
Post by Tiago Hori
I fairly new at this, so please bear with me. :)
I am building this web app for a project I am working at where I need to
store and process large amounts of data.
The data comes in comma delimited style. There are a bunch of headers that
I don't need and then the data. These files contain 96 times 96 entries
Do you mean 96 records (rows) with 96 columns each, or do you mean 9216
records? What's the size of the file?
Post by Tiago Hori
and
I need to parse each one of those. Right now I have something working that
takes about 5 minutes to parse the whole file.
I doubt that *parsing* the file takes about 5 minutes.
Post by Tiago Hori
Any tips on how to make this
run more efficiently would be greatly appreciated.
Measure first where you're script spends time, before you optimize in
the "wrong" place. A profiler such as Xdebug's or xhprof should come in
handy.
If you don't have a profiler available, I suggest you replace
queryMysql($query) with true, and check how long it takes to process the
file. If that is much faster than before (and I presume so), you'll
have to optimize the insertion of the values into the database.
--
Christoph M. Becker
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Jim Lucas
2014-03-18 22:23:05 UTC
Permalink
Post by Tiago Hori
Thanks All.
I am looking at all your suggestions. The file has 9216 entries with 10 columns, but I only need 2 of them really.
When you say need, does that mean that you are only inserting those two rows
into the database, or on the other side of the if..else ?
Post by Tiago Hori
The crucial one, is the one that I have to split with explode using the “: “ separator.
crucial in what way? Is this the side that has the two rows or the remaining
9214 rows?
Post by Tiago Hori
The files are about 1MB, that’s why I reached out, I figured it shouldn’t take that long.
Correct, it shouldn't. Sounds like something (more then likely the DB
inserts) that are taking so long. Depending on how your table is built and
how much data you already have in there, re-indexing the table could be what
is slowing you down.
Post by Tiago Hori
Thanks again!
T.
Post by Christoph Becker
Post by Tiago Hori
I fairly new at this, so please bear with me. :)
I am building this web app for a project I am working at where I need to
store and process large amounts of data.
The data comes in comma delimited style. There are a bunch of headers that
I don't need and then the data. These files contain 96 times 96 entries
Do you mean 96 records (rows) with 96 columns each, or do you mean 9216
records? What's the size of the file?
Post by Tiago Hori
and
I need to parse each one of those. Right now I have something working that
takes about 5 minutes to parse the whole file.
I doubt that *parsing* the file takes about 5 minutes.
Post by Tiago Hori
Any tips on how to make this
run more efficiently would be greatly appreciated.
Measure first where you're script spends time, before you optimize in
the "wrong" place. A profiler such as Xdebug's or xhprof should come in
handy.
If you don't have a profiler available, I suggest you replace
queryMysql($query) with true, and check how long it takes to process the
file. If that is much faster than before (and I presume so), you'll
have to optimize the insertion of the values into the database.
--
Christoph M. Becker
--
Jim Lucas

http://www.cmsws.com/
http://www.cmsws.com/examples/
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Tiago Hori
2014-03-18 22:31:54 UTC
Permalink
Thanks All.
Post by Tiago Hori
I am looking at all your suggestions. The file has 9216 entries with 10 columns, but I only need 2 of them really.
When you say need, does that mean that you are only inserting those two rows into the database, or on the other side of the if..else ?
No, sorry; that was way unclear. I am inserting 6 values. One is a constant, based on the filename (ruinid), and 5 more that come from columns in the file (well 3 and one that is one column in the file, but I insert separately); but I need all rows. These files will belong to projects, and each project will have more than one of these files. So I need the insert time to be a fast as possible.
Post by Tiago Hori
The crucial one, is the one that I have to split with explode using the “: “ separator.
crucial in what way? Is this the side that has the two rows or the remaining 9214 rows?
See above. Sorry for the confusion. This is crucial, because it represents the actual data.
Post by Tiago Hori
The files are about 1MB, that’s why I reached out, I figured it shouldn’t take that long.
Correct, it shouldn't. Sounds like something (more then likely the DB inserts) that are taking so long. Depending on how your table is built and how much data you already have in there, re-indexing the table could be what is slowing you down.
Post by Tiago Hori
Thanks again!
T.
Post by Christoph Becker
Post by Tiago Hori
I fairly new at this, so please bear with me. :)
I am building this web app for a project I am working at where I need to
store and process large amounts of data.
The data comes in comma delimited style. There are a bunch of headers that
I don't need and then the data. These files contain 96 times 96 entries
Do you mean 96 records (rows) with 96 columns each, or do you mean 9216
records? What's the size of the file?
Post by Tiago Hori
and
I need to parse each one of those. Right now I have something working that
takes about 5 minutes to parse the whole file.
I doubt that *parsing* the file takes about 5 minutes.
Post by Tiago Hori
Any tips on how to make this
run more efficiently would be greatly appreciated.
Measure first where you're script spends time, before you optimize in
the "wrong" place. A profiler such as Xdebug's or xhprof should come in
handy.
If you don't have a profiler available, I suggest you replace
queryMysql($query) with true, and check how long it takes to process the
file. If that is much faster than before (and I presume so), you'll
have to optimize the insertion of the values into the database.
--
Christoph M. Becker
--
Jim Lucas
http://www.cmsws.com/
http://www.cmsws.com/examples/
Jim Lucas
2014-03-18 22:55:43 UTC
Permalink
Post by Tiago Hori
Thanks All.
Post by Tiago Hori
I am looking at all your suggestions. The file has 9216 entries with 10 columns, but I only need 2 of them really.
When you say need, does that mean that you are only inserting those two rows into the database, or on the other side of the if..else ?
No, sorry; that was way unclear. I am inserting 6 values. One is a constant, based on the filename (ruinid), and 5 more that come from columns in the file (well 3 and one that is one column in the file, but I insert separately); but I need all rows. These files will belong to projects, and each project will have more than one of these files. So I need the insert time to be a fast as possible.
Well, now it comes down to how many insert statements you can insert at once.
The size of your sql query could get too large and kill things.

You might want to limit it to 100 or 500 sets of values for each call.

Also, the slow part could be your browser rendering the html table. Try
leaving out the echo for the rows and see if the page load is any faster.
Post by Tiago Hori
Post by Tiago Hori
The crucial one, is the one that I have to split with explode using the “: “ separator.
crucial in what way? Is this the side that has the two rows or the remaining 9214 rows?
See above. Sorry for the confusion. This is crucial, because it represents the actual data.
Post by Tiago Hori
The files are about 1MB, that’s why I reached out, I figured it shouldn’t take that long.
Correct, it shouldn't. Sounds like something (more then likely the DB inserts) that are taking so long. Depending on how your table is built and how much data you already have in there, re-indexing the table could be what is slowing you down.
Post by Tiago Hori
Thanks again!
T.
Post by Christoph Becker
Post by Tiago Hori
I fairly new at this, so please bear with me. :)
I am building this web app for a project I am working at where I need to
store and process large amounts of data.
The data comes in comma delimited style. There are a bunch of headers that
I don't need and then the data. These files contain 96 times 96 entries
Do you mean 96 records (rows) with 96 columns each, or do you mean 9216
records? What's the size of the file?
Post by Tiago Hori
and
I need to parse each one of those. Right now I have something working that
takes about 5 minutes to parse the whole file.
I doubt that *parsing* the file takes about 5 minutes.
Post by Tiago Hori
Any tips on how to make this
run more efficiently would be greatly appreciated.
Measure first where you're script spends time, before you optimize in
the "wrong" place. A profiler such as Xdebug's or xhprof should come in
handy.
If you don't have a profiler available, I suggest you replace
queryMysql($query) with true, and check how long it takes to process the
file. If that is much faster than before (and I presume so), you'll
have to optimize the insertion of the values into the database.
--
Christoph M. Becker
--
Jim Lucas
http://www.cmsws.com/
http://www.cmsws.com/examples/
--
Jim Lucas

http://www.cmsws.com/
http://www.cmsws.com/examples/
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Christoph Becker
2014-03-18 22:58:43 UTC
Permalink
Post by Tiago Hori
I am looking at all your suggestions. The file has 9216 entries with
10 columns, but I only need 2 of them really.
The crucial one, is the one that I have to split with explode using the “: “ separator.
The files are about 1MB, that’s why I reached out, I figured it
shouldn’t take that long.
I still presume the performance problem is due to the many separate
insert statements (reading such files with fgets() isn't the problem).
There is an article regarding the speed of insertions in MySQL
databases, with several optimization suggestions:

<https://dev.mysql.com/doc/refman/5.7/en/insert-speed.html>

HTH
--
Christoph M. Becker
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Tiago Hori
2014-03-18 23:19:06 UTC
Permalink
Jim and Christoph,

Thanks!
Post by Christoph Becker
Post by Tiago Hori
I am looking at all your suggestions. The file has 9216 entries with
10 columns, but I only need 2 of them really.
The crucial one, is the one that I have to split with explode using
the “: “ separator.
The files are about 1MB, that’s why I reached out, I figured it
shouldn’t take that long.
I still presume the performance problem is due to the many separate
insert statements (reading such files with fgets() isn't the problem).
There is an article regarding the speed of insertions in MySQL
<https://dev.mysql.com/doc/refman/5.7/en/insert-speed.html>
HTH
So, I could use Jim suggestion, but maybe not add the whole 9000 entries of time, correct? Would it be a good solution to create separate arrays with every 500 rows or just create one big array like Jim suggested and then break the insertion into iterations of 100 rows?

Thanks!

T,
Post by Christoph Becker
--
Christoph M. Becker
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Christoph Becker
2014-03-19 00:52:08 UTC
Permalink
Post by Tiago Hori
So, I could use Jim suggestion, but maybe not add the whole 9000
entries of time, correct? Would it be a good solution to create
separate arrays with every 500 rows or just create one big array like
Jim suggested and then break the insertion into iterations of 100
rows?
I suggest to split the whole process in two steps. The first step reads
the file and builds an array of records. The second step commits these
records to the database.

If you're doing it this way, you can easily change the actual insertion
according to what is fastest (what might change with new versions of
MySQL and PHP); even using LOAD DATA INFILE instead of INSERT could be
easily tested, by storing the array of records in a temporary file
between the two steps.
--
Christoph M. Becker
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Curtis Maurand
2014-03-19 00:52:52 UTC
Permalink
Post by Tiago Hori
Jim and Christoph,
Thanks!
Post by Christoph Becker
Post by Tiago Hori
I am looking at all your suggestions. The file has 9216 entries with
10 columns, but I only need 2 of them really.
The crucial one, is the one that I have to split with explode using
the “: “ separator.
The files are about 1MB, that’s why I reached out, I figured it
shouldn’t take that long.
I still presume the performance problem is due to the many separate
insert statements (reading such files with fgets() isn't the problem).
There is an article regarding the speed of insertions in MySQL
<https://dev.mysql.com/doc/refman/5.7/en/insert-speed.html>
HTH
So, I could use Jim suggestion, but maybe not add the whole 9000 entries of time, correct? Would it be a good solution to create separate arrays with every 500 rows or just create one big array like Jim suggested and then break the insertion into iterations of 100 rows?
Thanks!
T,
write out a temporary flat file the way you need it to look in the
database delimited however you need it delimited. Then use "LOAD DATA
INFILE" to populate the table. LOAD DATA INFILE is very fast and it's
very efficient. It'll load a 1MB file in a fraction of a second.

delete the flat file when you're done.

--Curtis
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Tiago Hori
2014-03-19 01:07:38 UTC
Permalink
Thanks everyone!

I will try the temp file with LOAD DATA INFILE!

T.

Sent from my iPhone
Post by Tiago Hori
Jim and Christoph,
Thanks!
Post by Christoph Becker
Post by Tiago Hori
I am looking at all your suggestions. The file has 9216 entries with
10 columns, but I only need 2 of them really.
The crucial one, is the one that I have to split with explode using
the “: “ separator.
The files are about 1MB, that’s why I reached out, I figured it
shouldn’t take that long.
I still presume the performance problem is due to the many separate
insert statements (reading such files with fgets() isn't the problem).
There is an article regarding the speed of insertions in MySQL
<https://dev.mysql.com/doc/refman/5.7/en/insert-speed.html>
HTH
So, I could use Jim suggestion, but maybe not add the whole 9000 entries of time, correct? Would it be a good solution to create separate arrays with every 500 rows or just create one big array like Jim suggested and then break the insertion into iterations of 100 rows?
Thanks!
T,
write out a temporary flat file the way you need it to look in the database delimited however you need it delimited. Then use "LOAD DATA INFILE" to populate the table. LOAD DATA INFILE is very fast and it's very efficient. It'll load a 1MB file in a fraction of a second.
delete the flat file when you're done.
--Curtis
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Tiago Hori
2014-03-19 13:39:45 UTC
Permalink
Hi All,

Just wanted to say that using LOAD DATA INFILE worked perfectly, upload is
not instantaneous! Thank you. I had to fight appArmor for most of the
morning, but now I finally found a way to let mysql read into /var/www.

That made me wonder, and bear mind that I very new at this: on a production
server, is default for mysql to have access to the website basedir or do I
have to use something like LOAD LOCAL DATA INFILE?

Thanks!

T.
Post by Tiago Hori
Thanks everyone!
I will try the temp file with LOAD DATA INFILE!
T.
Sent from my iPhone
Post by Curtis Maurand
Post by Tiago Hori
Jim and Christoph,
Thanks!
Post by Christoph Becker
Post by Tiago Hori
I am looking at all your suggestions. The file has 9216 entries with
10 columns, but I only need 2 of them really.
The crucial one, is the one that I have to split with explode using
the ": " separator.
The files are about 1MB, that's why I reached out, I figured it
shouldn't take that long.
I still presume the performance problem is due to the many separate
insert statements (reading such files with fgets() isn't the problem).
There is an article regarding the speed of insertions in MySQL
<https://dev.mysql.com/doc/refman/5.7/en/insert-speed.html>
HTH
So, I could use Jim suggestion, but maybe not add the whole 9000
entries of time, correct? Would it be a good solution to create separate
arrays with every 500 rows or just create one big array like Jim suggested
and then break the insertion into iterations of 100 rows?
Post by Curtis Maurand
Post by Tiago Hori
Thanks!
T,
write out a temporary flat file the way you need it to look in the
database delimited however you need it delimited. Then use "LOAD DATA
INFILE" to populate the table. LOAD DATA INFILE is very fast and it's very
efficient. It'll load a 1MB file in a fraction of a second.
Post by Curtis Maurand
delete the flat file when you're done.
--Curtis
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
--
"Education is not to be used to promote obscurantism." - Theodonius
Dobzhansky.

"Gracias a la vida que me ha dado tanto
Me ha dado el sonido y el abecedario
Con él, las palabras que pienso y declaro
Madre, amigo, hermano
Y luz alumbrando la ruta del alma del que estoy amando

Gracias a la vida que me ha dado tanto
Me ha dado la marcha de mis pies cansados
Con ellos anduve ciudades y charcos
Playas y desiertos, montañas y llanos
Y la casa tuya, tu calle y tu patio"

Violeta Parra - Gracias a la Vida

Tiago S. F. Hori. PhD.
Ocean Science Center-Memorial University of Newfoundland
Continue reading on narkive:
Loading...