Daevid Vincent
2013-09-03 20:47:05 UTC
EUREKA!
will be pointing at the next item in the array, as will the reference
you're
{
// Initialise the temporary variable.
$new_tmp_results = array();
// Loop around just the keys in the array.
foreach (array_keys($this->tmp_results) as $k)
{
// Store the item in the temporary array with the ID as the key.
// Note no pointless variable for the ID, and no use of &!
$new_tmp_results[$this->tmp_results[$k]['id']] =
$this->tmp_results[$k];
code for sake of not overly complicating what I was trying to illustrate.
There is other processing that had to be done too in the loop and that is
also what I illustrated.
Here is your version effectively:
private function _normalize_result_set() //Stuart
{
if (!$this->tmp_results || count($this->tmp_results) < 1)
return;
$new_tmp_results = array();
// Loop around just the keys in the array.
$D_start_mem_usage = memory_get_usage();
foreach (array_keys($this->tmp_results) as $k)
{
/*
if ($this->tmp_results[$k]['genres'])
{
// rip through each scene's `genres` and
store them as an array since we'll need'em later too
$g = explode('|',
$this->tmp_results[$k]['genres']);
array_pop($g); // there is an extra ''
element due to the final | character. :-\
$this->tmp_results[$k]['g'] = $g;
}
*/
// Store the item in the temporary array with the ID
as the key.
// Note no pointless variable for the ID, and no use of
&!
$new_tmp_results[$this->tmp_results[$k]['id']] =
$this->tmp_results[$k];
}
// Assign the temporary variable to the original variable.
$this->tmp_results = $new_tmp_results;
echo "\nMEMORY USED FOR STUART's version:
".number_format(memory_get_usage() - $D_start_mem_usage)." PEAK:
(".number_format(memory_get_peak_usage(true)).")<br>\n";
var_dump($this->tmp_results);
exit();
}
MEMORY USED FOR STUART's version: -128 PEAK: (90,439,680)
With the processing in the genres block
MEMORY USED FOR STUART's version: 97,264,368 PEAK: (187,695,104)
So a slight improvement from the original of -28,573,696
MEMORY USED FOR _normalize_result_set(): 97,264,912 PEAK: (216,268,800)
No matter what I tried however it seems that frustratingly just the simple
act of adding a new hash to the array is causing a significant memory jump.
That really blows! Therefore my solution was to not store the $g as ['g'] --
which would seem to be the more efficient way of doing this once and re-use
the array over and over, but instead I am forced to inline rip through and
explode() in three different places of my code.
We get over 30,000 hits per second, and even with lots of caching, 216MB vs
70-96MB is significant and the speed hit is only about 1.5 seconds more per
page.
Here are three distinctly different example pages that exercise different
parts of the code path:
PAGE RENDERED IN 7.0466279983521 SECONDS
MEMORY USED @START: 262,144 - @END: 26,738,688 = 26,476,544 BYTES
MEMORY PEAK USAGE: 69,730,304 BYTES
PAGE RENDERED IN 6.9327299594879 SECONDS
MEMORY USED @START: 262,144 - @END: 53,739,520 = 53,477,376 BYTES
MEMORY PEAK USAGE: 79,167,488 BYTES
PAGE RENDERED IN 7.558168888092 SECONDS
MEMORY USED @START: 262,144 - @END: 50,855,936 = 50,593,792 BYTES
MEMORY PEAK USAGE: 96,206,848 BYTES
Furthermore I investigated what Jim Giner suggested and it turns out there
was a way for me to wedge into our Connection class a way to mangle the
results at that point, which is actually a more elegant solution overall as
we can re-use that in many more places going forward.
/**
* Execute a database SQL query and return all the results in an
associative array
*
* @access public
* @return array or false
* @param string $sql the SQL code to execute
* @param boolean $print (false) Print a color coded version
of the query.
* @param boolean $get_first (false) return the first element
only. useful for when 1 row is returned such as "LIMIT 1"
* @param string $key (null) if a column name, such as 'id' is
used here, then that column will be used as the array key
* @author Daevid Vincent [***@sctr.net]
* @date 2013-09-03
* @see get_instance(), execute(), fetch_query_pair()
*/
public function fetch_query($sql = "", $print = false,
$get_first=false, $key=null)
{
//$D_start_mem_usage = memory_get_usage();
if (!$this->execute($sql, $print)) return false;
$tmp = array();
if (is_null($key))
while($arr = $this->fetch_array(MYSQL_ASSOC)) $tmp[]
= $arr;
else
while($arr = $this->fetch_array(MYSQL_ASSOC))
$tmp[$arr[$key]] = $arr;
$this->free_result(); // freeing result from memory
//echo "\nMEMORY USED FOR fetch_query():
".number_format(memory_get_usage() - $D_start_mem_usage)." PEAK:
(".number_format(memory_get_peak_usage(true)).")<br>\n";
return (($get_first) ? array_shift($tmp) : $tmp);
}
-----Original Message-----
Sent: Tuesday, September 03, 2013 6:31 AM
To: Daevid Vincent
Subject: Re: [PHP] refernces, arrays, and why does it take up so much
memory?
toSent: Tuesday, September 03, 2013 6:31 AM
To: Daevid Vincent
Subject: Re: [PHP] refernces, arrays, and why does it take up so much
memory?
I'm confused on how a reference works I think.
I have a DB result set in an array I'm looping over. All I simply want
I have a DB result set in an array I'm looping over. All I simply want
do
ais make the array key the "id" of the result set row.
private function _normalize_result_set()
{
foreach($this->tmp_results as $k => $v)
{
$id = $v['id'];
$new_tmp_results[$id] =& $v; //2013-08-29 [dv] using
private function _normalize_result_set()
{
foreach($this->tmp_results as $k => $v)
{
$id = $v['id'];
$new_tmp_results[$id] =& $v; //2013-08-29 [dv] using
reference here cuts the memory usage in half!
You are assigning a reference to $v. In the next iteration of the loop, $vwill be pointing at the next item in the array, as will the reference
storing here. With this code I'd expect $new_tmp_results to be an array
where the keys (i.e. the IDs) are correct, but the data in each item
matcheswhere the keys (i.e. the IDs) are correct, but the data in each item
the data in the last item from the original array, which appears to be
whatyou describe.
inner workings of PHP process something like this, but I wouldn't be
surprised if it's allocating a new chunk of memory for a version of the
array without this element. You may find it better to not unset anything
until the loop has finished, at which point you can just unset($this-
elementsunset($this->tmp_results[$k]);
Doing this for every loop is likely very inefficient. I don't know how theinner workings of PHP process something like this, but I wouldn't be
surprised if it's allocating a new chunk of memory for a version of the
array without this element. You may find it better to not unset anything
until the loop has finished, at which point you can just unset($this-
tmp_results).
/*
if ($i++ % 1000 == 0)
{
gc_enable(); // Enable Garbage Collector
var_dump(gc_enabled()); // true
var_dump(gc_collect_cycles()); // # of
/*
if ($i++ % 1000 == 0)
{
gc_enable(); // Enable Garbage Collector
var_dump(gc_enabled()); // true
var_dump(gc_collect_cycles()); // # of
cleaned up
gc_disable(); // Disable Garbage Collector
}
*/
}
$this->tmp_results = $new_tmp_results;
//var_dump($this->tmp_results); exit;
unset($new_tmp_results);
}
private function _normalize_result_set()gc_disable(); // Disable Garbage Collector
}
*/
}
$this->tmp_results = $new_tmp_results;
//var_dump($this->tmp_results); exit;
unset($new_tmp_results);
}
{
// Initialise the temporary variable.
$new_tmp_results = array();
// Loop around just the keys in the array.
foreach (array_keys($this->tmp_results) as $k)
{
// Store the item in the temporary array with the ID as the key.
// Note no pointless variable for the ID, and no use of &!
$new_tmp_results[$this->tmp_results[$k]['id']] =
}
// Assign the temporary variable to the original variable.
$this->tmp_results = $new_tmp_results;
}
I'd appreciate it if you could plug this in and see what your memory usage
reports say. In most cases, trying to control the garbage collection
through// Assign the temporary variable to the original variable.
$this->tmp_results = $new_tmp_results;
}
I'd appreciate it if you could plug this in and see what your memory usage
reports say. In most cases, trying to control the garbage collection
the use of references is the worst way to go about optimising your code.
Inmy code above I'm relying on PHPs copy-on-write feature where data is only
duplicated when assigned if it changes. No unsets, just using scope to
markduplicated when assigned if it changes. No unsets, just using scope to
a variable as able to be cleaned up.
Where is this result set coming from? You'd save yourself a lot of
memory/time by putting the data in to this format when you read it from
theWhere is this result set coming from? You'd save yourself a lot of
memory/time by putting the data in to this format when you read it from
source. For example, if reading it from MySQL, $this-
one go? Can you not break it up in to smaller pieces that won't put as
muchtmp_results[$row['id']] = $row when looping around the result set.
Also, is there any reason why you need to process this full set of data inone go? Can you not break it up in to smaller pieces that won't put as
strain on resources?
-Stuart
There were reasons I had the $id -- I only showed the relevant parts of the-Stuart
code for sake of not overly complicating what I was trying to illustrate.
There is other processing that had to be done too in the loop and that is
also what I illustrated.
Here is your version effectively:
private function _normalize_result_set() //Stuart
{
if (!$this->tmp_results || count($this->tmp_results) < 1)
return;
$new_tmp_results = array();
// Loop around just the keys in the array.
$D_start_mem_usage = memory_get_usage();
foreach (array_keys($this->tmp_results) as $k)
{
/*
if ($this->tmp_results[$k]['genres'])
{
// rip through each scene's `genres` and
store them as an array since we'll need'em later too
$g = explode('|',
$this->tmp_results[$k]['genres']);
array_pop($g); // there is an extra ''
element due to the final | character. :-\
$this->tmp_results[$k]['g'] = $g;
}
*/
// Store the item in the temporary array with the ID
as the key.
// Note no pointless variable for the ID, and no use of
&!
$new_tmp_results[$this->tmp_results[$k]['id']] =
$this->tmp_results[$k];
}
// Assign the temporary variable to the original variable.
$this->tmp_results = $new_tmp_results;
echo "\nMEMORY USED FOR STUART's version:
".number_format(memory_get_usage() - $D_start_mem_usage)." PEAK:
(".number_format(memory_get_peak_usage(true)).")<br>\n";
var_dump($this->tmp_results);
exit();
}
MEMORY USED FOR STUART's version: -128 PEAK: (90,439,680)
With the processing in the genres block
MEMORY USED FOR STUART's version: 97,264,368 PEAK: (187,695,104)
So a slight improvement from the original of -28,573,696
MEMORY USED FOR _normalize_result_set(): 97,264,912 PEAK: (216,268,800)
No matter what I tried however it seems that frustratingly just the simple
act of adding a new hash to the array is causing a significant memory jump.
That really blows! Therefore my solution was to not store the $g as ['g'] --
which would seem to be the more efficient way of doing this once and re-use
the array over and over, but instead I am forced to inline rip through and
explode() in three different places of my code.
We get over 30,000 hits per second, and even with lots of caching, 216MB vs
70-96MB is significant and the speed hit is only about 1.5 seconds more per
page.
Here are three distinctly different example pages that exercise different
parts of the code path:
PAGE RENDERED IN 7.0466279983521 SECONDS
MEMORY USED @START: 262,144 - @END: 26,738,688 = 26,476,544 BYTES
MEMORY PEAK USAGE: 69,730,304 BYTES
PAGE RENDERED IN 6.9327299594879 SECONDS
MEMORY USED @START: 262,144 - @END: 53,739,520 = 53,477,376 BYTES
MEMORY PEAK USAGE: 79,167,488 BYTES
PAGE RENDERED IN 7.558168888092 SECONDS
MEMORY USED @START: 262,144 - @END: 50,855,936 = 50,593,792 BYTES
MEMORY PEAK USAGE: 96,206,848 BYTES
Furthermore I investigated what Jim Giner suggested and it turns out there
was a way for me to wedge into our Connection class a way to mangle the
results at that point, which is actually a more elegant solution overall as
we can re-use that in many more places going forward.
/**
* Execute a database SQL query and return all the results in an
associative array
*
* @access public
* @return array or false
* @param string $sql the SQL code to execute
* @param boolean $print (false) Print a color coded version
of the query.
* @param boolean $get_first (false) return the first element
only. useful for when 1 row is returned such as "LIMIT 1"
* @param string $key (null) if a column name, such as 'id' is
used here, then that column will be used as the array key
* @author Daevid Vincent [***@sctr.net]
* @date 2013-09-03
* @see get_instance(), execute(), fetch_query_pair()
*/
public function fetch_query($sql = "", $print = false,
$get_first=false, $key=null)
{
//$D_start_mem_usage = memory_get_usage();
if (!$this->execute($sql, $print)) return false;
$tmp = array();
if (is_null($key))
while($arr = $this->fetch_array(MYSQL_ASSOC)) $tmp[]
= $arr;
else
while($arr = $this->fetch_array(MYSQL_ASSOC))
$tmp[$arr[$key]] = $arr;
$this->free_result(); // freeing result from memory
//echo "\nMEMORY USED FOR fetch_query():
".number_format(memory_get_usage() - $D_start_mem_usage)." PEAK:
(".number_format(memory_get_peak_usage(true)).")<br>\n";
return (($get_first) ? array_shift($tmp) : $tmp);
}
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php