Discussion:
refernces, arrays, and why does it take up so much memory? [SOLVED]
Daevid Vincent
2013-09-03 20:47:05 UTC
Permalink
EUREKA!
-----Original Message-----
Sent: Tuesday, September 03, 2013 6:31 AM
To: Daevid Vincent
Subject: Re: [PHP] refernces, arrays, and why does it take up so much
memory?
I'm confused on how a reference works I think.
I have a DB result set in an array I'm looping over. All I simply want
to
do
is make the array key the "id" of the result set row.
private function _normalize_result_set()
{
foreach($this->tmp_results as $k => $v)
{
$id = $v['id'];
$new_tmp_results[$id] =& $v; //2013-08-29 [dv] using
a
reference here cuts the memory usage in half!
You are assigning a reference to $v. In the next iteration of the loop, $v
will be pointing at the next item in the array, as will the reference
you're
storing here. With this code I'd expect $new_tmp_results to be an array
where the keys (i.e. the IDs) are correct, but the data in each item
matches
the data in the last item from the original array, which appears to be
what
you describe.
unset($this->tmp_results[$k]);
Doing this for every loop is likely very inefficient. I don't know how the
inner workings of PHP process something like this, but I wouldn't be
surprised if it's allocating a new chunk of memory for a version of the
array without this element. You may find it better to not unset anything
until the loop has finished, at which point you can just unset($this-
tmp_results).
/*
if ($i++ % 1000 == 0)
{
gc_enable(); // Enable Garbage Collector
var_dump(gc_enabled()); // true
var_dump(gc_collect_cycles()); // # of
elements
cleaned up
gc_disable(); // Disable Garbage Collector
}
*/
}
$this->tmp_results = $new_tmp_results;
//var_dump($this->tmp_results); exit;
unset($new_tmp_results);
}
private function _normalize_result_set()
{
// Initialise the temporary variable.
$new_tmp_results = array();
// Loop around just the keys in the array.
foreach (array_keys($this->tmp_results) as $k)
{
// Store the item in the temporary array with the ID as the key.
// Note no pointless variable for the ID, and no use of &!
$new_tmp_results[$this->tmp_results[$k]['id']] =
$this->tmp_results[$k];
}
// Assign the temporary variable to the original variable.
$this->tmp_results = $new_tmp_results;
}
I'd appreciate it if you could plug this in and see what your memory usage
reports say. In most cases, trying to control the garbage collection
through
the use of references is the worst way to go about optimising your code.
In
my code above I'm relying on PHPs copy-on-write feature where data is only
duplicated when assigned if it changes. No unsets, just using scope to
mark
a variable as able to be cleaned up.
Where is this result set coming from? You'd save yourself a lot of
memory/time by putting the data in to this format when you read it from
the
source. For example, if reading it from MySQL, $this-
tmp_results[$row['id']] = $row when looping around the result set.
Also, is there any reason why you need to process this full set of data in
one go? Can you not break it up in to smaller pieces that won't put as
much
strain on resources?
-Stuart
There were reasons I had the $id -- I only showed the relevant parts of the
code for sake of not overly complicating what I was trying to illustrate.
There is other processing that had to be done too in the loop and that is
also what I illustrated.

Here is your version effectively:

private function _normalize_result_set() //Stuart
{
if (!$this->tmp_results || count($this->tmp_results) < 1)
return;

$new_tmp_results = array();

// Loop around just the keys in the array.
$D_start_mem_usage = memory_get_usage();
foreach (array_keys($this->tmp_results) as $k)
{
/*
if ($this->tmp_results[$k]['genres'])
{
// rip through each scene's `genres` and
store them as an array since we'll need'em later too
$g = explode('|',
$this->tmp_results[$k]['genres']);
array_pop($g); // there is an extra ''
element due to the final | character. :-\
$this->tmp_results[$k]['g'] = $g;
}
*/

// Store the item in the temporary array with the ID
as the key.
// Note no pointless variable for the ID, and no use of
&!
$new_tmp_results[$this->tmp_results[$k]['id']] =
$this->tmp_results[$k];
}

// Assign the temporary variable to the original variable.
$this->tmp_results = $new_tmp_results;
echo "\nMEMORY USED FOR STUART's version:
".number_format(memory_get_usage() - $D_start_mem_usage)." PEAK:
(".number_format(memory_get_peak_usage(true)).")<br>\n";
var_dump($this->tmp_results);
exit();
}

MEMORY USED FOR STUART's version: -128 PEAK: (90,439,680)

With the processing in the genres block
MEMORY USED FOR STUART's version: 97,264,368 PEAK: (187,695,104)

So a slight improvement from the original of -28,573,696
MEMORY USED FOR _normalize_result_set(): 97,264,912 PEAK: (216,268,800)


No matter what I tried however it seems that frustratingly just the simple
act of adding a new hash to the array is causing a significant memory jump.
That really blows! Therefore my solution was to not store the $g as ['g'] --
which would seem to be the more efficient way of doing this once and re-use
the array over and over, but instead I am forced to inline rip through and
explode() in three different places of my code.

We get over 30,000 hits per second, and even with lots of caching, 216MB vs
70-96MB is significant and the speed hit is only about 1.5 seconds more per
page.

Here are three distinctly different example pages that exercise different
parts of the code path:

PAGE RENDERED IN 7.0466279983521 SECONDS
MEMORY USED @START: 262,144 - @END: 26,738,688 = 26,476,544 BYTES
MEMORY PEAK USAGE: 69,730,304 BYTES

PAGE RENDERED IN 6.9327299594879 SECONDS
MEMORY USED @START: 262,144 - @END: 53,739,520 = 53,477,376 BYTES
MEMORY PEAK USAGE: 79,167,488 BYTES

PAGE RENDERED IN 7.558168888092 SECONDS
MEMORY USED @START: 262,144 - @END: 50,855,936 = 50,593,792 BYTES
MEMORY PEAK USAGE: 96,206,848 BYTES

Furthermore I investigated what Jim Giner suggested and it turns out there
was a way for me to wedge into our Connection class a way to mangle the
results at that point, which is actually a more elegant solution overall as
we can re-use that in many more places going forward.

/**
* Execute a database SQL query and return all the results in an
associative array
*
* @access public
* @return array or false
* @param string $sql the SQL code to execute
* @param boolean $print (false) Print a color coded version
of the query.
* @param boolean $get_first (false) return the first element
only. useful for when 1 row is returned such as "LIMIT 1"
* @param string $key (null) if a column name, such as 'id' is
used here, then that column will be used as the array key
* @author Daevid Vincent [***@sctr.net]
* @date 2013-09-03
* @see get_instance(), execute(), fetch_query_pair()
*/
public function fetch_query($sql = "", $print = false,
$get_first=false, $key=null)
{
//$D_start_mem_usage = memory_get_usage();
if (!$this->execute($sql, $print)) return false;

$tmp = array();

if (is_null($key))
while($arr = $this->fetch_array(MYSQL_ASSOC)) $tmp[]
= $arr;
else
while($arr = $this->fetch_array(MYSQL_ASSOC))
$tmp[$arr[$key]] = $arr;

$this->free_result(); // freeing result from memory
//echo "\nMEMORY USED FOR fetch_query():
".number_format(memory_get_usage() - $D_start_mem_usage)." PEAK:
(".number_format(memory_get_peak_usage(true)).")<br>\n";
return (($get_first) ? array_shift($tmp) : $tmp);
}
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Stuart Dallas
2013-09-03 21:36:34 UTC
Permalink
Post by Daevid Vincent
There were reasons I had the $id -- I only showed the relevant parts of the
code for sake of not overly complicating what I was trying to illustrate.
There is other processing that had to be done too in the loop and that is
also what I illustrated.
private function _normalize_result_set() //Stuart
{
if (!$this->tmp_results || count($this->tmp_results) < 1)
return;
$new_tmp_results = array();
// Loop around just the keys in the array.
$D_start_mem_usage = memory_get_usage();
foreach (array_keys($this->tmp_results) as $k)
{
You could save another, relatively small, chunk of memory by crafting your loop with the rewind, key, current and next methods (look them up to see what they do). Using those you won't need to make a copy of the array keys as done in the above line. When you've got the amount of data you're dealing with it may be worth investing that time.
Post by Daevid Vincent
/*
if ($this->tmp_results[$k]['genres'])
{
// rip through each scene's `genres` and
store them as an array since we'll need'em later too
$g = explode('|',
$this->tmp_results[$k]['genres']);
array_pop($g); // there is an extra ''
element due to the final | character. :-\
Then remove that from the string before you explode. Munging arrays is expensive, both computationally and in terms of memory usage.
Post by Daevid Vincent
$this->tmp_results[$k]['g'] = $g;
Get rid of the temporary variable again - there's no need for it.

$this->tmp_results[$k]['g'] = explode('|', trim($this->tmp_results[$k]['genres'], '|'));

If this is going in to a class, and you have control over how it's accessed, you have the ability to do this when the value is accessed. This means you won't need to
Post by Daevid Vincent
}
*/
// Store the item in the temporary array with the ID
as the key.
// Note no pointless variable for the ID, and no use of
&!
$new_tmp_results[$this->tmp_results[$k]['id']] =
$this->tmp_results[$k];
}
// Assign the temporary variable to the original variable.
$this->tmp_results = $new_tmp_results;
(".number_format(memory_get_peak_usage(true)).")<br>\n";
var_dump($this->tmp_results);
exit();
}
MEMORY USED FOR STUART's version: -128 PEAK: (90,439,680)
With the processing in the genres block
MEMORY USED FOR STUART's version: 97,264,368 PEAK: (187,695,104)
So a slight improvement from the original of -28,573,696
MEMORY USED FOR _normalize_result_set(): 97,264,912 PEAK: (216,268,800)
Awesome.
Post by Daevid Vincent
No matter what I tried however it seems that frustratingly just the simple
act of adding a new hash to the array is causing a significant memory jump.
That really blows! Therefore my solution was to not store the $g as ['g'] --
which would seem to be the more efficient way of doing this once and re-use
the array over and over, but instead I am forced to inline rip through and
explode() in three different places of my code.
Consider what you're asking PHP to do. You're taking an element in the middle of an array structure in memory and asking PHP to make it bigger. What's PHP going to do? It's going to copy the entire array to a new location in memory with an additional amount reserved for what you're adding. Note that this is just a guess - it's entirely possible that PHP manages it's memory better than that, but I wouldn't count on it.
Post by Daevid Vincent
We get over 30,000 hits per second, and even with lots of caching, 216MB vs
70-96MB is significant and the speed hit is only about 1.5 seconds more per
page.
Here are three distinctly different example pages that exercise different
PAGE RENDERED IN 7.0466279983521 SECONDS
MEMORY PEAK USAGE: 69,730,304 BYTES
PAGE RENDERED IN 6.9327299594879 SECONDS
MEMORY PEAK USAGE: 79,167,488 BYTES
PAGE RENDERED IN 7.558168888092 SECONDS
MEMORY PEAK USAGE: 96,206,848 BYTES
Knowing nothing about your application I'm obviously not in a strong position to comment, but seven seconds to generate a page would be unacceptable to me and any of my clients. I'll put money on it being possible to cut that time by changing your caching strategy. The memory usage is also ridiculous - does a single page really display that amount of data? Granted, there are some applications that cannot be optimised beyond a certain point, but those numbers make me sad!

-Stuart
--
Stuart Dallas
3ft9 Ltd
http://3ft9.com/
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Daevid Vincent
2013-09-03 23:03:08 UTC
Permalink
-----Original Message-----
Sent: Tuesday, September 03, 2013 2:37 PM
To: Daevid Vincent
Subject: Re: [PHP] refernces, arrays, and why does it take up so much
memory? [SOLVED]
Post by Daevid Vincent
There were reasons I had the $id -- I only showed the relevant parts of
the
Post by Daevid Vincent
code for sake of not overly complicating what I was trying to illustrate.
There is other processing that had to be done too in the loop and that is
also what I illustrated.
private function _normalize_result_set() //Stuart
{
if (!$this->tmp_results || count($this->tmp_results) < 1)
return;
$new_tmp_results = array();
// Loop around just the keys in the array.
$D_start_mem_usage = memory_get_usage();
foreach (array_keys($this->tmp_results) as $k)
{
You could save another, relatively small, chunk of memory by crafting your
loop with the rewind, key, current and next methods (look them up to see
what they do). Using those you won't need to make a copy of the array keys
as done in the above line. When you've got the amount of data you're
dealing
with it may be worth investing that time.
Post by Daevid Vincent
/*
if ($this->tmp_results[$k]['genres'])
{
// rip through each scene's `genres` and
store them as an array since we'll need'em later too
$g = explode('|',
$this->tmp_results[$k]['genres']);
array_pop($g); // there is an extra ''
element due to the final | character. :-\
Then remove that from the string before you explode.
Munging arrays is
expensive, both computationally and in terms of memory usage.
Post by Daevid Vincent
$this->tmp_results[$k]['g'] = $g;
Get rid of the temporary variable again - there's no need for it.
$this->tmp_results[$k]['g'] = explode('|', trim($this-
Post by Daevid Vincent
tmp_results[$k]['genres'], '|'));
Maybe an option. I'll look into trim() the last "|" off the tmp_results in a
loop at the top. Not sure if changing the variable will have the same effect
as adding one does. Interesting to see...
If this is going in to a class, and you have control over how it's
accessed,
you have the ability to do this when the value is accessed. This means you
won't need to
Post by Daevid Vincent
}
*/
// Store the item in the temporary array with the ID
as the key.
// Note no pointless variable for the ID, and no use of &!
$new_tmp_results[$this->tmp_results[$k]['id']] =
$this->tmp_results[$k];
}
// Assign the temporary variable to the original variable.
$this->tmp_results = $new_tmp_results;
(".number_format(memory_get_peak_usage(true)).")<br>\n";
var_dump($this->tmp_results);
exit();
}
MEMORY USED FOR STUART's version: -128 PEAK: (90,439,680)
With the processing in the genres block
MEMORY USED FOR STUART's version: 97,264,368 PEAK: (187,695,104)
So a slight improvement from the original of -28,573,696
MEMORY USED FOR _normalize_result_set(): 97,264,912 PEAK: (216,268,800)
Awesome.
Post by Daevid Vincent
No matter what I tried however it seems that frustratingly just the simple
act of adding a new hash to the array is causing a significant memory
jump.
Post by Daevid Vincent
That really blows! Therefore my solution was to not store the $g as
['g']
--
Post by Daevid Vincent
which would seem to be the more efficient way of doing this once and re-
use
Post by Daevid Vincent
the array over and over, but instead I am forced to inline rip through and
explode() in three different places of my code.
Consider what you're asking PHP to do. You're taking an element in the
middle of an array structure in memory and asking PHP to make it bigger.
What's PHP going to do? It's going to copy the entire array to a new
location in memory with an additional amount reserved for what you're
adding. Note that this is just a guess - it's entirely possible that PHP
manages it's memory better than that, but I wouldn't count on it.
Post by Daevid Vincent
We get over 30,000 hits per second, and even with lots of caching, 216MB
vs
Post by Daevid Vincent
70-96MB is significant and the speed hit is only about 1.5 seconds more
per
Post by Daevid Vincent
page.
Here are three distinctly different example pages that exercise different
PAGE RENDERED IN 7.0466279983521 SECONDS
MEMORY PEAK USAGE: 69,730,304 BYTES
PAGE RENDERED IN 6.9327299594879 SECONDS
MEMORY PEAK USAGE: 79,167,488 BYTES
PAGE RENDERED IN 7.558168888092 SECONDS
MEMORY PEAK USAGE: 96,206,848 BYTES
Knowing nothing about your application I'm obviously not in a strong
position to comment, but seven seconds to generate a page would be
unacceptable to me and any of my clients.
It's a "one time hit" and the rest is served from a cache for the next 24
hours which serves very very fast after that initial rendering. It's just we
have so many thousands of pages that this becomes an issue -- especially
when webcrawlers hit us and thread-out so MANY pages are trying to render at
the same time, especially the ones towards the end where they haven't been
cached since rarely do real people get that far... Like you know, pages 900,
901, 902, etc... with new content each day, page 1 today is now page 2
tomorrow, so it's a constant thorn.
I'll put money on it being
possible to cut that time by changing your caching strategy. The memory
usage is also ridiculous - does a single page really display that amount
of
data? Granted, there are some applications that cannot be optimised beyond
a
certain point, but those numbers make me sad!
HA! It was over 400MB per page a few weeks ago. I keep whittling it down,
but I think I'm hitting the lower limit at this point.

It's a tough balance between database hits, cache hits, network traffic
(memcached), disk i/o, page speed, load balancing, etc. All we can do is try
things and tweak and see what works and what brings the servers to their
binary knees.
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Daevid Vincent
2013-09-03 23:21:39 UTC
Permalink
-----Original Message-----
Sent: Tuesday, September 03, 2013 4:03 PM
Cc: 'Stuart Dallas'
Subject: RE: [PHP] refernces, arrays, and why does it take up so much
memory? [SOLVED]
Post by Stuart Dallas
$this->tmp_results[$k]['g'] = explode('|', trim($this-
Post by Stuart Dallas
tmp_results[$k]['genres'], '|'));
Maybe an option. I'll look into trim() the last "|" off the tmp_results in a
loop at the top. Not sure if changing the variable will have the same effect
as adding one does. Interesting to see...
Here are the results of that. Interesting changes. Overall it's a slight
improvement, but most significant on the middle one, so still a worthy
keeper. Odd that it wouldn't be improvement across the board though. PHP is
kookie.

PAGE RENDERED IN 7.1903319358826 SECONDS
MEMORY USED @START: 262,144 - @END: 27,000,832 = 26,738,688 BYTES
MEMORY PEAK USAGE: 69,992,448 BYTES

PAGE RENDERED IN 6.5189208984375 SECONDS
MEMORY USED @START: 262,144 - @END: 42,729,472 = 42,467,328 BYTES
MEMORY PEAK USAGE: 78,905,344 BYTES

PAGE RENDERED IN 7.5954079627991 SECONDS
MEMORY USED @START: 262,144 - @END: 50,331,648 = 50,069,504 BYTES
MEMORY PEAK USAGE: 96,206,848 BYTES

Old.

PAGE RENDERED IN 7.0466279983521 SECONDS
MEMORY USED @START: 262,144 - @END: 26,738,688 = 26,476,544 BYTES
MEMORY PEAK USAGE: 69,730,304 BYTES

PAGE RENDERED IN 6.9327299594879 SECONDS
MEMORY USED @START: 262,144 - @END: 53,739,520 = 53,477,376 BYTES
MEMORY PEAK USAGE: 79,167,488 BYTES

PAGE RENDERED IN 7.558168888092 SECONDS
MEMORY USED @START: 262,144 - @END: 50,855,936 = 50,593,792 BYTES
MEMORY PEAK USAGE: 96,206,848 BYTES
Stuart Dallas
2013-09-04 09:21:16 UTC
Permalink
Post by Daevid Vincent
-----Original Message-----
Sent: Tuesday, September 03, 2013 2:37 PM
To: Daevid Vincent
Subject: Re: [PHP] refernces, arrays, and why does it take up so much
memory? [SOLVED]
Post by Daevid Vincent
We get over 30,000 hits per second, and even with lots of caching, 216MB
vs 70-96MB is significant and the speed hit is only about 1.5 seconds more
per page.
Here are three distinctly different example pages that exercise
PAGE RENDERED IN 7.0466279983521 SECONDS
MEMORY PEAK USAGE: 69,730,304 BYTES
PAGE RENDERED IN 6.9327299594879 SECONDS
MEMORY PEAK USAGE: 79,167,488 BYTES
PAGE RENDERED IN 7.558168888092 SECONDS
MEMORY PEAK USAGE: 96,206,848 BYTES
Knowing nothing about your application I'm obviously not in a strong
position to comment, but seven seconds to generate a page would be
unacceptable to me and any of my clients.
It's a "one time hit" and the rest is served from a cache for the next 24
hours which serves very very fast after that initial rendering. It's just we
have so many thousands of pages that this becomes an issue -- especially
when webcrawlers hit us and thread-out so MANY pages are trying to render at
the same time, especially the ones towards the end where they haven't been
cached since rarely do real people get that far... Like you know, pages 900,
901, 902, etc... with new content each day, page 1 today is now page 2
tomorrow, so it's a constant thorn.
At 30k requests per second, is it a one-time hit, or is it 225k hit because in the 7 seconds it takes the cache to be built you have that many clients also building the cache? Or is this already an offline script, in which case how long it takes is largely irrelevant.

What is your caching strategy? What is cached? At what granularity? When is the cache updated (i.e. on demand or on change)? Why does a page need to retrieve so much data? Can that data not be summarised/processed ahead of demand?
Post by Daevid Vincent
I'll put money on it being
possible to cut that time by changing your caching strategy. The memory
usage is also ridiculous - does a single page really display that amount
of
data? Granted, there are some applications that cannot be optimised beyond
a
certain point, but those numbers make me sad!
HA! It was over 400MB per page a few weeks ago. I keep whittling it down,
but I think I'm hitting the lower limit at this point.
That is nuts! What's the website?
Post by Daevid Vincent
It's a tough balance between database hits, cache hits, network traffic
(memcached), disk i/o, page speed, load balancing, etc. All we can do is try
things and tweak and see what works and what brings the servers to their
binary knees.
Without knowing anything about the site there's little I can say, but if you want to take this off-list I'm happy to talk about it. I have a fair amount of experience with high-traffic web applications so it's possible I might be able to help.

-Stuart
--
Stuart Dallas
3ft9 Ltd
http://3ft9.com/
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Loading...