[GSAS-II] Problem with sequential refinement of large data set

Fri Mar 15 20:22:33 CDT 2019

Hi Ivo (& Isaac),

  I have just checked in a new version of GSAS-II (#3855). I found a number of sections of code where we looped for every histogram through the entire GPX file or through complex data structures. This caused the time for these steps to increase as N**2. For small GPX files this was so quick as to be negligible, but made a huge difference for your “long.gpx” file.

   I changed a fair amount of code (and hopefully did not break anything), but the speed is now way faster for both startup and for fitting each histogram in a very large sequential refinement. On my laptop, the time for each histogram in long.gpx has gone from 80 sec down to 6 sec. I don’t have timing for the refinement startup, but I know that I trimmed at least a few minutes off that. Histogram plotting has gone from ~60 sec. down to ~2 sec. Single-histogram plotting should actually be a bit faster than that, but looking into that will be for another time.

    I think that you will find there is a bit of a slowdown with GPX files having thousands of histograms — that is unavoidable as certain steps require reading the entire GPX file or searching through all the information, but my hope is that those delays are only at the beginning and end of the sequential refinement, but the speed within will not be much affected. There are also some potential memory issues in that there are probably some arrays that should be deleted after each histogram (also for another occasion). Nonetheless, GSAS-II should now be good for sequential fitting for least 10,000 histograms in a single GPX file. Please let me know your experience.

Brian

On Mar 4, 2019, at 3:33 AM, Ivo Alxneit via GSAS-II
<gsas-ii at aps.anl.gov<mailto:gsas-ii at aps.anl.gov>
<mailto:gsas-ii at aps.anl.gov> <mailto:gsas-ii at aps.anl.gov>> wrote:

Dear all

I am having a problem with sequential refinements in GSASII. I have a
large data set where the changes to the few fitted parameters (scale
factor, one lattice parameter, three background parameters) are minor
and slow. If I fit a single pattern, the fit takes about one second
including saving the data. If I do a sequential refinement of less than
100 patterns (starting values are very close to final values in all
pattern) time to fit one pattern remains about the same. If I work with
1200 pattern the time increases to about three seconds. Finally, if I
use the whole series of 8500 patten the time to fit a single pattern
increases to about 45 seconds!

From my understanding the time to fit a single pattern in a sequential
refinement should be approximately constant as the fits are independent
of each other. I might expect a small increase of the time because the
data structure (project) becomes larger and so does the file that is
saved after each fit (8500 pattern: 1GB). The reality, however, shows a
more than linear increase of the time-per-pattern. How can this happen?
Where is more than 98% of the time spent?

In the same context. Shouldn't parallel execution of a sequential
refinement be "trivial" to be implemented: Each CPU grabs the next
pattern waiting to be refined and works on it. "Copy previous results"
would turn into "Get results from last refined pattern". These are
starting values and it should not make much of a difference if the are
from pattern n-1 or n-x (x being typically not too large).

Any insight is appreciated.
--
Dr. Ivo Alxneit

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.aps.anl.gov/pipermail/gsas-ii/attachments/20190316/0ec0feb6/attachment.html>