[GSAS-II] Problem with sequential refinement of large data set

Mon Mar 4 11:03:05 CST 2019

Hi Ivo,

   You do not comment on what scale/type of computer you are using for this. It would be interesting to know if this is Windows, Mac or Linux and how much memory the computer has. This could simply be a memory-swap problem, as the way GSAS-II works is to keep the entire GPX file in memory and that is going to create a much bigger demand with a 1Gb project. If you have access to big-memory machine it would be interesting to know if this makes a difference.

   Bob and I spoke about this earlier and he suggested that a likely reason for the slowdown has to do with the way we handle file IO. After every pattern is fit, the entire GPX file is written and probably reread as well. If you would be willing, we can put some debug statements into the code that would allow seeing where the 45 seconds are being consumed. Reorganizing the sequential refinement code to avoid some of this should then be possible. Just to confirm, you see the 45 seconds/pattern with your 1 Gb project, regardless if you set the fitting range as say 10 patterns or the entire range. Correct?

   Parallel processing is also certainly a possibility. It can already be done with scripting (see O’Donnell, J. H., Dreele, R. B. V., Chan, M. K. Y. & Toby, B. H. (2018). Journal of Applied Crystallography 51, 1244-1250.) — reprint on request, but the most straightforward approach of simply assigning each processor every n-th pattern would require n copies of the GPX file, which would amplify the memory problems described in the 1st paragraph.

   Please let us know if you have any experience with use of different types of machines, and if you are willing to do some timing runs for us.

Brian

On Mar 4, 2019, at 3:33 AM, Ivo Alxneit via GSAS-II <gsas-ii at aps.anl.gov<mailto:gsas-ii at aps.anl.gov>> wrote:

Dear all

I am having a problem with sequential refinements in GSASII. I have a
large data set where the changes to the few fitted parameters (scale
factor, one lattice parameter, three background parameters) are minor
and slow. If I fit a single pattern, the fit takes about one second
including saving the data. If I do a sequential refinement of less than
100 patterns (starting values are very close to final values in all
pattern) time to fit one pattern remains about the same. If I work with
1200 pattern the time increases to about three seconds. Finally, if I
use the whole series of 8500 patten the time to fit a single pattern
increases to about 45 seconds!

From my understanding the time to fit a single pattern in a sequential
refinement should be approximately constant as the fits are independent
of each other. I might expect a small increase of the time because the
data structure (project) becomes larger and so does the file that is
saved after each fit (8500 pattern: 1GB). The reality, however, shows a
more than linear increase of the time-per-pattern. How can this happen?
Where is more than 98% of the time spent?

In the same context. Shouldn't parallel execution of a sequential
refinement be "trivial" to be implemented: Each CPU grabs the next
pattern waiting to be refined and works on it. "Copy previous results"
would turn into "Get results from last refined pattern". These are
starting values and it should not make much of a difference if the are
from pattern n-1 or n-x (x being typically not too large).

Any insight is appreciated.
--
Dr. Ivo Alxneit
Catalysis for Energy Group
Bioenergy and Catalysis Laboratory        phone: +41 56 310 4092
Paul Scherrer Institute                     fax: +41 56 310 2688
CH-5232 Villigen                      gnupg key: 0x515E30C7
Switzerland
https://www.psi.ch/ceg/

_______________________________________________
GSAS-II mailing list
GSAS-II at aps.anl.gov
https://mailman.aps.anl.gov/mailman/listinfo/gsas-ii

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.aps.anl.gov/pipermail/gsas-ii/attachments/20190304/ec109885/attachment.html>