Understanding Fortran Memory & Page Usage

cva · Mar 18, 2011

Hello,

I am am confused about how fortran code is being executed and how are the variables in the memory updated during the execution.

My code in simple terms looks like this:

give some values to a matrix psi(lx,ly) (initialization)

main loop do n=1,nend

nt=psi^3

make Fourier transform of psi and get kpsi
make Fourier transform of nt
calculate new kpsi from a simple old kpsi+ft(nt)
make inverse Fourier transform of kpsi and get the new psi in real space.
enddo

Most of the time is spent to make the 3 Fourier transforms. A smart algorithm would save the intermidiate value of kpsi and use a algorithm like this one:

give some values to a matrix psi(lx,ly) (initialization)
make Fourier transform of psi and get kpsi

main loop do n=1,nend

nt=psi^3

make Fourier transform of nt
calculate new kpsi from a simple old kpsi+FT(nt)
make inverse Fourier transform of kpsi and get the new psi in real space.
enddo

The problem is that it does not work. If I execute the second algorithm it will give rubbish. Between two iteration the cpu saves only the array psi. Second algorithm will work without problem in matlab. How can I force the cpu to save the intermediate value for later reuse. Doing another FT should take more than saving the array in the memory.

Mark44 · Mar 18, 2011

cva said:

Hello,

I am am confused about how fortran code is being executed and how are the variables in the memory updated during the execution.

My code in simple terms looks like this:

give some values to a matrix psi(lx,ly) (initialization)

main loop do n=1,nend

nt=psi^3

What is this? AFAIK there is no ^ operator in Fortran, at least up to Fortran 95. Do you mean it as raising psi to the third power? The exponentiation operator in Fortran is **, but you can't use it to raise a two-dimensional array to a power.

cva said:

make Fourier transform of psi and get kpsi
make Fourier transform of nt
calculate new kpsi from a simple old kpsi+ft(nt)
make inverse Fourier transform of kpsi and get the new psi in real space.
enddo

Most of the time is spent to make the 3 Fourier transforms. A smart algorithm would save the intermidiate value of kpsi and use a algorithm like this one:

give some values to a matrix psi(lx,ly) (initialization)
make Fourier transform of psi and get kpsi

main loop do n=1,nend

nt=psi^3

make Fourier transform of nt
calculate new kpsi from a simple old kpsi+FT(nt)
make inverse Fourier transform of kpsi and get the new psi in real space.
enddo

The problem is that it does not work. If I execute the second algorithm it will give rubbish. Between two iteration the cpu saves only the array psi. Second algorithm will work without problem in matlab. How can I force the cpu to save the intermediate value for later reuse. Doing another FT should take more than saving the array in the memory.

Show us your code.

cva · Mar 18, 2011

Hello,

Thank you for your reply. I was trying to write in a pseudocode. The ** takes power of each element of a matrix.

I uploaded the whole code, but here is the relevant part of the code.

Code:

       open(unit=1,file='hex.in',status='old')
	do j=1,ly
	do i=1,lx
	read(1,*) psi(i,j)
	enddo
	enddo
	close(1) 
	
	do n=0,nend

	call update()

	enddo 	contains

	subroutine update()

        nt=psi**3

	call dfftw_execute(planpsi) ! call to make FT of array psi
	call dfftw_execute(plannt)  ! call to make FT of array nt 
	ckpsi=ckpsi*ucaa+ntc*mucaa
	
	call dfftw_execute(iplanpsi) ! call to make inverse FT of array ckpsi
	
	return

	end subroutine

A smart algorithm would save the value of ckpsi and thus not do one extra FT and reducing the total calculations by about 30%, but somehow the matrix ckpsi is not saved between successive iterations. The subroutine update would look like this:

Code:

	subroutine update()

        nt=psi**3

! Note the missing line call dfftw_execute(planpsi)

	call dfftw_execute(plannt) 
	ckpsi=ckpsi*ucaa+ntc*mucaa
	
	call dfftw_execute(iplanpsi)
	
	return

	end subroutine

Mark44 · Mar 18, 2011

Fortran doesn't have an operator to exponentiate each element of a matrix (2-d array), so if you want to raise a matrix to the third power, you need to do it in a nested do loop. Also, you can't multiply two matrices by using the * operator as you seem to do in this line: ckpsi=ckpsi*ucaa+ntc*mucaa

I'm not sure I understand what you're asking about saving ckpsi between iterations. Your code isn't very close to where it would need to be to compile and run, and in particular there are no declarations for the arrays you are using. Perhaps the answer to your question is to make a global array named ckpsi.

cva · Mar 18, 2011

The line psi**3 is ok. The fortran 90 vompilers reconize psi**3 as a nested loop and the result a matrix with each element with power 3. The A*B statement makes A(i,j)*B(i,j), while the psi**3 will result will do psi(i,j)**3 for each element. Arrays psi and ckpsi are golbal. During the iterations ( the loop with n=1,nend) I cannot reusse the matrix ckpsi.
I have to call the subroutine update() for many times succesive, but if I call the subroutine upodate() once, I can not reuse the variable ckpsi in the next call.

Here is the whole code which works:

Code:

      program si
      implicit none
      include "fftw3.f"       
      integer,parameter :: lx=128,ly=lx
      integer,parameter :: nend=500000,nav=nend/10,nf=50
      double precision, parameter :: pi=3.14159265358979
      double precision, parameter :: sq3=1.732050807568877
      double precision, parameter :: at=7.255197456936871,vmax=0.1
      double precision, parameter :: dt=0.5,VV=1.d0/(lx*ly) 
      double precision, parameter :: dx=at/8.,dy=(at*sq3/2.)/8. 
      double precision, parameter :: r=-0.25,pm=-0.25
      double precision :: psi(lx,ly),bffr(lx,ly),mu(lx,ly),nt(lx,ly)
      double precision :: Ah,ene,eneb,rhos,AA,tmp(lx,ly)
      double precision :: caa(lx/2+1,ly)
      double precision :: qq,ccc,rnd,qx,qy,dpsi
      double precision :: mucaa(lx/2+1,ly),ucaa(lx/2+1,ly)
      double precision :: ttime(2),gas
      integer :: iiss
      integer :: j,n,ii,iii,val(8),seed(1:50),sz
      complex :: ckpsi(lx/2+1,ly)
      complex :: ntc(lx/2+1,ly)
      complex :: bffc(lx/2+1,ly)
      integer*8 planpsi,iplanpsi,plannt,planbff,iplanbff
      character(len=20) :: fname
      
      call dfftw_plan_dft_r2c_2d(planpsi,lx,ly,psi,ckpsi,FFTW_ESTIMATE)  ! create the plan for forward ft of psi
      call dfftw_plan_dft_c2r_2d(iplanpsi,lx,ly,ckpsi,psi,FFTW_ESTIMATE) ! create the plan for inverse ft of psi
      call dfftw_plan_dft_r2c_2d(plannt,lx,ly,nt,ntc,FFTW_ESTIMATE)  ! create the plan for forward ft of nt
      call dfftw_plan_dft_r2c_2d(planbff,lx,ly,bffr,bffc,FFTW_ESTIMATE)  ! create the plan for forward ft of bffr
      call dfftw_plan_dft_c2r_2d(iplanbff,lx,ly,bffc,bffr,FFTW_ESTIMATE) ! create the plan for inverse ft of bffr

	
	call cpu_time(ttime(1))

	write(*,*) 'Begin'

	CALL RANDOM_SEED()
	call random_seed(size=sz)
	CALL RANDOM_NUMBER(rnd)
	CALL DATE_AND_TIME(VALUES=val);
	seed=INT(rnd*389179*val(8))
	call random_seed(put=seed(1:sz))!     k-space coeficients

	do j=1,ly
	if(j.le.ly/2) then
	qy=2*pi*(j-1)/(ly*dy)
	else
	qy=2*pi*(j-1-ly)/(ly*dy)
	endif
	
	do i=1,lx/2
	qx=2*pi*(i-1)/(lx*dx)
	qq=-qx*qx-qy*qy
	ccc=r+1+2*qq+qq*qq	
	caa(i,j)=VV*(ccc)
      ucaa(i,j)=VV/(1.-dt*ccc*qq)
	mucaa(i,j)=VV*qq*dt/(1-dt*ccc*qq)
	enddo
	
	enddo

! Initial configuration 
      
      write(*,*) 'Loading initial configuration from hex.in'
	open(unit=1,file='hex.in',status='old')
	do j=1,ly
	do i=1,lx
	read(1,*) psi(i,j)
	enddo
	enddo
	close(1) 
	
	rnd=sum(psi)/(lx*ly)-pm
	psi=psi-rnd
		

! Main loop      

	write(*,*) 'Commencing the main loop'      	
	
	do n=0,nend

	call update()

	enddo 
	
	dpsi=0
	do n=1,nav
	
	tmp=psi 
	
	call update()
      dpsi=dpsi+sum(abs(psi-tmp))*VV
      enddo
      	
	open(unit=2,file=fname(0))  
	do j=1,ly
	do iii=1,lx
	write(2,*) psi(iii,j) ! ,iii,j
	enddo
	enddo
	close(2)
	
	call energy()
      open(unit=800,file='energy.txt',position='append')
	write(800,*) ene
      close(800)	
      open(unit=800,file='speed.txt',position='append')
	write(800,*) dpsi/(dt*nav)
      close(800)	
	write(*,*) ene,dpsi/dt
	
	
	call cpu_time(ttime(2))
	
	write(*,*) 'Total time', ttime(2)-ttime(1)

	contains

	subroutine energy()

	bffr=psi
	
	call dfftw_execute(planbff)
	
	bffc=bffc*caa   
	call dfftw_execute(iplanbff)

	ene=sum(0.5*psi*bffr+0.25*psi**4)*VV
	
	return

	end subroutine

	subroutine update()

      nt=psi**3

	call dfftw_execute(planpsi)
	call dfftw_execute(plannt)
	ckpsi=ckpsi*ucaa+ntc*mucaa
	
	call dfftw_execute(iplanpsi)
	
	return

	end subroutine

      end program

      function fname(n)
      implicit none
      integer :: n
      character(len=20) fname
      
      select case(n)
      case(0:9)
      write(fname,'(a,i1,a)') '0000000',n,'.txt'
      case(10:99)
      write(fname,'(a,i2,a)') '000000',n,'.txt'
      case(100:999)
      write(fname,'(a,i3,a)') '00000',n,'.txt'
      case(1000:9999)
      write(fname,'(a,i4,a)') '0000',n,'.txt'
      case(10000:99999)
      write(fname,'(a,i5,a)') '000',n,'.txt'
      case(100000:999999)
      write(fname,'(a,i6,a)') '00',n,'.txt'
      case(1000000:9999999)
      write(fname,'(a,i7,a)') '0',n,'.txt'
      case(10000000:99999999)
      write(fname,'(a,i8,a)') n,'.txt'
      end select
      end function

As you can see the program is complete. the core of program is this:

Code:

open(unit=1,file='hex.in',status='old')
do j=1,ly
do i=1,lx
read(1,*) psi(i,j)
enddo
enddo
close(1)

do n=0,nend

call update()

enddocontains

subroutine update()

nt=psi**3

call dfftw_execute(planpsi) ! call to make FT of array psi
call dfftw_execute(plannt) ! call to make FT of array nt
ckpsi=ckpsi*ucaa+ntc*mucaa

call dfftw_execute(iplanpsi) ! call to make inverse FT of array ckpsi

return

end subroutine

While this code works fine, I would like to make faster., by gettting read of one FT in the subroutine update() anssaveing the ckpsi matrix from one call to another. Buevent if the matrix ckpsi is global, still it is not saved from one call to another. So if I woulf put a line like this :

write(*,*) ckpsi

I would get only zeros, even if the program outputs correctly. This make me believe that when the program is executed, in order to make it fast the a copy of the array is made in the cpu cache., but I do not have access to it.

Does anyone know what the command common does?

Mark44 · Mar 18, 2011

A common block is a way to share data among program units. Do a search for "fortran common" and you'll get more information.

Regarding your problem with ckpsi I don't see any place where this array gets initialized, unless it's when one of these subroutines gets called:

Code:

call dfftw_plan_dft_r2c_2d(planpsi,lx,ly,psi,ckpsi,FFTW _ESTIMATE) ! create the plan for forward ft of psi
call dfftw_plan_dft_c2r_2d(iplanpsi,lx,ly,ckpsi,psi,FFT W_ESTIMATE) ! create the plan for inverse ft of psi

In the update subroutine, you have this code:

Code:

ckpsi=ckpsi*ucaa+ntc*mucaa

If ckpsi hasn't been initialized to some known values, you are resetting the values in this array to garbage values.

cva · Mar 18, 2011

Mark44 said:
A common block is a way to share data among program units. Do a search for "fortran common" and you'll get more information.

Regarding your problem with ckpsi I don't see any place where this array gets initialized, unless it's when one of these subroutines gets called:
Code:
call dfftw_plan_dft_r2c_2d(planpsi,lx,ly,psi,ckpsi,FFTW _ESTIMATE) ! create the plan for forward ft of psi
call dfftw_plan_dft_c2r_2d(iplanpsi,lx,ly,ckpsi,psi,FFT W_ESTIMATE) ! create the plan for inverse ft of psi
In the update subroutine, you have this code:
Code:
ckpsi=ckpsi*ucaa+ntc*mucaa
If ckpsi hasn't been initialized to some known values, you are resetting the values in this array to garbage values.

The modified code wold be something like this:

Code:

open(unit=1,file='hex.in',status='old')
do j=1,ly
do i=1,lx
read(1,*) psi(i,j)
enddo
enddo
close(1)call dfftw_execute(planpsi) ! call to make FT of array psi ! Only once

do n=0,nend

call update()

enddocontains

subroutine update()

nt=psi**3

! Note that now there  would be only 2 FT
call dfftw_execute(plannt) ! call to make FT of array nt
ckpsi=ckpsi*ucaa+ntc*mucaa

call dfftw_execute(iplanpsi) ! call to make inverse FT of array ckpsi

return

end subroutine

So first the matrix psi is given some value, than the ckpsi is obntained by a FT and then the subroutine update() is called many times. But is does not work.

I think the memory is allocated, but cpu uses copies of those and the communication and the value in the memory are updated less in order to reduce the communications. In my case I need to save the intermediate values. I need a statement to force that the matrix ckpsi is always update in the main memory. This way I could reuse those values and gat read of a stament and make the code perform one 1/3 less calculations.

Mark44 · Mar 18, 2011

cva said:

So first the matrix psi is given some value, than the ckpsi is obntained by a FT and then the subroutine update() is called many times. But is does not work.

What do you mean, it doesn't work? Please be more specific. It's very difficult for me to get the gist of what this code is doing. The names used are pretty much incomprehensible. No offense to you, but Fortran programmers seem to pick names that don't aid in the understanding of what they are supposed to do. For example, planpsi, iplanpsi, plannt, planbff, iplanbff don't tell me much about what they're supposed to have in them.

cva said:

I think the memory is allocated, but cpu uses copies of those and the communication and the value in the memory are updated less in order to reduce the communications. In my case I need to save the intermediate values. I need a statement to force that the matrix ckpsi is always update in the main memory. This way I could reuse those values and gat read of a stament and make the code perform one 1/3 less calculations.

I think you are talking about the cache (or caches). If data is being loaded from program memory to the CPU it first goes into a cache on the processor. Some of your arrays are large enough that they might not fit into the cache. For instance, your kcpsi array is 65 X 128 times 16 bytes, or 133,120 bytes. If you are updating that array 500,000 times, it doesn't do much good to have it in the cache, because any change to program memory for the array will mean that one or more cache lines need to be refreshed.

rcgldr · Mar 18, 2011

Unless you're running multiple cpu's that do not have automatic memory sychronization, which would be rare, the cache in a cpu isn't going to create a problem in terms of retaining variables. Would could be a problem is a stack based local array that only exists while the routine that declares that array is running.

cva · Mar 18, 2011

rcgldr said:

Unless you're running multiple cpu's that do not have automatic memory sychronization, which would be rare, the cache in a cpu isn't going to create a problem in terms of retaining variables. Would could be a problem is a stack based local array that only exists while the routine that declares that array is running.

Well I need to access this "stack based local array". declaring that array as global does not work. For examplem writing this:

Code:

       do j=1,ly
	do i=1,lx
       psi(i,j)=-0.55*(cos(sqrt(3.)*i*dx/2)*cos(j*dy/2)-0.5*cos(j*dy))-0.25
	enddo
	enddo

	call dfftw_execute_dft_r2c(planpsi,psi,ckpsi) ! Fast Fourier transform 
       write(*,*) real(ckpsi(1,1)/(lx*ly)),sum(psi)/(lx*ly)

outputs the correct numbers -0.25,-0.25
But doing several calls like this

Code:

       do j=1,ly
	do i=1,lx
       psi(i,j)=-0.55*(cos(sqrt(3.)*i*dx/2)*cos(j*dy/2)-0.5*cos(j*dy))-0.25
	enddo
	enddo

     do n=1,100
      nt=psi**3
  	
	call dfftw_execute_dft_r2c(planpsi,psi,ckpsi)
	call dfftw_execute_dft_r2c(plannt,nt,ntc)
	ckpsi=ckpsi*ucaa+ntc*mucaa
	
	call dfftw_execute_dft_c2r(iplanpsi,ckpsi,psi)
       enddo 
       write(*,*) 0,real(ckpsi(1,1)/(lx*ly)),sum(psi)/(lx*ly)

outputs 0,-25. So even if the array psi gets the correct value the array ckpsi never gets written in the main memory. The cpu uses a copy and it never synchronizes. I need to make this synchronization in order to make my code faster.

Is it possible ?

rcgldr · Mar 18, 2011

Either make the array global (don't understand why this is an issue), or create a "master" subroutine that allocates the array, then calls all the functions that use the array, passing a reference to that array in all the called subroutines (fortran normally passes parameters by reference (address)).

cva · Mar 18, 2011

rcgldr said:

Either make the array global (don't understand why this is an issue), or create a "master" subroutine that allocates the array, then calls all the functions that use the array, passing a reference to that array in all the called subroutines (fortran normally passes parameters by reference (address)).

My original code has the arrray global, however if I have a succession of calls (Fourier transforms) I can not access the values of this array . The only explanation is that the cpu does not synchronize.

Could gice a small eample of what do you mean by the second suggestion?

rcgldr · Mar 18, 2011

cva said:

My original code has the arrray global, however if I have a succession of calls (Fourier transforms) I can not access the values of this array. The only explanation is that the cpu does not synchronize.

That seems unlikely, since that would be a cache / memory failure that would cause all sorts of problems. It's more likely that the array is getting overwritten due to indexing other arrays near where that array is located in memory, or that the transform process destroys the input array. If this is the case, create a second array to use as backup array. Copy the array to the backup array at the start of the program, then do the transform, then copy the backup back to the primary array before doing the next transform.

Could give a small eample of what do you mean by the second suggestion?

You shouldn't need to do this. The global array should work. Trying to allocate a large array on the stack could cause other issues.

Mark44 · Mar 18, 2011

See this link, on page 15 - http://star-www.st-and.ac.uk/~spd3/Teaching/AS3013/lectures/AS3013_lecture5.pdf

cva · Mar 19, 2011

rcgldr said:

That seems unlikely, since that would be a cache / memory failure that would cause all sorts of problems. It's more likely that the array is getting overwritten due to indexing other arrays near where that array is located in memory, or that the transform process destroys the input array. If this is the case, create a second array to use as backup array. Copy the array to the backup array at the start of the program, then do the transform, then copy the backup back to the primary array before doing the next transform.

Done that. Createa d a back-up arrray an tried to a line like this
tmp=ckpsi, but as I said several times the contents of ckpsi are not accessible.

In this code with one call, I can access the array ckpsi:

Code:

       do j=1,ly
	do i=1,lx
       psi(i,j)=-0.55*(cos(sqrt(3.)*i*dx/2)*cos(j*dy/2)-0.5*cos(j*dy))-0.25
	enddo
	enddo

	call dfftw_execute_dft_r2c(planpsi,psi,ckpsi) ! Fast Fourier transform 
       write(*,*) real(ckpsi(1,1)/(lx*ly)),sum(psi)/(lx*ly)

outputs the correct numbers -0.25,-0.25
But when doing several calls like, I can not this

Code:

       do j=1,ly
	do i=1,lx
       psi(i,j)=-0.55*(cos(sqrt(3.)*i*dx/2)*cos(j*dy/2)-0.5*cos(j*dy))-0.25
	enddo
	enddo

     do n=1,100
      nt=psi**3
  	
	call dfftw_execute_dft_r2c(planpsi,psi,ckpsi)
	call dfftw_execute_dft_r2c(plannt,nt,ntc)
	ckpsi=ckpsi*ucaa+ntc*mucaa
	
	call dfftw_execute_dft_c2r(iplanpsi,ckpsi,psi)
       enddo 
       write(*,*) 0,real(ckpsi(1,1)/(lx*ly)),sum(psi)/(lx*ly)

It gives 0,-02.5. The code wiorks, but the values of the array ckos are never saved anywhere, and there nothing to copy to the buffer array.

Understanding Fortran Memory & Page Usage

Related to Understanding Fortran Memory & Page Usage

1. What is Fortran memory and how is it different from other programming languages?

2. How does Fortran handle memory allocation and deallocation?

3. What is page usage in Fortran and how does it impact performance?

4. Can I control the memory and page usage in my Fortran program?

5. Are there any best practices for optimizing Fortran memory and page usage?

Similar threads

Hot Threads

Recent Insights