Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace cal_stress in GPU relax calculation,use NUM_STREAM in INPUT file and fix memory leak in force calculation #4200

Merged
merged 42 commits into from
May 27, 2024
Merged
Show file tree
Hide file tree
Changes from 34 commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
44ac153
replace ParaV in module gint
A-006 May 6, 2024
5c29f8c
Merge branch 'develop' into replace
A-006 May 6, 2024
0e360ee
Merge branch 'deepmodeling:develop' into replace
A-006 May 7, 2024
43fbbd4
change PV to pv in module gint
A-006 May 7, 2024
e53b13c
Merge branch 'deepmodeling:develop' into replace
A-006 May 11, 2024
1ad5d87
Merge branch 'deepmodeling:develop' into replace
A-006 May 13, 2024
3a67c2b
change GlobalC in module gint
A-006 May 13, 2024
e50ee79
fix LCAO_Orbitals in module gint
A-006 May 13, 2024
83ebe2e
fix error in compile without abacus
A-006 May 13, 2024
b96a678
Merge branch 'deepmodeling:develop' into replace
A-006 May 13, 2024
ff3e262
fix error in init_gpu_gint_variables
A-006 May 13, 2024
7d06aa2
remove GlobalC in grid_technique and grid_bigcell
A-006 May 13, 2024
da3522c
remove GlobalC in gint_tools and vbatch matrix
A-006 May 13, 2024
030217b
Merge branch 'deepmodeling:develop' into replace
A-006 May 14, 2024
4bbbdf3
Merge branch 'develop' into replace
A-006 May 14, 2024
82d9843
Merge branch 'develop' into replace
A-006 May 15, 2024
4e5e256
Merge branch 'develop' into replace
mohanchen May 20, 2024
5366d50
Merge branch 'develop' into replace
A-006 May 21, 2024
15e79e1
Merge branch 'deepmodeling:develop' into replace
A-006 May 21, 2024
939ed76
fix relax have compute stress and change GPU force compute to acclerate
A-006 May 21, 2024
01bde3e
fix num stream in input.md and use num_stream in input
A-006 May 21, 2024
49d6256
Merge branch 'deepmodeling:develop' into replace
A-006 May 22, 2024
4e44406
Merge branch 'deepmodeling:develop' into replace
A-006 May 22, 2024
6ebb7a3
fix error in compute force
A-006 May 22, 2024
db894e4
Merge branch 'develop' into replace
mohanchen May 22, 2024
07de464
Merge branch 'develop' into replace
A-006 May 23, 2024
fbd2f7b
fix memory error in force compute
A-006 May 23, 2024
c0e9990
use std instead of double * and add const
A-006 May 24, 2024
7d7a0b5
Merge branch 'deepmodeling:develop' into replace
A-006 May 24, 2024
20f60ab
fix error in vector use
A-006 May 24, 2024
b9a69cd
Merge branch 'develop' into replace
A-006 May 24, 2024
8626c7a
fix error in compile
A-006 May 24, 2024
5a81037
fix error in compile with force
A-006 May 24, 2024
cfae1f6
fix compile error
A-006 May 24, 2024
6c6f274
Merge branch 'develop' into replace
mohanchen May 25, 2024
c9ad0ee
Merge branch 'develop' into replace
A-006 May 26, 2024
8929180
fix paramter name and function name
A-006 May 26, 2024
7056269
add time ticker and fix nspin transport
A-006 May 26, 2024
7ccc51f
delete printf in files
A-006 May 26, 2024
60b1b4d
fix test bug and fix grid_size
A-006 May 26, 2024
5359609
init nstreams
A-006 May 27, 2024
7340726
Merge branch 'develop' into replace
A-006 May 27, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions docs/advanced/input_files/input-main.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@
- [search\_radius](#search_radius)
- [search\_pbc](#search_pbc)
- [bx, by, bz](#bx-by-bz)
- [num\_stream] (#num_stream)
- [num\_stream](#num_stream)
- [Electronic structure](#electronic-structure)
- [basis\_type](#basis_type)
- [ks\_solver](#ks_solver)
Expand Down Expand Up @@ -913,7 +913,6 @@ These variables are used to control the numerical atomic orbitals related parame
- **Description**: choose the number of streams in GPU when we compute the `LCAO`. According to different devices , we may have different effects.For most devices,the stream is
enough when the number is bigger then 2.
- **Default** : "4"
[back to top](#full-list-of-input-keywords)
mohanchen marked this conversation as resolved.
Show resolved Hide resolved

## Electronic structure

Expand Down
3 changes: 2 additions & 1 deletion source/module_esolver/esolver_ks_lcao_elec.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,8 @@ void ESolver_KS_LCAO<TK, TR>::set_matrix_grid(Record_adj& ra)
this->pw_rho->nplane,
this->pw_rho->startz_current,
GlobalC::ucell,
GlobalC::ORB);
GlobalC::ORB,
GlobalV::NUM_STREAM);

// (2)For each atom, calculate the adjacent atoms in different cells
// and allocate the space for H(R) and S(R).
Expand Down
61 changes: 32 additions & 29 deletions source/module_hamilt_lcao/module_gint/gint.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,6 @@ Gint::~Gint()
void Gint::cal_gint(Gint_inout* inout)
{
ModuleBase::timer::tick("Gint_interface", "cal_gint");

if(inout->job==Gint_Tools::job_type::vlocal)
{
ModuleBase::TITLE("Gint_interface","cal_gint_vlocal");
Expand Down Expand Up @@ -132,44 +131,49 @@ void Gint::cal_gint(Gint_inout* inout)
{
const int ncyz = this->ny * this->nplane;
int nat = ucell.nat;
const int isforce = inout->isforce;
const int isstress =inout->isstress;
// for (int is = 0; is < GlobalV::NSPIN; ++is)
// {
double *force = new double[ucell.nat * 3];
for (int i = 0; i < nat * 3; i++)
{
force[i] = 0.0;
}
double *stress = new double[6];
for (int i = 0; i < 6; i++)
{
stress[i] = 0.0;
}
GintKernel::gint_gamma_force_gpu(this->DMRGint[inout->ispin],
ucell.omega
/ this->ncxyz,
inout->vl,
force,
stress,
this->nplane,
dr,
rcut,
*this->gridt,
ucell);
for (int iat = 0; iat < nat; iat++)
if (isforce || isstress){
std::vector<double> force(nat * 3, 0.0);
std::vector<double> stress(6, 0.0);
GintKernel::gint_gamma_force_gpu(this->DMRGint[inout->ispin],
ucell.omega
/ this->ncxyz,
inout->vl,
force,
stress,
this->nplane,
dr,
rcut,
isforce,
isstress,
*this->gridt,
ucell);

if (inout->isforce)
{
inout->fvl_dphi[0](iat, 0) += force[iat * 3];
inout->fvl_dphi[0](iat, 1) += force[iat * 3 + 1];
inout->fvl_dphi[0](iat, 2) += force[iat * 3 + 2];
for (int iat = 0; iat < nat; iat++)
{
inout->fvl_dphi[0](iat, 0) += force[iat * 3];
inout->fvl_dphi[0](iat, 1) += force[iat * 3 + 1];
inout->fvl_dphi[0](iat, 2) += force[iat * 3 + 2];
}
}
if (inout->isstress){
inout->svl_dphi[0](0, 0) += stress[0];
inout->svl_dphi[0](0, 1) += stress[1];
inout->svl_dphi[0](0, 2) += stress[2];
inout->svl_dphi[0](1, 1) += stress[3];
inout->svl_dphi[0](1, 2) += stress[4];
inout->svl_dphi[0](2, 2) += stress[5];

}
force.clear();
stress.clear();
}

delete[] force;
delete[] stress;
// }
}
}
Expand Down Expand Up @@ -310,7 +314,6 @@ void Gint::cal_gint(Gint_inout* inout)
this->nplane, this->gridt->start_ind[grid_index], ncyz, dv);

double** DM_in;

if(GlobalV::GAMMA_ONLY_LOCAL)
{
DM_in = inout->DM[GlobalV::CURRENT_SPIN];
Expand Down
12 changes: 7 additions & 5 deletions source/module_hamilt_lcao/module_gint/gint_force.h
Original file line number Diff line number Diff line change
Expand Up @@ -87,11 +87,13 @@ typedef struct
void gint_gamma_force_gpu(hamilt::HContainer<double>* dm,
const double vfactor,
const double* vlocal,
double* force,
double* stress,
std::vector<double>& force,
std::vector<double>& stress,
const int nczp,
double dr,
double* rcut,
const int isforce,
const int isstress,
const Grid_Technique& gridt,
const UnitCell& ucell);

Expand Down Expand Up @@ -208,7 +210,7 @@ void cal_init(ForceStressIat& f_s_iat,
const int cuda_block,
const int atom_num_grid,
const int max_size,
const ForceStressIatGlobal& f_s_iatg);
ForceStressIatGlobal& f_s_iatg);
/**
* @brief GridParameter memCpy,from Host to Device
*
Expand Down Expand Up @@ -247,7 +249,7 @@ void cal_mem_cpy(ForceStressIat& f_s_iat,
* @param atom_num_grid in force calculate,used for Block nums
*/
void cal_force_add(ForceStressIat& f_s_iat,
double* force,
std::vector<double>& force,
const int atom_num_grid);
/**
* @brief Stress Calculate on Host
Expand All @@ -258,7 +260,7 @@ void cal_force_add(ForceStressIat& f_s_iat,
* @param cuda_block in stress compute,used for Block nums
*/
void cal_stress_add(ForceStressIat& f_s_iat,
double* stress,
std::vector<double>& stress,
const int cuda_block);
} // namespace GintKernel
#endif
37 changes: 20 additions & 17 deletions source/module_hamilt_lcao/module_gint/gint_force_gpu.cu
Original file line number Diff line number Diff line change
Expand Up @@ -58,11 +58,13 @@ namespace GintKernel
void gint_gamma_force_gpu(hamilt::HContainer<double>* dm,
const double vfactor,
const double* vlocal,
double* force,
double* stress,
std::vector<double>& force,
std::vector<double>& stress,
const int nczp,
double dr,
double* rcut,
const int isforce,
const int isstress,
const Grid_Technique& gridt,
const UnitCell& ucell)
{
Expand Down Expand Up @@ -110,16 +112,14 @@ void gint_gamma_force_gpu(hamilt::HContainer<double>* dm,
dim3 block_dot_force(cuda_threads);
dim3 grid_dot(cuda_block);
dim3 block_dot(cuda_threads);

para_init(para, iter_num, nbz, gridt);
cal_init(f_s_iat,
para.stream_num,
cuda_block,
atom_num_grid,
max_size,
f_s_iat_dev);
checkCuda(cudaStreamSynchronize(gridt.streams[para.stream_num]));

/*gpu task compute in CPU */
gpu_task_generator_force(gridt,
ucell,
Expand Down Expand Up @@ -150,6 +150,7 @@ void gint_gamma_force_gpu(hamilt::HContainer<double>* dm,
para.stream_num);
checkCuda(cudaStreamSynchronize(gridt.streams[para.stream_num]));
/* cuda stream compute and Multiplication of multinomial matrices */

get_psi_force<<<grid_psi,
block_psi,
0,
Expand Down Expand Up @@ -194,9 +195,8 @@ void gint_gamma_force_gpu(hamilt::HContainer<double>* dm,
atom_pair_num,
gridt.streams[para.stream_num],
nullptr);

checkCuda(cudaStreamSynchronize(gridt.streams[para.stream_num]));
/* force compute in GPU */
if (isforce){
dot_product_force<<<grid_dot_force,
block_dot_force,
0,
Expand All @@ -210,10 +210,9 @@ void gint_gamma_force_gpu(hamilt::HContainer<double>* dm,
nwmax,
max_size,
gridt.psir_size / nwmax);
/* force compute in CPU*/
cal_force_add(f_s_iat, force, atom_num_grid);

}
/*stress compute in GPU*/
if (isstress){
dot_product_stress<<<grid_dot,
block_dot,
0,
Expand All @@ -227,17 +226,21 @@ void gint_gamma_force_gpu(hamilt::HContainer<double>* dm,
para.psir_dm_device,
f_s_iat.stress_device,
gridt.psir_size);
}
/* stress compute in CPU*/
cal_stress_add(f_s_iat, stress, cuda_block);
if (isstress){
cal_stress_add(f_s_iat, stress, cuda_block);
}
if (isforce){
cal_force_add(f_s_iat, force, atom_num_grid);
}
iter_num++;
delete[] f_s_iat.stress_host;
mohanchen marked this conversation as resolved.
Show resolved Hide resolved
delete[] f_s_iat.force_host;
delete[] f_s_iat.iat_host;
}
}
// cudaFree(f_s_iat.stress_device);
// cudaFree(f_s_iat.force_device);
// cudaFree(f_s_iat.iat_device);
delete[] f_s_iat.stress_host;
delete[] f_s_iat.force_host;
delete[] f_s_iat.iat_host;
delete[] denstiy_mat.density_mat_h;
mohanchen marked this conversation as resolved.
Show resolved Hide resolved
/*free variables in CPU host*/
for (int i = 0; i < gridt.nstreams; i++)
{
Expand Down
9 changes: 6 additions & 3 deletions source/module_hamilt_lcao/module_gint/grid_technique.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,8 @@ void Grid_Technique::set_pbc_grid(const int& ncx_in,
const int& nplane,
const int& startz_current,
const UnitCell& ucell,
const LCAO_Orbitals& orb)
const LCAO_Orbitals& orb,
const int num_stream)
{
ModuleBase::TITLE("Grid_Technique", "init");
ModuleBase::timer::tick("Grid_Technique", "init");
Expand Down Expand Up @@ -184,7 +185,7 @@ void Grid_Technique::set_pbc_grid(const int& ncx_in,
#if ((defined __CUDA) /* || (defined __ROCM) */)
if(GlobalV::device_flag == "gpu")
{
this->init_gpu_gint_variables(ucell,orb);
this->init_gpu_gint_variables(ucell,orb,num_stream);
}
#endif

Expand Down Expand Up @@ -638,12 +639,14 @@ void Grid_Technique::cal_trace_lo(const UnitCell& ucell)

#if ((defined __CUDA) /* || (defined __ROCM) */)

void Grid_Technique::init_gpu_gint_variables(const UnitCell& ucell,const LCAO_Orbitals &orb)
void Grid_Technique::init_gpu_gint_variables(const UnitCell& ucell,const LCAO_Orbitals &orb,const int num_stream)
{
if (is_malloced)
{
free_gpu_gint_variables(this->nat);
}
nstreams = num_stream;
streams=new cudaStream_t[nstreams];
mohanchen marked this conversation as resolved.
Show resolved Hide resolved
double ylmcoef[100];
ModuleBase::GlobalFunc::ZEROS(ylmcoef, 100);
for (int i = 0; i < 100; i++)
Expand Down
9 changes: 5 additions & 4 deletions source/module_hamilt_lcao/module_gint/grid_technique.h
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,8 @@ class Grid_Technique : public Grid_MeshBall
const int& nplane,
const int& startz_current,
const UnitCell& ucell,
const LCAO_Orbitals& orb);
const LCAO_Orbitals& orb,
const int num_stream);

/// number of elements(basis-pairs) in this processon
/// on all adjacent atoms-pairs(Grid division)
Expand Down Expand Up @@ -162,8 +163,8 @@ class Grid_Technique : public Grid_MeshBall
int atom_pair_mesh;
int atom_pair_nbz;

const int nstreams = 4;
cudaStream_t streams[4];
int nstreams ;
mohanchen marked this conversation as resolved.
Show resolved Hide resolved
cudaStream_t* streams;
// streams[nstreams]
// TODO it needs to be implemented through configuration files

Expand Down Expand Up @@ -229,7 +230,7 @@ class Grid_Technique : public Grid_MeshBall
matrix_multiple_func_type fastest_matrix_mul;

private:
void init_gpu_gint_variables(const UnitCell& ucell,const LCAO_Orbitals &orb);
void init_gpu_gint_variables(const UnitCell& ucell,const LCAO_Orbitals &orb,const int num_stream);
void free_gpu_gint_variables(int nat);

#endif
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -393,7 +393,7 @@ void cal_init(ForceStressIat& f_s_iat,
const int cuda_block,
const int atom_num_grid,
const int max_size,
const ForceStressIatGlobal& f_s_iat_dev)
ForceStressIatGlobal& f_s_iat_dev)
{
const int iat_min = -max_size - 1;
f_s_iat.stress_host = new double[6 * cuda_block];
Expand Down Expand Up @@ -572,7 +572,7 @@ void cal_mem_cpy(ForceStressIat& f_s_iat,
* @param atom_num_grid in force calculate,used for Block nums
*/
void cal_force_add(ForceStressIat& f_s_iat,
double* force,
std::vector<double>& force,
const int atom_num_grid)
{
checkCuda(cudaMemcpy(f_s_iat.force_host,
Expand Down Expand Up @@ -601,7 +601,7 @@ void cal_force_add(ForceStressIat& f_s_iat,
* @param cuda_block in stress compute,used for Block nums
*/
void cal_stress_add(ForceStressIat& f_s_iat,
double* stress,
std::vector<double>& stress,
const int cuda_block)
{
checkCuda(cudaMemcpy(f_s_iat.stress_host,
Expand Down