This repository has been archived by the owner on Apr 17, 2019. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 2
/
petschpddm.tex
94 lines (73 loc) · 8.56 KB
/
petschpddm.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
%!TEX root = ./freefem++doc.tex
\subsection{HPDDM solvers}
The corresponding examples are in the directory \texttt{examples++-hpddm}. Real valued problems (diffusion, heat, elasticity and Stokes) and complex valued problems (Maxwell and Helmholtz) are given in both 2D and 3D. We detail here the 3D elasticity problem and the 3D time-dependent heat problem.
\begin{example}[elasticity-3d.edp]
A three dimensional elasticity problem is defined. The solver is a domain decomposition method. Domain decomposition methods are a natural framework for parallel computers. The scripts in the directory \texttt{examples++-hpddm} run on multicores computers (from 2 to tens of thousands of cores). Recall that like in any MPI code the number of MPI processes, \texttt{mpisize}, is given in the command line via the option \texttt{-np}. We focus on the script \texttt{elasticity-3d.edp} but the other scripts have the same structure. The command line to run the example on four processes with \texttt{ffglut} visualization is:\\
\texttt{ff-mpirun -np 4 elasticity-3d.edp -glut ffglut}
\end{example}
\def\parallelScript{hpddm/elasticity-3d.edp}
\lstinput[linerange=SchwarzMethod-SchwarzMethodEnd]{\parallelScript}
The macro \texttt{build} is of particular interest since it handles the data distribution among the \texttt{mpisize} MPI processes with the following steps:
\begin{itemize}
\item the initial mesh \texttt{ThGlobal} is partitioned by process 0 into \texttt{mpisize} submeshes
\item the partition is broadcasted to every process $i$ for $0 < i < \texttt{mpisize}$. From then on, all tasks are parallel.
\item each process creates the local submesh \texttt{Th} (if the refinement factor \texttt{s} defined via the option \texttt{-split} is larger than 1, each local edge is splitted into $s$ subedges, resulting in each element being split into $s^2$ element in 2D and $s^3$ elements in 3D) so that the collection of these submeshes is an overlapping domain decomposition of a refined mesh. The number of extra layers added to the initial partition is monitored by the option \texttt{-overlap}.
\item connectivity structures are created
\begin{itemize}
\item \texttt{D} is the diagonal of the local partition of unity (see below \S~\ref{sub:linear} for more details)
\item \texttt{arrayIntersection} is the list of neighbors of the current subdomain
\item For \texttt{j} in \texttt{arrayIntersection}, \texttt{restrictionIntersection[j]} is the list of the degrees of freedom that belong to the intersection of the current subdomain with its neighbor \texttt{j}.
\end{itemize}
\end{itemize}
Then, the variational formulation \texttt{vPb} of a three dimensional elasticity problem is used to assemble a local matrix \texttt{Mat}. This matrix along with \texttt{D}, \texttt{arrayIntersection} and \texttt{restrictionIntersection} are arguments for the constructor of the distributed matrix \texttt{A}. This is enough to solve the problem with a one-level additive Schwarz method which can be either ASM or RAS.\\
For some problems it is interesting to use optimized interface conditions. When there are many subdomains, it is usually profitable to add a second level to the solver. Options are set in the sequel of the script:
\lstinput[linerange=OsmTwolevel-OsmTwolevelEnd]{\parallelScript}
In the above line, the first option selects the one-level preconditioner \texttt{ras} (possible choices are \texttt{ras}, \texttt{oras}, \texttt{soras}, \texttt{asm}, \texttt{osm} or \texttt{none}), the second option selects the correction formula for the second level here \texttt{balanced} (possible options are \texttt{deflated}, \texttt{additive} or \texttt{balanced}), the third option selects right preconditioning, the fourth one is verbosity level of HPDDM (different from the one of FreeFem++), the fifth one prints all possible options of HPPDM and the last one specifies the number of coarse degrees of freedom per subdomain of the GENEO coarse space. All other options of \href{https://github.com/hpddm/hpddm/blob/master/doc/cheatsheet.pdf}{cheatsheet of the HPDDM} \cite{Jolivet:2014:HPD} library can be selected via the FreeFem++ function \texttt{set}.\\
% The first part deals with \texttt{OSM} methods. The variational form \texttt{vOptimized} is the one of the elasticity problem with an extra term on the interface labeled \texttt{fakeInterface}. The second part deals with the function \texttt{attachCoarseOperator} which constructs the coarse space based on solving generalized eigenvalue problems in the subdomains.
In the last part of the script, the global linear system is solved by the domain decomposition method defined above.
\lstinput[linerange=SolvePlot-SolvePlotEnd]{\parallelScript}
\subsubsection{Time dependent problem} % (fold)
\label{sub:time_dependent_problem}
\begin{example}[heat-3d.edp]
A three dimensional heat problem
\[
\frac{\partial u}{\partial t} - \Delta u = 1,\ \ \ u(0,\cdot) := 0 \text{ in }\Omega\,.
\]
is discretized by an implicit Euler scheme. At each time step $n$, we shall seek $u^n(x,y,z)$ satisfying for all $w\in H^1(\Omega)$:
\[
\int_\Omega \frac{u^n-u^{n-1}}{\delta t}\,w + \nabla u^n \nabla w = \int_\Omega w ,\ \ \ u^0 := 0 \text{ in }\Omega\,.
\]
so that at each time step a linear system
\[
(M+dt*K) u^n[] = M*u^{n-1}[] + \delta t*F
\]
is solved by a domain decomposition method where $M$ is the mass matrix and $K$ is the rigidity matrix. In order to save computational efforts, the domain decomposition method preconditioner is built only once and then reused for all subsequent solves with matrix $A:=M+dt*K$. The distributed matrix vector product with matrix $M$ is made through the call to the function \texttt{dmv} using the partition of unity associated to matrix $A$.
\end{example}
\def\parallelScript{hpddm/heat-3d.edp}
\lstinput[linerange=SolvePlot-SolvePlotEnd]{\parallelScript}
% subsubsection time_dependent_problem (end)
\subsubsection{Distributed vectors in HPDDM} % (fold)
\label{sub:linear}
We give here some hints on the way vectors are distributed among $np$ processes when using FreeFem++ interfaced with HPDDM. The set of degrees of freedom ${\mathcal N}$ is decomposed into $np$ overlapping sets $({\mathcal N}_i)_{1\le i\le np}$. A MPI-process is in charge of each subset. Let $n:=\#{\mathcal N}$ be the number of degrees of freedom of the global finite element space. Let $R_i$ denote the restriction operator from $\R^n$ onto $\R^{\#{\mathcal N}_i}$. We have also defined local diagonal matrices $D_i\in \R^{\#{\mathcal N}_i}\times \R^{\#{\mathcal N}_i}$ so that we have a partition of unity at the algebraic level:
\begin{equation}
\label{eq:hpddm:14}
{\mathbf U} = \sum_{i=1}^{np} R_i^T\,D_i\,R_i\,{\mathbf U}\ \ \ \ \forall\ {\mathbf U}\in\R^n\,.
\end{equation}
A global vector ${\mathbf U}\in\R^n$ is actually not stored. Rather, it is stored in a distributed way. Each process $i$, $1\le i\le N$, stores the local vector ${\mathbf U}_i:=R_i {\mathbf U}\in \R^{\#{\mathcal N}_i}$.
It is important to ensure that the result of all linear algebra operators applied to this representation are coherent.\\
As an example, consider the scalar product of two distributed vectors ${\mathbf U}, {\mathbf V} \in \mathbb{R}^{n}$. Using the partition of unity~\eqref{eq:hpddm:14}, we have:
\begin{align*}({\mathbf U}, {\mathbf V}) = \left({\mathbf U}, \sum_{i=1}^{np} R_i^T D_i R_i {\mathbf V}\right) &= \sum_{i=1}^{np} (R_i {\mathbf U}, D_i R_i {\mathbf V})\\
&=\sum_{i=1}^{np} \left({\mathbf U}_i, D_i {\mathbf V}_i\right)\,.
\end{align*}
Thus, the formula for the scalar product is:
\begin{equation*}
({\mathbf U}, {\mathbf V}) = \sum_{i = 1}^{np} (R_i {\mathbf U}, D_i R_i {\mathbf V})\,.
\end{equation*}
Local scalar products are performed concurrently. Thus, the implementation is parallel except for the sum which corresponds to a {\tt MPI\_Reduce} call across the $np$ MPI processes. Note also that the implementation relies on the knowledge of a partition of unity so that the FreeFem++ syntax is {\tt dscalprod(D,u,v)}.\\
A {\tt axpy} procedure $y \leftarrow \alpha\,x+y$ for $x,y\in \mathbb{R}^{n}$ and $\alpha\in\R$ is easily implemented concurrently for distributed vectors in the form:
\[
y_i \leftarrow \alpha\,x_i+y_i\,, \forall\ 1\le i \le np\,.
\]
The matrix vector product is more involved and details are given in the SIAM book \href{https://www.ljll.math.upmc.fr/nataf/OT144DoleanJolivetNataf_full.pdf}{An Introduction to Domain Decomposition Methods: algorithms, theory and parallel implementation} \cite{Dolean:2015:IDD} and even more details are given in \href{http://jolivet.perso.enseeiht.fr/thesis.pdf}{P.~Jolivet's PhD manuscrit}.
% subsubsection linear (end)
% subsection ddm_principles (end)