Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply minimum linewidth constraint only to a binary design of the waveguide mode converter #32

Merged
merged 5 commits into from
Jan 10, 2023

Conversation

oskooi
Copy link
Collaborator

@oskooi oskooi commented Dec 5, 2022

It looks like we are able to generate some useful designs for the broadband waveguide mode converter. Thanks to @mochen4 for assistance in getting this to work. Some additional tuning is likely required to improve the results further.

This required three main changes to the original setup of the multiwavelength minimax topology optimization:

  1. Ensuring that the initial design of the final epoch (in which the minimum linewidth constraint is applied) is binary. We realized that trying to apply the linewidth constraint to a greyscale design will immediately ruin the performance and produce a poor final design. The change involved increasing the number of epochs to include three new β values of 64, 128, and 256.

  2. Turning on subpixel smoothing in the MaterialGrid due to the binarized designs.

  3. Applying damping to the MaterialGrid to penalize intermediate values.

The final design using a minimum linewidth constraint of 50 nm (shown below as an image and added to this repository as a CSV file) generated using these changes shows decent performance for the reflectance and transmittance across the six wavelengths:

1. reflectance into mode 1: wavelength (μm), reflectance, reflectance (dB)

refl:, 1.265, 0.000485, -33.139015
refl:, 1.270, 0.001061, -29.741837
refl:, 1.275, 0.002018, -26.950658
refl:, 1.285, 0.005592, -22.523949
refl:, 1.290, 0.008463, -20.724575
refl:, 1.295, 0.012185, -19.141852
worst-case reflectance (dB):, -19.141852

2. transmittance into mode 2: wavelength (μm), transmittance, transmittance (dB)

tran:, 1.265, 0.907546, -0.421314
tran:, 1.270, 0.906566, -0.426007
tran:, 1.275, 0.904175, -0.437477
tran:, 1.285, 0.894339, -0.484978
tran:, 1.290, 0.886602, -0.522712
tran:, 1.295, 0.876944, -0.570282
worst-case transmittance (dB):, -0.570282

Some notes regarding this data. The optimization history shown below (maximum of the objective function of $R+1-T$ over the six wavelengths vs. iteration number) shows a performance degradation in the final epoch (iterations 300 - 400). If we inspect the designs at the end of the last two epochs (β of 128 and 256), they differ only slightly. This indicates that even small changes to the design are enough to affect its performance. Also, there are unexpected "spikes" present in the optimization history. Could these be due to the inner iterations of the MMA algorithm?

It's possible that these results can be further improved by adjusting the two hyperparameters used in the minimum linewidth constraint function as well as the β scheduling.

The next steps are to generate designs for additional values of the minimum linewidths beyond 50 nm.

mode_converter_optimize

optimal_design_multi_frq_max_objfunc_hist

$\beta$ = 128 (next to last epoch) [note: not completely binarized yet]
multifreq_optimal_design_beta128

$\beta$ = 256 (final epoch)
multifrq_optimal_design_beta256

cc @smartalecH

@stevengj
Copy link
Contributor

stevengj commented Dec 6, 2022

Regarding the spikes in the convergence plot: this is a normal feature of nonlinear optimization algorithms. Sometimes, they take too large a step, which turns out to make the objective much worse, and then the algorithm has to somehow "backtrack" and take a smaller step (which in CCSA happens by increasing a penalty term).

This kind of thing is especially prominent in badly conditioned problems (with second derivatives in some directions much larger than in others), corresponding to optimizing along a "narrow ridge" or valley.

@smartalecH
Copy link

So even though your design is "pretty" binary, I'm a little worried we are declaring victory too early. Primarily because the linewidth constraints aren't truly in effect. The optimizer is working hard to satisfy them, but never gets to a point where it can return to improving performance.

take a look at some of our convergence plots in our paper (where "larger is better", unlike the current example where "smaller is better"):

image

image

image

Notice how the optimizer rebounds? The means two things: (1) the GLC constraints are satisfied; (2) the performance is actually being optimized with the constraints active.

Also notice the lower values of $\beta$. I know it's tempting to rely on large values for $\beta$ to enforce binarization. But gradients break down really quickly and make it impossible to optimize (unless you are using our subpixel smoothing trick, which would fix this issue completely).

@oskooi
Copy link
Collaborator Author

oskooi commented Dec 8, 2022

Notice how the optimizer rebounds? The means two things: (1) the GLC constraints are satisfied; (2) the performance is actually being optimized with the constraints active.

Good point. For a minimum lengthscale constraint of 90 nm (results shown below), it does seem that the optimizer improves the objective function in the final epoch (in which the constraint is activated). The final design also shows decent broadband performance.

As a separate issue, when I measure the minimum lengthscale of the 90 nm design using @mawc2019's ruler in this repository, I get a value of 63 nm. Since the design pixels are 10 nm in length, this means there is roughly a three-pixel mismatch between the glc constraint and the ruler. The image of the final design also does not appear to show sharp corners because the 1-pixel outer boundaries of the design region have been masked to preclude such features.

optimal_design_multi_frq_4a_max_objfunc_hist

mode_converter_90nm

meep_min_length_90nm

1. reflectance into mode 1

refl:, 1.265, 0.047344, -13.247368
refl:, 1.270, 0.040014, -13.977880
refl:, 1.275, 0.033221, -14.785876
refl:, 1.285, 0.021526, -16.670418
refl:, 1.290, 0.016720, -17.767528
refl:, 1.295, 0.012651, -18.978828
worst-case reflectance (dB):, -13.247368

2. transmittance into mode 2

tran:, 1.265, 0.686657, -1.632605
tran:, 1.270, 0.682052, -1.661827
tran:, 1.275, 0.675988, -1.700610
tran:, 1.285, 0.659137, -1.810243
tran:, 1.290, 0.648375, -1.881739
tran:, 1.295, 0.636363, -1.962949
worst-case transmittance (dB):, -1.962949

CSV file of final design:
mode_converter_90nm.csv

@mawc2019
Copy link
Collaborator

mawc2019 commented Dec 8, 2022

As a separate issue, when I measure the minimum lengthscale of the 90 nm design using @mawc2019's ruler in this repository, I get a value of 63 nm.

When I used another version of ruler, the result is 56.25 nm, which deviates further from the expected value. When I used
@mfschubert's version, the result is 100 nm, which matches well the the expected value. I will look into this issue in the next few days.

@smartalecH
Copy link

Ok so we are making progress. But as discussed, there's a few things to consider:

(1) in the last epoch, when we "calibrate" the dummy parameter (t) we need to take into account the values of the objective functions (the FOMs) and the GLC constraints themselves.

(2) this is the "fun part" of TO! Until somebody automates this part (any takers?) then each problem requires a bit of detective work to nail down the proper hyperparameters. For example, in addition to the dummy parameter, it's always good to look at the gradients of the FOMs and the constraint functions. The gradients will tell you a lot about what's going on... When trying to nail down a proper c0 value (for the GLC) you can look at the indicator functions themselves. I typically save images of all the gradients/inidcator functions at each optimization iteration so I can do a post mortem on the results.

@stevengj
Copy link
Contributor

stevengj commented Dec 8, 2022

  1. @mawc2019's ruler code does not know the boundary conditions. You have to manually extend the image with the waveguides on the 2 sides in order to take those into account.
  2. It looks like there is a clear lengthscale violation here when you add in the waveguides — it looks to me like a sharp corner at the left (and the right) at the junction with the waveguides. According to @smartalecH, masking by 1 pixel should handle this for you (it should filter as if that 1 pixel were infinitely extended), but according to @mochen4 there might be a problem with this — he padded to 20 pixels, and more generally padding/masking to 1 filter radius should suffice.

@mawc2019
Copy link
Collaborator

The underestimated minimum length scale given by ruler.py is due to the features in the red circles below.
image

If the left and right sides of the pattern are extended as follows, the minimum length scale estimated by ruler.py is 93.75 nm, which is close to the declared value, namely, 90 nm.
image

@mawc2019's ruler code does not know the boundary conditions. You have to manually extend the image with the waveguides on the 2 sides in order to take those into account.

In the current version of ruler.py, a design pattern is padded by its boundary values. Therefore, at least for the design pattern here, the code was expected to produce the correct result. However, if a design pattern contains some features that are close to the boundaries, something unexpected happens: the narrow zones between those features and the boundaries would give rise to an incorrect minimum length scale.

@oskooi
Copy link
Collaborator Author

oskooi commented Dec 13, 2022

By enlarging the mask at the edge of the design region from one pixel to the filter radius, the measured lengthscale of the final designs (using @mawc2019's ruler) are now more consistent with the imposed lengthscale constraint. Based on this change as well as adjustment of the constraint-function hyperparameters, I have added designs with lengthscale constraints of 50 nm, 60 nm, 70 nm, 80 nm, and 90 nm. The measured lengthscale is consistently larger than the imposed constraint except for the 80 nm case for which the measured lengthscale is 75 nm (which is within one pixel dimension). The performance metrics for each design are added to the README.md as well as an image.

@stevengj
Copy link
Contributor

Would be good to compare your objective function max(R + 1-T) to the same quantity for Ian's — ideally we should be doing at least as well as Ian's structure on this metric, since he's optimizing a somewhat different objective. If the max(R + 1-T) data is not easily accessible for Ian, you can bound it above by max(R + 1-T) ≤ max(R) + max(1-T).

@ianwilliamson
Copy link
Collaborator

Would be good to compare your objective function max(R + 1-T) to the same quantity for Ian's

It should be possible to compute this quantity since it depends on the reflection and transmission that are calculated by the script checked into this repository.

Another option for comparison here would be the quantities that the script in this repository reports: worst case transmission and worst case reflection.

@smartalecH
Copy link

As discussed, those flat regions in the beginning are odd. They could be due to the subpixel smoothing... they could be due to the damping factor, or some permutation of the two?

Also, that hard flat regions at the end of the optimization are somewhat discouraging... my experience has taught me to expect something much more smooth and organic (like an asymptotic convergence etc.)

So as discussed, we should look at $t$, the gradients, and the indicator functions to try to gain some intuition...

@oskooi
Copy link
Collaborator Author

oskooi commented Dec 18, 2022

(1) in the last epoch, when we "calibrate" the dummy parameter (t) we need to take into account the values of the objective functions (the FOMs) and the GLC constraints themselves.

Specifying an initial value for the epigraph variable based on the objective function and the GLC constraints seems to make the final design worse compared to setting the epigraph variable based on the objective function alone (at least for the test case involving a minimum lengthscale of 70 nm).

Why does this happen? As shown in the plot of the epigraph variable and objective function (R+1-T) vs. iteration number (second figure below), this is probably because the value of the GLC constraint is two orders of magnitude larger than the objective function (~100 vs. ~1). With such a large disparity in magnitude, the optimizer is mainly working to satisfy the GLC constraint rather than minimizing the objective function.

As discussed, those flat regions in the beginning are odd. They could be due to the subpixel smoothing... they could be due to the damping factor, or some permutation of the two?

I tried modifying the setup such that damping and subpixel smoothing were applied separately rather than together but this did not seem to affect the results. Specifically, I applied damping and no subpixel smoothing for the first two epochs (β=8, 16) in which the design is mostly greyscale and subpixel smoothing (and no damping) for the remaining epochs (β = 32, 64, 128, and 256) in which the design is mostly binarized.

optimal_design_multi_frq_max_objfunc_hist_1

optimal_design_multi_frq_max_objfunc_hist_2

final design
meep_min_length_70nm_f

reflectance/transmittance spectra of final design (performance is worse than reference design in this PR)

refl:, 1.265, 0.180689, -7.430678
refl:, 1.270, 0.176486, -7.532888
refl:, 1.275, 0.172268, -7.637961
refl:, 1.285, 0.163559, -7.863258
refl:, 1.290, 0.159343, -7.976674
refl:, 1.295, 0.155480, -8.083249
worst-case reflectance (dB):, -7.430678
tran:, 1.265, 0.556878, -2.542402
tran:, 1.270, 0.564747, -2.481457
tran:, 1.275, 0.570994, -2.433684
tran:, 1.285, 0.579970, -2.365943
tran:, 1.290, 0.582900, -2.344057
tran:, 1.295, 0.584649, -2.331048
worst-case transmittance (dB):, -2.542402
diff --git a/waveguide_mode_converter/mode_converter_meep_opt.py b/waveguide_mode_converter/mode_converter_meep_opt.py
index bcfdb6b..3aa26d1 100644
--- a/waveguide_mode_converter/mode_converter_meep_opt.py
+++ b/waveguide_mode_converter/mode_converter_meep_opt.py
@@ -183,7 +183,7 @@ def c(result, x, gradient, eta, beta):
 
     # backpropagate the gradients through mapping function
     my_grad = np.zeros(dJ_du.shape)
-    for k in range(opt.nf):
+    for k in range(len(frqs)):
         my_grad[:, k] = tensor_jacobian_product(mapping, 0)(
             v,
             eta,
@@ -242,6 +242,13 @@ def glc(result, x, gradient, beta):
     gradient[0,1:] = g1.flatten()
     gradient[1,1:] = g2.flatten()
 
+    t1 = (M1(v) - b1) / a1
+    t2 = (M2(v) - b1) / a1
+
+    print(f"glc:, {result[0]}, {result[1]}, {t1}, {t2}")
+
+    return max(t1, t2)
+
 
 def straight_waveguide():
     """Computes the DFT fields from the mode source in a straight waveguide
@@ -301,12 +308,15 @@ def straight_waveguide():
     return input_flux, input_flux_data
 
 
-def mode_converter_optimization(input_flux, input_flux_data):
+def mode_converter_optimization(input_flux, input_flux_data, use_damping,
+                                use_epsavg):
     """Sets up the adjoint optimization of the waveguide mode converter.
 
     Args:
       input_flux: 1darray of DFT fields from normalization run.
       input_flux_data: DFT fields object returned by `get_flux_data`.
+      use_damping: whether to use the damping feature of `MaterialGrid`.
+      use_epsavg: whether to use subpixel smoothing in `MaterialGrid`.
 
     Returns:
       A `meep.adjoint.OptimizationProblem` class object.
@@ -316,8 +326,8 @@ def mode_converter_optimization(input_flux, input_flux_data):
         SiO2,
         Si,
         weights=np.ones((Nx,Ny)),
-        do_averaging=True,
-        damping=0.05*2*np.pi*fcen,
+        do_averaging=True if use_epsavg else False,
+        damping=0.02*2*np.pi*fcen if use_damping else 0,
     )
 
     matgrid_region = mpa.DesignRegion(
@@ -410,8 +420,6 @@ def mode_converter_optimization(input_flux, input_flux_data):
 if __name__ == '__main__':
     input_flux, input_flux_data = straight_waveguide()
 
-    opt = mode_converter_optimization(input_flux, input_flux_data)
-
     algorithm = nlopt.LD_MMA
 
     # number of design parameters
@@ -437,9 +445,10 @@ if __name__ == '__main__':
     epivar_history = []
     cur_iter = [0]
 
+    beta_thresh = 16
     betas = [8, 16, 32, 64, 128, 256]
     max_evals = [80, 100, 120, 130, 150, 100]
-    tol_epi = np.array([1e-4] * opt.nf)
+    tol_epi = np.array([1e-4] * len(frqs))
     tol_lw = np.array([1e-4] * 2)
     for beta, max_eval in zip(betas, max_evals):
         solver = nlopt.opt(algorithm, n + 1)
@@ -452,10 +461,21 @@ if __name__ == '__main__':
             lambda r, x, g: c(r, x, g, eta_i, beta),
             tol_epi,
         )
+
+        opt = mode_converter_optimization(
+            input_flux,
+            input_flux_data,
+            True, # use_damping
+            False if beta <= beta_thresh else True, # use_epsavg
+        )
+
         # apply the minimum linewidth constraint
         # only in the final "epoch" to an initial
         # binary design from the previous epoch
         if beta == betas[-1]:
+            res = np.zeros(2)
+            grd = np.zeros((2,n+1))
+            t = glc(res, x, grd, beta)
             solver.add_inequality_mconstraint(
                 lambda r, x, g: glc(r, x, g, beta),
                 tol_lw,
@@ -464,12 +484,16 @@ if __name__ == '__main__':
         # execute a single forward run before the start of each
         # epoch and manually set the initial epigraph variable to
         # slightly larger than the largest value of the objective
-        # function over the six wavelengths.
+        # function over the six wavelengths (and the lengthscale constraint).
         t0 = opt(
             [mapping(x[1:], eta_i, beta)],
             need_gradient=False,
         )
-        x[0] = np.amax(t0[0]) + 0.1
+        x[0] = np.amax(t0[0])
+        if beta == betas[-1]:
+            x[0] = 1.05 * max(x[0], t)
+        else:
+            x[0] = x[0] + 0.1
         print(f"data:, {beta}, {t0[0]}, {x[0]}")
 
         x[:] = solver.optimize(x)

@mawc2019
Copy link
Collaborator

As discussed, those flat regions in the beginning are odd. They could be due to the subpixel smoothing... they could be due to the damping factor, or some permutation of the two?

I tried modifying the setup such that damping and subpixel smoothing were applied separately rather than together but this did not seem to affect the results.

I maintain that the initial flat region is due to the incorrect adjoint gradients around the initial guess, i.e., x=0.5*np.ones(Nx*Ny), even with do_averaging=False and damping=0, as tested here.
Fortunately, the adjoint gradients are correct at a random design pattern (and maybe also correct at many other design patterns). Consequently, if the initial guess is a random structure, the initial flat region does not appear, as demonstrated here.
(Maybe in some cases, the initial flat region does not appear even though the adjoint gradients are incorrect, but such a lucky situation does not seem to happen every time.)

@oskooi
Copy link
Collaborator Author

oskooi commented Dec 19, 2022

Consequently, if the initial guess is a random structure, the initial flat region does not appear

Even with a random rather than a uniform initial structure, the "flat region" at the start of the first epoch is still present.

@smartalecH
Copy link

The best thing to do at this point is to closely look at all the outputs (gradients, indicator functions, FOM values, $t$, etc.) and try to piece together what's going on (and why).

The fact that the dummy parameter started to go down, and then flat-lines shows that something is going on. Looking at all the data is the only way to figure out what that something is.

@mawc2019
Copy link
Collaborator

Even with a random rather than a uniform initial structure, the "flat region" at the start of the first epoch is still present.

Perhaps my previous test on a random structure was not presentative. What is your random seed? I can test the gradients related to your random structure.

@smartalecH
Copy link

As discussed, we could try "scheduling" the a1 parameter (like we do with $\beta$). For example, start with a relatively high a1 (e.g. 0.1) and gradually ramp it down (0.01, 0.001, etc).

I still think there is something else going on here...

@smartalecH
Copy link

We also discussed the importance of checking the gradient as we go... there are some obvious checks you can perform just by looking at it (weird nans, asymmetries, etc) but it's also good to do some actual checks (e.g. with a directional derivative) to gauge things quantitatively.

@stevengj
Copy link
Contributor

Another thing to do is to set opt.set_param("verbosity", 1) to see the internal progress of the MMA or CCSA algorithm, to see if it is getting stuck on inner iterations (e.g. if the "rho" penalty parameter is increasing without bound). See the internal parameters for MMA in the nlopt docs.

@oskooi
Copy link
Collaborator Author

oskooi commented Dec 23, 2022

We also discussed the importance of checking the gradient as we go... there are some obvious checks you can perform just by looking at it (weird nans, asymmetries, etc)

There seems to be an inconsistency between the final design (binarized) and its gradient. While the $L_\infty$ norm of the gradient does not reveal NaNs, etc, the regions in which its magnitude is largest does not seem to correlate with the topology of the final design. In fact, there seems to be practically no correlation between the final design and its gradient (or sensitivity map).

We would expect the gradient map to be mostly nonzero along the boundaries where the design transitions discontinuously from a weight of 0 to 1.

1. final design

optimal_design_beta256

2. gradient for a single wavelength [needs to be rotated 90° CW to be consistent with the design in (1)]

gradient_iter396_frq2

3. objective function history (maximum over six wavelengths)

optimal_design_multi_frq_max_objfunc_hist

@smartalecH
Copy link

Try looking at the gradient before backpropagating the smoothing filter

Maybe superimpose it on the structure (and make sure your image rotations are consistent etc)

@oskooi
Copy link
Collaborator Author

oskooi commented Jan 3, 2023

Five major changes which taken together seem to produce better results:

  1. The objective function is now [R,1-T] across the six wavelengths rather than R+1-T.
  2. The edge of the design region is padded by a filter radius rather than a single pixel which produces measured length scales of the final design that are more consistent with the imposed constraint.
  3. The initial value of the epigraph variable of the final epoch (in which the linewidth constraint is imposed) now takes into account the value of the constraint itself to ensure a feasible starting point (previously the choice of the initial epigraph variable only involved the objective function).
  4. The GLC hyperparameters have been modified to produce final designs which do not significantly degrade the unconstrained designs at the start of the final epoch.
  5. Damping of the design weights is used for the early epochs in which the design is mostly greyscale and subpixel averaging of the design weights is used in the later epochs in which the design is mostly binarized.

@oskooi
Copy link
Collaborator Author

oskooi commented Jan 10, 2023

I have added an additional set of results for designs with imposed minimum feature sizes of 100 nm, 125 nm, 150 nm, 175 nm, 200 nm, and 225 nm. The results in the README.md have been updated to include only the worst-case reflectance and transmittance values for each design (rather than the per-wavelength values).

These results, however, are a bit inconsistent: e.g., the design with minimum feature size of 200 nm has a measured feature size of 325 nm and significantly outperforms the design with 175 nm feature size with measured feature size of 175 nm . This suggests that the results can probably be improved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants