Skip to content

Commit

Permalink
concatenate by channel
Browse files Browse the repository at this point in the history
  • Loading branch information
johndpope committed Mar 25, 2024
1 parent 00206f0 commit bbbad33
Show file tree
Hide file tree
Showing 8 changed files with 1,887 additions and 328 deletions.
2 changes: 1 addition & 1 deletion Net.py
Original file line number Diff line number Diff line change
Expand Up @@ -687,7 +687,7 @@ def forward(self, x):


# given an image - spit out the mask

# I dont think we need this - https://github.com/johndpope/Emote-hack/issues/28
# Instantiate the model
# model = FaceLocator()

Expand Down
31 changes: 12 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@ The heavy lifting now is implementing the denoise of unet/ integrating attention
- **AnimateAnyone** - https://github.com/jimmyl02/animate/tree/main/animate-anyone
3 training stages here
https://github.com/jimmyl02/animate/tree/main/animate-anyone
- **DiffusedHeads** - (no training code) https://github.com/MStypulkowski/diffused-heads

While this is using poseguider - it's not hard to see a dwpose / facial driving the animation. https://www.reddit.com/r/StableDiffusion/comments/1281iva/new_controlnet_face_model/?rdt=50313&onetap_auto=true


Expand All @@ -51,6 +53,8 @@ ideally the network would take a sound (wav2vec stuff) - and show an facial expr

## Face Locator:
The face locator is a separate module that learns to detect and localize the face region in a single input image.It takes a reference image as input and outputs the corresponding face region mask.(DRAFTED - train_stage_0.py)
UPDATE - I think we can substitute this work for Alibaba's existing trained model (6.8gb) to drop in replace and provide mask conditioning https://github.com/johndpope/Emote-hack/issues/28


## Speed Encoder:
The speed encoder takes the audio waveform as input and extracts speed embeddings.
Expand Down Expand Up @@ -130,29 +134,14 @@ Note: The sample includes rich tagging. For more details, see `./data/test.json`


### Models / architecture

(flux)



```javascript

-✅ FramesEncodingVAE
- __init__(input_channels, latent_dim, img_size, reference_net)
- reparameterize(mu, logvar)
- forward(reference_image, motion_frames, speed_value)
- vae_loss(recon_frames, reference_image, motion_frames, reference_mu, reference_logvar, motion_mu, motion_logvar)

- DownsampleBlock
- __init__(in_channels, out_channels)
- forward(x)

- UpsampleBlock
- __init__(in_channels, out_channels)
- forward(x1, x2)

- ✅ ReferenceNet
- __init__(vae_model, speed_encoder, config)
- forward(reference_image, motion_frames, head_rotation_speed)
- __init__(self, config, reference_unet, denoising_unet, vae, dtype)
- forward(self, reference_image, motion_features, timesteps)

- ✅ SpeedEncoder
- __init__(num_speed_buckets, speed_embedding_dim)
Expand Down Expand Up @@ -216,5 +205,9 @@ Note: The sample includes rich tagging. For more details, see `./data/test.json`
- has some training code
```


magicanimate code - it has custom blocks for unet - maybe very useful when wiring up the attentions in unet.
```javascript
- EMOAnimationPipeline (copied from magicanimate)
- has some training code / this should not need text encoder / clip to aling with EMO paper.
```

1 change: 1 addition & 0 deletions configs/training/stage0.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ training:
learning_rate: 1.0e-5
num_epochs: 2
use_gpu_video_tensor: True
video_data_dir: '/home/oem/Downloads/CelebV-HQ/celebvhq/35666'
solver:
gradient_accumulation_steps: 1
mixed_precision: 'fp16'
Expand Down
1 change: 1 addition & 0 deletions configs/training/stage1.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ training:
num_epochs: 2
use_gpu_video_tensor: True
prev_frames: 2 # Add this line to specify the number of previous frames to consider
video_data_dir: '/home/oem/Downloads/CelebV-HQ/celebvhq/35666'

solver:
gradient_accumulation_steps: 1
Expand Down
Loading

0 comments on commit bbbad33

Please sign in to comment.