Reproducing experiments Memories of Forgotten Concepts
Resources and Dev Log
- Paper: Memories of Forgotten Concepts;
- The original repository for the project;
- My Fork - note i have no plan to merge or make contributions to the original one;
- I did this because i had to make a presentation & reproduce experiments for my exam, see: CV - Exam - Paper Presentation
- I use this note to decouple tasks from presentation
Problem: Problem: I’m required to use Google Colab, and the project didn’t run properly when I followed the original instructions. The aim of this project is to made changes, document, refactoring them and experimenting with new tools.
Lesson Learned:
- I make myself more comfortable with the Python Conda tool.
- Refactoring an
environment.ymlrequires a lot of time because each time you have to re-run and re-install everything. Also conda seems slower respect to pip. Usingconda install -n myenv package_nameseems to help. It is useful to check on conda-forge.org packages version and name.- Right now, the conda environment crashes on google colab, so i’m installing packages with pip; pip is simply faster and give less issues
- Google Drive is very punitive on sharing large files to others. With just 2-3 tries using
gdowni received the message that i have to wait for 24 hours. Thanksfully i have saved data from my drive and i can use colab to load them. In fact to avoid wasting more time (after all, the scripts works) i will load from colab for the presentation. I read that you can technically use Google APIs to authenticate, but i haven’t tried. - I find some inefficient code, i.e in some parts the authors forgot to use
torch.no_grad()and so it crashed with a L4 GPU. - I had to use A100 (equiv to H100) for many memories of an ablated image. Noneteless, it still requires about 2 hours for image, so i stopped after the first one.
- Next time, to save time, as soon as you generate images, save them to google colab.
- Google Colab is very limiting, even though i pay for a PRO, i have to keep pretending to use to avoid getting disconneted. For next project, i need a better solution for idle computing. It would save a lot of time.
Tasks
- Reproduce code on Colab
- Fork repository
- Add changes (saved to drive) to repository
- Use
environments.ymlinstead of requirements.txt - Make a script folder and make scripts to download data, use code from sketchbook on drive
- Edit the requirements file, use instead a environment.yml file with conda see Python Conda
- Add gdown to configuration file
- See if you can fix the “where to run it” and improve the README documentation steps
- Add a folder with notebook and add here the notebook to run them + some additional explanation and images
-
For the presentation ideally i would like to reproduce 1 single experiment i.e 1 single image, apply diffusion and inverse diffusion and reproduce it - Add a project structure in the README, and also cite which part of the codes are taken from which repository and where exactly
- In the clean notebook colab, add at the start some variables with the path to where to recover files (from my google drive).