Reproducing experiments Memories of Forgotten Concepts

Resources and Dev Log

Problem: Problem: I’m required to use Google Colab, and the project didn’t run properly when I followed the original instructions. The aim of this project is to made changes, document, refactoring them and experimenting with new tools.

Lesson Learned:

  • I make myself more comfortable with the Python Conda tool.
  • Refactoring an environment.yml requires a lot of time because each time you have to re-run and re-install everything. Also conda seems slower respect to pip. Using conda install -n myenv package_name seems to help. It is useful to check on conda-forge.org packages version and name.
    • Right now, the conda environment crashes on google colab, so i’m installing packages with pip; pip is simply faster and give less issues
  • Google Drive is very punitive on sharing large files to others. With just 2-3 tries using gdown i received the message that i have to wait for 24 hours. Thanksfully i have saved data from my drive and i can use colab to load them. In fact to avoid wasting more time (after all, the scripts works) i will load from colab for the presentation. I read that you can technically use Google APIs to authenticate, but i haven’t tried.
  • I find some inefficient code, i.e in some parts the authors forgot to use torch.no_grad() and so it crashed with a L4 GPU.
  • I had to use A100 (equiv to H100) for many memories of an ablated image. Noneteless, it still requires about 2 hours for image, so i stopped after the first one.
  • Next time, to save time, as soon as you generate images, save them to google colab.
  • Google Colab is very limiting, even though i pay for a PRO, i have to keep pretending to use to avoid getting disconneted. For next project, i need a better solution for idle computing. It would save a lot of time.

Tasks

  • Reproduce code on Colab
  • Fork repository
  • Add changes (saved to drive) to repository
  • Use environments.yml instead of requirements.txt
  • Make a script folder and make scripts to download data, use code from sketchbook on drive
  • Edit the requirements file, use instead a environment.yml file with conda see Python Conda
  • Add gdown to configuration file
  • See if you can fix the “where to run it” and improve the README documentation steps
  • Add a folder with notebook and add here the notebook to run them + some additional explanation and images
  • For the presentation ideally i would like to reproduce 1 single experiment i.e 1 single image, apply diffusion and inverse diffusion and reproduce it
  • Add a project structure in the README, and also cite which part of the codes are taken from which repository and where exactly
  • In the clean notebook colab, add at the start some variables with the path to where to recover files (from my google drive).