Thread 'Geppetto Test 001 - Segnalazioni e Commenti'

Author	Message
entity Send message Joined: 6 Feb 25 Posts: 8 Credit: 12,258 RAC: 876	Message 29 - Posted: 8 Feb 2025, 18:13:17 UTC - in response to Message 27. Observation: Running more than 1 concurrently causes boinc to change all other work to "Waiting to Run". This is on a 128 thread EPYC server. Evidently, boinc reserves 64 threads for each running work unit since they are specified as 64 CPUs. None of the work units actually use that many threads during execution. Has anyone tried running these on a system with less than 64 threads? I'm running the geppetto-hf-inference application. ID: 29 · Rating: 0 · rate: / Reply Quote

mmonnin Send message Joined: 8 Feb 25 Posts: 11 Credit: 506,452 RAC: 41,992	Message 31 - Posted: 8 Feb 2025, 19:57:52 UTC - in response to Message 26. In reply to manalog's message of 8 Feb 2025: This looks like a problem related with your Ubuntu distribution: perhaps it's too old (20.04) and Docker cannot be found into the archives. Did you try to sudo apt-get update before? Otherwise, you can use the official installation procedure from the Docker webpage: https://docs.docker.com/engine/install/ubuntu/: # Add Docker's official GPG key: sudo apt-get update sudo apt-get install ca-certificates curl sudo install -m 0755 -d /etc/apt/keyrings sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc sudo chmod a+r /etc/apt/keyrings/docker.asc # Add the repository to Apt sources: echo \ "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \ $(. /etc/os-release && echo "${UBUNTU_CODENAME:-$VERSION_CODENAME}") stable" \| \ sudo tee /etc/apt/sources.list.d/docker.list > /dev/null sudo apt-get update sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin However, I suspect that the wrapper is not going to run on Ubuntu 20.04: now it is compiled with a new version of GLIBC and probably will refuse too run on such an old distribution. Solving this problem is on my list, I need to recompile the wrapper so that it can runs even on older distribution. This was my issue I mentioned above. The first PC installed OK then I copied the command to others w/o checking that it actually completed. sudo apt-get install update then worked. I also had to sudo the usermod command A boinc app shouldn't require an OS upgrade, even a free one. ID: 31 · Rating: 0 · rate: / Reply Quote

mmonnin Send message Joined: 8 Feb 25 Posts: 11 Credit: 506,452 RAC: 41,992	Message 32 - Posted: 8 Feb 2025, 19:59:44 UTC - in response to Message 29. In reply to entity's message of 8 Feb 2025: Observation: Running more than 1 concurrently causes boinc to change all other work to "Waiting to Run". This is on a 128 thread EPYC server. Evidently, boinc reserves 64 threads for each running work unit since they are specified as 64 CPUs. None of the work units actually use that many threads during execution. Has anyone tried running these on a system with less than 64 threads? I'm running the geppetto-hf-inference application. In progress = 1904 Your queue = 1027 So no we haven't been able to run any. ID: 32 · Rating: 0 · rate: / Reply Quote

entity Send message Joined: 6 Feb 25 Posts: 8 Credit: 12,258 RAC: 876	Message 33 - Posted: 8 Feb 2025, 20:22:58 UTC - in response to Message 32. minerva-7bI-q6-llama-inference_162315_1739037874.610214_137_0: Ended with the following computational error: <core_client_version>8.0.4</core_client_version> <![CDATA[ <message> process exited with code 127 (0x7f, -129)</message> <stderr_txt> ../../projects/boinc.llmentor.org_LLMentorGrid/llama-boinc: error while loading shared libraries: libbz2.so.1.0: cannot open shared object file: No such file or directory </stderr_txt> ]]> I have libbz2 installed: ./usr/lib64/libbz2.so.1 ./usr/lib64/libbz2.so.1.0.8 Is it looking for a 32bit version? ID: 33 · Rating: 0 · rate: / Reply Quote

[VENETO] sabayonino Send message Joined: 6 Feb 25 Posts: 3 Credit: 114,786 RAC: 7,494	Message 35 - Posted: 8 Feb 2025, 22:03:29 UTC Hi guys All Minerva WUs fail with the same error. (unknown for me) http://boinc.llmentor.org/LLMentorGrid/result.php?resultid=41462 <core_client_version>8.1.0</core_client_version> <![CDATA[ <message> process exited with code 1 (0x1, -255)</message> <stderr_txt> </stderr_txt> ]]> ID: 35 · Rating: 0 · rate: / Reply Quote

[VENETO] sabayonino Send message Joined: 6 Feb 25 Posts: 3 Credit: 114,786 RAC: 7,494	Message 36 - Posted: 8 Feb 2025, 22:08:23 UTC - in response to Message 33. Last modified: 8 Feb 2025, 22:12:33 UTC In reply to entity's message of 8 Feb 2025: minerva-7bI-q6-llama-inference_162315_1739037874.610214_137_0: Ended with the following computational error: <core_client_version>8.0.4</core_client_version> <![CDATA[ <message> process exited with code 127 (0x7f, -129)</message> <stderr_txt> ../../projects/boinc.llmentor.org_LLMentorGrid/llama-boinc: error while loading shared libraries: libbz2.so.1.0: cannot open shared object file: No such file or directory </stderr_txt> ]]> I have libbz2 installed: ./usr/lib64/libbz2.so.1 ./usr/lib64/libbz2.so.1.0.8 Is it looking for a 32bit version? I think that libbz2 shared libraries should be linked to libbz2.so.1.0.8 or libbz2.so.1.0 is missing (inside the docker image ??) lrwxrwxrwx root root 15 B Thu Oct 31 20:27:30 2024  /usr/lib64/libbz2.so.1 ⇒ libbz2.so.1.0.8 lrwxrwxrwx root root 15 B Thu Oct 31 20:27:30 2024  /usr/lib64/libbz2.so.1.0 ⇒ libbz2.so.1.0.8 .rwxr-xr-x root root 77 KB Thu Oct 31 20:27:32 2024  /usr/lib64/libbz2.so.1.0.8 Or create a symlink pointed to : # ln -sf /usr/lib64/libbz2.so.1.0.8 /usr/lib64/libbz2.so.1.0 Or inside the /usr/lib64 directory [/usr/lib64] # ln -sf /usr/lib64/libbz2.so.1.0.8 libbz2.so.1.0 ID: 36 · Rating: 0 · rate: / Reply Quote

manalog Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Help desk expert Send message Joined: 22 Jan 25 Posts: 12 Credit: 56 RAC: 5	Message 37 - Posted: 8 Feb 2025, 23:54:04 UTC Last modified: 8 Feb 2025, 23:57:31 UTC Now the Minerva workunits should be fixed: they should be short (around 20 minutes on a rented 4 cores EPYC server), but still I have to fix the estimate FLOPS (so they self report as much much longer). The unique dependency of these workunits is "libgomp1". sudo apt-get install libgomp1 Even this dependency is temporary: my goal is to switch to static binaries so that every program can be run regardless of Linux version/distribution. However this is the maximum I was able to do in this (full) day working at the project. So to people who reports problems such as: - Old GLIBC - Missing libraries It's fine if you report them but imagine they are at the top of the list of problems to be fixed. I think I will recompile all the binaries to support at least Debian 9, so even older versions of Ubuntu such as 20.04 are going to be supported. Unfortunately I compiled everything on my personal machines with all the libraries up-to-date and I didn't think about the linking issue. To sum up, we currently have 2 app in testing: GEPPETTO DOCKER This application does inference on a very small GPT-2 like model using docker + hugging face transformers libraries. This is not going to be the preferred way to do inference, because it's slower. However, for me it's very important to have a working docker + python environment because most of the interpretability studies are going to be run using this modality because it would be hard to rewrite the C++ code from scratch every time... If I find a way to nicely distribute python packages, then we could ditch docker at all. I will try to get in contact with "pianoman" from MLC@Home. I remember he did it somehow. During the alpha stage of this app the most frequent errors I encountered on your hosts were: * Issues with docker, in more or less 80% of the cases they can be solved by following the instructions on the homepage. I am going to open in the next day a new official thread collecting all the docker issue (that is, container not starting at all) not related with the instructions mentioned on the homepage. Unfortunately I saw some of these more weird cases in this forum, in the Boinc.Italy forum and in some host and I still have to investigate what is happening. * Older GLIBC. As mentioned, this must be solved and it's easy to do. I just have to recompile everything, I have slow machine and slow internet at home so I need to go back into the office during the working days and do it from there. MINERVA LLAMA.CPP No docker required for this application! This applications does inference on a much larger model, a 7B Instruction Tuned model (details will arrive) with a q6 quantization. We are no longer using python + docker but an amazing C++ program adapted by myself to communicate with Boinc. So the integration should be much better (fraction done report, checkpoint...) and the goal will be that no dependencies at all are going to be needed. If you tried it in the last hour, please try it again: I updated the app and now it should work. The only requirement should be libgomp1 on a clean Ubuntu 24.04 setup (it's the only system I tried yet, a rented server). The libz2 error reported by entity it's strange, it should not happen, so I need to investigate. Please try this new app, checkpoints maybe will not work very well and the fractional report is a bit dumb at the moment, but it's something and, good thing, this app could in few weeks start to be useful to the scientific community: I have some colleagues who need to perform some inference experiment for their works, so I will propose to them to use the new Boinc app to run them. Be careful! As opposed to geppetto, that could run even on toasters, Minerva requires at least 6GB of Ram+disk+network to hold the model. So if you think your machine cannot handle it, you can untickle the app from the preferences. A question on the opposite side: do you think having a more "monstrous" app, like 16Gb of RAM+disk+network basic requirement can be too much for boinc or something that could be done and run by users who have powerful machines? Because scientifically it would be very useful, even if fewer people run it. [/code] ID: 37 · Rating: 0 · rate: / Reply Quote

entity Send message Joined: 6 Feb 25 Posts: 8 Credit: 12,258 RAC: 876	Message 38 - Posted: 9 Feb 2025, 4:11:24 UTC - in response to Message 36. In reply to [VENETO] sabayonino's message of 8 Feb 2025: In reply to entity's message of 8 Feb 2025: minerva-7bI-q6-llama-inference_162315_1739037874.610214_137_0: Ended with the following computational error: <core_client_version>8.0.4</core_client_version> <![CDATA[ <message> process exited with code 127 (0x7f, -129)</message> <stderr_txt> ../../projects/boinc.llmentor.org_LLMentorGrid/llama-boinc: error while loading shared libraries: libbz2.so.1.0: cannot open shared object file: No such file or directory </stderr_txt> ]]> I have libbz2 installed: ./usr/lib64/libbz2.so.1 ./usr/lib64/libbz2.so.1.0.8 Is it looking for a 32bit version? I think that libbz2 shared libraries should be linked to libbz2.so.1.0.8 or libbz2.so.1.0 is missing (inside the docker image ??) lrwxrwxrwx root root 15 B Thu Oct 31 20:27:30 2024  /usr/lib64/libbz2.so.1 ⇒ libbz2.so.1.0.8 lrwxrwxrwx root root 15 B Thu Oct 31 20:27:30 2024  /usr/lib64/libbz2.so.1.0 ⇒ libbz2.so.1.0.8 .rwxr-xr-x root root 77 KB Thu Oct 31 20:27:32 2024  /usr/lib64/libbz2.so.1.0.8 Or create a symlink pointed to : # ln -sf /usr/lib64/libbz2.so.1.0.8 /usr/lib64/libbz2.so.1.0 Or inside the /usr/lib64 directory [/usr/lib64] # ln -sf /usr/lib64/libbz2.so.1.0.8 libbz2.so.1.0 Thanks for the information, I was thinking about adding a link but was hesitant to do that out of concern I might be creating a maintenance headache. Sooner or later libbz2.so.1.0.8 will become libbz2.so.1.0.9 and the .8 version will probably go away leaving a dead link. When the issues come up I probably won't remember why that link was there. I'm going to ponder this a little more... ID: 38 · Rating: 0 · rate: / Reply Quote

Conan Send message Joined: 7 Feb 25 Posts: 5 Credit: 0 RAC: 0	Message 41 - Posted: 9 Feb 2025, 7:07:36 UTC - in response to Message 33. In reply to entity's message of 8 Feb 2025: minerva-7bI-q6-llama-inference_162315_1739037874.610214_137_0: Ended with the following computational error: <core_client_version>8.0.4</core_client_version> <![CDATA[ <message> process exited with code 127 (0x7f, -129)</message> <stderr_txt> ../../projects/boinc.llmentor.org_LLMentorGrid/llama-boinc: error while loading shared libraries: libbz2.so.1.0: cannot open shared object file: No such file or directory </stderr_txt> ]]> I have libbz2 installed: ./usr/lib64/libbz2.so.1 ./usr/lib64/libbz2.so.1.0.8 Is it looking for a 32bit version? I have just had 97 work units fail looking for this file which is already installed on the computer, I have set to No New Work till I can sort it out. Conan ID: 41 · Rating: 0 · rate: / Reply Quote

[VENETO] sabayonino Send message Joined: 6 Feb 25 Posts: 3 Credit: 114,786 RAC: 7,494	Message 45 - Posted: 9 Feb 2025, 8:14:14 UTC - in response to Message 38. Thanks for the information, I was thinking about adding a link but was hesitant to do that out of concern I might be creating a maintenance headache. Sooner or later libbz2.so.1.0.8 will become libbz2.so.1.0.9 and the .8 version will probably go away leaving a dead link. When the issues come up I probably won't remember why that link was there. I'm going to ponder this a little more...[/quote] You're right. It also depends on how the packages manager handles any dead links or libraries within the system tree. ID: 45 · Rating: 0 · rate: / Reply Quote

WTBroughton Send message Joined: 9 Feb 25 Posts: 1 Credit: 11,103 RAC: 1,055	Message 46 - Posted: 9 Feb 2025, 9:49:40 UTC I set up Linux Mint 22 in Virtualbox on a Windows 11 machine (Ryzen 9). I allocated 10 cores and 32Gbyte memory. Its been running Minerva tasks overnight with no errors. I know very little about Linux so I have unticked the other applications that require docker but will monitor your progress and add them in as necessary. Good luck with the project it looks very promising. ID: 46 · Rating: 0 · rate: / Reply Quote

fzs600 Send message Joined: 8 Feb 25 Posts: 3 Credit: 265,425 RAC: 14,480	Message 65 - Posted: 10 Feb 2025, 16:31:13 UTC - in response to Message 46. On this PC, all geppetto-hf-inference v1.00 (mt) files have been awaiting validation for 2 days. http://boinc.llmentor.org/LLMentorGrid/results.php?hostid=65&offset=0&show_names=0&state=2&appid= ID: 65 · Rating: 0 · rate: / Reply Quote

kotenok2000 Send message Joined: 8 Feb 25 Posts: 6 Credit: 52,226 RAC: 4,100	Message 80 - Posted: 14 Feb 2025, 22:30:32 UTC - in response to Message 37. You could also ask on Gpugrid discord server. They run Python workunits . ID: 80 · Rating: 0 · rate: / Reply Quote

kotenok2000 Send message Joined: 8 Feb 25 Posts: 6 Credit: 52,226 RAC: 4,100	Message 82 - Posted: 16 Feb 2025, 8:11:46 UTC - in response to Message 37. Turns out i am still in MLC@Home discord server. I have mentioned Pianoman there. ID: 82 · Rating: 0 · rate: / Reply Quote

manalog Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Help desk expert Send message Joined: 22 Jan 25 Posts: 12 Credit: 56 RAC: 5	Message 83 - Posted: 16 Feb 2025, 10:21:44 UTC Hi! The tests went great. During this week, I worked at the next scientific project: we are going to run the library "Transformer Lens" to do some analysis on transformers' internals. The topic is complicate and I need time to set everything up, the good thing is that it should be easy to integrate this library into Boinc because I can re-use the docker app that I used for "Geppetto". First we are going to run tests on GPT2 small just to replicate results already reported in scientific papers. Then we will move on other models and topic to finally make some new science. When these new WUs will be out, I am going to write a good explanation for everyone to understand what we are doing. During this week you will also see solved the following problems: - missing libraries (very easy to fix) - LLama.cpp threads issue: because I don't have a machine with many threads to test it, I didn't noticed that the llama.cpp application, when running on CPU (usually I ran the app on the university server with GPU instead), needs a command line parameters to decide the number of threads. I think it's easy to hook this command line parameter with the number of threads decided by Boinc Client (max threads of the machine or user's preferences) Regarding GPUs, the answer is a big yes. GPU support is definitely in the list and was planned since beginning. It should be not so hard to add GPU support in docker, but I am going to do it only after all the previous points will be solved, because it adds another layer of complexity and it's harder for me to debug as I don't have a Nvidia CUDA GPU on my personal machine. I need to do it on the HPC server or rent one. Please let me know if someone reaches Pianoman: somehow MLC@Home was able to run pytorch apps without docker, and it would be very good for this project (much easier for volunteers). Regarding validation of Geppetto, yes, the validator was turned off. Now I will run it manually so you are going to see the last WUs validated soon. ID: 83 · Rating: 0 · rate: / Reply Quote

kotenok2000 Send message Joined: 8 Feb 25 Posts: 6 Credit: 52,226 RAC: 4,100	Message 84 - Posted: 18 Feb 2025, 14:48:33 UTC Last modified: 18 Feb 2025, 14:51:25 UTC GPUGRID use python environments too. They have used Anaconda for it. I will ask on their discord server for help. <job_desc> <unzip_input> <zipfilename>windows_x86_64__cuda1121.zip</zipfilename> </unzip_input> <task> <application>Library/usr/bin/tar.exe</application> <command_line>xjvf input.tar.bz2</command_line> <setenv>PATH=$PWD/Library/usr/bin</setenv> <weight>1</weight> </task> <task> <application>C:/Windows/system32/cmd.exe</application> <command_line>/c call Scripts\activate.bat && Scripts\conda-unpack.exe && run.bat</command_line> <setenv>CUDA_DEVICE=$GPU_DEVICE_NUM</setenv> <setenv>TMPDIR=$PWD</setenv> <setenv>TEMP=$PWD</setenv> <setenv>TMP=$PWD</setenv> <setenv>HOME=$PWD</setenv> <setenv>PATH=$PWD\Library\bin</setenv> <setenv>SystemRoot=C:\Windows</setenv> <setenv>ComSpec=C:\Windows\system32\cmd.exe</setenv> <stdout_filename>run.log</stdout_filename> <weight>1000</weight> <fraction_done_filename>progress</fraction_done_filename> </task> </job_desc> ID: 84 · Rating: 0 · rate: / Reply Quote

boboviz Send message Joined: 6 Feb 25 Posts: 4 Credit: 262 RAC: 20	Message 85 - Posted: 21 Feb 2025, 7:43:13 UTC - in response to Message 83. Regarding GPUs, the answer is a big yes. GPU support is definitely in the list and was planned since beginning. This is a great news. But, if possible, i prefer a Windows app before.... :-) ID: 85 · Rating: 0 · rate: / Reply Quote

kotenok2000 Send message Joined: 8 Feb 25 Posts: 6 Credit: 52,226 RAC: 4,100	Message 86 - Posted: 25 Feb 2025, 10:36:50 UTC Last modified: 25 Feb 2025, 10:44:09 UTC I have found this: https://github.com/BOINC/boinc/wiki/PythonApps https://bitbucket-archive.softwareheritage.org/projects/je/jeremycowles/pyboinc.html ID: 86 · Rating: 0 · rate: / Reply Quote

boboviz Send message Joined: 6 Feb 25 Posts: 4 Credit: 262 RAC: 20	Message 87 - Posted: 10 Apr 2025, 8:38:33 UTC Is the project dead?? ID: 87 · Rating: 0 · rate: / Reply Quote

Ian-n-Steve C. Send message Joined: 9 Feb 25 Posts: 6 Credit: 15,141 RAC: 1,469	Message 88 - Posted: 12 Apr 2025, 20:14:12 UTC - in response to Message 87. In reply to boboviz's message of 10 Apr 2025: Is the project dead?? it's not even officially started yet. it was still in alpha when it had work a few months ago. ID: 88 · Rating: 0 · rate: / Reply Quote