Thread 'Geppetto Test 001 - Segnalazioni e Commenti'

Message boards : Number crunching : Geppetto Test 001 - Segnalazioni e Commenti
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
entity

Send message
Joined: 6 Feb 25
Posts: 8
Credit: 12,258
RAC: 876
Message 29 - Posted: 8 Feb 2025, 18:13:17 UTC - in response to Message 27.  

Observation: Running more than 1 concurrently causes boinc to change all other work to "Waiting to Run". This is on a 128 thread EPYC server. Evidently, boinc reserves 64 threads for each running work unit since they are specified as 64 CPUs. None of the work units actually use that many threads during execution. Has anyone tried running these on a system with less than 64 threads? I'm running the geppetto-hf-inference application.
ID: 29 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 8 Feb 25
Posts: 11
Credit: 506,452
RAC: 41,992
Message 31 - Posted: 8 Feb 2025, 19:57:52 UTC - in response to Message 26.  

In reply to manalog's message of 8 Feb 2025:
This looks like a problem related with your Ubuntu distribution: perhaps it's too old (20.04) and Docker cannot be found into the archives.
Did you try to
sudo apt-get update
before?

Otherwise, you can use the official installation procedure from the Docker webpage: https://docs.docker.com/engine/install/ubuntu/:
# Add Docker's official GPG key:
sudo apt-get update
sudo apt-get install ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc

# Add the repository to Apt sources:
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
  $(. /etc/os-release && echo "${UBUNTU_CODENAME:-$VERSION_CODENAME}") stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update

 sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin


However, I suspect that the wrapper is not going to run on Ubuntu 20.04: now it is compiled with a new version of GLIBC and probably will refuse too run on such an old distribution. Solving this problem is on my list, I need to recompile the wrapper so that it can runs even on older distribution.


This was my issue I mentioned above. The first PC installed OK then I copied the command to others w/o checking that it actually completed.
sudo apt-get install update then worked.
I also had to sudo the usermod command

A boinc app shouldn't require an OS upgrade, even a free one.
ID: 31 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 8 Feb 25
Posts: 11
Credit: 506,452
RAC: 41,992
Message 32 - Posted: 8 Feb 2025, 19:59:44 UTC - in response to Message 29.  

In reply to entity's message of 8 Feb 2025:
Observation: Running more than 1 concurrently causes boinc to change all other work to "Waiting to Run". This is on a 128 thread EPYC server. Evidently, boinc reserves 64 threads for each running work unit since they are specified as 64 CPUs. None of the work units actually use that many threads during execution. Has anyone tried running these on a system with less than 64 threads? I'm running the geppetto-hf-inference application.


In progress = 1904
Your queue = 1027

So no we haven't been able to run any.
ID: 32 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
entity

Send message
Joined: 6 Feb 25
Posts: 8
Credit: 12,258
RAC: 876
Message 33 - Posted: 8 Feb 2025, 20:22:58 UTC - in response to Message 32.  

minerva-7bI-q6-llama-inference_162315_1739037874.610214_137_0:

Ended with the following computational error:

<core_client_version>8.0.4</core_client_version>
<![CDATA[
<message>
process exited with code 127 (0x7f, -129)</message>
<stderr_txt>
../../projects/boinc.llmentor.org_LLMentorGrid/llama-boinc: error while loading shared libraries: libbz2.so.1.0: cannot open shared object file: No such file or directory

</stderr_txt>
]]>

I have libbz2 installed:

./usr/lib64/libbz2.so.1
./usr/lib64/libbz2.so.1.0.8

Is it looking for a 32bit version?
ID: 33 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[VENETO] sabayonino

Send message
Joined: 6 Feb 25
Posts: 3
Credit: 114,786
RAC: 7,494
Message 35 - Posted: 8 Feb 2025, 22:03:29 UTC

Hi guys
All Minerva WUs fail with the same error. (unknown for me)
http://boinc.llmentor.org/LLMentorGrid/result.php?resultid=41462

<core_client_version>8.1.0</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)</message>
<stderr_txt>

</stderr_txt>
]]>
ID: 35 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[VENETO] sabayonino

Send message
Joined: 6 Feb 25
Posts: 3
Credit: 114,786
RAC: 7,494
Message 36 - Posted: 8 Feb 2025, 22:08:23 UTC - in response to Message 33.  
Last modified: 8 Feb 2025, 22:12:33 UTC

In reply to entity's message of 8 Feb 2025:
minerva-7bI-q6-llama-inference_162315_1739037874.610214_137_0:

Ended with the following computational error:

<core_client_version>8.0.4</core_client_version>
<![CDATA[
<message>
process exited with code 127 (0x7f, -129)</message>
<stderr_txt>
../../projects/boinc.llmentor.org_LLMentorGrid/llama-boinc: error while loading shared libraries: libbz2.so.1.0: cannot open shared object file: No such file or directory

</stderr_txt>
]]>

I have libbz2 installed:

./usr/lib64/libbz2.so.1
./usr/lib64/libbz2.so.1.0.8

Is it looking for a 32bit version?


I think that libbz2 shared libraries should be linked to libbz2.so.1.0.8 or libbz2.so.1.0 is missing (inside the docker image ??)

lrwxrwxrwx root root 15 B  Thu Oct 31 20:27:30 2024  /usr/lib64/libbz2.so.1 ⇒ libbz2.so.1.0.8
lrwxrwxrwx root root 15 B  Thu Oct 31 20:27:30 2024  /usr/lib64/libbz2.so.1.0 ⇒ libbz2.so.1.0.8
.rwxr-xr-x root root 77 KB Thu Oct 31 20:27:32 2024  /usr/lib64/libbz2.so.1.0.8


Or create a symlink pointed to :
# ln -sf /usr/lib64/libbz2.so.1.0.8 /usr/lib64/libbz2.so.1.0


Or inside the /usr/lib64 directory
[/usr/lib64] # ln -sf /usr/lib64/libbz2.so.1.0.8 libbz2.so.1.0
ID: 36 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
manalog
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Help desk expert

Send message
Joined: 22 Jan 25
Posts: 12
Credit: 56
RAC: 5
Message 37 - Posted: 8 Feb 2025, 23:54:04 UTC
Last modified: 8 Feb 2025, 23:57:31 UTC

Now the Minerva workunits should be fixed: they should be short (around 20 minutes on a rented 4 cores EPYC server), but still I have to fix the estimate FLOPS (so they self report as much much longer). The unique dependency of these workunits is "libgomp1".
sudo apt-get install libgomp1


Even this dependency is temporary: my goal is to switch to static binaries so that every program can be run regardless of Linux version/distribution. However this is the maximum I was able to do in this (full) day working at the project.

So to people who reports problems such as:
- Old GLIBC
- Missing libraries
It's fine if you report them but imagine they are at the top of the list of problems to be fixed. I think I will recompile all the binaries to support at least Debian 9, so even older versions of Ubuntu such as 20.04 are going to be supported. Unfortunately I compiled everything on my personal machines with all the libraries up-to-date and I didn't think about the linking issue.


To sum up, we currently have 2 app in testing:

GEPPETTO DOCKER

This application does inference on a very small GPT-2 like model using docker + hugging face transformers libraries. This is not going to be the preferred way to do inference, because it's slower. However, for me it's very important to have a working docker + python environment because most of the interpretability studies are going to be run using this modality because it would be hard to rewrite the C++ code from scratch every time... If I find a way to nicely distribute python packages, then we could ditch docker at all. I will try to get in contact with "pianoman" from MLC@Home. I remember he did it somehow.
During the alpha stage of this app the most frequent errors I encountered on your hosts were:
* Issues with docker, in more or less 80% of the cases they can be solved by following the instructions on the homepage. I am going to open in the next day a new official thread collecting all the docker issue (that is, container not starting at all) not related with the instructions mentioned on the homepage. Unfortunately I saw some of these more weird cases in this forum, in the Boinc.Italy forum and in some host and I still have to investigate what is happening.
* Older GLIBC. As mentioned, this must be solved and it's easy to do. I just have to recompile everything, I have slow machine and slow internet at home so I need to go back into the office during the working days and do it from there.

MINERVA LLAMA.CPP
No docker required for this application!
This applications does inference on a much larger model, a 7B Instruction Tuned model (details will arrive) with a q6 quantization. We are no longer using python + docker but an amazing C++ program adapted by myself to communicate with Boinc. So the integration should be much better (fraction done report, checkpoint...) and the goal will be that no dependencies at all are going to be needed.
If you tried it in the last hour, please try it again: I updated the app and now it should work.

The only requirement should be libgomp1 on a clean Ubuntu 24.04 setup (it's the only system I tried yet, a rented server).
The libz2 error reported by entity it's strange, it should not happen, so I need to investigate.

Please try this new app, checkpoints maybe will not work very well and the fractional report is a bit dumb at the moment, but it's something and, good thing, this app could in few weeks start to be useful to the scientific community: I have some colleagues who need to perform some inference experiment for their works, so I will propose to them to use the new Boinc app to run them.

Be careful! As opposed to geppetto, that could run even on toasters, Minerva requires at least 6GB of Ram+disk+network to hold the model. So if you think your machine cannot handle it, you can untickle the app from the preferences.
A question on the opposite side: do you think having a more "monstrous" app, like 16Gb of RAM+disk+network basic requirement can be too much for boinc or something that could be done and run by users who have powerful machines? Because scientifically it would be very useful, even if fewer people run it.

[/code]
ID: 37 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
entity

Send message
Joined: 6 Feb 25
Posts: 8
Credit: 12,258
RAC: 876
Message 38 - Posted: 9 Feb 2025, 4:11:24 UTC - in response to Message 36.  

In reply to [VENETO] sabayonino's message of 8 Feb 2025:
In reply to entity's message of 8 Feb 2025:
minerva-7bI-q6-llama-inference_162315_1739037874.610214_137_0:

Ended with the following computational error:

<core_client_version>8.0.4</core_client_version>
<![CDATA[
<message>
process exited with code 127 (0x7f, -129)</message>
<stderr_txt>
../../projects/boinc.llmentor.org_LLMentorGrid/llama-boinc: error while loading shared libraries: libbz2.so.1.0: cannot open shared object file: No such file or directory

</stderr_txt>
]]>

I have libbz2 installed:

./usr/lib64/libbz2.so.1
./usr/lib64/libbz2.so.1.0.8

Is it looking for a 32bit version?


I think that libbz2 shared libraries should be linked to libbz2.so.1.0.8 or libbz2.so.1.0 is missing (inside the docker image ??)

lrwxrwxrwx root root 15 B  Thu Oct 31 20:27:30 2024  /usr/lib64/libbz2.so.1 ⇒ libbz2.so.1.0.8
lrwxrwxrwx root root 15 B  Thu Oct 31 20:27:30 2024  /usr/lib64/libbz2.so.1.0 ⇒ libbz2.so.1.0.8
.rwxr-xr-x root root 77 KB Thu Oct 31 20:27:32 2024  /usr/lib64/libbz2.so.1.0.8


Or create a symlink pointed to :
# ln -sf /usr/lib64/libbz2.so.1.0.8 /usr/lib64/libbz2.so.1.0


Or inside the /usr/lib64 directory
[/usr/lib64] # ln -sf /usr/lib64/libbz2.so.1.0.8 libbz2.so.1.0


Thanks for the information,
I was thinking about adding a link but was hesitant to do that out of concern I might be creating a maintenance headache. Sooner or later libbz2.so.1.0.8 will become libbz2.so.1.0.9 and the .8 version will probably go away leaving a dead link. When the issues come up I probably won't remember why that link was there. I'm going to ponder this a little more...
ID: 38 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Conan

Send message
Joined: 7 Feb 25
Posts: 5
Credit: 0
RAC: 0
Message 41 - Posted: 9 Feb 2025, 7:07:36 UTC - in response to Message 33.  

In reply to entity's message of 8 Feb 2025:
minerva-7bI-q6-llama-inference_162315_1739037874.610214_137_0:

Ended with the following computational error:

<core_client_version>8.0.4</core_client_version>
<![CDATA[
<message>
process exited with code 127 (0x7f, -129)</message>
<stderr_txt>
../../projects/boinc.llmentor.org_LLMentorGrid/llama-boinc: error while loading shared libraries: libbz2.so.1.0: cannot open shared object file: No such file or directory

</stderr_txt>
]]>

I have libbz2 installed:

./usr/lib64/libbz2.so.1
./usr/lib64/libbz2.so.1.0.8

Is it looking for a 32bit version?


I have just had 97 work units fail looking for this file which is already installed on the computer,

I have set to No New Work till I can sort it out.

Conan
ID: 41 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[VENETO] sabayonino

Send message
Joined: 6 Feb 25
Posts: 3
Credit: 114,786
RAC: 7,494
Message 45 - Posted: 9 Feb 2025, 8:14:14 UTC - in response to Message 38.  

Thanks for the information,
I was thinking about adding a link but was hesitant to do that out of concern I might be creating a maintenance headache. Sooner or later libbz2.so.1.0.8 will become libbz2.so.1.0.9 and the .8 version will probably go away leaving a dead link. When the issues come up I probably won't remember why that link was there. I'm going to ponder this a little more...[/quote]

You're right.
It also depends on how the packages manager handles any dead links or libraries within the system tree.
ID: 45 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
WTBroughton

Send message
Joined: 9 Feb 25
Posts: 1
Credit: 11,103
RAC: 1,055
Message 46 - Posted: 9 Feb 2025, 9:49:40 UTC

I set up Linux Mint 22 in Virtualbox on a Windows 11 machine (Ryzen 9). I allocated 10 cores and 32Gbyte memory. Its been running Minerva tasks overnight with no errors. I know very little about Linux so I have unticked the other applications that require docker but will monitor your progress and add them in as necessary.
Good luck with the project it looks very promising.
ID: 46 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
fzs600

Send message
Joined: 8 Feb 25
Posts: 3
Credit: 265,425
RAC: 14,480
Message 65 - Posted: 10 Feb 2025, 16:31:13 UTC - in response to Message 46.  

On this PC, all geppetto-hf-inference v1.00 (mt) files have been awaiting validation for 2 days.
http://boinc.llmentor.org/LLMentorGrid/results.php?hostid=65&offset=0&show_names=0&state=2&appid=
ID: 65 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
kotenok2000
Avatar

Send message
Joined: 8 Feb 25
Posts: 6
Credit: 52,226
RAC: 4,100
Message 80 - Posted: 14 Feb 2025, 22:30:32 UTC - in response to Message 37.  

You could also ask on Gpugrid discord server. They run Python workunits .
ID: 80 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
kotenok2000
Avatar

Send message
Joined: 8 Feb 25
Posts: 6
Credit: 52,226
RAC: 4,100
Message 82 - Posted: 16 Feb 2025, 8:11:46 UTC - in response to Message 37.  

Turns out i am still in MLC@Home discord server. I have mentioned Pianoman there.
ID: 82 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
manalog
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Help desk expert

Send message
Joined: 22 Jan 25
Posts: 12
Credit: 56
RAC: 5
Message 83 - Posted: 16 Feb 2025, 10:21:44 UTC

Hi!
The tests went great.
During this week, I worked at the next scientific project: we are going to run the library "Transformer Lens" to do some analysis on transformers' internals. The topic is complicate and I need time to set everything up, the good thing is that it should be easy to integrate this library into Boinc because I can re-use the docker app that I used for "Geppetto".
First we are going to run tests on GPT2 small just to replicate results already reported in scientific papers. Then we will move on other models and topic to finally make some new science.
When these new WUs will be out, I am going to write a good explanation for everyone to understand what we are doing.


During this week you will also see solved the following problems:
- missing libraries (very easy to fix)
- LLama.cpp threads issue: because I don't have a machine with many threads to test it, I didn't noticed that the llama.cpp application, when running on CPU (usually I ran the app on the university server with GPU instead), needs a command line parameters to decide the number of threads. I think it's easy to hook this command line parameter with the number of threads decided by Boinc Client (max threads of the machine or user's preferences)

Regarding GPUs, the answer is a big yes. GPU support is definitely in the list and was planned since beginning. It should be not so hard to add GPU support in docker, but I am going to do it only after all the previous points will be solved, because it adds another layer of complexity and it's harder for me to debug as I don't have a Nvidia CUDA GPU on my personal machine. I need to do it on the HPC server or rent one.

Please let me know if someone reaches Pianoman: somehow MLC@Home was able to run pytorch apps without docker, and it would be very good for this project (much easier for volunteers).

Regarding validation of Geppetto, yes, the validator was turned off. Now I will run it manually so you are going to see the last WUs validated soon.
ID: 83 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
kotenok2000
Avatar

Send message
Joined: 8 Feb 25
Posts: 6
Credit: 52,226
RAC: 4,100
Message 84 - Posted: 18 Feb 2025, 14:48:33 UTC
Last modified: 18 Feb 2025, 14:51:25 UTC

GPUGRID use python environments too.
They have used Anaconda for it.
I will ask on their discord server for help.
<job_desc>
<unzip_input>
<zipfilename>windows_x86_64__cuda1121.zip</zipfilename>
</unzip_input>
<task>
<application>Library/usr/bin/tar.exe</application>
<command_line>xjvf input.tar.bz2</command_line>
<setenv>PATH=$PWD/Library/usr/bin</setenv>
<weight>1</weight>
</task>
<task>
<application>C:/Windows/system32/cmd.exe</application>
<command_line>/c call Scripts\activate.bat && Scripts\conda-unpack.exe && run.bat</command_line>
<setenv>CUDA_DEVICE=$GPU_DEVICE_NUM</setenv>
<setenv>TMPDIR=$PWD</setenv>
<setenv>TEMP=$PWD</setenv>
<setenv>TMP=$PWD</setenv>
<setenv>HOME=$PWD</setenv>
<setenv>PATH=$PWD\Library\bin</setenv>
<setenv>SystemRoot=C:\Windows</setenv>
<setenv>ComSpec=C:\Windows\system32\cmd.exe</setenv>
<stdout_filename>run.log</stdout_filename>
<weight>1000</weight>
<fraction_done_filename>progress</fraction_done_filename>
</task>
</job_desc>
ID: 84 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
boboviz

Send message
Joined: 6 Feb 25
Posts: 3
Credit: 262
RAC: 20
Message 85 - Posted: 21 Feb 2025, 7:43:13 UTC - in response to Message 83.  

Regarding GPUs, the answer is a big yes. GPU support is definitely in the list and was planned since beginning.


This is a great news.
But, if possible, i prefer a Windows app before.... :-)
ID: 85 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
kotenok2000
Avatar

Send message
Joined: 8 Feb 25
Posts: 6
Credit: 52,226
RAC: 4,100
Message 86 - Posted: 25 Feb 2025, 10:36:50 UTC
Last modified: 25 Feb 2025, 10:44:09 UTC

I have found this:
https://github.com/BOINC/boinc/wiki/PythonApps

https://bitbucket-archive.softwareheritage.org/projects/je/jeremycowles/pyboinc.html
ID: 86 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : Number crunching : Geppetto Test 001 - Segnalazioni e Commenti

©2025 Matteo Rinaldi