Thread 'Minerva 7B Instruct Q6 inference via LLama.cpp'

Message boards : Number crunching : Minerva 7B Instruct Q6 inference via LLama.cpp
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Conan

Send message
Joined: 7 Feb 25
Posts: 5
Credit: 0
RAC: 0
Message 39 - Posted: 9 Feb 2025, 5:20:45 UTC
Last modified: 9 Feb 2025, 5:36:03 UTC

Default set up seems to be use all available threads.

Just downloaded one and it wants all 24 threads that I have on my Ryzen 5900X and Ryzen 7900X
On my Ryzen 7950X3D it wants all 32 threads.

A bit like YAFU was until I set an app_config.xml file to limit it.
Seems I will have to do the same here.

Also a massive 5.66 GB file to download for this application, that is huge when you have a few computers to update.

Conan
ID: 39 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Conan

Send message
Joined: 7 Feb 25
Posts: 5
Credit: 0
RAC: 0
Message 42 - Posted: 9 Feb 2025, 7:17:16 UTC
Last modified: 9 Feb 2025, 7:26:32 UTC

I am also having issues with running these work units due to libbz2.so.1.0 not being found even though it is installed.

I haven't got any time at the moment to sort that out as I am going away for a week, so will look at it when I get back.

I will set to no new work in the mean time so i don't trash work units.

An OS upgrade maybe in order? I am on either 37 or 38 and the current Fedora has gone to 42, so a few behind, it might fix a "curl" issue I have when trying to update drivers and such, which I can't do due to this stopping me getting to 3rd party repos.

Conan
ID: 42 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mikey
Avatar

Send message
Joined: 7 Feb 25
Posts: 8
Credit: 33,666
RAC: 2,909
Message 43 - Posted: 9 Feb 2025, 7:38:17 UTC - in response to Message 42.  

I am still getting glibc is too old errors, I thought your were going to make the new tasks work okay with older versions?

<core_client_version>7.18.1</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)</message>
<stderr_txt>
../../projects/boinc.llmentor.org_LLMentorGrid/llama-boinc_3: /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.38' not found (required by ../../projects/boinc.llmentor.org_LLMentorGrid/llama-boinc_3)
../../projects/boinc.llmentor.org_LLMentorGrid/llama-boinc_3: /lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.32' not found (required by ../../projects/boinc.llmentor.org_LLMentorGrid/llama-boinc_3)
../../projects/boinc.llmentor.org_LLMentorGrid/llama-boinc_3: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.38' not found (required by ../../projects/boinc.llmentor.org_LLMentorGrid/llama-boinc_3)

</stderr_txt>
]]>

If you could post how to update it without updating the whole OS, I use Linux Mint 21.3 on this pc but other pc use newer or older versions.
ID: 43 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 8 Feb 25
Posts: 11
Credit: 506,452
RAC: 41,992
Message 47 - Posted: 9 Feb 2025, 13:05:47 UTC

Yeah quite a large download but I've never seen 20MB/s per file from a project which is quite nice.
ID: 47 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 8 Feb 25
Posts: 11
Credit: 506,452
RAC: 41,992
Message 48 - Posted: 9 Feb 2025, 13:42:36 UTC

App config to run more than 1 task at once. Update the cmdline/avg_ncpus # to the desired threads. Ideally only 4 threads should be set as that's what is being used by the app but I didn't have enough memory with other vbox tasks in memory.

<app_config>
<app_version>
<app_name>minerva-7bI-q6-llama-inference</app_name>
<plan_class>mt</plan_class>
<cmdline>-t 8</cmdline>
<avg_ncpus>8</avg_ncpus>
</app_version>
</app_config>
ID: 48 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
boboviz

Send message
Joined: 6 Feb 25
Posts: 3
Credit: 262
RAC: 20
Message 50 - Posted: 9 Feb 2025, 14:13:33 UTC
Last modified: 9 Feb 2025, 14:13:50 UTC

I created a little wm with Linux Mint (2 core cpu, 5.8 Gb of ram)
All Minerva wus are errors:
EXIT_MEM_LIMIT_EXCEEDED
<message>
working set size > client RAM limit: 5399.29MB > 5366.79MB</message>
llm_load_tensors: tensor 'token_embd.weight' (q6_K) (and 290 others) cannot be used with preferred buffer type CPU_AARCH64, using CPU instead
llm_load_tensors: CPU_Mapped model buffer size = 5789.55 MiB


Indeed, during the elaborations, Boinc Manager said "Waiting for more memory"
But, if i see at task manager, the app is using less than 600 mb of ram.

Have i to increase the virtual memory??
ID: 50 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
manalog
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Help desk expert

Send message
Joined: 22 Jan 25
Posts: 12
Credit: 56
RAC: 5
Message 51 - Posted: 9 Feb 2025, 16:04:14 UTC

Hello to everyone :) , I will quickly answer to the last messages:


@Conan: unfortunately, large files are really unavoidable when dealing with transformers models. Imagine that the model I am proposing you to run is a 7B quantized (a sort of compression that reduces precision) to 6bit in order to occupy less than 6GB. The original non-compressed model, which would be more preferable to run, would occupy 16GB. And we are talking about a 7B model. SOTA models have more than 70B and unfortunately are not suitable to be used in Boinc yet. The good news is that when the interpretability studies will start we are going to analyse several models: 128M, 300M, 1B, 2B and 7B so apart from the 7B all the others are going to have much smaller footprint. If you think it could be useful, I could find a way to split the 6Gb files in smaller chunks. Could it help? The other good thing is that this big file is downloaded only once, then the workunits are very small both in download as well as upload.
Also multithreading it's something inherently related with transformers models, that shines with parallelism. Of course I could make a single threaded version of the app, but performances would be degraded. Perhaps, again if you prefer, I could think to serve the smaller models (<1B) in a single threaded fashion so that people with more constraints could decide to run only these smaller apps.
Regarding libz2 your idea to wait a week is good: it's definitely solvable (I want either to make the binary static or to deliver the libraries with it) but I need some tinkering and a good Internet so I have to do it in my office and not at home.

Same thing to answer to @Mickey: I will solve GLIBC problem by compiling everything in a virtual machine or docker container with an older version of Debian and then distribute the executable compiled there. In this way they are going to be compatible with any Linux distribution released in the last 10 years.

@mmonnin: eheh yees because I am renting the server from a cloud provider ;) Then after some months I will have to understand because the options are either a server house at my university (good, even 40MB/s) or, for the first year, a home server and this would mean no more than 1MB/s... let's hope for the first solution :) There is a conference soon where I can submit a paper about the project and in this case (though it's hard to write a paper in such a short time) the good server is going to be more probable.

@boboviz: The app is using less than 600MB because it hasn't loaded the model yet, but it needs more than 6GB to work correctly. So you need to increase the RAM of the virtual machine and then the maximum amount of ram used by Boinc in the Boinc manager preferences
ID: 51 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian-n-Steve C.

Send message
Joined: 9 Feb 25
Posts: 5
Credit: 15,141
RAC: 1,469
Message 54 - Posted: 9 Feb 2025, 17:37:57 UTC - in response to Message 51.  

to expand a bit for anyone who doesnt know some of the common LLM verbiage.

"7B" "70B" etc are referring to the model size in billions of parameters. it's how big the model is and correlates to how much space the will take up.

are there any plans to utilize GPUs? if you move to CUDA, the processing power of modern Nvidia GPUs will far outweigh CPUs on low precision inferencing.
ID: 54 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
boboviz

Send message
Joined: 6 Feb 25
Posts: 3
Credit: 262
RAC: 20
Message 56 - Posted: 9 Feb 2025, 21:17:54 UTC - in response to Message 51.  

@boboviz: The app is using less than 600MB because it hasn't loaded the model yet, but it needs more than 6GB to work correctly. So you need to increase the RAM of the virtual machine and then the maximum amount of ram used by Boinc in the Boinc manager preferences


With 8Gb of virtual ram the situation is better.
Sometimes the wus stop again, but after some times they restart
ID: 56 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
zombie67 [MM]
Avatar

Send message
Joined: 8 Feb 25
Posts: 5
Credit: 80,626
RAC: 6,291
Message 58 - Posted: 10 Feb 2025, 3:42:36 UTC

I am still confused on one point. How many threads does the app actually use? If the max is really only 4, then why is the task using all 8 of the threads on my machine? Shouldn't the max be automatically set from the server side? If it has to be set on the user side for some reason, a pull-down menu in the users project preferences page should be the way to do it. Many projects have this. Until then, assuming the number 4 is the actual max for the app, then I will use the app_config.xml setting from earlier in this thread.
ID: 58 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 8 Feb 25
Posts: 11
Credit: 506,452
RAC: 41,992
Message 59 - Posted: 10 Feb 2025, 9:56:17 UTC

On my system each task uses 4 threads so with my app_config above half the threads were being used. I have now since changed it to use 4 threads and the CPU is 100% busy.
ID: 59 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
manalog
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Help desk expert

Send message
Joined: 22 Jan 25
Posts: 12
Credit: 56
RAC: 5
Message 60 - Posted: 10 Feb 2025, 11:41:15 UTC - in response to Message 58.  

The app by default uses all the available threads on the users' machine. This is not a bug but the default behaviour of boinc's plan class "mt". However, because I understand that this might not be optimal for every user, I really like this idea of the pull down menu in the project configuration.

Could you please give me a link of a project that does it? If it's easy to implement, I can do it today or tomorrow. Better, if some of you knows the configuration parameters of the Boinc server, if you can directly show me the correct configuration to enable this behaviour. I like it!
ID: 60 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
kotenok2000
Avatar

Send message
Joined: 8 Feb 25
Posts: 6
Credit: 52,226
RAC: 4,100
Message 61 - Posted: 10 Feb 2025, 12:33:39 UTC - in response to Message 60.  

Ask on Boinc Network discord server
https://discord.gg/qQYfRG64z4
ID: 61 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian-n-Steve C.

Send message
Joined: 9 Feb 25
Posts: 5
Credit: 15,141
RAC: 1,469
Message 62 - Posted: 10 Feb 2025, 13:52:03 UTC
Last modified: 10 Feb 2025, 14:19:11 UTC

I ran a task on my system yesterday. i did not make any customizations with app_config.

it was set to use only 4 threads by default from the project and was passing the --nthreads 4 parameter in the app_version in client state.

it only used 4 threads, not all of them.
ID: 62 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mikey
Avatar

Send message
Joined: 7 Feb 25
Posts: 8
Credit: 33,666
RAC: 2,909
Message 63 - Posted: 10 Feb 2025, 15:07:12 UTC - in response to Message 60.  

In reply to manalog's message of 10 Feb 2025:
The app by default uses all the available threads on the users' machine. This is not a bug but the default behaviour of boinc's plan class "mt". However, because I understand that this might not be optimal for every user, I really like this idea of the pull down menu in the project configuration.

Could you please give me a link of a project that does it? If it's easy to implement, I can do it today or tomorrow. Better, if some of you knows the configuration parameters of the Boinc server, if you can directly show me the correct configuration to enable this behaviour. I like it!


Prime Grid http://www.primegrid.com/home.php does it as does LHC https://lhcathome.cern.ch/lhcathome
here is LHC's way:

Resource share ---
Use CPU
Use AMD GPU
Run test applications?
Is it OK for LHC@home and your team (if any) to email you?
Emails will be sent from boinc-server-admin@cern.ch; make sure your spam filter accepts this address.
Should LHC@home show your computers on its web site?
General terms-of-use for this BOINC project.
Do you consent to exporting your data to BOINC statistics aggregation Web sites?
Default computer location ---
Run only the selected applications SixTrack: yes
sixtracktest: yes
CMS Simulation: no
Theory Simulation: no
ATLAS Simulation: yes
If no work for selected applications is available, accept work from other applications?
Run native if available? (Not recommended for Windows)
Max # jobs No limit
Max # CPUs 3

Edit preferences
ID: 63 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Spot T

Send message
Joined: 6 Feb 25
Posts: 1
Credit: 0
RAC: 0
Message 64 - Posted: 10 Feb 2025, 15:30:13 UTC - in response to Message 60.  

In reply to manalog's message of 10 Feb 2025:
The app by default uses all the available threads on the users' machine. This is not a bug but the default behaviour of boinc's plan class "mt". However, because I understand that this might not be optimal for every user, I really like this idea of the pull down menu in the project configuration.

Could you please give me a link of a project that does it? If it's easy to implement, I can do it today or tomorrow. Better, if some of you knows the configuration parameters of the Boinc server, if you can directly show me the correct configuration to enable this behaviour. I like it!


MilkyWay has a pull down menu to set 'Max # of threads for each MilkyWay@home task'.
ID: 64 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian-n-Steve C.

Send message
Joined: 9 Feb 25
Posts: 5
Credit: 15,141
RAC: 1,469
Message 66 - Posted: 10 Feb 2025, 16:45:02 UTC
Last modified: 10 Feb 2025, 16:52:04 UTC

the input parameters don't work properly on the app.

doesnt matter what --nthreads value you put. the app still only uses 4 threads while the system thinks it's using however many threads your system has.

--nthreads 2 -> results in 4 threads being used.
--nthreads 32 -> results in 4 threads being used.

I'm guessing it's hard coded in the binary somewhere.

if you want boinc to behave in a way that matches reality, you can apply this app_config.xml so you can properly run multiple tasks according to your preferences. running more tasks doesnt using more RAM. seems like they all use the same data in memory.

<app_config>
<app_version>
    <app_name>minerva-7bI-q6-llama-inference</app_name>
    <plan_class>mt</plan_class>
    <avg_ncpus>4</avg_ncpus>
    <cmdline>--nthreads 4</cmdline>
</app_version>
</app_config>
ID: 66 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 8 Feb 25
Posts: 11
Credit: 506,452
RAC: 41,992
Message 67 - Posted: 10 Feb 2025, 22:12:46 UTC - in response to Message 60.  

In reply to manalog's message of 10 Feb 2025:
The app by default uses all the available threads on the users' machine. This is not a bug but the default behaviour of boinc's plan class "mt". However, because I understand that this might not be optimal for every user, I really like this idea of the pull down menu in the project configuration.

Could you please give me a link of a project that does it? If it's easy to implement, I can do it today or tomorrow. Better, if some of you knows the configuration parameters of the Boinc server, if you can directly show me the correct configuration to enable this behaviour. I like it!


PrimeGrid - https://www.primegrid.com/
Amicable - https://sech.me/boinc/Amicable/
Milkyway - https://milkyway.cs.rpi.edu/milkyway/
Yafu - https://yafu.myfirewall.org/yafu/
ID: 67 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mikey
Avatar

Send message
Joined: 7 Feb 25
Posts: 8
Credit: 33,666
RAC: 2,909
Message 68 - Posted: 11 Feb 2025, 12:28:23 UTC - in response to Message 67.  

MilkyWay does it like this in the venue section

Resource share ---
Use CPU
Use AMD GPU
Use NVIDIA GPU
Is it OK for MilkyWay@home and your team (if any) to email you?
Should MilkyWay@home show your computers on its web site?
Default computer location ---
Color scheme for graphics Tahiti Sunset
Maximum CPU % for graphics
0...100 20
Run only the selected applications Milkyway@home N-Body Simulation: no
Milkyway@home N-Body Simulation with Orbit Fitting: yes
If no work for selected applications is available, accept work from other applications? no
Max # of simultaneous MilkyWay@home tasks No limit
Max # of threads for each MilkyWay@home task 1

Edit preferences
ID: 68 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Drago75

Send message
Joined: 12 Feb 25
Posts: 1
Credit: 11,320
RAC: 1,073
Message 78 - Posted: 14 Feb 2025, 15:24:35 UTC

UBUNTU 20.04. doesn't work
UBUNTU 24.04. works without any problems. To get docker ready to run only these two steps were necessary on my machines:

sudo apt-get install docker.io
sudo ln -s /usr/bin/docker /bin/unknown

No restart of BOINC necessary
The other two lines didn't work or had no effect.
ID: 78 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : Minerva 7B Instruct Q6 inference via LLama.cpp

©2025 Matteo Rinaldi