Thread 'Minerva 7B Instruct Q6 inference via LLama.cpp'

Author	Message
Conan Send message Joined: 7 Feb 25 Posts: 5 Credit: 0 RAC: 0	Message 39 - Posted: 9 Feb 2025, 5:20:45 UTC Last modified: 9 Feb 2025, 5:36:03 UTC Default set up seems to be use all available threads. Just downloaded one and it wants all 24 threads that I have on my Ryzen 5900X and Ryzen 7900X On my Ryzen 7950X3D it wants all 32 threads. A bit like YAFU was until I set an app_config.xml file to limit it. Seems I will have to do the same here. Also a massive 5.66 GB file to download for this application, that is huge when you have a few computers to update. Conan ID: 39 · Rating: 0 · rate: / Reply Quote

Conan Send message Joined: 7 Feb 25 Posts: 5 Credit: 0 RAC: 0	Message 42 - Posted: 9 Feb 2025, 7:17:16 UTC Last modified: 9 Feb 2025, 7:26:32 UTC I am also having issues with running these work units due to libbz2.so.1.0 not being found even though it is installed. I haven't got any time at the moment to sort that out as I am going away for a week, so will look at it when I get back. I will set to no new work in the mean time so i don't trash work units. An OS upgrade maybe in order? I am on either 37 or 38 and the current Fedora has gone to 42, so a few behind, it might fix a "curl" issue I have when trying to update drivers and such, which I can't do due to this stopping me getting to 3rd party repos. Conan ID: 42 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 7 Feb 25 Posts: 8 Credit: 33,666 RAC: 2,909	Message 43 - Posted: 9 Feb 2025, 7:38:17 UTC - in response to Message 42. I am still getting glibc is too old errors, I thought your were going to make the new tasks work okay with older versions? <core_client_version>7.18.1</core_client_version> <![CDATA[ <message> process exited with code 1 (0x1, -255)</message> <stderr_txt> ../../projects/boinc.llmentor.org_LLMentorGrid/llama-boinc_3: /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.38' not found (required by ../../projects/boinc.llmentor.org_LLMentorGrid/llama-boinc_3) ../../projects/boinc.llmentor.org_LLMentorGrid/llama-boinc_3: /lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.32' not found (required by ../../projects/boinc.llmentor.org_LLMentorGrid/llama-boinc_3) ../../projects/boinc.llmentor.org_LLMentorGrid/llama-boinc_3: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.38' not found (required by ../../projects/boinc.llmentor.org_LLMentorGrid/llama-boinc_3) </stderr_txt> ]]> If you could post how to update it without updating the whole OS, I use Linux Mint 21.3 on this pc but other pc use newer or older versions. ID: 43 · Rating: 0 · rate: / Reply Quote

mmonnin Send message Joined: 8 Feb 25 Posts: 11 Credit: 506,452 RAC: 41,992	Message 47 - Posted: 9 Feb 2025, 13:05:47 UTC Yeah quite a large download but I've never seen 20MB/s per file from a project which is quite nice. ID: 47 · Rating: 0 · rate: / Reply Quote

mmonnin Send message Joined: 8 Feb 25 Posts: 11 Credit: 506,452 RAC: 41,992	Message 48 - Posted: 9 Feb 2025, 13:42:36 UTC App config to run more than 1 task at once. Update the cmdline/avg_ncpus # to the desired threads. Ideally only 4 threads should be set as that's what is being used by the app but I didn't have enough memory with other vbox tasks in memory. <app_config> <app_version> <app_name>minerva-7bI-q6-llama-inference</app_name> <plan_class>mt</plan_class> <cmdline>-t 8</cmdline> <avg_ncpus>8</avg_ncpus> </app_version> </app_config> ID: 48 · Rating: 0 · rate: / Reply Quote

boboviz Send message Joined: 6 Feb 25 Posts: 4 Credit: 262 RAC: 20	Message 50 - Posted: 9 Feb 2025, 14:13:33 UTC Last modified: 9 Feb 2025, 14:13:50 UTC I created a little wm with Linux Mint (2 core cpu, 5.8 Gb of ram) All Minerva wus are errors: EXIT_MEM_LIMIT_EXCEEDED <message> working set size > client RAM limit: 5399.29MB > 5366.79MB</message> llm_load_tensors: tensor 'token_embd.weight' (q6_K) (and 290 others) cannot be used with preferred buffer type CPU_AARCH64, using CPU instead llm_load_tensors: CPU_Mapped model buffer size = 5789.55 MiB Indeed, during the elaborations, Boinc Manager said "Waiting for more memory" But, if i see at task manager, the app is using less than 600 mb of ram. Have i to increase the virtual memory?? ID: 50 · Rating: 0 · rate: / Reply Quote

manalog Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Help desk expert Send message Joined: 22 Jan 25 Posts: 12 Credit: 56 RAC: 5	Message 51 - Posted: 9 Feb 2025, 16:04:14 UTC Hello to everyone :) , I will quickly answer to the last messages: @Conan: unfortunately, large files are really unavoidable when dealing with transformers models. Imagine that the model I am proposing you to run is a 7B quantized (a sort of compression that reduces precision) to 6bit in order to occupy less than 6GB. The original non-compressed model, which would be more preferable to run, would occupy 16GB. And we are talking about a 7B model. SOTA models have more than 70B and unfortunately are not suitable to be used in Boinc yet. The good news is that when the interpretability studies will start we are going to analyse several models: 128M, 300M, 1B, 2B and 7B so apart from the 7B all the others are going to have much smaller footprint. If you think it could be useful, I could find a way to split the 6Gb files in smaller chunks. Could it help? The other good thing is that this big file is downloaded only once, then the workunits are very small both in download as well as upload. Also multithreading it's something inherently related with transformers models, that shines with parallelism. Of course I could make a single threaded version of the app, but performances would be degraded. Perhaps, again if you prefer, I could think to serve the smaller models (<1B) in a single threaded fashion so that people with more constraints could decide to run only these smaller apps. Regarding libz2 your idea to wait a week is good: it's definitely solvable (I want either to make the binary static or to deliver the libraries with it) but I need some tinkering and a good Internet so I have to do it in my office and not at home. Same thing to answer to @Mickey: I will solve GLIBC problem by compiling everything in a virtual machine or docker container with an older version of Debian and then distribute the executable compiled there. In this way they are going to be compatible with any Linux distribution released in the last 10 years. @mmonnin: eheh yees because I am renting the server from a cloud provider ;) Then after some months I will have to understand because the options are either a server house at my university (good, even 40MB/s) or, for the first year, a home server and this would mean no more than 1MB/s... let's hope for the first solution :) There is a conference soon where I can submit a paper about the project and in this case (though it's hard to write a paper in such a short time) the good server is going to be more probable. @boboviz: The app is using less than 600MB because it hasn't loaded the model yet, but it needs more than 6GB to work correctly. So you need to increase the RAM of the virtual machine and then the maximum amount of ram used by Boinc in the Boinc manager preferences ID: 51 · Rating: 0 · rate: / Reply Quote

Ian-n-Steve C. Send message Joined: 9 Feb 25 Posts: 6 Credit: 15,141 RAC: 1,469	Message 54 - Posted: 9 Feb 2025, 17:37:57 UTC - in response to Message 51. to expand a bit for anyone who doesnt know some of the common LLM verbiage. "7B" "70B" etc are referring to the model size in billions of parameters. it's how big the model is and correlates to how much space the will take up. are there any plans to utilize GPUs? if you move to CUDA, the processing power of modern Nvidia GPUs will far outweigh CPUs on low precision inferencing. ID: 54 · Rating: 0 · rate: / Reply Quote

boboviz Send message Joined: 6 Feb 25 Posts: 4 Credit: 262 RAC: 20	Message 56 - Posted: 9 Feb 2025, 21:17:54 UTC - in response to Message 51. @boboviz: The app is using less than 600MB because it hasn't loaded the model yet, but it needs more than 6GB to work correctly. So you need to increase the RAM of the virtual machine and then the maximum amount of ram used by Boinc in the Boinc manager preferences With 8Gb of virtual ram the situation is better. Sometimes the wus stop again, but after some times they restart ID: 56 · Rating: 0 · rate: / Reply Quote

zombie67 [MM] Send message Joined: 8 Feb 25 Posts: 5 Credit: 80,626 RAC: 6,291	Message 58 - Posted: 10 Feb 2025, 3:42:36 UTC I am still confused on one point. How many threads does the app actually use? If the max is really only 4, then why is the task using all 8 of the threads on my machine? Shouldn't the max be automatically set from the server side? If it has to be set on the user side for some reason, a pull-down menu in the users project preferences page should be the way to do it. Many projects have this. Until then, assuming the number 4 is the actual max for the app, then I will use the app_config.xml setting from earlier in this thread. ID: 58 · Rating: 0 · rate: / Reply Quote

mmonnin Send message Joined: 8 Feb 25 Posts: 11 Credit: 506,452 RAC: 41,992	Message 59 - Posted: 10 Feb 2025, 9:56:17 UTC On my system each task uses 4 threads so with my app_config above half the threads were being used. I have now since changed it to use 4 threads and the CPU is 100% busy. ID: 59 · Rating: 0 · rate: / Reply Quote

manalog Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Help desk expert Send message Joined: 22 Jan 25 Posts: 12 Credit: 56 RAC: 5	Message 60 - Posted: 10 Feb 2025, 11:41:15 UTC - in response to Message 58. The app by default uses all the available threads on the users' machine. This is not a bug but the default behaviour of boinc's plan class "mt". However, because I understand that this might not be optimal for every user, I really like this idea of the pull down menu in the project configuration. Could you please give me a link of a project that does it? If it's easy to implement, I can do it today or tomorrow. Better, if some of you knows the configuration parameters of the Boinc server, if you can directly show me the correct configuration to enable this behaviour. I like it! ID: 60 · Rating: 0 · rate: / Reply Quote

kotenok2000 Send message Joined: 8 Feb 25 Posts: 6 Credit: 52,226 RAC: 4,100	Message 61 - Posted: 10 Feb 2025, 12:33:39 UTC - in response to Message 60. Ask on Boinc Network discord server https://discord.gg/qQYfRG64z4 ID: 61 · Rating: 0 · rate: / Reply Quote

Ian-n-Steve C. Send message Joined: 9 Feb 25 Posts: 6 Credit: 15,141 RAC: 1,469	Message 62 - Posted: 10 Feb 2025, 13:52:03 UTC Last modified: 10 Feb 2025, 14:19:11 UTC I ran a task on my system yesterday. i did not make any customizations with app_config. it was set to use only 4 threads by default from the project and was passing the --nthreads 4 parameter in the app_version in client state. it only used 4 threads, not all of them. ID: 62 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 7 Feb 25 Posts: 8 Credit: 33,666 RAC: 2,909	Message 63 - Posted: 10 Feb 2025, 15:07:12 UTC - in response to Message 60. In reply to manalog's message of 10 Feb 2025: The app by default uses all the available threads on the users' machine. This is not a bug but the default behaviour of boinc's plan class "mt". However, because I understand that this might not be optimal for every user, I really like this idea of the pull down menu in the project configuration. Could you please give me a link of a project that does it? If it's easy to implement, I can do it today or tomorrow. Better, if some of you knows the configuration parameters of the Boinc server, if you can directly show me the correct configuration to enable this behaviour. I like it! Prime Grid http://www.primegrid.com/home.php does it as does LHC https://lhcathome.cern.ch/lhcathome here is LHC's way: Resource share --- Use CPU Use AMD GPU Run test applications? Is it OK for LHC@home and your team (if any) to email you? Emails will be sent from boinc-server-admin@cern.ch; make sure your spam filter accepts this address. Should LHC@home show your computers on its web site? General terms-of-use for this BOINC project. Do you consent to exporting your data to BOINC statistics aggregation Web sites? Default computer location --- Run only the selected applications SixTrack: yes sixtracktest: yes CMS Simulation: no Theory Simulation: no ATLAS Simulation: yes If no work for selected applications is available, accept work from other applications? Run native if available? (Not recommended for Windows) Max # jobs No limit Max # CPUs 3 Edit preferences ID: 63 · Rating: 0 · rate: / Reply Quote

Spot T Send message Joined: 6 Feb 25 Posts: 1 Credit: 0 RAC: 0	Message 64 - Posted: 10 Feb 2025, 15:30:13 UTC - in response to Message 60. In reply to manalog's message of 10 Feb 2025: The app by default uses all the available threads on the users' machine. This is not a bug but the default behaviour of boinc's plan class "mt". However, because I understand that this might not be optimal for every user, I really like this idea of the pull down menu in the project configuration. Could you please give me a link of a project that does it? If it's easy to implement, I can do it today or tomorrow. Better, if some of you knows the configuration parameters of the Boinc server, if you can directly show me the correct configuration to enable this behaviour. I like it! MilkyWay has a pull down menu to set 'Max # of threads for each MilkyWay@home task'. ID: 64 · Rating: 0 · rate: / Reply Quote

Ian-n-Steve C. Send message Joined: 9 Feb 25 Posts: 6 Credit: 15,141 RAC: 1,469	Message 66 - Posted: 10 Feb 2025, 16:45:02 UTC Last modified: 10 Feb 2025, 16:52:04 UTC the input parameters don't work properly on the app. doesnt matter what --nthreads value you put. the app still only uses 4 threads while the system thinks it's using however many threads your system has. --nthreads 2 -> results in 4 threads being used. --nthreads 32 -> results in 4 threads being used. I'm guessing it's hard coded in the binary somewhere. if you want boinc to behave in a way that matches reality, you can apply this app_config.xml so you can properly run multiple tasks according to your preferences. running more tasks doesnt using more RAM. seems like they all use the same data in memory. <app_config> <app_version> <app_name>minerva-7bI-q6-llama-inference</app_name> <plan_class>mt</plan_class> <avg_ncpus>4</avg_ncpus> <cmdline>--nthreads 4</cmdline> </app_version> </app_config> ID: 66 · Rating: 0 · rate: / Reply Quote

mmonnin Send message Joined: 8 Feb 25 Posts: 11 Credit: 506,452 RAC: 41,992	Message 67 - Posted: 10 Feb 2025, 22:12:46 UTC - in response to Message 60. In reply to manalog's message of 10 Feb 2025: The app by default uses all the available threads on the users' machine. This is not a bug but the default behaviour of boinc's plan class "mt". However, because I understand that this might not be optimal for every user, I really like this idea of the pull down menu in the project configuration. Could you please give me a link of a project that does it? If it's easy to implement, I can do it today or tomorrow. Better, if some of you knows the configuration parameters of the Boinc server, if you can directly show me the correct configuration to enable this behaviour. I like it! PrimeGrid - https://www.primegrid.com/ Amicable - https://sech.me/boinc/Amicable/ Milkyway - https://milkyway.cs.rpi.edu/milkyway/ Yafu - https://yafu.myfirewall.org/yafu/ ID: 67 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 7 Feb 25 Posts: 8 Credit: 33,666 RAC: 2,909	Message 68 - Posted: 11 Feb 2025, 12:28:23 UTC - in response to Message 67. MilkyWay does it like this in the venue section Resource share --- Use CPU Use AMD GPU Use NVIDIA GPU Is it OK for MilkyWay@home and your team (if any) to email you? Should MilkyWay@home show your computers on its web site? Default computer location --- Color scheme for graphics Tahiti Sunset Maximum CPU % for graphics 0...100 20 Run only the selected applications Milkyway@home N-Body Simulation: no Milkyway@home N-Body Simulation with Orbit Fitting: yes If no work for selected applications is available, accept work from other applications? no Max # of simultaneous MilkyWay@home tasks No limit Max # of threads for each MilkyWay@home task 1 Edit preferences ID: 68 · Rating: 0 · rate: / Reply Quote

Drago75 Send message Joined: 12 Feb 25 Posts: 1 Credit: 11,320 RAC: 1,073	Message 78 - Posted: 14 Feb 2025, 15:24:35 UTC UBUNTU 20.04. doesn't work UBUNTU 24.04. works without any problems. To get docker ready to run only these two steps were necessary on my machines: sudo apt-get install docker.io sudo ln -s /usr/bin/docker /bin/unknown No restart of BOINC necessary The other two lines didn't work or had no effect. ID: 78 · Rating: 0 · rate: / Reply Quote