Message boards : Number crunching : Minerva 7B Instruct Q6 inference via LLama.cpp
Message board moderation
Author | Message |
---|---|
Send message Joined: 7 Feb 25 Posts: 5 Credit: 0 RAC: 0 |
Default set up seems to be use all available threads. Just downloaded one and it wants all 24 threads that I have on my Ryzen 5900X and Ryzen 7900X On my Ryzen 7950X3D it wants all 32 threads. A bit like YAFU was until I set an app_config.xml file to limit it. Seems I will have to do the same here. Also a massive 5.66 GB file to download for this application, that is huge when you have a few computers to update. Conan |
Send message Joined: 7 Feb 25 Posts: 5 Credit: 0 RAC: 0 |
I am also having issues with running these work units due to libbz2.so.1.0 not being found even though it is installed. I haven't got any time at the moment to sort that out as I am going away for a week, so will look at it when I get back. I will set to no new work in the mean time so i don't trash work units. An OS upgrade maybe in order? I am on either 37 or 38 and the current Fedora has gone to 42, so a few behind, it might fix a "curl" issue I have when trying to update drivers and such, which I can't do due to this stopping me getting to 3rd party repos. Conan |
![]() Send message Joined: 7 Feb 25 Posts: 8 Credit: 33,666 RAC: 2,909 |
I am still getting glibc is too old errors, I thought your were going to make the new tasks work okay with older versions? <core_client_version>7.18.1</core_client_version> <![CDATA[ <message> process exited with code 1 (0x1, -255)</message> <stderr_txt> ../../projects/boinc.llmentor.org_LLMentorGrid/llama-boinc_3: /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.38' not found (required by ../../projects/boinc.llmentor.org_LLMentorGrid/llama-boinc_3) ../../projects/boinc.llmentor.org_LLMentorGrid/llama-boinc_3: /lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.32' not found (required by ../../projects/boinc.llmentor.org_LLMentorGrid/llama-boinc_3) ../../projects/boinc.llmentor.org_LLMentorGrid/llama-boinc_3: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.38' not found (required by ../../projects/boinc.llmentor.org_LLMentorGrid/llama-boinc_3) </stderr_txt> ]]> If you could post how to update it without updating the whole OS, I use Linux Mint 21.3 on this pc but other pc use newer or older versions. |
Send message Joined: 8 Feb 25 Posts: 11 Credit: 506,452 RAC: 41,992 |
Yeah quite a large download but I've never seen 20MB/s per file from a project which is quite nice. |
Send message Joined: 8 Feb 25 Posts: 11 Credit: 506,452 RAC: 41,992 |
App config to run more than 1 task at once. Update the cmdline/avg_ncpus # to the desired threads. Ideally only 4 threads should be set as that's what is being used by the app but I didn't have enough memory with other vbox tasks in memory. <app_config> <app_version> <app_name>minerva-7bI-q6-llama-inference</app_name> <plan_class>mt</plan_class> <cmdline>-t 8</cmdline> <avg_ncpus>8</avg_ncpus> </app_version> </app_config> |
Send message Joined: 6 Feb 25 Posts: 3 Credit: 262 RAC: 20 |
I created a little wm with Linux Mint (2 core cpu, 5.8 Gb of ram) All Minerva wus are errors: EXIT_MEM_LIMIT_EXCEEDED Indeed, during the elaborations, Boinc Manager said "Waiting for more memory" But, if i see at task manager, the app is using less than 600 mb of ram. Have i to increase the virtual memory?? |
Send message Joined: 22 Jan 25 Posts: 12 Credit: 56 RAC: 5 |
Hello to everyone :) , I will quickly answer to the last messages: @Conan: unfortunately, large files are really unavoidable when dealing with transformers models. Imagine that the model I am proposing you to run is a 7B quantized (a sort of compression that reduces precision) to 6bit in order to occupy less than 6GB. The original non-compressed model, which would be more preferable to run, would occupy 16GB. And we are talking about a 7B model. SOTA models have more than 70B and unfortunately are not suitable to be used in Boinc yet. The good news is that when the interpretability studies will start we are going to analyse several models: 128M, 300M, 1B, 2B and 7B so apart from the 7B all the others are going to have much smaller footprint. If you think it could be useful, I could find a way to split the 6Gb files in smaller chunks. Could it help? The other good thing is that this big file is downloaded only once, then the workunits are very small both in download as well as upload. Also multithreading it's something inherently related with transformers models, that shines with parallelism. Of course I could make a single threaded version of the app, but performances would be degraded. Perhaps, again if you prefer, I could think to serve the smaller models (<1B) in a single threaded fashion so that people with more constraints could decide to run only these smaller apps. Regarding libz2 your idea to wait a week is good: it's definitely solvable (I want either to make the binary static or to deliver the libraries with it) but I need some tinkering and a good Internet so I have to do it in my office and not at home. Same thing to answer to @Mickey: I will solve GLIBC problem by compiling everything in a virtual machine or docker container with an older version of Debian and then distribute the executable compiled there. In this way they are going to be compatible with any Linux distribution released in the last 10 years. @mmonnin: eheh yees because I am renting the server from a cloud provider ;) Then after some months I will have to understand because the options are either a server house at my university (good, even 40MB/s) or, for the first year, a home server and this would mean no more than 1MB/s... let's hope for the first solution :) There is a conference soon where I can submit a paper about the project and in this case (though it's hard to write a paper in such a short time) the good server is going to be more probable. @boboviz: The app is using less than 600MB because it hasn't loaded the model yet, but it needs more than 6GB to work correctly. So you need to increase the RAM of the virtual machine and then the maximum amount of ram used by Boinc in the Boinc manager preferences |
Send message Joined: 9 Feb 25 Posts: 5 Credit: 15,141 RAC: 1,469 |
to expand a bit for anyone who doesnt know some of the common LLM verbiage. "7B" "70B" etc are referring to the model size in billions of parameters. it's how big the model is and correlates to how much space the will take up. are there any plans to utilize GPUs? if you move to CUDA, the processing power of modern Nvidia GPUs will far outweigh CPUs on low precision inferencing. |
Send message Joined: 6 Feb 25 Posts: 3 Credit: 262 RAC: 20 |
@boboviz: The app is using less than 600MB because it hasn't loaded the model yet, but it needs more than 6GB to work correctly. So you need to increase the RAM of the virtual machine and then the maximum amount of ram used by Boinc in the Boinc manager preferences With 8Gb of virtual ram the situation is better. Sometimes the wus stop again, but after some times they restart |
Send message Joined: 8 Feb 25 Posts: 5 Credit: 80,626 RAC: 6,291 |
I am still confused on one point. How many threads does the app actually use? If the max is really only 4, then why is the task using all 8 of the threads on my machine? Shouldn't the max be automatically set from the server side? If it has to be set on the user side for some reason, a pull-down menu in the users project preferences page should be the way to do it. Many projects have this. Until then, assuming the number 4 is the actual max for the app, then I will use the app_config.xml setting from earlier in this thread. |
Send message Joined: 8 Feb 25 Posts: 11 Credit: 506,452 RAC: 41,992 |
On my system each task uses 4 threads so with my app_config above half the threads were being used. I have now since changed it to use 4 threads and the CPU is 100% busy. |
Send message Joined: 22 Jan 25 Posts: 12 Credit: 56 RAC: 5 |
The app by default uses all the available threads on the users' machine. This is not a bug but the default behaviour of boinc's plan class "mt". However, because I understand that this might not be optimal for every user, I really like this idea of the pull down menu in the project configuration. Could you please give me a link of a project that does it? If it's easy to implement, I can do it today or tomorrow. Better, if some of you knows the configuration parameters of the Boinc server, if you can directly show me the correct configuration to enable this behaviour. I like it! |
Send message Joined: 8 Feb 25 Posts: 6 Credit: 52,226 RAC: 4,100 |
Ask on Boinc Network discord server https://discord.gg/qQYfRG64z4 |
Send message Joined: 9 Feb 25 Posts: 5 Credit: 15,141 RAC: 1,469 |
I ran a task on my system yesterday. i did not make any customizations with app_config. it was set to use only 4 threads by default from the project and was passing the --nthreads 4 parameter in the app_version in client state. it only used 4 threads, not all of them. |
![]() Send message Joined: 7 Feb 25 Posts: 8 Credit: 33,666 RAC: 2,909 |
In reply to manalog's message of 10 Feb 2025: The app by default uses all the available threads on the users' machine. This is not a bug but the default behaviour of boinc's plan class "mt". However, because I understand that this might not be optimal for every user, I really like this idea of the pull down menu in the project configuration. Prime Grid http://www.primegrid.com/home.php does it as does LHC https://lhcathome.cern.ch/lhcathome here is LHC's way: Resource share --- Use CPU Use AMD GPU Run test applications? Is it OK for LHC@home and your team (if any) to email you? Emails will be sent from boinc-server-admin@cern.ch; make sure your spam filter accepts this address. Should LHC@home show your computers on its web site? General terms-of-use for this BOINC project. Do you consent to exporting your data to BOINC statistics aggregation Web sites? Default computer location --- Run only the selected applications SixTrack: yes sixtracktest: yes CMS Simulation: no Theory Simulation: no ATLAS Simulation: yes If no work for selected applications is available, accept work from other applications? Run native if available? (Not recommended for Windows) Max # jobs No limit Max # CPUs 3 Edit preferences |
Send message Joined: 6 Feb 25 Posts: 1 Credit: 0 RAC: 0 |
In reply to manalog's message of 10 Feb 2025: The app by default uses all the available threads on the users' machine. This is not a bug but the default behaviour of boinc's plan class "mt". However, because I understand that this might not be optimal for every user, I really like this idea of the pull down menu in the project configuration. MilkyWay has a pull down menu to set 'Max # of threads for each MilkyWay@home task'. |
Send message Joined: 9 Feb 25 Posts: 5 Credit: 15,141 RAC: 1,469 |
the input parameters don't work properly on the app. doesnt matter what --nthreads value you put. the app still only uses 4 threads while the system thinks it's using however many threads your system has. --nthreads 2 -> results in 4 threads being used. --nthreads 32 -> results in 4 threads being used. I'm guessing it's hard coded in the binary somewhere. if you want boinc to behave in a way that matches reality, you can apply this app_config.xml so you can properly run multiple tasks according to your preferences. running more tasks doesnt using more RAM. seems like they all use the same data in memory. <app_config> <app_version> <app_name>minerva-7bI-q6-llama-inference</app_name> <plan_class>mt</plan_class> <avg_ncpus>4</avg_ncpus> <cmdline>--nthreads 4</cmdline> </app_version> </app_config> |
Send message Joined: 8 Feb 25 Posts: 11 Credit: 506,452 RAC: 41,992 |
In reply to manalog's message of 10 Feb 2025: The app by default uses all the available threads on the users' machine. This is not a bug but the default behaviour of boinc's plan class "mt". However, because I understand that this might not be optimal for every user, I really like this idea of the pull down menu in the project configuration. PrimeGrid - https://www.primegrid.com/ Amicable - https://sech.me/boinc/Amicable/ Milkyway - https://milkyway.cs.rpi.edu/milkyway/ Yafu - https://yafu.myfirewall.org/yafu/ |
![]() Send message Joined: 7 Feb 25 Posts: 8 Credit: 33,666 RAC: 2,909 |
MilkyWay does it like this in the venue section Resource share --- Use CPU Use AMD GPU Use NVIDIA GPU Is it OK for MilkyWay@home and your team (if any) to email you? Should MilkyWay@home show your computers on its web site? Default computer location --- Color scheme for graphics Tahiti Sunset Maximum CPU % for graphics 0...100 20 Run only the selected applications Milkyway@home N-Body Simulation: no Milkyway@home N-Body Simulation with Orbit Fitting: yes If no work for selected applications is available, accept work from other applications? no Max # of simultaneous MilkyWay@home tasks No limit Max # of threads for each MilkyWay@home task 1 Edit preferences |
Send message Joined: 12 Feb 25 Posts: 1 Credit: 11,320 RAC: 1,073 |
UBUNTU 20.04. doesn't work UBUNTU 24.04. works without any problems. To get docker ready to run only these two steps were necessary on my machines: sudo apt-get install docker.io sudo ln -s /usr/bin/docker /bin/unknown No restart of BOINC necessary The other two lines didn't work or had no effect. |
©2025 Matteo Rinaldi