Проект TBEG недоступен

Message boards : Number crunching : Проект TBEG недоступен
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Tomas Brada

Send message
Joined: 14 Jan 19
Posts: 119
Credit: 574
RAC: 0
Message 5305 - Posted: 24 Feb 2020, 18:38:46 UTC

Dear Crunchers and Natalia!
Basically, the database was corrupted again.
Do not panic! I have very recent backup (few minutes before the incident) and the recently returned results are also archived.
I tried to restore the backup, it ran for a few hours and then corrupted again. And then again on another attempt. Additionally, I tried to install mariadb on my PC, and it just refused to work. Thank you crunchers for your contributions.
I am very angry about this and I need to take a break.
ID: 5305 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Natalia Makarova
Project scientist
Avatar

Send message
Joined: 6 Apr 17
Posts: 12858
Credit: 0
RAC: 0
Message 5306 - Posted: 24 Feb 2020, 19:04:06 UTC - in response to Message 5305.  
Last modified: 24 Feb 2020, 19:27:52 UTC

Tomas Brada
Очень сожалею.
Однажды я задала вам вопрос: "Почему повреждается БД?"
Вы не ответили на этот вопрос.
Повреждение БД повторяется.
Наверное, надо искать причину: почему БД периодически повреждается?
Если бы это был только единичный случай, можно было бы понять - какой-то случайный сбой техники.
Но когда это происходит периодически - надо найти причину и устранить её.
Такая работа проекта никуда не годится!

I have very recent backup (few minutes before the incident) and the recently returned results are also archived.
I tried to restore the backup, it ran for a few hours and then corrupted again. And then again on another attempt.

Это тоже мало понятно. Почему резервная копия БД не восстановилась и раз, и два?
Можно предположить, что и резервная копия повреждена. Это возможно?
Что делать в этом случае?

К сожалению, история повторяется. В проекте Stop@home тоже повредилась БД.
Резервной копии, как я понимаю, не было.
Проект был остановлен.

Я разочаровалась в BOINC-проектах.
Смотрите сообщение
https://boinc.progger.info/odlk/forum_thread.php?id=1&postid=5301#5301

Предлагаю вам перестать мучиться с BOINC и перейти в ручной подпроект, если вы хотите мне помогать в моих исследованиях.
Это успешно делает XAVER и раньше делал Demis.
У меня есть эффективные алгоритмы поиска ОДЛК, вы в этом убедились.
Мне очень не хватает вычислительных ресурсов!
Мои мольбы к администраторам проектов ODLK и ODLK1 не возымели никакого действия. Там крутится алгоритм грубой силы.
Ну пусть крутится до скончания века.

Кстати, о БД результатов по ОДЛК в проекте TBEG.
Я обрабатывала результаты до последнего, последнюю порцию сняла вчера.
Вся БД решений у меня есть.
Первая часть (более миллиона КФ ОДЛК) была мной выложена.
Вторая часть на данный момент содержит 717728 КФ ОДЛК.
Завтра выложу вторую часть БД.
На этом я ставлю точку в БД КФ ОДЛК вашего проекта.
Если вы будете продолжать проект, займитесь также обработкой результатов. Это можно сделать автоматически.
ID: 5306 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Natalia Makarova
Project scientist
Avatar

Send message
Joined: 6 Apr 17
Posts: 12858
Credit: 0
RAC: 0
Message 5315 - Posted: 25 Feb 2020, 16:25:36 UTC
Last modified: 25 Feb 2020, 17:00:26 UTC

Выкладываю обещанную вторую часть БД проекта TBEG
https://cloud.mail.ru/public/5gJK/58nEGzmF5

The results of TBEG project for 4/11/2019 - 24/02/2020.

©2020 A.Belyshev & N.Makarova & T.Brada

717,728 unique CFs ODLS

Created 25/02/2020

Я уже объясняла наличие трёх авторов, когда выкладывала первую часть БД.

Итого в двух частях БД проекта TBEG содержится на данный момент 1745285 КФ ОДЛК.

Tomas Brada
я уже писала выше, что прекращаю обработку БД КФ ОДЛК вашего проекта.
Если вы будете продолжать проект, пожалуйста, сделайте автоматическую обработку результатов.
Это очень просто сделать, вы хорошо с этим знакомы.
Продолжайте начатую мной БД.
Мне трудно работать с такими количествами квадратов на моём ПК с 2 Гб оперативной памяти.
Поэтому я ставила условие перед запуском эксперимента, что будет автоматическая обработка результатов. Пока её нет.

PS. Первая часть БД проекта TBEG выложена тут
https://boinc.progger.info/odlk/forum_thread.php?id=104&postid=4754#4754
ID: 5315 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Tomas Brada

Send message
Joined: 14 Jan 19
Posts: 119
Credit: 574
RAC: 0
Message 5317 - Posted: 25 Feb 2020, 21:30:18 UTC

We will see how it turns out. Regarding the backup: I am very sure that my backup of the project is not corrupted. It is the server that is having problems, probably the disk, corrupting the active version of the database. The DB has checksums.
Good news: The backup imported on database installed on my computer and SPT results are slowly, but surely, being processed.
I will still take a break from working on the server. Maybe a week or two.
ID: 5317 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile UBT - Timbo

Send message
Joined: 21 Jun 17
Posts: 4
Credit: 3,156,400
RAC: 0
Message 5318 - Posted: 26 Feb 2020, 11:42:27 UTC - in response to Message 5317.  

Hi Tomas

So, the backup is ok but the server is broken?

Now the tricky question: When do you think the project will be online again?

I am sure that many (including me) have completed tasks that are now just sitting on their computers waiting to be uploaded...maybe you can make a new (temporary) server available and just allow everyone to upload their tasks ? (But not to download new tasks).

And perhaps, you can send out a message via BOINC Manager to let people know what is going on?

regards
Tim
ID: 5318 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Tomas Brada

Send message
Joined: 14 Jan 19
Posts: 119
Credit: 574
RAC: 0
Message 5319 - Posted: 26 Feb 2020, 12:48:39 UTC - in response to Message 5318.  

Hi Timbo!

> So, the backup is ok but the server is broken?
Yes, the backup is consistent. The baskup imported into my computer also looks OK, it is currently being processed. It is likely a failing disk, but I am not sure. Could also be memory or the mysql software. I need to buy a new disk.

> Now the tricky question: When do you think the project will be online again? I am sure that many (including me) have completed tasks that are now just sitting on their computers waiting to be uploaded...maybe you can make a new (temporary) server available and just allow everyone to upload their tasks ? (But not to download new tasks).

It will be online when it is done, no ETA. I could enable file uploads, but I would risk corruption of them as well. I will think about it.

> And perhaps, you can send out a message via BOINC Manager to let people know what is going on?
No, this is not possible right now.

Thank you, Timbo, for writing.
ID: 5319 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile UBT - Timbo

Send message
Joined: 21 Jun 17
Posts: 4
Credit: 3,156,400
RAC: 0
Message 5320 - Posted: 26 Feb 2020, 15:38:19 UTC - in response to Message 5319.  
Last modified: 26 Feb 2020, 15:39:39 UTC

Hi Tomas

Many thanks for the swift reply.

I will pass on the info to our Team so that the current members involved with your project (just 4 including myself) are aware of the situation.

I can't do anything to help inform the other 333 active members who have joined the project though :-(

re: Server "fix".
Good luck getting the server fixed. If it is local to you, then it could be very simple to get working again, if it just needs a new HDD...as I am sure that not only do you have DB backups but also OS backups as well (which will allow the original project messageboard, teams and members databases to be restored to a new HDD).

If it were me, my plan, would be to test the server itself is OK, by running some basic tests from a USB memory stick or from a simple Linux installer from a CDROM/DVDROM (but removing the original HDD first !!).

Once I knew the server was OK, then install a new HDD and restore the OS backup. If I ever suspect a HDD problem, it is cheaper in the long run to just get a new (and probably larger) HDD...trying to save a few dollars on the old HD doesn't make sense if it fails again later on.

Just my 2c !!

regards
Tim
ID: 5320 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Tomas Brada

Send message
Joined: 14 Jan 19
Posts: 119
Credit: 574
RAC: 0
Message 5321 - Posted: 26 Feb 2020, 17:47:39 UTC - in response to Message 5320.  

I agree, The HDD will be tossed.
The static system files are installed on raid1 btrfs over a SD-card and the HDD. Btrfs is checksummed filesystem, and it reports no erros, so I know that is not corrupted. The dynamic data files (boinc upload/download and misc) are on btrfs on single over the HDD with metadata dup, again, it's checksummend and reports no errors. The database (boinc results, science results, including the message boards, nextcloud) was on InnoDB on ext4 in writeback over the HDD. InnoDB is checksummed database, and it reported no errors at the time of the backup, so I know the backup is correct. It did report out-of-bounds errors and crashed. SMARTctl did NOT report errors and also InnoDB did NOT report checksum errors, which is strange.
I have incremental semi-automatic backups over ssh set up. Even the boot partition was backed up for fast recovery in case of card or disk failure. I had spare card and spare disk on hand. And it did happen once, disk failed back in the alpha stage, but at that time the db and data partitions were small and raided over to the card, so it was quick swap. And that's why I have no spare disk now.
The requirements for the disk are minor. It just has to be 2.5" sata. The DB is under 8 gigs and data is under 32 gigs. I am considering getting an SSD instead, because small ssds cost the same as small rust-drives today, but I had bad experience with a similar small SSD.
While we are here, the processing finished and will show some stats. The counts include duplicates and disabled groups, but the SPT(24), SPT(17), STPT(16) and TPT(10*2) are rare and exciting.
ok=357052 inval=270
Count SPT: 9:1415304 10:0 11:320979 12:15032 13:6131 14:6598462 15:60 16:530544 17:1 18:25702 19:0 20:1314 21:0 22:69 23:0 24:4
Count STPT: 8:40655218 9:0 10:472777 11:0 12:22856 13:0 14:25 15:0 16:2
Count TPT: 6:2107451 7:98346 8:3084 9:88 10:2
# DB select
kind    k       start                   ofs
spt     17      589492143270716899      24 30 60 6 72 12 6 12
spt     24      528050771271601307      14 6 66 6 22 26 4 30 8 4 18 8
spt     24      587950582712698157      2 22 12 6 24 30 14 10 2 54 18 28
spt     24      22930603692243271       70 6 42 18 20 4 18 24 20 16 12 128
tpt     10      3324648277099157        52 16 10 64 10 16 58 22 28
tpt     10      31910610414019031       34 22 94 28 10 10 58 106 46
stpt    16      2640138520272677        2 10 2 16 2 22 2 34
ID: 5321 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Natalia Makarova
Project scientist
Avatar

Send message
Joined: 6 Apr 17
Posts: 12858
Credit: 0
RAC: 0
Message 5323 - Posted: 27 Feb 2020, 1:55:44 UTC - in response to Message 5321.  

While we are here, the processing finished and will show some stats. The counts include duplicates and disabled groups, but the SPT(24), SPT(17), STPT(16) and TPT(10*2) are rare and exciting.
ok=357052 inval=270
Count SPT: 9:1415304 10:0 11:320979 12:15032 13:6131 14:6598462 15:60 16:530544 17:1 18:25702 19:0 20:1314 21:0 22:69 23:0 24:4
Count STPT: 8:40655218 9:0 10:472777 11:0 12:22856 13:0 14:25 15:0 16:2
Count TPT: 6:2107451 7:98346 8:3084 9:88 10:2
# DB select
kind    k       start                   ofs
spt     17      589492143270716899      24 30 60 6 72 12 6 12
spt     24      528050771271601307      14 6 66 6 22 26 4 30 8 4 18 8
spt     24      587950582712698157      2 22 12 6 24 30 14 10 2 54 18 28
spt     24      22930603692243271       70 6 42 18 20 4 18 24 20 16 12 128
tpt     10      3324648277099157        52 16 10 64 10 16 58 22 28
tpt     10      31910610414019031       34 22 94 28 10 10 58 106 46
stpt    16      2640138520272677        2 10 2 16 2 22 2 34

Обсуждение здесь
https://boinc.progger.info/odlk/forum_thread.php?id=49&postid=5322#5322
ID: 5323 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Natalia Makarova
Project scientist
Avatar

Send message
Joined: 6 Apr 17
Posts: 12858
Credit: 0
RAC: 0
Message 5339 - Posted: 1 Mar 2020, 6:01:38 UTC

nabializm
ваш пост скрыт, как не имеющий отношения к теме.
ID: 5339 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Tomas Brada

Send message
Joined: 14 Jan 19
Posts: 119
Credit: 574
RAC: 0
Message 5342 - Posted: 1 Mar 2020, 19:32:14 UTC
Last modified: 1 Mar 2020, 20:00:41 UTC

The server received an upgrade of one SSD to replace the presumably faulty HDD. SSD was chosen, because it costs the same as harddrive. I have migrated logical volumes over to the new drive, updated the kernel, updated the operating system, recompiled latest boinc serve and currently I am testing the new drive. I have a small suspicion that there is electrical interference or loose cable somewhere.
Anyway, I re-enabled the file upload handler, so the last outstanding files can be uploaded, even before the database and full operation is resumed.
ID: 5342 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Natalia Makarova
Project scientist
Avatar

Send message
Joined: 6 Apr 17
Posts: 12858
Credit: 0
RAC: 0
Message 5343 - Posted: 2 Mar 2020, 1:03:26 UTC - in response to Message 5342.  

Tomas Brada
большое спасибо за информацию.
Есть надежда на скорое возвращение проекта?
ID: 5343 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Tomas Brada

Send message
Joined: 14 Jan 19
Posts: 119
Credit: 574
RAC: 0
Message 5349 - Posted: 2 Mar 2020, 20:07:48 UTC
Last modified: 2 Mar 2020, 20:08:03 UTC

Announce: Forum and server pages are now back online.
ID: 5349 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Tomas Brada

Send message
Joined: 14 Jan 19
Posts: 119
Credit: 574
RAC: 0
Message 5509 - Posted: 26 Apr 2020, 18:24:29 UTC

Announce: power outage taken out the server boot filesystem. We restore it from backup. And take time to install battery backup.
ID: 5509 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Natalia Makarova
Project scientist
Avatar

Send message
Joined: 6 Apr 17
Posts: 12858
Credit: 0
RAC: 0
Message 6163 - Posted: 2 Aug 2020, 11:43:42 UTC

Есть тут кто-нибудь живой?
У меня уже дня три не открывается проект TBEG, такое выдаётся



Это только у меня проблема?
Пожалуйста, отзовитесь кто-нибудь.
My new article "SOLS and SODLS"
in Russian
https://yadi.sk/d/nvdI6TgBrKv72A
in English https://yadi.sk/d/VeY9bx6_q6CcZg
ID: 6163 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Natalia Makarova
Project scientist
Avatar

Send message
Joined: 6 Apr 17
Posts: 12858
Credit: 0
RAC: 0
Message 6165 - Posted: 3 Aug 2020, 3:04:44 UTC

Сегодня проект TBEG у меня открывается.
Правда, пока не вижу там ничего нового.
My new article "SOLS and SODLS"
in Russian
https://yadi.sk/d/nvdI6TgBrKv72A
in English https://yadi.sk/d/VeY9bx6_q6CcZg
ID: 6165 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : Number crunching : Проект TBEG недоступен


©2024 (C) Progger