Re: [Isis-fish-devel] Problem with ISIS simulations on Caparmor
Bernard PREVOSTO wrote:
Bonjour,
en effet, je m'en souviens le samedi on a eu un probleme sur le serveur disque ce qui a entrainé des probleme sur PBS au redemarrage, j'ai constaté que ces 3 jobs étaient en attente, mais ne redémarraient pas je les ai donc tué Désolé de ne pas vous avoir prévenu
Ce n'est pas grave, nous nous demandions si c'était un probleme qui venait de nous (trop d'espace disque utilisé, trop de mémoire, de temps de calcul,...) étant donné que c'étiat notre premier test à grande échelle. A priori ce n'était pas le cas, c'est donc plutôt une bonne nouvelle. Je vais donc relancer ces plans de simulation Jean Couteau Code Lutin
Bernard
Jean Couteau a écrit :
Tina ODAKA wrote:
you said you submitted 3 jobs, can you send me all 3 job-id ? thanks, tina
the 3 jobs are :
99338[].service4 99198[].service4 99346[].service4
Jean
Jean Couteau a écrit :
Tina ODAKA wrote:
hi jean, you need to check what was 'pbs job id' that is the number you get when you do qsub xxx and with this number you (or I)can type tracejob -n days xxx (days is number of days you submited before, like today is 4th, you submitted on 28 thus it is 7)
to see when the job died because of what. Ok, so I got that :
poussin@service4:~> tracejob -n 7 99338[].service4
Job: 99338[].service4
11/28/2009 12:36:28 S Job Modified at request of Scheduler@service4.ice.ifremer.fr 11/28/2009 12:36:28 A user=poussin group=emh jobname=simulation-as_S queue=sequentiel ctime=1259336077 qtime=1259336077 etime=1259336077 start=0 array_indices=0-1574 Resource_List.mem=3gb Resource_List.ncpus=1 Resource_List.nodect=1 Resource_List.place=pack Resource_List.select=1:mem=3gb:ncpus=1 Resource_List.walltime=96:00:00 11/28/2009 12:37:07 L Considering job to run 11/28/2009 12:37:07 L Queue sequentiel per-user job limit reached 11/28/2009 12:37:13 S delete job request received 11/28/2009 12:37:13 S Job to be deleted at request of root@service4.ice.ifremer.fr 11/28/2009 12:37:13 A requestor=root@service4.ice.ifremer.fr 11/28/2009 12:37:20 S delete job request received 11/28/2009 12:37:20 S Job to be deleted at request of root@service4.ice.ifremer.fr 11/28/2009 12:37:20 A requestor=root@service4.ice.ifremer.fr 11/28/2009 12:37:21 S dequeuing from sequentiel, state 7 11/28/2009 12:37:21 A user=poussin group=emh jobname=simulation-as_S queue=sequentiel ctime=1259336077 qtime=1259336077 etime=1259336077 start=0 array_indices=0-1574 Resource_List.mem=3gb Resource_List.ncpus=1 Resource_List.nodect=1 Resource_List.place=pack Resource_List.select=1:mem=3gb:ncpus=1 Resource_List.walltime=96:00:00 session=0 end=1259411841 Exit_status=0
poussin@service4:~> tracejob -n 7 99198[].service4
Job: 99198[].service4
11/28/2009 12:36:17 L Considering job to run 11/28/2009 12:36:17 L Queue sequentiel per-user job limit reached 11/28/2009 12:36:24 S delete job request received 11/28/2009 12:36:24 S Job to be deleted at request of root@service4.ice.ifremer.fr 11/28/2009 12:36:24 A requestor=root@service4.ice.ifremer.fr 11/28/2009 12:37:13 S delete job request received 11/28/2009 12:37:13 S Job to be deleted at request of root@service4.ice.ifremer.fr 11/28/2009 12:37:13 S dequeuing from sequentiel, state 7 11/28/2009 12:37:13 A requestor=root@service4.ice.ifremer.fr 11/28/2009 12:37:13 A user=poussin group=emh jobname=simulation-as_r queue=sequentiel ctime=1259331670 qtime=1259331671 etime=1259331671 start=0 array_indices=0-1499 Resource_List.mem=3gb Resource_List.ncpus=1 Resource_List.nodect=1 Resource_List.place=pack Resource_List.select=1:mem=3gb:ncpus=1 Resource_List.walltime=96:00:00 session=0 end=1259411833 Exit_status=0 11/28/2009 12:37:20 S delete job request received 11/28/2009 12:37:20 S Unknown Job Id
poussin@service4:~> tracejob -n 7 99346[].service4
Job: 99346[].service4
11/28/2009 12:37:07 L Considering job to run 11/28/2009 12:37:07 L Queue sequentiel per-user job limit reached 11/28/2009 12:37:13 S delete job request received 11/28/2009 12:37:13 S Job to be deleted at request of root@service4.ice.ifremer.fr 11/28/2009 12:37:13 S dequeuing from sequentiel, state 1 11/28/2009 12:37:13 A requestor=root@service4.ice.ifremer.fr 11/28/2009 12:37:20 S delete job request received 11/28/2009 12:37:20 S Unknown Job Id
Un nouveau plan de simulations a été stoppé aujourd'hui : poussin@service4:~> tracejob 102815[].service4 Job: 102815[].service4 12/04/2009 11:15:54 S enqueuing into sequentiel, state 1 hop 1 12/04/2009 11:15:54 S Job Queued at request of poussin@service4.ice.ifremer.fr, owner = poussin@service4.ice.ifremer.fr, job name = simulation-as_r, queue = sequentiel 12/04/2009 11:15:54 S Job Modified at request of Scheduler@service4.ice.ifremer.fr 12/04/2009 11:15:54 A queue=sequentiel 12/04/2009 11:15:54 A user=poussin group=emh jobname=simulation-as_r queue=sequentiel ctime=1259925354 qtime=1259925354 etime=1259925354 start=0 array_indices=0-1499 Resource_List.mem=3gb Resource_List.ncpus=1 Resource_List.nodect=1 Resource_List.place=pack Resource_List.select=1:mem=3gb:ncpus=1 Resource_List.walltime=96:00:00 12/04/2009 12:20:15 L Queue sequentiel per-user job limit reached 12/04/2009 12:20:22 L Considering job to run 12/04/2009 12:22:50 S dequeuing from sequentiel, state 7 12/04/2009 12:22:50 A user=poussin group=emh jobname=simulation-as_r queue=sequentiel ctime=1259925354 qtime=1259925354 etime=1259925354 start=0 array_indices=0-1499 Resource_List.mem=3gb Resource_List.ncpus=1 Resource_List.nodect=1 Resource_List.place=pack Resource_List.select=1:mem=3gb:ncpus=1 Resource_List.walltime=96:00:00 session=0 end=1259929370 Exit_status=0 Que s'est-il passé ? 80/1500 simulations ont tourné.
ce coup-ci on y est pour rien rien a signaler du côté systeme non plus Bernard Jean Couteau a écrit :
Un nouveau plan de simulations a été stoppé aujourd'hui :
poussin@service4:~> tracejob 102815[].service4
Job: 102815[].service4
12/04/2009 11:15:54 S enqueuing into sequentiel, state 1 hop 1 12/04/2009 11:15:54 S Job Queued at request of poussin@service4.ice.ifremer.fr, owner = poussin@service4.ice.ifremer.fr, job name = simulation-as_r, queue = sequentiel 12/04/2009 11:15:54 S Job Modified at request of Scheduler@service4.ice.ifremer.fr 12/04/2009 11:15:54 A queue=sequentiel 12/04/2009 11:15:54 A user=poussin group=emh jobname=simulation-as_r queue=sequentiel ctime=1259925354 qtime=1259925354 etime=1259925354 start=0 array_indices=0-1499 Resource_List.mem=3gb Resource_List.ncpus=1 Resource_List.nodect=1 Resource_List.place=pack Resource_List.select=1:mem=3gb:ncpus=1 Resource_List.walltime=96:00:00 12/04/2009 12:20:15 L Queue sequentiel per-user job limit reached 12/04/2009 12:20:22 L Considering job to run 12/04/2009 12:22:50 S dequeuing from sequentiel, state 7 12/04/2009 12:22:50 A user=poussin group=emh jobname=simulation-as_r queue=sequentiel ctime=1259925354 qtime=1259925354 etime=1259925354 start=0 array_indices=0-1499 Resource_List.mem=3gb Resource_List.ncpus=1 Resource_List.nodect=1 Resource_List.place=pack Resource_List.select=1:mem=3gb:ncpus=1 Resource_List.walltime=96:00:00 session=0 end=1259929370 Exit_status=0
Que s'est-il passé ? 80/1500 simulations ont tourné.
-- Bernard PREVOSTO - DOP/DCB/IDM/RIC IFREMER Centre de Brest Tel: 02 98 22 45 43 - Fax: 02 98 22 45 46 Email: Bernard.Prevosto@ifremer.fr
Bernard PREVOSTO wrote:
ce coup-ci on y est pour rien rien a signaler du côté systeme non plus
Bernard Nous avons identifié le problème : nous pensions avoir nettoyé l'espace disque, mais il restait des gros fichiers 'cachés'. c'était un problème de quota.
Jean Couteau
participants (2)
-
Bernard PREVOSTO -
Jean Couteau