[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

20001218: write errors to mcidas directory after 17:52 today



>From: Robert Mullenax <address@hidden>
>Organization: UCAR/Unidata
>Keywords: 200012190130.eBJ1Uqo23841

Robert,

>Even though I have data flowing and the GEMPAK decoders are writing
>output, I am getting constant write error from the McIDAS
>XCD DDS and HRS decoders.

Where were you seening the errors?

>Data stopped being decoded at 17:52
>today (I know there was a McIADS feed problem)

Yes, but that was earlier, and the XCD decoders do not work with the
Unidata-Wisconsin image products; the ldm-mcidas decoders do.

>and now
>I have deleted the queue, made a new one, checked for permission
>problems, stopped and restarted the LDM, but no dice..still continuous
>write errors.

It would have been nice to see a sample of those errors.

>This is on the same disk that the GEMPAK data is being
>written to and there were no changes at all to the system.

Weird.

>It just stopped working.  This is on our Sparc system
>psnldm.nsbf.nasa.gov.
>
>Help!!?

More below.

>From address@hidden Mon Dec 18 18:38:51 2000

>Okay after doing an ldm stop again ( third time) and an ldm clean
>it is working now.  The question is what happened in the first place..

That was what I was going to ask.

>I saw this the other day on the x86 system in New Mexico.  I remade the
>queue and that fixed it. The SPARC is running McIDAS-X 7.6/ldm-5.1.2/
>Solaris 7 and the x86 Solaris 8 with the same Unidata versions.

The XCD decoders should not care about the LDM queue.  The sequence of
events is:

o LDM gets products from upstream sites
o pqact sends products to either ingebin.k or ingetext.k depending on
  what kind of products we are talking about (binary/HRS or textual/DDS)
o ingebin.k and ingetext.k write the products they get from pqact to
  a spool: ingetext.k to the daily .XCD file; ingebin.k to HRS.SPL
o the XCD data monitors work their way through the spool to decode
  products into McIDAS files

The write error would have to come from ingetext.k and/or ingebin.k
having execute problems or not being able to write to their respective
spool files.  Are you sure that no changes were made to the McIDAS
binaries during this process?

Tom

>From address@hidden Mon Dec 18 19:38:37 2000

Sorry, Tom I did not give you much to work on.  I am in a slight
panic mode trying to get things ready for Australia..(this working
two jobs thing can get hectic).  Here is what I saw this evening
in ldmd.log after I saw the errors in ldmd.log and stopped
and started the LDM after remaking the queue.  Later on I
got HDS errors as well:


Dec 19 01:00:00 psnldm 140.172.240.73[3163]: run_requester: 
20001218233000.988 T
S_ENDT {{HDS,  ".*"}}
Dec 19 01:00:00 psnldm cirrus[3165]: run_requester: Starting Up: 
cirrus.al.noaa.
gov
Dec 19 01:00:00 psnldm cirrus[3165]: run_requester: 20001218233000.999 
TS_ENDT {
{FSL2|IDS|DDPLUS,  ".*"},{MCIDAS,  "^pnga2area Q[01]"
Dec 19 01:00:01 psnldm pqact[3159]: child 3164 exited with status 127
Dec 19 01:00:01 psnldm pqact[3159]: child 3161 exited with status 127
Dec 19 01:00:01 psnldm 140.172.240.73[3163]: FEEDME(140.172.240.73): OK
Dec 19 01:00:01 psnldm cirrus[3165]: FEEDME(cirrus.al.noaa.gov): OK
Dec 19 01:00:01 psnldm pqact[3159]: child 3167 exited with status 127
Dec 19 01:00:01 psnldm pqact[3159]: child 3169 exited with status 127
Dec 19 01:00:01 psnldm pqact[3159]: child 3171 exited with status 127
Dec 19 01:00:01 psnldm pqact[3159]: child 3173 exited with status 127
Dec 19 01:00:01 psnldm pqact[3159]: child 3175 exited with status 127
Dec 19 01:00:01 psnldm pqact[3159]: child 3177 exited with status 127
Dec 19 01:00:01 psnldm pqact[3159]: child 3180 exited with status 127
Dec 19 01:00:01 psnldm pqact[3159]: child 3183 exited with status 127
Dec 19 01:00:01 psnldm pqact[3159]: child 3185 exited with status 127
Dec 19 01:00:02 psnldm pqact[3159]: pbuf_flush (4) write: Broken pipe
Dec 19 01:00:02 psnldm pqact[3159]: pipe_dbufput: xcd_runDDS write error
Dec 19 01:00:02 psnldm pqact[3159]: pipe_prodput: trying again
Dec 19 01:00:02 psnldm pqact[3159]: child 3187 exited with status 127
Dec 19 01:00:02 psnldm pqact[3159]: child 3189 exited with status 127
Dec 19 01:00:02 psnldm pqact[3159]: child 3191 exited with status 127
Dec 19 01:00:02 psnldm pqact[3159]: child 3193 exited with status 127
Dec 19 01:00:02 psnldm pqact[3159]: child 3195 exited with status 127
--More--(0%)

Shortly after this I did an ldmadmin stop,clean, start and
all is well now (and continues to be fine).  The strange
thing is that even though DDS is running (a ps -eaf shows
ingetext.k DDS and I have new obs) the XCD_START.LOG
in ~mcidas/workdata only shows the HRS starting.  I am
sure no changes were made to the system.. It just started
spewing errors and then stopped after doing an ldmadmin clean
after having stopped and started a couple of times.  I checked
the inge*.k binaries and thet were from April 28, 2000 and
have not been messed with.  So I am really stumped.


Robert

>From address@hidden Mon Dec 18 20:21:40 2000

Tom,

I went over to wxmcidas which was fine and found it was doing the
same thing now, after I switched it's feed to other
than psnldm.  I have found the problem.  After doing ldmadmin
stop a couple of times and clean, I did an ldmadmin ps which
said no ldm running, etc..  However look at this:
all 1,042 messages.
/usr/local/ldm/logs% ps -lu ldm
F S   UID   PID  PPID  C PRI NI     ADDR     SZ    WCHAN TTY      TIME CMD
8 S  1002  1671  1670  0  99 20 e17347d8    652 e1734844 ?        0:00 
startxcd
8 S  1002  1501  1498  0  40 20 e172f7e0  87231 e172fa0c ?        0:03 
pqbinsta
8 S  1002  1682  1671  0  40 20 e1b34140    652 e15b4bf6 ?        0:13 
startxcd
8 S  1002  1673  1670  0  40 20 e17a47f8  87252 e17a4a24 ?       12:39 
pqbinsta
8 S  1002  1687  1682  0  40 20 e17230a8   4669 e15a00f6 ?       782:51 
dmgrid.k
8 R  1002 24678 14922  0  51 20 e19e6860    484          pts/2    0:00 tcsh
8 S  1002 18637  1682  0  40 20 e1b81158    920 e19aa6f6 ?        0:02 
dmmisc.k
8 S  1002 17789  1682  0  40 20 e1eb4860    918 e126d676 ?        0:15 
dmsfc.k
8 S  1002 18635  1682  0  41 20 e1b52840    902 e195fb96 ?        0:09 
dmsyn.k
8 S  1002 22471  1682  0  40 20 e0dd7760    854 e19f93d6 ?        0:02 
dmraob.k

I can't kill these off except by killing them one by one.
I have had trouble with ldm-5.1.2 getting it to stop, but have
not seen this before.

Robert

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy