Issues with running a second node on another processor

问题

I am trying to run a second node on a different processor, either an ARM or a second x86_64. I have a DomMgr running on one x86_64 and attempting to start a node on either another x86_64 or arm using nodeBooter. The DevMgr starts and registers with the DomMgr but when it starts the GPP device it "Requesting IDM CHANNEL IDM CHANNEL IDM_CHANNEL" and then immediately "terminate called after throwing an instance of 'CORBA::OBJECT_NOT_EXIST'". The DomMgr printed out to the console that "Domain Channel: IDM_Channel created". Is it supposed to register that in the NameService or why does the remote DevMgr get an invalid object ref when it tries to get it?

I did not realize I could clarify my question by editing it to add new findings. I'll do that from now on.

By using ORBtraceLevel on the remote DevMgr I found that I had different problem on my remote x86-based DevMgr and my ARM-based one, even though the normal error messages were the same. The x86 case was simply that I my exported DevMgr dcd used the same name and id as one running locally on the Domain. When I fixed that I have no problem with the x86-based remote DevMgr starting its GPP device and registering.

But this is NOT the problem for the ARM-based case. With traceLevel=10 I started DevMgr on both my x86 successfully and my ARM and compared the outputs. First I should mention that my ARM is running Ubuntu 16.04 on a RaspberryPi 3. The cpu is 64-bit but no distro for either Ubuntu or CentOS is available as 64-bit so the OS is 32-bit Ubuntu for now. I know that RedHawk 2.0 says it only now supports 64-bit CentOS so perhaps that is the problem, although I was able to build RedHawk with no trouble and most of it works fine. But trace does show two warnings

WARN Device_impl:172 - Cannot set allocation implementation: Property ### is 
of type 'CORBA::Long' (not 'int')

which do not show in the x86 case and I believe are due to the different sizes of int. If I do not start an Event Service on the domain, these same warnings show but I am able to start the GPP fine and run waveforms. So I do not know if this is related to my OBJECT_NOT_FOUND error in GPP or not but thought I should mention it.

Trace shows one successful

Creating ref to remote: REDHAWK.DEV.IDM.Channel
target id   :IDL:omg.org/CosEventChannelAdmin/EventChannel:1.0
most derived id:
Adding root/Files<3> (activating) to object table.

but on the second case it immedately shows

Adding root<3> (activating) to object table.

followed by

throw OBJECT_NOT_EXIST from GIOP_C.cc:281 (NO,OBJECT_NOT_EXIST_NoMatch)
throw OBJECT_NOT_EXIST from omniOrbRef.cc:829 (NO,OBJECT_NOT_EXIST_NoMatch)

and then GPP terminates with signal 6.

The successful x86 trace shows the same Creating ref and Adding root<3> but then has

Creating ref to remote: root/REDHAWK_DEV.IDM_Channel <...>

Can this be related to the 32-bit vs 64-bit or why would this happen only on the ARM based GPP?

Note that I have iptables accepting any traffic from my subdomain on x86s and is not running at all on the ARM. There is a lot of successful connections including queries with nameclt, so this is not (as far as I can tell) a network connection issue.

回答1:

What version of REDHAWK are you running? What OS? Can you provide a list of all the omni rpms you have installed on your machine?

回答2:

It sounds like something is miss-configured on your system, perhaps IPTables or selinux? Lets walk through a quick example to show the minimum needed configuration and running processes needed for a multi-node system. If this does not clear things up, I'd suggest rerunning the domain and device manager with TRACE level debugging enabled and examine the output for any anomalies or disable selinux and iptables temporarily to rule them out as issues.

I'll use a REDHAWK 2.0.1 docker image as a tool to walk through the example. The installation steps used to build this image can be found here.

First we'll drop into a a REDHAWK 2.0.1 environment with nothing running and label this container as our domain manager

[youssef@axios(0) docker-redhawk]$docker run -it --name=domainMgr axios/redhawk:2.0

Let's confirm that almost nothing is running on this container

[redhawk@ce4df2ff20e4 ~]$ ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
redhawk      1     0  0 12:55 ?        00:00:00 /bin/bash -l
redhawk     27     1  0 12:57 ?        00:00:00 ps -ef

Lets take a look at the current omniORB configuration file. This will be the box we run omniNames, omniEvents and the domain manager.

[redhawk@ce4df2ff20e4 ~]$ cat /etc/omniORB.cfg 
InitRef = NameService=corbaname::127.0.0.1:2809
supportBootstrapAgent = 1
InitRef = EventService=corbaloc::127.0.0.1:11169/omniEvents

Since this will be the machine we are running omniNames and omniEvents on, the loopback address (127.0.0.1) is fine however other machines will need to reference this machine either by its hostname (domainMgr) or it's IP address so we can note it's IP now.

[redhawk@ce4df2ff20e4 ~]$ ifconfig
eth0      Link encap:Ethernet  HWaddr 02:42:AC:11:00:0E  
          inet addr:172.17.0.14  Bcast:0.0.0.0  Mask:255.255.0.0
          inet6 addr: fe80::42:acff:fe11:e/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:6 errors:0 dropped:0 overruns:0 frame:0
          TX packets:7 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:468 (468.0 b)  TX bytes:558 (558.0 b)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

Note it only has a single interface so we do not need to specify an endPoint. However specifying the unix socket endpoint would provide a performance boost for any locally running components.
We can now startup omniNames, omniEvents, and the domain manager and after each step see what is running. The "extra operand" output on omniNames is expected on newer versions of CentOS6 and is an issue with the omniNames init script.

[redhawk@ce4df2ff20e4 ~]$ sudo service omniNames start
Starting omniNames: /usr/bin/dirname: extra operand `2>&1'
Try `/usr/bin/dirname --help' for more information.
                                                           [  OK  ]

[redhawk@ce4df2ff20e4 ~]$ ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
redhawk      1     0  0 12:55 ?        00:00:00 /bin/bash -l
omniORB     50     1  0 13:01 ?        00:00:00 /usr/bin/omniNames -start -always -logdir /var/log/omniORB/ -errlog /var/log/omniORB/error.log
redhawk     53     1  0 13:01 ?        00:00:00 ps -ef

[redhawk@ce4df2ff20e4 ~]$ sudo service omniEvents start
Starting omniEvents                                        [  OK  ]

[redhawk@ce4df2ff20e4 ~]$ ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
redhawk      1     0  0 12:55 ?        00:00:00 /bin/bash -l
omniORB     50     1  0 13:01 ?        00:00:00 /usr/bin/omniNames -start -always -logdir /var/log/omniORB/ -errlog /var/log/omniORB/error.log
root        69     1  0 13:01 ?        00:00:00 /usr/sbin/omniEvents -P /var/run/omniEvents.pid -l /var/lib/omniEvents -p 11169
redhawk     79     1  0 13:01 ?        00:00:00 ps -ef

I'm going to start up the domain manager in the foreground and grab the output of ps -ef via a "docker exec domainMgr ps -ef" in a different terminal

[redhawk@ce4df2ff20e4 ~]$ nodeBooter -D
2016-06-22 13:03:21 INFO  DomainManager:257 - Loading DEFAULT logging configuration. 
2016-06-22 13:03:21 INFO  DomainManager:368 - Starting Domain Manager
2016-06-22 13:03:21 INFO  DomainManager_impl:208 - Domain Channel: ODM_Channel created.
2016-06-22 13:03:21 INFO  DomainManager_impl:225 - Domain Channel: IDM_Channel created.
2016-06-22 13:03:21 INFO  DomainManager:455 - Starting ORB!

[youssef@axios(0) docker-redhawk]$docker exec domainMgr ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
redhawk      1     0  0 12:55 ?        00:00:00 /bin/bash -l
omniORB     50     1  0 13:01 ?        00:00:00 /usr/bin/omniNames -start -always -logdir /var/log/omniORB/ -errlog /var/log/omniORB/error.log
root        69     1  0 13:01 ?        00:00:00 /usr/sbin/omniEvents -P /var/run/omniEvents.pid -l /var/lib/omniEvents -p 11169
redhawk     80     1  0 13:03 ?        00:00:00 DomainManager DEBUG_LEVEL 3 DMD_FILE /domain/DomainManager.dmd.xml DOMAIN_NAME REDHAWK_DEV FORCE_REBIND false PERSISTENCE true SDRROOT /var/redhawk/sdr
redhawk     93     0  1 13:03 ?        00:00:00 ps -ef

So we can see that we have omniNames, omniEvents, and the DomainManager binaries running. Time to move on to a new node for the device manager.
In a new terminal I create a new container and call it deviceManager

[youssef@axios(0) docker-redhawk]$docker run -it --name=deviceManager axios/redhawk:2.0

Confirm nothing is really running, then take a look at the omniORB configuration file.

[redhawk@765ce325f145 ~]$ ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
redhawk      1     0  0 13:05 ?        00:00:00 /bin/bash -l
redhawk     28     1  0 13:06 ?        00:00:00 ps -ef

[redhawk@765ce325f145 ~]$ cat /etc/omniORB.cfg 
InitRef = NameService=corbaname::127.0.0.1:2809
supportBootstrapAgent = 1
InitRef = EventService=corbaloc::127.0.0.1:11169/omniEvents

We need to change where the NameService and EventService IPs are pointing to either our domain managers hostname (domainMgr) or IP address (172.17.0.14) I will go with IP address.

[redhawk@765ce325f145 ~]$ sudo sed -i 's,127.0.0.1,172.17.0.14,g' /etc/omniORB.cfg

[redhawk@765ce325f145 ~]$ cat /etc/omniORB.cfg 
InitRef = NameService=corbaname::172.17.0.14:2809
supportBootstrapAgent = 1
InitRef = EventService=corbaloc::172.17.0.14:11169/omniEvents

We can confirm this worked using nameclt list to show the entry in omniNames of the event channel factory and the domain.

[redhawk@765ce325f145 ~]$ nameclt list
EventChannelFactory
REDHAWK_DEV/

Finally we can start up the device manager and inspect the running processes in a new shell via "docker exec deviceManager ps -ef"

[redhawk@765ce325f145 ~]$ nodeBooter -d /var/redhawk/sdr/dev/nodes/DevMgr_12ef887a9000/DeviceManager.dcd.xml 
2016-06-22 13:09:09 INFO  DeviceManager:446 - Starting Device Manager with /nodes/DevMgr_12ef887a9000/DeviceManager.dcd.xml
2016-06-22 13:09:09 INFO  DeviceManager_impl:367 - Connecting to Domain Manager REDHAWK_DEV/REDHAWK_DEV
2016-06-22 13:09:09 INFO  DeviceManager:494 - Starting ORB!
2016-06-22 13:09:09 INFO  Device:995 - DEV-ID:DCE:c5029226-ce70-48d9-9533-e025fb9c2a34 Requesting IDM CHANNEL IDM_Channel
2016-06-22 13:09:09 INFO  redhawk::events::Manager:573 - PUBLISHER - Channel:IDM_Channel Reg-Id21f4e766-c5c6-4c5b-8974-337736e71f87 RESOURCE:DCE:c5029226-ce70-48d9-9533-e025fb9c2a34
2016-06-22 13:09:09 INFO  DeviceManager_impl:1865 - Registering device GPP_12ef887a9000 on Device Manager DevMgr_12ef887a9000
2016-06-22 13:09:09 INFO  DeviceManager_impl:1907 - Device LABEL: GPP_12ef887a9000  SPD loaded: GPP' - 'DCE:4e20362c-4442-4656-af6d-aedaaf13b275
2016-06-22 13:09:09 INFO  GPP:658 - initialize()
2016-06-22 13:09:09 INFO  redhawk::events::Manager:626 - SUBSCRIBER - Channel:ODM_Channel Reg-Id0d18c1f4-71bf-42c2-9a2d-416f16af9fcf resource:DCE:c5029226-ce70-48d9-9533-e025fb9c2a34
2016-06-22 13:09:09 INFO  GPP_i:679 - Component Output Redirection is DISABLED.
2016-06-22 13:09:09 INFO  GPP:1611 - Affinity Disable State,  disabled=1
2016-06-22 13:09:09 INFO  GPP:1613 - Disabling affinity processing requests.
2016-06-22 13:09:09 INFO  GPP_i:571 - SOCKET CPUS USER    SYSTEM  IDLE    

2016-06-22 13:09:09 INFO  GPP_i:577 - 0      8    0.00    0.00    0.00    
2016-06-22 13:09:09 INFO  GPP:616 -  initialize CPU Montior --- wl size 8
2016-06-22 13:09:10 INFO  GPP_i:602 - initializeNetworkMonitor: Adding interface (docker0)
2016-06-22 13:09:10 INFO  GPP_i:602 - initializeNetworkMonitor: Adding interface (em1)
2016-06-22 13:09:10 INFO  GPP_i:602 - initializeNetworkMonitor: Adding interface (lo)
2016-06-22 13:09:10 INFO  GPP_i:602 - initializeNetworkMonitor: Adding interface (tun0)
2016-06-22 13:09:10 INFO  GPP_i:602 - initializeNetworkMonitor: Adding interface (vboxnet0)
2016-06-22 13:09:10 INFO  GPP_i:602 - initializeNetworkMonitor: Adding interface (veth70de860)
2016-06-22 13:09:10 INFO  GPP_i:602 - initializeNetworkMonitor: Adding interface (vethd0227d6)
2016-06-22 13:09:10 INFO  DeviceManager_impl:2087 - Registering device GPP_12ef887a9000 on Domain Manager

[youssef@axios(0) docker-redhawk]$docker exec deviceManager ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
redhawk      1     0  0 13:05 ?        00:00:00 /bin/bash -l
redhawk     35     1  0 13:09 ?        00:00:00 DeviceManager DCD_FILE /nodes/DevMgr_12ef887a9000/DeviceManager.dcd.xml DEBUG_LEVEL 3 DOMAIN_NAME REDHAWK_DEV SDRCACHE /var/redhawk/sdr/dev SDRROOT /var/redhawk/sdr
redhawk     40    35  1 13:09 ?        00:00:00 /var/redhawk/sdr/dev/devices/GPP/cpp/GPP PROFILE_NAME /devices/GPP/GPP.spd.xml DEVICE_ID DCE:c5029226-ce70-48d9-9533-e025fb9c2a34 DEVICE_LABEL GPP_12ef887a9000 DEBUG_LEVEL 3 DOM_PATH REDHAWK_DEV/DevMgr_12ef887a9000 DCE:218e612c-71a7-4a73-92b6-bf70959aec45 False DCE:3bf07b37-0c00-4e2a-8275-52bd4e391f07 1.0 DCE:442d5014-2284-4f46-86ae-ce17e0749da0 0 DCE:4e416acc-3144-47eb-9e38-97f1d24f7700  DCE:5a41c2d3-5b68-4530-b0c4-ae98c26c77ec 0 DEVICE_MGR_IOR IOR:010000001900000049444c3a43462f4465766963654d616e616765723a312e3000000000010000000000000070000000010102000c0000003137322e31372e302e313500a49e00001c000000ff4465766963654d616e61676572fef58d6a570100002300000000000200000000000000080000000100000000545441010000001c00000001000000010001000100000001000105090101000100000009010100
redhawk    398     0  0 13:09 ?        00:00:00 ps -ef

So we've successfully spun up two machines on the same network with unique IP addresses, designated one as the domain manager, omniNames, and omniEvents server and the other as a Device Manager / GPP node. At this point, we could connect to the domain manager either via the IDE or through a python interface and launch waveforms; we would expect these waveforms to launch on the sole device manager node.

来源：https://stackoverflow.com/questions/37846192/issues-with-running-a-second-node-on-another-processor

标签

redhawksdr