Jump to content
OpenSplice DDS Forum


  • Content Count

  • Joined

  • Last visited

About rccampbe

  • Rank
  • Birthday 03/11/1983

Profile Information

  • Gender
  • Company
    General Dynamics
  1. I've got a Java application running OSPL 6.4.140407 on RHEL6. It's got numerous topics/publishers/subscribers/readers/writers, but only one domain participant. All of the messaging works fine and there are no problems until I try to clean up at application shutdown time. I call delete_contained_entities on my domain participant and end up getting a segmentation fault with the following backtrace that repeats for the 16k other frames: #16124 0x00007f90230f2885 in _c_freeReferences () from /usr/local/OpenSplice/HDE/x86_64.linux/lib/libddskernel.so #16125 0x00007f90230f2d5b in c_free () from /usr/local/OpenSplice/HDE/x86_64.linux/lib/libddskernel.so #16126 0x00007f90230fefb8 in c_clear () from /usr/local/OpenSplice/HDE/x86_64.linux/lib/libddskernel.so #16127 0x00007f90230f294a in _c_freeReferences () from /usr/local/OpenSplice/HDE/x86_64.linux/lib/libddskernel.so #16128 0x00007f90230f2d5b in c_free () from /usr/local/OpenSplice/HDE/x86_64.linux/lib/libddskernel.so #16129 0x00007f90230f2721 in _freeReference () from /usr/local/OpenSplice/HDE/x86_64.linux/lib/libddskernel.so #16130 0x00007f90230f2885 in _c_freeReferences () from /usr/local/OpenSplice/HDE/x86_64.linux/lib/libddskernel.so #16131 0x00007f90230f2d5b in c_free () from /usr/local/OpenSplice/HDE/x86_64.linux/lib/libddskernel.so #16132 0x00007f90230f2721 in _freeReference () from /usr/local/OpenSplice/HDE/x86_64.linux/lib/libddskernel.so #16133 0x00007f90230f2885 in _c_freeReferences () from /usr/local/OpenSplice/HDE/x86_64.linux/lib/libddskernel.so #16134 0x00007f90230f2d5b in c_free () from /usr/local/OpenSplice/HDE/x86_64.linux/lib/libddskernel.so #16135 0x00007f90230fefe8 in c_clear () from /usr/local/OpenSplice/HDE/x86_64.linux/lib/libddskernel.so #16136 0x00007f90230f294a in _c_freeReferences () from /usr/local/OpenSplice/HDE/x86_64.linux/lib/libddskernel.so #16137 0x00007f90230f2d5b in c_free () from /usr/local/OpenSplice/HDE/x86_64.linux/lib/libddskernel.so #16138 0x00007f90230f2721 in _freeReference () from /usr/local/OpenSplice/HDE/x86_64.linux/lib/libddskernel.so #16139 0x00007f90230f2a5a in _c_freeReferences () from /usr/local/OpenSplice/HDE/x86_64.linux/lib/libddskernel.so #16140 0x00007f90230f2763 in _freeReference () from /usr/local/OpenSplice/HDE/x86_64.linux/lib/libddskernel.so #16141 0x00007f90230f2885 in _c_freeReferences () from /usr/local/OpenSplice/HDE/x86_64.linux/lib/libddskernel.so #16142 0x00007f90230f2d5b in c_free () from /usr/local/OpenSplice/HDE/x86_64.linux/lib/libddskernel.so #16143 0x00007f9023152029 in v_handleRelease () from /usr/local/OpenSplice/HDE/x86_64.linux/lib/libddskernel.so #16144 0x00007f902317ec42 in u_handleRelease () from /usr/local/OpenSplice/HDE/x86_64.linux/lib/libddskernel.so #16145 0x00007f902317cc33 in u_entityRelease () from /usr/local/OpenSplice/HDE/x86_64.linux/lib/libddskernel.so #16146 0x00007f902317de9b in u_entityDeinit () from /usr/local/OpenSplice/HDE/x86_64.linux/lib/libddskernel.so #16147 0x00007f9023178658 in u_dispatcherDeinit () from /usr/local/OpenSplice/HDE/x86_64.linux/lib/libddskernel.so #16148 0x00007f9023186aba in u_readerDeinit () from /usr/local/OpenSplice/HDE/x86_64.linux/lib/libddskernel.so #16149 0x00007f90231767e0 in u_dataReaderFree () from /usr/local/OpenSplice/HDE/x86_64.linux/lib/libddskernel.so #16150 0x00007f902319446f in _DataReaderFree () from /usr/local/OpenSplice/HDE/x86_64.linux/lib/libddskernel.so #16151 0x00007f90231babf8 in _SubscriberDeleteContainedEntities () from /usr/local/OpenSplice/HDE/x86_64.linux/lib/libddskernel.so #16152 0x00007f90231bac2c in gapi_subscriber_delete_contained_entities () from /usr/local/OpenSplice/HDE/x86_64.linux/lib/libddskernel.so #16153 0x00007f90231985d0 in _DomainParticipantDeleteContainedEntitiesNoClaim () from /usr/local/OpenSplice/HDE/x86_64.linux/lib/libddskernel.so #16154 0x00007f902319cbac in gapi_domainParticipantFactory_delete_contained_entities () from /usr/local/OpenSplice/HDE/x86_64.linux/lib/libddskernel.so #16155 0x00007f902348109b in Java_org_opensplice_dds_dcps_DomainParticipantFactoryImpl_jniDeleteContainedEntities () from /usr/local/OpenSplice/HDE/x86_64.linux/lib/libdcpssaj.so I'll work on boiling this down to a manageable use case for reproducing it, but I thought this might be hitting a known issue someone could enlighten me about. Thanks, Rob
  2. I managed a work around which was acceptable for our purposes by adding a delay between subscription (DataReader set_listener/enable) and subsequent publication on the same topic. It appears that the asynchronous operations started by a subscription need a little time to finish up before the calling thread moves on to publishing. I won't be able to test it out with 6.5, but if you'd like to recreate it, I'd suggest a loop in which you create necessary DDS objects, call the DataReader's set_listener (with the ANY mask) and enable functions, then the DataWriter's write function, then clean up the DDS objects. Run it 50 times and I bet you see it.
  3. I'm not pursuing this any further. The build made it far enough to give me shared libraries with debug symbols, which is all I needed. But I still don't know why the build failed in that manner.
  4. Running a Java application on RHEL6 x86_64, I'm seeing a non-deterministic seg fault occur about 80% of the time during some automated Jenkins testing. The same tests running on a RHEL6 x86 platform have not caused any seg faults. Here's some of the GDB analysis: Program terminated with signal 6, Aborted. #0 0x0000003323632925 in raise () from /lib64/libc.so.6 Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.132.el6_5.2.x86_64 (gdb) where #0 0x0000003323632925 in raise () from /lib64/libc.so.6 #1 0x000000332363408d in abort () from /lib64/libc.so.6 #2 0x00007f47a50719c5 in os::abort(bool) () from /usr/local/java/jdk1.7.0_80/jre/lib/amd64/server/libjvm.so #3 0x00007f47a51f2607 in VMError::report_and_die() () from /usr/local/java/jdk1.7.0_80/jre/lib/amd64/server/libjvm.so #4 0x00007f47a50768af in JVM_handle_linux_signal () from /usr/local/java/jdk1.7.0_80/jre/lib/amd64/server/libjvm.so #5 <signal handler called> #6 0x00007f47a4e8a759 in jni_invoke_nonstatic(JNIEnv_*, JavaValue*, _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, Thread*) () from /usr/local/java/jdk1.7.0_80/jre/lib/amd64/server/libjvm.so #7 0x00007f47a4e995a1 in jni_CallVoidMethod () from /usr/local/java/jdk1.7.0_80/jre/lib/amd64/server/libjvm.so #8 0x00007f478b19b10e in saj_dataReaderListenerOnLivelinessChanged (listenerData=0x7f47a064b610, dataReader=0x7f47a0663a98, status=0x7f478968ed50) at /home/rob/Downloads/opensplice-OSPL_V6_4_OSS_RELEASE/src/api/dcps/java/common/c/code/saj_dataReaderListener.c:150 #9 0x00007f478aeb2484 in _StatusNotifyLivelinessChanged (status=0x7f47a06517d0, source=0x7f47a0663a98, info=0x7f478968ed50) at /home/rob/Downloads/opensplice-OSPL_V6_4_OSS_RELEASE/src/api/dcps/gapi/code/gapi_status.c:960 #10 0x00007f478ae8a320 in _DataReaderNotifyListener (_this=0x7f47a0651390, triggerMask=20480) at /home/rob/Downloads/opensplice-OSPL_V6_4_OSS_RELEASE/src/api/dcps/gapi/code/gapi_dataReader.c:1564 #11 0x00007f478aeb198a in _StatusNotifyEvent (status=0x7f47a06517d0, events=640) at /home/rob/Downloads/opensplice-OSPL_V6_4_OSS_RELEASE/src/api/dcps/gapi/code/gapi_status.c:618 #12 0x00007f478ae94f4d in _EntityNotifyInitialEvents (_this=0x7f47a0651390) at /home/rob/Downloads/opensplice-OSPL_V6_4_OSS_RELEASE/src/api/dcps/gapi/code/gapi_entity.c:208 #13 0x00007f478ae92391 in listenerEventThread (arg=0x7f47a062eeb8) at /home/rob/Downloads/opensplice-OSPL_V6_4_OSS_RELEASE/src/api/dcps/gapi/code/gapi_domainParticipant.c:2913 #14 0x00007f478adb1e56 in os_startRoutineWrapper (threadContext=0x7f47a07c3bc0) at /home/rob/Downloads/opensplice-OSPL_V6_4_OSS_RELEASE/src/abstraction/os/include/../posix/code/os_thread.c:292 #15 0x0000003323a079d1 in start_thread () from /lib64/libpthread.so.0 #16 0x00000033236e8b5d in clone () from /lib64/libc.so.6 (gdb) up 9 #9 0x00007f478aeb2484 in _StatusNotifyLivelinessChanged (status=0x7f47a06517d0, source=0x7f47a0663a98, info=0x7f478968ed50) at /home/rob/Downloads/opensplice-OSPL_V6_4_OSS_RELEASE/src/api/dcps/gapi/code/gapi_status.c:960 960 /home/rob/Downloads/opensplice-OSPL_V6_4_OSS_RELEASE/src/api/dcps/gapi/code/gapi_status.c: No such file or directory. in /home/rob/Downloads/opensplice-OSPL_V6_4_OSS_RELEASE/src/api/dcps/gapi/code/gapi_status.c (gdb) info args status = 0x7f47a06517d0 source = 0x7f47a0663a98 info = 0x7f478968ed50 (gdb) info locals target = 0x7f47a0663a98 entity = 0x7f478ae8935b listenerData = 0x7f47a064b610 callback = 0x7f478b19b070 <saj_dataReaderListenerOnLivelinessChanged> (gdb) p listenerData $1 = (c_voidp) 0x7f47a064b610 (gdb) p status->callbackInfo.listenerData $2 = (void *) 0x7f47a0660fc0 (gdb) p ((saj_listenerData)listenerData)->jlistener $3 = (jobject) 0x0 (gdb) p ((saj_listenerData)status->callbackInfo.listenerData)->jlistener $4 = (jobject) 0x7f47a0601c88 And here's the relevant portion of code from saj_dataReaderListener.c: void saj_dataReaderListenerOnLivelinessChanged( void* listenerData, gapi_dataReader dataReader, const gapi_livelinessChangedStatus* status) { saj_listenerData ld; JNIEnv *env; jobject jstatus; jobject jdataReader; saj_returnCode rc; ld = saj_listenerData(listenerData); env = *(JNIEnv**)os_threadMemGet(OS_THREAD_JVM); rc = saj_statusCopyOutLivelinessChangedStatus(env, (gapi_livelinessChangedStatus *)status, &jstatus); if(rc == SAJ_RETCODE_OK){ jdataReader = saj_read_java_address(dataReader); (*env)->CallVoidMethod(env, ld->jlistener, // <---- Line 150, strack trace frame #8, ld->jlistener is NULL GET_CACHED(listener_onLivelinessChanged_mid), jdataReader, jstatus); } } And gapi_status.c: void _StatusNotifyLivelinessChanged ( _Status status, gapi_object source, gapi_livelinessChangedStatus *info) { gapi_object target; _Entity entity; c_voidp listenerData; gapi_listener_LivelinessChangedListener callback; target = _StatusFindTarget(status, GAPI_LIVELINESS_CHANGED_STATUS); if (target) { /* get target listener and listener data. */ if ( target != source ) { entity = gapi_entityClaim(target, NULL); if (entity) { callback = _EntityStatus(entity)->callbackInfo.on_liveliness_changed; listenerData = _EntityStatus(entity)->callbackInfo.listenerData; _EntityRelease(entity); } else { OS_REPORT(OS_ERROR, "_StatusNotifyLivelinessChanged", 0, "Failed to claim target."); callback = NULL; } } else { callback = status->callbackInfo.on_liveliness_changed; listenerData = status->callbackInfo.listenerData; } if (callback) { /* Temporary unlock status entity and call listener. */ _EntitySetBusy(status->entity); _EntityRelease(status->entity); callback(listenerData, source, info); // <---- Line 960, strack trace frame #9 _EntityClaim(status->entity); _EntityClearBusy(status->entity); } } } The variables source and target are being shown as equal, so I would expect the code path which sets listenerData = status->callbackInfo.listenerData to execute. But printing listenerData and status->callbackInfo.listenerData shows that they differ. And the listenerData which is used has a NULL jlistener field, which is what is passed in to the JVM, hence a JVM crash. Given that this is a non-deterministic bug, my only thought is that something is non-thread-safe and the status->callbackInfo.listenerData object is set appropriately somewhere else while this thread is waiting at the _EntitySetBusy statement, which I'm assuming is waiting to grab a mutex.
  5. I'm trying to build a debug version of OpenSplice, so I grabbed the the 6.4 Community Edition source from here and tried to build it on CentOS 6. Sourcing the configure script went fine, but running 'make' I get an error like this: magic_make.pl: Processing directory isocpp/... magic_make.pl: MPC File date : 1396950106 DCPS_ISO_Cpp.mpc magic_make.pl: 1396950106 > 0 Detected MPC rebuild required... magic_make.pl: Generating MPC build file(s): mwc.pl --type make --src-co isocpp/ MPC_ROOT was set to /home/rob/Downloads/opensplice-OSPL_V6_4_OSS_RELEASE/submods/MPC_ROOT. ERROR: Unable to find the MPC modules in /home/rob/Downloads/opensplice-OSPL_V6_4_OSS_RELEASE/submods/MPC_ROOT. Your MPC_ROOT environment variable does not point to a valid MPC location. magic_make.pl: ERROR: Trying to run: mwc.pl --type make --src-co isocpp/ !!! My next thought is to manually download/build/install MPC to get the build to work, but it seems like MPC would've been checked in the configure script if it was an external build dependency. Any other ideas? Thanks
  6. rccampbe


    No general answer as to non-sample-loss reasons for the on_sample_lost method getting called? Ah, well. I've got 2 processes communicating over the network, both running on RHEL6 x86. Process A is a Java application and has only data readers (1 per message type), process B is in C++ and has a reader and writer for each message and it receives its own messages that it publishes. I don't see any on_sample_lost callbacks on the C++ side, but I do on the Java side.
  7. rccampbe


    A little more info: I'm getting the on_sample_lost callbacks in Java, but not in C++. And I'm tracking the number of messages published and received and they are the same, so there is no real message loss going on, just a whole bunch of misleading on_sample_lost callbacks (one for every message sent). As I would expect, the SampleLostStatus is being incremented for every message as well.
  8. rccampbe


    I just upgraded from 6.3 to 6.4 and I'm getting a lot of on_sample_lost callbacks fired that I never did before. I don't *think* I'm losing messages and the 6.4 docs say "NOTE: This operation is not yet implemented. It is scheduled for a future release." So does anyone have any info on this change? Is this actually a sign of lost messages? Or is it something akin to the on_data_available callbacks with NO_DATA specified, which serve as a sign of something else entirely?
  9. Proper cleanup solved the issue. Thanks, Hans.
  10. Hold that thought. The clean-up code is in a destructor, but I don't think that object is being deleted at shutdown time. Sorry about that. I'll try a good clean up and get back to you.
  11. B shuts down cleanly as opposed to 'exit'ing. But perhaps my destruction of entities is incomplete. Here's the explicit clean up I'm doing at shutdown time of B: CommandTopicPublisher->delete_datawriter(CommandTopicDataWriter); CommandTopicSubscriber->delete_datareader(CommandTopicDataReader); DomainParticipant->delete_topic(CommandTopic); DomainParticipant->delete_subscriber(CommandTopicSubscriber); DomainParticipant->delete_publisher(CommandTopicPublisher); Is there a way for me to verify when the message is acked by B?
  12. I've got a test setup running on Linux where a Java test application (A) spawns instances of a C++ application ( B ), injects messages to test the application, then injects a "shutdown command" to kill B, before starting a new instance of B to continue with a new test. 'A' has a command data writer configured to be Volatile, Reliable and to Keep All History. It is used like this: long handle = getDataWriter().register_instance(ddsCommand); int ret = getDataWriter().write(ddsCommand, DDS.HANDLE_NIL.value); getDataWriter().unregister_instance(ddsCommand, handle); These tests ran without issue on 6.3, but I just upgraded to 6.4 and now test B1 succeeds, then shuts down appropriately, but B2 receives all the commands that were already delivered to B1, including the shutdown command, which kills the test prematurely and causes failures. Any thoughts on configuration settings or QoS changes I could make to prevent these old, volatile messages from being delivered to new subscribers? ospl.xml: <OpenSplice> <Domain> <Name>1</Name> <Id>1</Id> <SingleProcess>true</SingleProcess> <Service name="ddsi2"> <Command>ddsi2</Command> </Service> <Service name="durability"> <Command>durability</Command> </Service> <Service enabled="false" name="cmsoap"> <Command>cmsoap</Command> </Service> <Listeners> <StackSize>131072</StackSize> </Listeners> </Domain> <DDSI2Service name="ddsi2"> <General> <NetworkInterfaceAddress>AUTO</NetworkInterfaceAddress> <AllowMulticast>true</AllowMulticast> <EnableMulticastLoopback>true</EnableMulticastLoopback> <CoexistWithNativeNetworking>false</CoexistWithNativeNetworking> <StartupModeDuration>0</StartupModeDuration> </General> <Compatibility> <!-- see the release notes and/or the OpenSplice configurator on DDSI interoperability --> <StandardsConformance>lax</StandardsConformance> <!-- the following one is necessary only for TwinOaks CoreDX DDS compatibility --> <!-- <ExplicitlyPublishQosSetToDefault>true</ExplicitlyPublishQosSetToDefault> --> </Compatibility> </DDSI2Service> <DurabilityService name="durability"> <Network> <Alignment> <TimeAlignment>false</TimeAlignment> <RequestCombinePeriod> <Initial>2.5</Initial> <Operational>0.1</Operational> </RequestCombinePeriod> </Alignment> <WaitForAttachment maxWaitCount="10"> <ServiceName>ddsi2</ServiceName> </WaitForAttachment> </Network> <NameSpaces> <NameSpace name="defaultNamespace"> <Partition>*</Partition> </NameSpace> <Policy alignee="Initial" aligner="true" durability="Durable" nameSpace="defaultNamespace"/> </NameSpaces> </DurabilityService> <TunerService name="cmsoap"> <Server> <PortNr>Auto</PortNr> </Server> </TunerService> </OpenSplice>
  13. Did anyone ever find a solution to this? It's the same issue I described here: http://forums.opensplice.org/index.php?/topic/2308-startup-delay-issues/ I'm not explicitly setting the initial RequestCombinePeriod and the default is 0.5 sec, so I don't think it's a durability-alignment issue.
  14. Could you provide a link for this info? All the OpenSplice bugzilla links I have found are broken and point here: http://dev.opensplice.org/cgi-bin/bugzilla3 Thanks, Rob
  15. I found the more appropriate looking DDSI2 Thread configuration section 2.5.6 in the deployment guide. Perhaps increasing the StackSize of the 'recv' thread could help?
  • Create New...