[TriEmbed] Suggestions Needed to Improve Quality / Reliability

Chip McClelland chip at mcclellands.org
Sun Jan 22 14:37:36 CST 2017


All, 

I have been installing trail counters at Umstead park for some time.  This has been an ideal situation as I often run or ride in the park and it allows me to keep an eye on things.  For some time now, I have been working to improve the reliability of my counters and I feel like I have hit a wall.  I am writing you all not in the hope that you can find the source of my issue but to offer suggestions on approaches I could use to better isolate what the root cause might be.  

Background and what I know thus far:
	- My sensor has two micro controllers (Arduino and Simblee) that share an i2c bus.  As they are both masters, I have a scheme to allow one or the other to take control.  I mention this because it may be the source of my issue.  More on how I architected this here <https://www.hackster.io/chipmc/arduino-i2c-multi-master-approach-why-and-how-93f638?ref=user&ref_id=6903&offset=0>   
	- The Arduino manages the sensor and the Simblee manages communications using Bluetooth LE.  
	- I have tested the software extensively and have a test rig that can test the full functionality of the system to randomly timed “events”.  I can only occasionally get the error condition with this approach typically requiring over 100,000 “events” before it happens.  Because of the timing in the system, this requires testing the systems continuously for days on end.  In the parks, the sensors typically run 25-30 days before they lock up during which time they will log around 30,000 “events”.
	- When I do get the system to lock up, his is what I see:
		- The Arduino is the one that is locked up not the Simblee
		- When the Arduino locks up, the Simblee is unable to take control of the i2c bus
		- The Arduino is in the “sleep” state and will not wake up - with the pin change interrupt set and even if I manually bring the interrupt pin low
		- The Arduino will not reset even if I manually bring the reset pin low - this is odd and what I have come to call the "fugue state".
		- The Simblee will be visible on Bluetooth LE but will lock up when I attempt to connect and it tries to read the i2c bus (I do not have a timeout on taking the bus)
		- The only way to recover is to power cycle the device

I am not sure how to best proceed.  I could spend more time / effort trying to figure out what the root cause is OR I could design an external watchdog that would simply recognize a lock up and reset the power.  In my application, the cost of a reset every 20-30 days is acceptable but, part of me says that is the easy way out.  

So, I am open to any suggestions you might have even if they are the “have you checked to see if it is plugged in” variety.  If you want to see the code, it is in two parts: Arduino <https://github.com/chipmc/Connected-Logger-Arduino> and Simblee <https://github.com/chipmc/Trail-Counter-Simblee>   Any input is appreciated.

Thanks,

Chip
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.triembed.org/pipermail/triembed_triembed.org/attachments/20170122/01593ed3/attachment.htm>


More information about the TriEmbed mailing list