[TriEmbed] Suggestions Needed to Improve Quality / Reliability

Carl Nobile carl.nobile at gmail.com
Mon Jan 23 08:53:38 CST 2017


Chip,

When I read your email late last night the first thing that crossed my mind
was also memory leaks, so I concur with John.

1. Some things to look for are files that are opened but never closed, ie
SD card storage if you're using it.
2. Off-by-one errors which can cause buffer overruns. (Well known to cause
lock ups.)
3. Edge case conditionals that go to never never land if not written
correctly.

~Carl


On Mon, Jan 23, 2017 at 9:23 AM, John Vaughters via TriEmbed <
triembed at triembed.org> wrote:

> Chip,
>
> If the symptom is at regular intervals, whether they are regular events,
> or regular in time, I would consider a memory leak. I do not no have
> recommendations on how to check that, but I did a quick search and it
> appears there are some suggestions. I have never search for a memory leak
> in an Arduino before, but I am just throwing out an idea based on what
> appears to be regular intervals and a lock up condition.
>
> If you want to create a cheap reboot device, consider an ATTINY to wake up
> the device and look for a Digital IO to come on and off during a certain
> period of time, and if not cycle power. I know this just more power to
> Engineer which is a major negative for your product. Another option is a
> low cost, low power 555 timer set to trigger every so many hours or days,
> whatever you need. I am not sure which idea would give you the best power
> solution.
>
> Just throwing out ideas.
>
> Good Luck,
>
> John Vaughters
>
>
>
>
>
>
> On Sunday, January 22, 2017 3:37 PM, Chip McClelland via TriEmbed <
> triembed at triembed.org> wrote:
>
>
> All,
>
> I have been installing trail counters at Umstead park for some time.  This
> has been an ideal situation as I often run or ride in the park and it
> allows me to keep an eye on things.  For some time now, I have been working
> to improve the reliability of my counters and I feel like I have hit a
> wall.  I am writing you all not in the hope that you can find the source of
> my issue but to offer suggestions on approaches I could use to better
> isolate what the root cause might be.
>
> Background and what I know thus far:
> - My sensor has two micro controllers (Arduino and Simblee) that share an
> i2c bus.  As they are both masters, I have a scheme to allow one or the
> other to take control.  I mention this because it may be the source of my
> issue.  More on how I architected this here
> <https://www.hackster.io/chipmc/arduino-i2c-multi-master-approach-why-and-how-93f638?ref=user&ref_id=6903&offset=0>
>
> - The Arduino manages the sensor and the Simblee manages communications
> using Bluetooth LE.
> - I have tested the software extensively and have a test rig that can test
> the full functionality of the system to randomly timed “events”.  I can
> only occasionally get the error condition with this approach typically
> requiring over 100,000 “events” before it happens.  Because of the timing
> in the system, this requires testing the systems continuously for days on
> end.  In the parks, the sensors typically run 25-30 days before they lock
> up during which time they will log around 30,000 “events”.
> - When I do get the system to lock up, his is what I see:
> - The Arduino is the one that is locked up not the Simblee
> - When the Arduino locks up, the Simblee is unable to take control of the
> i2c bus
> - The Arduino is in the “sleep” state and will not wake up - with the pin
> change interrupt set and *even if I manually bring the interrupt pin low*
> - The Arduino will not reset *even if I manually bring the reset pin low*
> - this is odd and what I have come to call the "fugue state".
> - The Simblee will be visible on Bluetooth LE but will lock up when I
> attempt to connect and it tries to read the i2c bus (I do not have a
> timeout on taking the bus)
> - The only way to recover is to power cycle the device
>
> I am not sure how to best proceed.  I could spend more time / effort
> trying to figure out what the root cause is OR I could design an external
> watchdog that would simply recognize a lock up and reset the power.  In my
> application, the cost of a reset every 20-30 days is acceptable but, part
> of me says that is the easy way out.
>
> So, I am open to any suggestions you might have even if they are the “have
> you checked to see if it is plugged in” variety.  If you want to see the
> code, it is in two parts: Arduino
> <https://github.com/chipmc/Connected-Logger-Arduino> and Simblee
> <https://github.com/chipmc/Trail-Counter-Simblee>   Any input is
> appreciated.
>
> Thanks,
>
> Chip
>
>
> _______________________________________________
> Triangle, NC Embedded Computing mailing list
> TriEmbed at triembed.org
> http://mail.triembed.org/mailman/listinfo/triembed_triembed.org
> TriEmbed web site: http://TriEmbed.org <http://triembed.org/>
>
>
>
> _______________________________________________
> Triangle, NC Embedded Computing mailing list
> TriEmbed at triembed.org
> http://mail.triembed.org/mailman/listinfo/triembed_triembed.org
> TriEmbed web site: http://TriEmbed.org
>
>


-- 
-------------------------------------------------------------------------------
Carl J. Nobile (Software Engineer)
carl.nobile at gmail.com
-------------------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.triembed.org/pipermail/triembed_triembed.org/attachments/20170123/cfb458a0/attachment.htm>


More information about the TriEmbed mailing list