Skip to content

Memory corruption when accessing the network from a Python user thread #6492

@jdstroy

Description

@jdstroy

Hi,

I have a Kano Pixel, which runs on the ESP32, with a few peripherals attached to it (a NeoPixel R/G/B array, an analog potentiometer, about 7 momentary buttons, and a battery). I flashed Micropython on it (esp32-idf3-20200320-v1.12-262-g19ea30bdd.bin), and wrote a little bit of code to try to use exercise most of the peripherals and the WLAN controller.

My code looks something like this (pseudocode):

fn timesync() {
  while(True) {
    time.sleep(60);
    ntptime.settime()
  }
}

fn mainloop() {
  while(True) {
    data = https_get(resource);
    draw_neopixel(data);
    http_post(confirmation_url, data.version);
    sleep();
  }
}

setup_network();
while(not connected() or not ready) {
  sleep();
}

thread_create(timesync);
thread_create(mainloop);

When running the code of mainloop from the main thread, all is well. But when running mainloop off of a Python thread, in short order, 1. network connectivity is lost (partly due to corrupted DNS settings -- I dumped this out via the .ifconfig() function, and observed it being set to a bogus value), 2. the communications from the STA to AP are dropped, and 3. the firmware crashes and produces a stacktrace on the serial console console. Sometimes, rather than crashing, the platform locks up, and the WDT kicks in instead.

I tried grabbing the value of ifconfig() post-DHCP autoconfiguration, and forcing it back to that setting when encountering this problem (i.e. by doing x = ifconfig() as part of initialization, and then periodically calling ifconfig(x)), but this only fixes DNS resolution; sometimes, connectivity isn't restored, regardless (i.e. when using IP addresses). Furthermore, conditions 2 and 3 still occur.

I'm able to induce a less extreme version of this with some sample SSCCE (below). These messages show up on the console when running the SSCCE on a Python user thread, but not when running on the main Python thread:

I (996486) wifi:bcn_timout,ap_probe_send_start
I (998996) wifi:ap_probe_send over, resett wifi status to disassoc
I (998996) wifi:state: run -> init (c800)
I (998996) wifi:pm stop, total sleep time: 6266378 us / 16415665 us

I (998996) wifi:new:<10,0>, old:<10,0>, ap:<255,255>, sta:<10,0>, prof:1
I (999006) wifi: STA_DISCONNECTED, reason:200
beacon timeout
I (999136) wifi:new:<10,0>, old:<10,0>, ap:<255,255>, sta:<10,0>, prof:1
I (999136) wifi:state: init -> auth (b0)
I (999136) wifi:state: auth -> assoc (0)
I (999136) wifi:state: assoc -> run (10)
I (999156) wifi:connected with [redacted], aid = 3, channel 10, BW20, bssid = [redacted]
I (999166) wifi:security type: 3, phy: bgn, rssi: -52
I (999166) wifi:pm start, type: 1

I (999166) network: CONNECTED
I (999196) wifi:AP's beacon interval = 102400 us, DTIM period = 3
I (1000126) event: sta ip: 192.168.29.35, mask: 255.255.255.0, gw: 192.168.29.1
I (1000126) network: GOT_IP

SSCCE:

import _thread as thread
def xtest():
 import urequests as requests
 import ujson as json
 while True:
  try:
   response = requests.post(url="https://dweet.io/dweet/for/example", headers = {'content-type': 'application/json'}, data=json.dumps({"hello": 0, "world": 1}))
   response.json()
   response.close()
  except:
   pass

thread.start_new_thread(xtest, ())

An example backtrace message I get when running my actual application (instead of the SSCCE above):

Backtrace: 0x400920a7:0x3ffd92a0 0x400923c5:0x3ffd92c0 0x400923dc:0x3ffd92e0 0x4009be1e:0x3ffd9300 0x4009d8d4:0x3ffd9320 0x4009d88a:0x3ffd9340 0x4013f79f:0x3ffd940c

I'm suspecting memory corruption, since the microcontroller locks up, shows bogus DNS configuration data, and crashes.

Here's an example of what gets dumped out when I run my actual application:

>>> Guru Meditation Error: Core 0 panic'ed ()
Core0 register dump:
: 0x4043PS      06d4  : 0x816eA1      fd00  A2      fcec  : 0x0fffA4      0400  : 0x0000
: 0x4427A7      0000  : 0x818cA9      fd00  A10     0007  : 0x0000A12     000b  : 0x0000
: 0x0000A15     0000  : 0x0001EXCCAUSE0005  EXCVADDR0000  : 0x40c6LEND    0047  : 0x0000
ets Jun  8 2016 00:22:57

rst:0x8 (TG1WDT_SYS_RESET),boot:0x17 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0018,len:4
load:0x3fff001c,len:5008
ho 0 tail 12 room 4
load:0x40078000,len:10600
ho 0 tail 12 room 4
load:0x40080400,len:5684
entry 0x400806bc
W (65) boot: PRO CPU has been reset by WDT.
W (65) boot: WDT reset info: PRO CPU PC=0x400853ad
W (65) boot: WDT reset info: APP CPU PC=0x4009241a

>>> thread.start_new_thread(sync_display, ())
>>> ***ERROR*** A stack overflow in task mp_thread has been detected.
abort() was called at PC 0x400923dc on core 1

ELF file SHA256: 0000000000000000000000000000000000000000000000000000000000000000

Backtrace: 0x400920a7:0x3ffd7bb0 0x400923c5:0x3ffd7bd0 0x400923dc:0x3ffd7bf0 0x4009be1e:0x3ffd7c10 0x4009d8d4:0x3ffd7c30 0x4009d88a:0x3ffd7c34

Rebooting...
ets Jun  8 2016 00:22:57

rst:0xc (SW_CPU_RESET),boot:0x17 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0018,len:4
load:0x3fff001c,len:5008
ho 0 tail 12 room 4
load:0x40078000,len:10600
ho 0 tail 12 room 4
load:0x40080400,len:5684
entry 0x400806bc

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      pFad - Phonifier reborn

      Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

      Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


      Alternative Proxies:

      Alternative Proxy

      pFad Proxy

      pFad v3 Proxy

      pFad v4 Proxy