05-12-2013 11:02 AM - edited 05-12-2013 11:11 AM
I know BlackBerry has a "critical" ticket on this and want to put this out there as my observations.
I am running an "unofficial" load on my Z-10, for what it's worth, and after loading the last one the behavior changed materially.
This is what I now observe and what I believe it points to -- I've been chasing this issue for a while, trying to both provoke it and evade it, as it appears to be the salient issue that people have with the handset that is causing them to get upset; there are posts all over here and Crackberry on this and the finger-pointing by both carriers and vendors is getting out of hand.
I have a bit of a unique and "stress-capable" configuration -- I have my own IPSEC/IKEv2 gateway which I desire to use all the time except when I'm on my "home" network, where it's not necessary. That is currently not possible to configure, incidentally, and is something BlackBerry needs to fix -- they desperately need a flag in the WiFi profiles that says "this connection is secure; EXEMPT auto-vpn if I'm connected on this network." That's missing, which means I must set up auto-vpn for everything including my "home" network which is stupid as it causes the phone to spend battery power encrypting data that never leaves the local LAN and thus (assuming WPA2/AES encryption) is by definition secure.
Anyway, what I have now is a situation where with 10.1.0.1762 if I have auto-connect enabled the device will wedge and restart within a few hours -- reliably. It also blocks on bringing the VPN up at times, claiming to have a timeout while negotiating the connection. This is known false as I can watch the server's negotiation in real-time and it completes and transitions to the online state with routing table entries inserted and the SPI mapping complete.
In other words there is a deadlock in the phone's network stack. Of importance is that the deadlock is happening before the phone's route table entries get inserted, as the phone remains able to talk on the network during the period that it is waiting for the deadlock to resolve (and that ultimately leads to a message if you're on the VPN page of "Timeout".)
The deadlock condition can be provoked quite-reliably by coming out of and back into a known WiFi access point's range with AutoVPN enabled. Again, the AP in question (which I have access to directly since I own it, obviously) thinks the remote is present but the phone doesn't see it. Once this happens you can sometimes clear the condition by turning off WiFi, but often you have to go into Airplane mode and back out to clear it.
This strongly implicates a deadlock problem in the network stack, where an upcall into the stack is blocked waiting for a resource it needs exclusive access to (e.g. to insert a route into the table) but is never obtained.
Now here's where the reboot part comes from -- we know the phone has a watchdog timer in it. If that deadlock happens when the process trying to access the network is a critical system resource -- say, for instance, a poll from the cellular network -- that process would block indefinitely. This will produce what looks like a "freeze" in the phone, which is exactly what happens if you are lucky enough to catch it when it occurs. I have had this happen while I have the browser open and it appears to "freeze" for 10-15 seconds during which nothing works -- including the power switch -- at which point the phone reboots (presumably due to the watchdog timer firing.)
Yesterday I turned off all auto-VPN connections. The crashes ceased; I haven't had one since. This is also new; I occasionally got them on 10.0 (base T-Mobile load) although they were infrequent. While 24 hours is not enough to know for sure that I won't see any more of them the change in behavior is important in that whatever was tickling them has gone away by disabling the VPN option.
There is a related component to this in that if I am on a phone call while the phone is either on WiFi or HSPA+, I could also provoke a situation where data becomes unavailable. This should not happen but it did quite-reliably under 10.0. (If you're on EDGE or GPRS, however, simultaneous data access does not work by design as the Z-10, like all other consumer terminals, does not dual-attach even though GSM's protocol does allow it. This is also true for CDMA; be aware that so-called "LTE" service is in some cases LTE only for data and the network is configured to handle voice calls by falling back! This behavior in mixed CDMA/LTE networks is very carrier-dependent.)
Under 10.1.0.1762 I have not yet been able to reproduce this; every call I've had up where I've tried to access the a network application while either on HSPA or WiFi for data has not blocked.
I have turned on diagnostics and hopefully these are being sent back up to BlackBerry when I provoke these crashes. If I can somehow be of help to BlackBerry in running down these issues I'm willing to do so; the change in behavior (and it's quite material) over the last few "leaked" releases tells me that they're narrowing the focus on the problem. Unfortunately without debugger access into the kernel and a way to temporarily turn off the watchdog while being able to get debugger access when the hangs happen, which none of us "mere mortals" can do tracing this sort of thing is difficult, and having done this sort of diagnostic on Unix device drivers in the past I'm aware of how much "fun" it can be trying to find the cause of these sorts of deadlocks.
I'm quite-confident at this point, however, that the root cause of the resets is a deadlock in the networking code. Frequent cell network search/reselect behavior could easily tickle that sort of problem just as does the relatively-frequent renegotiation of an IPSEC/IKEv2 connection (the VPN connection renegotiates hourly by default, and also any time the endpoint changes IP number, which of course happens any time you go in or out of range of a WiFi signal or roam from one head-end cell connection to another and the cell system reassigns a new IP to your device.)
I don't know if BlackBerry is aware of all of the above, but I do know that they monitor this forum -- which is why I'm posting this here, in the hope that perhaps there's a nugget of useful data in here that will help them run the problem down.
05-12-2013 11:15 AM
"They" do monitor this forum, but mostly it is community managers and not the OS developers per se that you are expecting.
If you have an incident ticket open with BlackBerry Support on this issue, you'd best forward that information via email.
Just my advice... good luck.
PIN: C0001B7B4 Display/Scan Bar Code
PIN: C0005A9AA Display/Scan Bar Code
05-12-2013 11:18 AM
There is no means provided by BlackBerry to directly open an incident ticket with them that I'm aware of.
If there was or is I'd be happy to do so.
05-12-2013 11:21 AM
Sorry, I thought we'd been over this with you.
Through TMobile. Call Tmobile Tech support 9for whom you have contracted with and are paying for technical support on this device). Ask them to ESCALATE this issue to BlackBerry tech support for you. You will be given an incident number and telephone contacts.
If Tmobile refuses, demand it. Remind them of your support contract, or ask for a supervisor.
Call the Tmo support number. Dropping by a store won't work.
PIN: C0001B7B4 Display/Scan Bar Code
PIN: C0005A9AA Display/Scan Bar Code