Tales of nondeterministic builds
• Mark Eschbach
Back to the nose at the grindstone: my goal is to create a test of a successful merge. First up is to fabricate a repository for testing.
Gah! Interrupted to resolve Travis Errors
2017-03-02 16:52:40.957 xcodebuild[5825:23114] iOSSimulator: task_name_for_pid(mach_task_self(), 5836, &task) returned 5 2017-03-02 16:52:40.957 xcodebuild[5825:23114] iOSSimulator: Falling back to DISPATCH_SOURCE_TYPE_PROC
Two thoughts: These messages are approximately in UTC and I’v encountered this before. Sigh. Let’s see if I can track that down. After the message the directing process of the build is stuck waiting for the tests to run from the faulted subprocess.
The Google is a waste land of bad search results with the full term. First true hit which begins to address the issue talks about a security vulnerability in the Mach kernel. Someone installed the retry handler from Travis a while ago. It’s worked for a while it looks like. Unforunately we don’t capture metrics so I wonder how long this has been sweeping things under the rug.
The retry handler has been swallowing the log dumps post exit, so I’ll have to pull it until later. Looks like additional information is logged at $(BUILD_ROOT)/target/Logs/Test/{some-uuid}/Session-{ProjIdent}-{Year}-{Month}-{Day}_{Rand}.log
. To gather more information I would like to extract that information. Hmm, time to up my shell foo to find a list of those files. StackOverflow to the rescue? Didn’t have much luck with that however shell globing worked for me:
dirlist=/tmp/*
for f in ${dirlist} ; do echo $f; done
So that will make the situation easier. And after all that it doesn’t fault on me after I add the diagnostic code. That was sad. I’m hoping it’s like a perfect storm I manage to trigger. Otherwise this is kind of sad.
Back to the regularly scheduled program
Alrighty. So commit history. Hmm, this might be hard to represent in this post. Maybe that on-line git output would be good? Ah! Burried in the documentation: git log --pretty=format:"%h %s" --graph
. Good enough for now. I was wrong. It will only print out the graph from the current head. Wouldn’t be a problem except I want to see more than that. --graph
is awesome by iteslf. I might start using that for more complicated things.
Stories of intermitten failure
At least this time I was able to get some actionable information out of the bulid logs: dyld: Library not loaded: @rpath/SwiftyPaperTrail.framework/SwiftyPaperTrail Referenced from: /Users/travis/Library/Developer/CoreSimulator/Devices/37AEE2A0-8EE3-46E6-AB3C-3CE266C97946/data/Containers/Bundle/Application/E73926B4-B2D7-4B34-9A8F-567F912A9293/DevBuild.app/DevBuild Reason: no suitable image found. Did find: /Users/travis/Library/Developer/CoreSimulator/Devices/37AEE2A0-8EE3-46E6-AB3C-3CE266C97946/data/Containers/Bundle/Application/E73926B4-B2D7-4B34-9A8F-567F912A9293/DevBuild.app/Frameworks/SwiftyPaperTrail.framework/SwiftyPaperTrail: required code signature missing for ‘/Users/travis/Library/Developer/CoreSimulator/Devices/37AEE2A0-8EE3-46E6-AB3C-3CE266C97946/data/Containers/Bundle/Application/E73926B4-B2D7-4B34-9A8F-567F912A9293/DevBuild.app/Frameworks/SwiftyPaperTrail.framework/SwiftyPaperTrail’
Also for some reason the subshell I setup to handle the error didn’t pass variables in. I fixed it by using a function however I’ll probably want to revisit that in the future. I wish I journied accross this when I was originally setting up the systems.
After much searching I’ve found a bunch of confusion in the community. Unforunately this seems to be a bit of a black art with a lot of people just trying completely random things. I’m guilty of it to. At the core of the problem seems to be a syncrhonization error; the only evidence I have is the nondeterminstic behavior and heavy loads of the CI system.
I’ll return to the automerge system tomorrow.