General Splunk

Posts

 

Splunk development adventures in Splunk 8.0 with Python3 on.



Splunk Enterprise 8.0 is coming! And it has Python3!

Disclaimer: If you have never written any Python in Splunk and don’t plan on it, this is probably not the blog post for you.

Still there? OK the short version is that Splunk Enterprise 8 ships both Python2 and Python3. The core python pieces inside Splunk itself will pretty much all be running in Python3 out of the box, however conversely all, or pretty much all things in the app layers are still python2 by default in 8.0. And last but not least, at some point in the future, Splunk will change this and the app layers will also default to python3.

(for more read Splunk’s official post, and main docs page about this.)

App developers can then test their apps by flipping python3 on for individual pieces. As they get to feel better about such things they can ship their app with python3 on for those pieces, eventually for all pieces. And on the other side, Splunk admins out in the real world can be forced for security reasons to just flip the entire thing to force Python3 everywhere for all apps and all pieces everywhere (and then run and hide in the broom closet as appropriate).

But I digress. We’ve been working on this for a while and I got a chance to play with the Splunk Enterprise 8.0 beta recently, and I thought I’d share some high level notes on what was involved in actually making the code in our apps run happily in both Python2 and Python3.

So what are the differences? There are lots of resources out there but here’s a handy and pretty compact cheatsheet from the python-future project. I have heard very recently that Splunk engineering has written a porting guide but I have not seen it unfortunately so I can’t tell you how useful it is.

If this is the first you’re reading about these differences, yes they are terrifying. That’s the bad news. The good news is that the python parts of an actual Splunk app, at least most Splunk apps, are usually not doing heavy lifting at low layers where all the nasty stuff is, so at least in terms of lines of diffs, almost all the changes are going to be exception syntax.

eg:
except SomeExceptionClass, e:
is now
except SomeExceptionClass as e:

find all those, change them. The first line there only runs in python2, the second works in both 2 and 3. Likewise if you used the old syntax when raising exceptions you’ll have to change all that. It’s a bit anticlimactic really.

And conversely the painful parts you might hit, will be anything that actually deals with low level streams, or things like base64 encoding, or StringIO. In these areas you’ll hit some bumps for sure. More painful than that, if you’re using any class or function with the word unicode in it, ok… you’re gonna have a bad day. Take a moment of silence for Splunk Engineering who had to port all the existing SplunkWeb code, in all its internationalized and localized glory.

So OK, how to find them all. Surely not by hand.

In the normal python world everything says just import from future, or use a compatibility library like six – great ideas. Yay. None of these ideas seem to work in Splunk’s Python2.7 however. We don’t really have control over that thing. For instance you can’t “just use pip install”. Or fine we can hack it locally but you can’t then have all the customers of your app do that – it would be a support nightmare. it’s quite possible that I’m just an idiot and there was a simple way around this, but for what it’s worth I tried some of these roads and I couldn’t make any of them work so…

So… what then. What I did is I fell all the way back to installing a separate instance of python3 and just pointing pylint at all the source code and letting it yell at me – this actually went a long way.
It’s annoying that jumbled in next to pedantic little PEP8 things I didn’t want to care about, Pylint would hide egregious python3 syntax errors and runtime errors but I got over that and we got along fine. You can customize Pylint to just never tell you about whole vast swaths of best-practice things and…. ok I didn’t really do that very much, in fact I sort of contracted pylint’s PEP8++ pedantry as a disease but that’s another story.

Next up was manually checking whether the libraries we were using actually list themselves as python3 compatible, and updating or replacing as necessary. and/or hHey just point pylint at them. Rinse, repeat.

What happened when I actually ran all this in python3 in Splunk 8.0

First, since out of the box everything runs in python2 still, to test I went into etc/system/local/server.conf and in the [general] stanza I set python.version = python3.
And on the whole it worked pretty well because from firing up 8.0 to having *almost* everything run fine, was less than a day.

First speedbump – static files that have one or more characters that don’t actually match their declared encodings. Who cares? do we care? I certainly didn’t before. In Splunk 7.X and earlier, Splunkweb didn’t are about these either. However SplunkWeb in python3 will not serve these files and the browser gets a 500 error instead. This tends to have a somewhat fatal effect on whatever client side code was hoping to load that library. If things are inexplicably broken on the client side look for 500 errors in the Network tab of the Console. If you find some, you can do a very fun binary search by deleting half of the file, hitting the bump endpoint and seeing if the 500 error went away. Or search for the names of French people in code comments. Or both. You may not have any of these in any apps, I don’t have a good sense on how many developers make this mistake, but fwiw this was happening for us in two separate third party javascript libraries.
(update – I filed this as a bug and I’ve heard it might be getting fixed before GA. I’ll try update if that’s confirmed)

Second Speedbump – oh good lord we were still using the Splunk Python SDK in this little spot here, and it was an old version of the SDK. I could probably have updated to a newer version of the SDK but I didn’t even check – the piece in question was only using it to iterate over the installed apps, and so I was able to throw it away and replace it with a raw rest call in just a few minutes.

Third Speedbump – I had made sure that I had a nice new copy of PyYaml shipping, but I had failed to notice that the PyYaml project actually has two totally module subdirectories in lib that you’re supposed import, depending on python2 or python3. oops. So I had to update the files in the app, then do a conditional import, and then get over the vague sense of shame over having done a conditional import.

Worse, it was in a PersistentServerConnectionApplication instance, which isn’t one of the places where importing “just works”, so you have to explicitly append to the path (as was the case in 7.x too though – now I just had to do it twice). Here’s what it looked like


if sys.version_info.major>=3:
sys.path.append(os.path.join(os.environ['SPLUNK_HOME'], "etc", "apps", APP, "bin", "yaml3"))
import yaml3 as yaml
elif sys.version_info.major==2:
sys.path.append(os.path.join(os.environ['SPLUNK_HOME'], "etc", "apps", APP, "bin", "yaml2"))
import yaml2 as yaml

Next was dict.iteritems() which is gone in Python3. For some reason pylint hadn’t caught this but luckily all our actual usage could be replaced by just dict.items(), and it was all in our unit test code anyway. Apparently there’s a performance hit when that code runs in Python2 but I don’t think it’s going to matter here.

Next was some base64 string stuff that was in our custom licensing endpoint (Sideview apps have license strings that end-users paste in, that represent either trial/full, term/perpetual license/support etc…). This one wasn’t particularly fun but it was only an hour or so of work to fix technically.

Next I was surprised to find I’ve been using the traceback library slightly wrong for years and python2 just never cared – traceback.format_exc() does not take the exception instance as an argument. It just picks up the exception from the stack. Now in Python3 if you give it the exception instance it gets mad at you. The more you know. This makes sense in hindsight and it’s just a bad habit I must have picked up in my very first days writing python. But if I picked up this habit from the Splunk world you may have too.

Next, it’s pretty common for Splunk views to get saved with an “encoding” attribute on the <?xml> declaration, but it turns out that in Python3 only, the XMLParser class in lxml.etree throws an exception here. I couldn’t find out how to make it forgive/ignore the attribute nor find an easy drop in replacement, so for now I just used a regex to throw the attribute away before parsing.

And that’s it so far.

I hope this is illuminating to someone! Please Contact us if you have anything to add or correct about this, and I will update this as necessary, and/or post it somewhere better.

Addendum – for a frame of reference, the apps in question have all combined about 88,000 lines of python source.




Post a Comment

You must be logged in to Post a Comment.