How do I protect Python code?
I am developing a piece of software in Python that will be distributed to my employer's customers. My employer wants to limit the usage of the software with a time restricted license file.
If we distribute the .py files or even .pyc files it will be easy to (decompile and) remove the code that checks the license file.
Another aspect is that my employer does not want the code to be read by our customers, fearing that the code may be stolen or at least the "novel ideas".
Is there a good way to handle this problem? Preferably with an off-the-shelf solution.
The software will run on Linux systems (so I don't think py2exe will do the trick).
Python, being a byte-code-compiled interpreted language, is very difficult to lock down. Even if you use a exe-packager like py2exe, the layout of the executable is well-known, and the Python byte-codes are well understood.
Usually in cases like this, you have to make a tradeoff. How important is it really to protect the code? Are there real secrets in there (such as a key for symmetric encryption of bank transfers), or are you just being paranoid? Choose the language that lets you develop the best product quickest, and be realistic about how valuable your novel ideas are.
If you decide you really need to enforce the license check securely, write it as a small C extension so that the license check code can be extra-hard (but not impossible!) to reverse engineer, and leave the bulk of your code in Python.
"Is there a good way to handle this problem?" No. Nothing can be protected against reverse engineering. Even the firmware on DVD machines has been reverse engineered and the AACS Encryption key exposed. And that's in spite of the DMCA making that a criminal offense.
Since no technical method can stop your customers from reading your code, you have to apply ordinary commercial methods.
Licenses. Contracts. Terms and Conditions. This still works even when people can read the code. Note that some of your Python-based components may require that you pay fees before you sell software using those components. Also, some open-source licenses prohibit you from concealing the source or origins of that component.
Offer significant value. If your stuff is so good -- at a price that is hard to refuse -- there's no incentive to waste time and money reverse engineering anything. Reverse engineering is expensive. Make your product slightly less expensive.
Offer upgrades and enhancements that make any reverse engineering a bad idea. When the next release breaks their reverse engineering, there's no point. This can be carried to absurd extremes, but you should offer new features that make the next release more valuable than reverse engineering.
Offer customization at rates so attractive that they'd rather pay you to build and support the enhancements.
Use a license key which expires. This is cruel, and will give you a bad reputation, but it certainly makes your software stop working.
Offer it as a web service. SaaS involves no downloads to customers.
Read more... Read less...
Python is not the tool you need
You must use the right tool to do the right thing, and Python was not designed to be obfuscated. It's the contrary; everything is open or easy to reveal or modify in Python because that's the language's philosophy.
If you want something you can't see through, look for another tool. This is not a bad thing, it is important that several different tools exist for different usages.
Obfuscation is really hard
Even compiled programs can be reverse-engineered so don't think that you can fully protect any code. You can analyze obfuscated PHP, break the flash encryption key, etc. Newer versions of Windows are cracked every time.
Having a legal requirement is a good way to go
You cannot prevent somebody from misusing your code, but you can easily discover if someone does. Therefore, it's just a casual legal issue.
Code protection is overrated
Nowadays, business models tend to go for selling services instead of products. You cannot copy a service, pirate nor steal it. Maybe it's time to consider to go with the flow...
Compile python and distribute binaries!
That way, no Python (byte) code is left and you've done any reasonable amount of obscurification anyone (i.e. your employer) could expect from regular Code, I think. (.NET or Java less safe than this case, as that bytecode is not obfuscated and can relatively easily be decompiled into reasonable source.)
Cython is getting more and more compatible with CPython, so I think it should work. (I'm actually considering this for our product.. We're already building some thirdparty libs as pyd/dlls, so shipping our own python code as binaries is not a overly big step for us.)
See This Blog Post (not by me) for a tutorial on how to do it. (thx @hithwen)
You could probably get Cython to store the C-files separately for each module, then just concatenate them all and build them with heavy inlining. That way, your Python module is pretty monolithic and difficult to chip at with common tools.
You might be able to build a single executable if you can link to (and optimize with) the python runtime and all libraries (dlls) statically. That way, it'd sure be difficult to intercept calls to/from python and whatever framework libraries you use. This cannot be done if you're using LGPL code though.
I understand that you want your customers to use the power of python but do not want expose the source code.
Here are my suggestions:
(b) Use cython instead of Python
(c) In both (a) and (b), it should be possible to distribute the libraries as licensed binary with a Python interface.
Have you had a look at pyminifier? It does Minify, obfuscate, and compress Python code. The example code looks pretty nasty for casual reverse engineering.
$ pyminifier --nonlatin --replacement-length=50 /tmp/tumult.py #!/usr/bin/env python3 ﺭ异ﭞﰣﺁں뻛嬭ﱌꝪﴹ뙫퉊ﳦﲣפּܟﺶﶨࠔﶻ䉊ﰸﭳᣲמּ=ImportError ﺭ异ﭞﰣﺁں뻛嬭ﱌꝪﴹ뙫퉊ﳦﲣפּܟﺶﶨࠔﶻ䉊ﰸﭳᣲ燱=print ﺭ异ﭞﰣﺁں뻛嬭ﱌꝪﴹ뙫퉊ﳦﲣפּܟﺶﶨࠔﶻ䉊ﰸﭳᣲ=False ﺭ异ﭞﰣﺁں뻛嬭ﱌꝪﴹ뙫퉊ﳦﲣפּܟﺶﶨࠔﶻ䉊ﰸﭳᣲ澨=object try: import demiurgic except ﺭ异ﭞﰣﺁں뻛嬭ﱌꝪﴹ뙫퉊ﳦﲣפּܟﺶﶨࠔﶻ䉊ﰸﭳᣲמּ: ﺭ异ﭞﰣﺁں뻛嬭ﱌꝪﴹ뙫퉊ﳦﲣפּܟﺶﶨࠔﶻ䉊ﰸﭳᣲ燱("Warning: You're not demiurgic. Actually, I think that's normal.") try: import mystificate except ﺭ异ﭞﰣﺁں뻛嬭ﱌꝪﴹ뙫퉊ﳦﲣפּܟﺶﶨࠔﶻ䉊ﰸﭳᣲמּ: ﺭ异ﭞﰣﺁں뻛嬭ﱌꝪﴹ뙫퉊ﳦﲣפּܟﺶﶨࠔﶻ䉊ﰸﭳᣲ燱("Warning: Dark voodoo may be unreliable.") ﺭ异ﭞﰣﺁں뻛嬭ﱌꝪﴹ뙫퉊ﳦﲣפּܟﺶﶨࠔﶻ䉊ﰸﭳᣲﺬ=ﺭ异ﭞﰣﺁں뻛嬭ﱌꝪﴹ뙫퉊ﳦﲣפּܟﺶﶨࠔﶻ䉊ﰸﭳᣲ class ﺭ异ﭞﰣﺁں뻛嬭ﱌꝪﴹ뙫퉊ﳦﲣפּܟﺶﶨࠔﶻ䉊ﰸﭳᣲ(ﺭ异ﭞﰣﺁں뻛嬭ﱌꝪﴹ뙫퉊ﳦﲣפּܟﺶﶨࠔﶻ䉊ﰸﭳᣲ澨): def __init__(self,*args,**kwargs): pass def ﺭ异ﭞﰣﺁں뻛嬭ﱌꝪﴹ뙫퉊ﳦﲣפּܟﺶﶨࠔﶻ䉊ﰸﭳᣲ클(self,dactyl): ﺭ异ﭞﰣﺁں뻛嬭ﱌꝪﴹ뙫퉊ﳦﲣפּܟﺶﶨࠔﶻ䉊ﰸﭳᣲ퐐=demiurgic.palpitation(dactyl) ﺭ异ﭞﰣﺁں뻛嬭ﱌꝪﴹ뙫퉊ﳦﲣפּܟﺶﶨࠔﶻ䉊ﰸﭳᣲ=mystificate.dark_voodoo(ﺭ异ﭞﰣﺁں뻛嬭ﱌꝪﴹ뙫퉊ﳦﲣפּܟﺶﶨࠔﶻ䉊ﰸﭳᣲ퐐) return ﺭ异ﭞﰣﺁں뻛嬭ﱌꝪﴹ뙫퉊ﳦﲣפּܟﺶﶨࠔﶻ䉊ﰸﭳᣲ def ﺭ异ﭞﰣﺁں뻛嬭ﱌꝪﴹ뙫퉊ﳦﲣפּܟﺶﶨࠔﶻ䉊ﰸﭳᣲ(self,whatever): ﺭ异ﭞﰣﺁں뻛嬭ﱌꝪﴹ뙫퉊ﳦﲣפּܟﺶﶨࠔﶻ䉊ﰸﭳᣲ燱(whatever) if __name__=="__main__": ﺭ异ﭞﰣﺁں뻛嬭ﱌꝪﴹ뙫퉊ﳦﲣפּܟﺶﶨࠔﶻ䉊ﰸﭳᣲ燱("Forming...") ﺭ异ﭞﰣﺁں뻛嬭ﱌꝪﴹ뙫퉊ﳦﲣפּܟﺶﶨࠔﶻ䉊ﰸﭳᣲﺃ=ﺭ异ﭞﰣﺁں뻛嬭ﱌꝪﴹ뙫퉊ﳦﲣפּܟﺶﶨࠔﶻ䉊ﰸﭳᣲ("epicaricacy","perseverate") ﺭ异ﭞﰣﺁں뻛嬭ﱌꝪﴹ뙫퉊ﳦﲣפּܟﺶﶨࠔﶻ䉊ﰸﭳᣲﺃ.ﺭ异ﭞﰣﺁں뻛嬭ﱌꝪﴹ뙫퉊ﳦﲣפּܟﺶﶨࠔﶻ䉊ﰸﭳᣲ("Codswallop") # Created by pyminifier (https://github.com/liftoff/pyminifier)
Is your employer aware that he can "steal" back any ideas that other people get from your code? I mean, if they can read your work, so can you theirs. Maybe looking at how you can benefit from the situation would yield a better return of your investment than fearing how much you could lose.
[EDIT] Answer to Nick's comment:
Nothing gained and nothing lost. The customer has what he wants (and paid for it since he did the change himself). Since he doesn't release the change, it's as if it didn't happen for everyone else.
Now if the customer sells the software, they have to change the copyright notice (which is illegal, so you can sue and will win -> simple case).
If they don't change the copyright notice, the 2nd level customers will notice that the software comes from you original and wonder what is going on. Chances are that they will contact you and so you will learn about the reselling of your work.
Again we have two cases: The original customer sold only a few copies. That means they didn't make much money anyway, so why bother. Or they sold in volume. That means better chances for you to learn about what they do and do something about it.
But in the end, most companies try to comply to the law (once their reputation is ruined, it's much harder to do business). So they will not steal your work but work with you to improve it. So if you include the source (with a license that protects you from simple reselling), chances are that they will simply push back changes they made since that will make sure the change is in the next version and they don't have to maintain it. That's win-win: You get changes and they can make the change themselves if they really, desperately need it even if you're unwilling to include it in the official release.