The recent release of PyML, a Python toolbox for interpretable machine learning, has sparked significant discussion within the developer community, particularly regarding its source code availability and licensing practices.
Source Code Transparency Concerns
Despite being advertised as an Apache-licensed project on both GitHub and PyPI, community members have discovered that PyML's source code is not publicly accessible. When installing the package, users only receive precompiled Cython libraries and minimal Python code necessary for imports. This lack of transparency has raised serious concerns among developers about the project's true nature and intentions.
Security Implications
Security experts in the community have flagged this as a potential risk. As noted by several developers, the practice of including an open-source license while withholding source code could be seen as an attempt to gain trust while potentially concealing problematic code within compiled .so
files distributed through wheels. This approach contradicts standard open-source practices and raises questions about the toolbox's security implications.
Technical Features and Applications
The toolbox itself offers promising capabilities for interpretable machine learning, including support for various models and comparison functionalities. Community discussions have highlighted interesting applications, particularly in relation to FIGS (Fast Interpretable Greedy-Tree Sums) and its practical implementations. Some users have shared experiences with FIGS successors, noting gaps between theoretical benchmarks and real-world performance as discussed by researchers at Berkeley.
Naming Confusion
The project's name has also generated discussion, with community members pointing out potential confusion with Physics-informed Machine Learning (PiML) and amusing linguistic coincidences in German and Dutch. This highlights the importance of careful naming consideration in international software projects.
Conclusion
While PyML presents potentially valuable tools for interpretable machine learning, the community's primary concern centers on the lack of source code transparency. This situation serves as a reminder of the importance of open-source principles and the need for careful evaluation of dependencies in machine learning projects. Developers are advised to exercise caution when considering this toolbox for production use until these transparency issues are addressed.