Doctesting for PyData Libraries
Published October 30, 2023
Sheila-nk
Sheila Kahwai
Discovering the PyData World
Hey there, my name is Sheila Kahwai, and before this internship, I was a PyData newbie! Yes, I hadn't dipped my toes into the world of NumPy, and my first time locally building SciPy happened to be a month into my internship.
Even though I had no experience working with SciPy or NumPy, I knew I had the potential to create something valuable for the PyData community. So, when I was assigned the task of building a pytest plugin, something I am all too familiar with, I thought, "Maybe a month tops, right? Quick operation, in and out!" Lol, was I in for a surprise!
It was a journey filled with unexpected roadblocks. There were moments I thought I was seeing the light at the end of the tunnel only to realize that the tunnel had light wells. But through it all, I remained positive because my primary goal was to learn and grow, and this internship was an endless source of knowledge and personal growth.
Navigating the Doctesting Landscape
Let's dive into the technical stuff now. The "refguide-check" tool is a SciPy and NumPy module that deals with docstrings. One of its essential functions is doctesting, which involves testing docstring examples to ensure they are accurate and valid. Docstring examples are critical because they serve as documentation to show users how to use your code. However, having them is not enough; they must also be accurate.
NumPy and SciPy use a modified form of doctesting in their refguide-check utilities. My mentor, Evgeni Burovski, managed to isolate this functionality into a separate package called "scpdt". Scpdt is not your ordinary doctesting tool. It has the following capabilities:
- Floating-Point Awareness: Scpdt is acutely aware of floating-point intricacies. E.g: It recognizes that 1/3 isn't precisely equal to 0.333 due to floating-point precision. It incorporates a core check using
np.allclose(want, got, atol=..., rtol=...)
, allowing users to control absolute and relative tolerances. - Human-Readable Skip Markers: Scpdt introduces user-friendly skip markers like
# may vary
and# random
. These markers differ from the standard# doctest: +SKIP
in that they selectively skip the output verification while ensuring the example source remains valid Python code. - Handling Numpy's Output Formatting: Numpy has a unique output formatting style, such as array abbreviation and often adding whitespace that can confound standard doctesting, which is whitespace-sensitive. Scpdt ensures accurate testing even with Numpy's quirks.
- User Configurability: Through a
DTConfig
instance, users can tailor the behavior of doctests to meet their specific needs. - Flexible Doctest Discovery: One can use
testmod(module, strategy='api')
to assess only public module objects, which is ideal for complex packages. The defaultstrategy=None
mirrors standard doctest module behavior.
But here's the twist: Scpdt could only perform doctesting on SciPy's and NumPy's public modules through a helper script, and that wasn't ideal. So, guess who stepped in to bridge the gap?
Bridging the Gap with Pytest
Pytest already has a doctesting module, but unfortunately, it doesn't meet the specific needs of the PyData libraries. Therefore, the crucial task was to ensure pytest could leverage the power of Scpdt for doctesting. This involved overriding some of doctest's functions and classes to incorporate scpdt's alternative doctesting objects. It also meant modifying pytest's behavior by implementing hooks, primarily for initialization and collection.
Once all the technical juggling was done, it was time for what my mentor called "dogfooding" (a term he picked up from Joel Spolsky's essay). The term simply means putting your own product to the test by using it, and I had to make sure that the plugin functioned as expected. I did this by locally running doctests on SciPy's modules. It was an eye-opener, exposing issues like faulty collection – for example, the plugin wasn't collecting compiled and NumPy universal functions for doctesting.
With the bugs and vulnerabilities exposed during this process, I was able to refine the plugin further. I then created a pull request to demonstrate how the pytest plugin could be seamlessly integrated into SciPy. The process is fairly straightforward:
- Installation: Install the plugin via pip.
- Configuration: Customize your doctesting through a
conftest.py
file. - Running Doctests in SciPy: If you're running doctests on SciPy, execute the command
python dev.py test --doctests
in your shell. - Running Doctests on Other Packages: If you're not working with SciPy, use the command
pytest --pyargs <your-package> --doctest-modules
to run your doctests.
Voila! 🎉
Future Goals
I am currently in the process of integrating the plugin into SciPy; for more details, you can check out the PR. Looking ahead, our goal is to publish the plugin on PyPI and extend its integration to NumPy and other PyData libraries.
If you run into challenges with floating-point arithmetic, face output issues related to whitespace and array abrreviations, need to validate example source code without output testing, or simply desire a customized doctesting experience, consider giving this plugin a try.
The Journey's End
Throughout this incredible journey, I cherished every moment spent working, learning from my mentors: Evgeni Burovski and Melissa Weber Mendonça, and being part of the Quansight team. I'm incredibly grateful for this opportunity, and I look forward to continuing my contributions to the pytest plugin even after the internship.
Curious? Check out the plugin repository on GitHub. Feel free to contribute – the more, the merrier! 🚀🐍
Stay tuned for more exciting developments!