Welcome to the scikit-validate documentation!¶
scikit-validate¶
Free software: Apache Software License 2.0
Documentation: https://scikit-validate.readthedocs.io.
Overview¶
scikit-validate is a validation package for science output developed within F.A.S.T.. This package provides commands for monitoring and comparing analysis outputs, computing resource usage (e.g. CPU time/RAM) as well as commands for summarising findings.
It is meant to provide analysis groups or small experiments with some of the fundamental features needed tovalidate (i.e. compare to a reference) the outcomes of their code and to provide easy access to the results.
Features¶
Collect metrics in JSON output * measure file metrics (e.g. size) * measure execution time and memory usage * compare to previous executions
compare ROOT files & plot discrepancies
create validation reports
Credits¶
This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.
Make a new RELEASE¶
export RELEASE=X.Y.Z
export CHANGELOG_GITHUB_TOKEN=<your github token>
make release
Installation¶
Stable release¶
To install scikit-validate, run this command in your terminal:
$ pip install skvalidate
This is the preferred method to install scikit-validate, as it will always install the most recent stable release.
If you don’t have pip installed, this Python installation guide can guide you through the process.
From sources¶
The sources for scikit-validate can be downloaded from the GitLab repo.
You can either clone the public repository:
$ git clone https://gitlab.cern.ch/fast-hep/public/scikit-validate.git
Once you have a copy of the source, you can install it with:
$ python setup.py install
Quickstart¶
Storing resource usage¶
Especially when constrained by available computing resources it is good to keep an eye on the resource usage of your analysis/application. For this purpose we provide the sv_execute command which encapsulates the script/executable and periodically checks the memory usage as well as reports the time taken at the end.
The following will execute stress --cpu 1 --io 1 --vm 1 --vm-bytes 128M --timeout 10s --verbose
and output the resource usage into the resource_metrics.json
file.
sv_execute -m resource_metrics.json \ -- \ stress --cpu 1 --io 1 --vm 1 --vm-bytes 128M --timeout 10s --verbose
In the first part, sv_execute -m resource_metrics.json
, we set the parameters for sv_execute
. --
marks the end of sv_execute
parameters, everthing after that is considered as the command (and parameters) to be executed.
The standard output of the called command is uneffected:
... stress: dbug: [1844521] allocating 134217728 bytes ... stress: dbug: [1844521] touching bytes in strides of 4096 bytes ... ... >>> Ran command: "stress --cpu 1 --io 1 --vm 1 --vm-bytes 128M --timeout 10s --verbose" >>> in 11.424817s and used 93.3 MB of memory.
Storing file information¶
Comparing two ROOT files¶
Adding high-level information¶
Usage¶
After installation scikit-validate will provide several commands, all starting with sv_:
sv_file_info¶
The first subcommand will simply record the file size of a given file and record it in a JSON file:
sv_file_info --help
Usage: sv_file_info [OPTIONS] [INPUT_FILES]...
Script to record file metrics.
For testing pick or create a file:
# create 10 MB file dd if=/dev/zero of=test.file bs=10485760
count=1 sv_add_file_metrics test.file -m metrics.json
If the output file, default metrics.json, already exists it will be read
first and results will be appended.
Options:
-m, --metrics-file TEXT file for JSON output
--help Show this message and exit.
sv_execute¶
This subcommand will execute the parameters passed to it as a shell command and monitor its resource usage. At the moment only (simple) CPU time and RAM usage are supported:
sv_execute --help
Usage: sv_execute [OPTIONS] COMMAND
Command that wraps and monitors another command.
For testing install 'stress' package and run
sv_execute -m resource_metrics.json -- \
stress --cpu 1 --io 1 --vm 1 --vm-bytes 128M --timeout 10s --verbose
If the output file, default resource_metrics.json, already exists it will
be read first and results will be appended.
If a single string argument is provided as the command then it will be
split using white-space, however if multiple arguments are provided then
no additional splitting is performed. In this case though, use `--`
before the command so that options are passed to the command, rather than
this script.
Options:
-m, --metrics-file PATH
--memprof-file PATH
--sample-interval FLOAT Sampling period (in seconds), defaults to 0.1
--help Show this message and exit.
sv_get_artifact_url¶
Reads the ENV variable in a Gitlab CI job and constructs a URL for a given existing file or folder.
e.g.:
sv_get_artefact_url output/test_file
will return ${CI_PROJECT_URL}/-/jobs/${CI_JOB_ID}/artifacts/file/output/test_file
while:
sv_get_artefact_url output
will return ${CI_PROJECT_URL}/-/jobs/${CI_JOB_ID}/artifacts/browse/output
sv_get_target_branch¶
Script to extract the target branch for a given project and commit hash.
- Meant to be run within a Gitlab CI job and needs the following ENV variables defined:
CI_PROJECT_ID (automatic from CI job)
CI_COMMIT_SHA (automatic from CI job)
CI_API_TOKEN (to be set in the Gitlab project: settings -> pipelines -> add variable)
Related issue: https://gitlab.com/gitlab-org/gitlab-ce/issues/15280
sv_merge_json¶
Merges dictionaries in <N>JSON files into one output file. Uses dict.update() → last occurrence of a key will take precedence. Usage:
sv_merge_json [OPTIONS] [INPUT_FILES]... OUTPUT
sv_remove_from_env¶
Removes a path from an environment variable, e.g.
sv_remove_from_env /a/b/c:/a/b/d:/d/b/a /a/b
will result in /d/b/a. Recommended use is to clean up ENV variables:
PATH=`sv_remove_from_env /a/b/c:/a/b/d:/d/b/a /a/b`
sv_metric_diff¶
Usage: sv_metric_diff [OPTIONS] FILE_UNDER_TEST REFERENCE_FILE
Display the difference between two metric (JSON) files.
Examples: sv_metric_diff
skvalidate/data/examples/performance_metrics*.json sv_metric_diff
skvalidate/data/examples/file_metrics*.json
Options:
-o, --output-format [console|csv|markdown]
--help Show this message and exit.
Example output:
sv_metric_diff skvalidate/data/examples/file_metrics*
+-----------------------------------------+------------+---------+-------------+--------+-----------+--------+
| file | metric | value | ref value | diff | diff_pc | unit |
|-----------------------------------------+------------+---------+-------------+--------+-----------+--------|
| continuous_integration_101.bin | size_in_mb | 81 | 39.6 | 41.4 | 104.545 | MB |
| continuous_integration_101.root | size_in_mb | 14.3 | 9.4 | 4.9 | 52.1277 | MB |
| continuous_integration_101_mctruth.root | size_in_mb | 90.3 | 31.9 | 58.4 | 183.072 | MB |
+-----------------------------------------+------------+---------+-------------+--------+-----------+--------+
sv_root_diff¶
Calculates the difference between two ROOT (https://root.cern.ch/) files. If a difference is present, the command will create plots for the distributions that differ.:
sv_root_diff file_under_test reference_file --out-dir <path to output folder (for plots etc)>
Example output 1 - test.a only exists in the reference file:

Example output 2 - test.y exists in both, but different random seed:
sv_version¶
sv_version
scikit-validate version: 0.3.7
sv_version --plain
0.3.7
run-clang-tidy¶
From https://github.com/llvm-mirror/clang-tools-extra/blob/master/clang-tidy/tool/run-clang-tidy.py
Runs clang-tidy in parallel for the code base:
run-clang-tidy <path to code base>
Reporting¶
Report.yml¶
template¶
download¶
The download
entry for a section allows you to specify files that need to be downloaded from either a web URL or
from a pipeline job from the pipeline the report will run in.
For the latter, the path is of the form protocol://<name of CI job>/<path to file>
, e.g. gitlab://test/output/t.png
will download output/t.png
from the Gitlab CI pipeline job test
.
The entry in the Report.yml can then be written as
download:
<output path>: <url>
images/t.png: gitlab://test/output/t.png
Contributing¶
Contributions are welcome, and they are greatly appreciated! Every little bit helps, and credit will always be given.
You can contribute in many ways:
Types of Contributions¶
Report Bugs¶
Report bugs at https://gitlab.cern.ch/fast-hep/public/scikit-validate/issues.
If you are reporting a bug, please include:
Your operating system name and version.
Any details about your local setup that might be helpful in troubleshooting.
Detailed steps to reproduce the bug.
Fix Bugs¶
Look through the GitHub issues for bugs. Anything tagged with “bug” and “help wanted” is open to whoever wants to implement it.
Implement Features¶
Look through the GitHub issues for features. Anything tagged with “enhancement” and “help wanted” is open to whoever wants to implement it.
Write Documentation¶
scikit-validate could always use more documentation, whether as part of the official scikit-validate docs, in docstrings, or even on the web in blog posts, articles, and such.
Submit Feedback¶
The best way to send feedback is to file an issue at https://gitlab.cern.ch/fast-hep/public/scikit-validate/issues.
If you are proposing a feature:
Explain in detail how it would work.
Keep the scope as narrow as possible, to make it easier to implement.
Remember that this is a volunteer-driven project, and that contributions are welcome :)
Get Started!¶
Ready to contribute? Here’s how to set up skvalidate for local development.
Fork the skvalidate repo on GitLab.
Clone your fork locally:
$ git clone git@github.com:your_name_here/skvalidate.git
Install your local copy into a virtualenv. Assuming you have virtualenvwrapper installed, this is how you set up your fork for local development:
$ mkvirtualenv skvalidate $ cd skvalidate/ $ python setup.py develop
Create a branch for local development:
$ git checkout -b name-of-your-bugfix-or-feature
Now you can make your changes locally.
When you’re done making changes, check that your changes pass flake8 and the tests, including testing other Python versions with tox:
$ flake8 skvalidate tests $ python setup.py test or py.test $ tox
To get flake8 and tox, just pip install them into your virtualenv.
Commit your changes and push your branch to GitHub:
$ git add . $ git commit -m "Your detailed description of your changes." $ git push origin name-of-your-bugfix-or-feature
Submit a pull request through the GitHub website.
Pull Request Guidelines¶
Before you submit a pull request, check that it meets these guidelines:
The pull request should include tests.
If the pull request adds functionality, the docs should be updated. Put your new functionality into a function with a docstring, and add the feature to the list in README.rst.
The pull request should work for Python 2.7, 3.4, 3.5 and 3.6, and for PyPy. Check https://travis-ci.org/kreczko/skvalidate/pull_requests and make sure that the tests pass for all supported Python versions.
Tips¶
To run a subset of tests:
$ py.test tests.test_skvalidate
Deploying¶
A reminder for the maintainers on how to deploy. Make sure all your changes are committed (including an entry in HISTORY.rst). Then run:
$ bumpversion patch # possible: major / minor / patch
$ git push
$ git push --tags
Travis will then deploy to PyPI if tests pass.
Credits¶
Development Lead¶
Faster Analysis Software Taskforce (FAST) <fast-hep@cern.ch>
Contributors¶
None yet. Why not be the first?
skvalidate¶
skvalidate package¶
Top-level package for scikit-validate.
Subpackages¶
skvalidate.commands package¶
Package for skvalidate commands.
All modules in this folder are automatically loaded as commands available through skvalidate.
Submodules¶
skvalidate.compare package¶
Submodules¶
skvalidate.gitlab package¶
skvalidate.io package¶
skvalidate.operations package¶
skvalidate.report package¶
Submodules¶
skvalidate.software package¶
Submodules¶
skvalidate.vis package¶
Submodules¶
Submodules¶
skvalidate.clang_tidy module¶
# Parallel clang-tidy runner.
Runs clang-tidy over all files in a compilation database. Requires clang-tidy and clang-apply-replacements in $PATH.
Example invocations.
Run clang-tidy on all files in the current working directory with a default set of checks and show warnings in the cpp files and all project headers.
run-clang-tidy.py $PWD
Fix all header guards.
run-clang-tidy.py -fix -checks=-*,llvm-header-guard
Fix all header guards included from clang-tidy and header guards for clang-tidy headers.
run-clang-tidy.py -fix -checks=-*,llvm-header-guard extra/clang-tidy -header-filter=extra/clang-tidy
Compilation database setup: http://clang.llvm.org/docs/HowToSetupToolingForLLVM.html
-
skvalidate.clang_tidy.
apply_fixes
(args, tmpdir)[source]¶ Call clang-apply-fixes on a given directory. Deletes the dir when done.
-
skvalidate.clang_tidy.
check_clang_apply_replacements_binary
(args)[source]¶ Check if invoking supplied clang-apply-replacements binary works.
-
skvalidate.clang_tidy.
find_compilation_database
(path)[source]¶ Adjust the directory until a compilation database is found.
skvalidate.cli module¶
skvalidate.git module¶
skvalidate.skvalidate module¶
Main module.