root_numpy: The interface between ROOT and NumPy

3 downloads 0 Views 216KB Size Report
May 11, 2017 - Abadi, Martín, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, ... Speckmayer, P., A. Hocker, J. Stelzer, and H. Voss.
root_numpy: The interface between ROOT and NumPy Edmund Noel Dawe1 , Piti Ongmongkolkul2 , and Giordon Stark3 1

2

The University of Melbourne Mahidol University International College 3 The University of Chicago 11 May 2017

Paper DOI: http://dx.doi.org/10.21105/joss.00307 Software Repository: https://github.com/scikit-hep/root_numpy Software Archive: http://dx.doi.org/10.5281/zenodo.842249

Summary root_numpy interfaces NumPy (Walt, Colbert, and Varoquaux 2011) with CERN’s ROOT (Antcheva et al. 2009) software framework, providing the ability to analyse ROOT data within the broad ecosystem of scientific Python packages. At its core are functions for converting between ROOT TTrees and structured NumPy arrays. root_numpy can convert TTree branches (columns) of fundamental types and strings, as well as variable-length and fixed-length multidimensional arrays and (nested) std::vectors. root_numpy can also create columns in the output NumPy array from mathematical expressions like ROOT’s TTree::Draw(). root_numpy’s internals are written in Cython (Behnel et al. 2011), installed as compiled C++ extensions, and can handle data with comparable speed to ROOT as shown in Figure 1. root_numpy can also convert between ROOT histograms and NumPy arrays, and sample or evaluate ROOT functions as NumPy arrays. root_numpy interfaces NumPy with TMVA (Speckmayer et al. 2010), ROOT’s machine learning toolkit, but naturally allows ROOT users to take advantage of scikit-learn (Pedregosa et al. 2011) and TensorFlow (Abadi et al. 2015).

References Abadi, Martín, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, et al. 2015. “TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems.” http://tensorflow.org/. Antcheva, Ilka, Maarten Ballintijn, Bertrand Bellenot, Marek Biskup, Rene Brun, Nenad Buncic, Philippe Canal, et al. 2009. “ROOT - a C++ Framework for Petabyte Data Storage, Statistical Analysis and Visualization.” Computer Physics Communications 180 (12): 2499–2512. doi:10.1016/j.cpc.2009.08.005. Behnel, Stefan, Robert Bradshaw, Craig Citro, Lisandro Dalcin, Dag Sverre Seljebotn, and Kurt Smith. 2011. “Cython: The Best of Both Worlds.” Computing in Science and Engineering 13: 31–39. doi:10.1109/MCSE.2010.118. Pedregosa, Fabian, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, et al. 2011. “Scikit-Learn: Machine Learning in Python.” Journal of Machine Learning

1

10

1

10

2

10

3

10

4

tree contains a single branch root_numpy.tree2array() ROOT.TTree.Draw() time [s]

time [s]

100

101

102

104 105 103 number of entries

106

107

0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05

tree contains 1M entries per branch

Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz Storage: M4-CT256M4SSD2 ROOT-6.10/02 Python-2.7.13 NumPy-1.13.1

2

4 6 number of branches

8

10

Figure 1: Benchmarking root_numpy’s tree2array() function against ROOT’s TTree::Draw() Research 12: 2825–30. http://scikit-learn.org. Speckmayer, P., A. Hocker, J. Stelzer, and H. Voss. 2010. “The Toolkit for Multivariate Data Analysis, Tmva 4.” J. Phys. Conf. Ser. 219: 032057. doi:10.1088/1742-6596/219/3/032057. Walt, Stéfan van der, S. Chris Colbert, and Gaël Varoquaux. 2011. “The Numpy Array: A Structure for Efficient Numerical Computation.” Computing in Science & Engineering 13: 22–30. doi:10.1109/MCSE.2011.37.

2

Suggest Documents