Arrow
Apache Arrow is a cross-language development platform for in-memory data. It uses a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations. It also provides computational libraries and zero-copy streaming messaging and interprocess communication. Languages currently supported include C, C++, C#, Go, Java, JavaScript, MATLAB, Python, R, Ruby, and Rust.
CUDA¶
Arrow is also available with CUDA.
whereX.Y.Z represent the desired version.
Python Bindings¶
The module contains bindings for multiple Python versions. To discover which are the compatible Python versions, run
whereX.Y.Z represent the desired version.
Or search pyarrow directly, by running
PyArrow¶
The Arrow Python bindings (also named PyArrow) have first-class integration with NumPy, Pandas, and built-in Python objects. They are based on the C++ implementation of Arrow.
-
Load the required modules.
whereX.Y.Zrepresent the desired version. -
Import PyArrow.
If the command displays nothing, the import was successful.
For more information, see the Arrow Python documentation.
Fulfilling other Python Package Dependency¶
Other Python packages depend on PyArrow in order to be installed.
With the arrow module loaded, your package dependency for pyarrow will be satisfied.
If pip list shows an entry, then pyarrow is available and seen by pip. Otherwise, in case of no entry, pyarrow is not available.
Apache Parquet Format¶
The Parquet file format is available.
To import the Parquet module, execute the previous steps for pyarrow, then run
If the command displays nothing, the import was successful.
R Bindings¶
The Arrow package exposes an interface to the Arrow C++ library to access many of its features in R. This includes support for analyzing large, multi-file datasets (open_dataset()), working with individual Parquet files (read_parquet(), write_parquet()) and Feather files (read_feather(), write_feather()), as well as lower-level access to the Arrow memory and messages.
Installation¶
-
Load the required modules.
-
Specify the local installation directory.
-
Export the required variables to ensure you are using the system installation.
-
Install the bindings.
Usage¶
After the bindings are installed, they have to be loaded.
-
Load the required modules.
-
Load the library.
For more information, see the Arrow R documentation
Troubleshooting¶
This is a Normal Error Generated by This Dummy Wheel¶
See This is a normal error generated by this dummy wheel.
ModuleNotFoundError: No Module Named 'pyarrow'¶
When importing the pyarrow, one may get the following error:
Traceback (most recent call last):
File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'pyarrow'
This usually means one of the two following cases: 1. A module for Arrow was not loaded 2. A Python module was not loaded
Module Arrow Not Loaded¶
Find a compatible arrow module and load it. See PyArrow.
Python Module Not Loaded¶
When omitting to load a Python module, and activating a virtual environment, the Python bindings will not be available, hence resulting in pyarrow not seen.
To remedy:
- Deactivate any Python virtual environment.
Note
If you had a virtual environment activated, it is important to deactivate it first, then load the module, before reactivating your virtual environment.
-
Load the module.
-
Check that it is visible by
and is accessible for your currently loaded Python module. If no errors are raised, then everything is OK!pip