Skip to main content
Michael Rajkowski

July 27, 2018

In “Python Data Science and the Rocket MultiValue Database ( Part 1 of 3 )“ I provided an introduction to Numpy, and showed how to convert a Numpy array to a u2py.DynArray.

In this section I will go a bit further and show:

  • How to write the numpy data to a Rocket MultiValue file
  • How to read back the MultiValue data, and instantiate a Numpy array
  • Introduce you to Pandas
  • How to move a Numpy array to a Pandas Data Frame

How to write the Numpy data to a Rocket MultiValue file

Before we can begin, we need to create a Rocket MultiValue file, and create some dictionaries.  Note for brevity, I will copy dictionaries items into the account rather than creating them by hand.  ( In part three of the series, I complete our journey on creating a Python class for persisting your data science objects into the MultiValue Database )

UniData Example:

CREATE.FILE U2DS 3 11
Create file D_U2DS, modulo/3,blocksize/1024
Hash type = 0
Create file U2DS, modulo/11,blocksize/1024
Hash type = 3
Added "@ID", the default record for UniData to DICT U2DS.
:COPY FROM DICT VOC TO DICT U2DS F1 F2 F3 F4
4 records copied

 

Note that you will need to modify the new dictionary items to be defined as MultiValued.  ( Change attribute 7 from S to M )

Now that you have a file, let’s go into Python, and build a Numpy Array then store it to the MultiValue File.

 

: PYTHON
python> import u2py
python> import numpy as np
python> import pandas as pd

 

Start with a Numpy Array ( built with simple sample data )

 

python> theData = [ [ 101, 102, 103, 104 ], [ 201, 202, 203, 204], [ 301, 302, 303, 304 ], [ 401, 402, 403, 404 ] ]
python> theData
[[101, 102, 103, 104], [201, 202, 203, 204], [301, 302, 303, 304], [401, 402, 403, 404]]
python> myArray = np.array( theData )
python> myArray
array([[101, 102, 103, 104],
[201, 202, 203, 204],
[301, 302, 303, 304],
[401, 402, 403, 404]])

 

We can modify our 4×4 Numpy Array prior to persisting it to our MultiValue database.

 

python> np.transpose(myArray)
array([[101, 201, 301, 401],
[102, 202, 302, 402],
[103, 203, 303, 403],
[104, 204, 304, 404]])

 

For our example we will put the transposed array data back into a Python nested list

 

python> asNestedList = np.transpose(myArray).tolist()
python> asNestedList
[[101, 201, 301, 401], [102, 202, 302, 402], [103, 203, 303, 403], [104, 204, 30
4, 404]]

 

Here I’ll write the data to the MultiValue file.

Since the u2py.DynArray can be instantiated from a Python nested list, we can create a dynamic array, and store it in the file we created earlier.

 

python> rec = u2py.DynArray(asNestedList)
python> rec
<u2py.DynArray value=b'101xfd201xfd301xfd401xfe102xfd202xfd302xfd402xfe1
03xfd203xfd303xfd403xfe104xfd204xfd304xfd404'>
python> file = u2py.File("U2DS")
python> file.write("mike", rec)

 

Now I’ll verify the data made it to the file.

 


u2py.run("LIST U2DS F1 F2 F3 F4")
LIST U2DS F1 F2 F3 F4 10:48:56 Jul 06 2018 1
U2DS...... F1........ F2............. F3............. F4.............

 

mike       101        102             103             104
201        202             203             204
301        302             303             304
401        402             403             404
1 record listed

How to read back the MultiValue data, and instantiate a numpy array

The next step in our example is to extract the data from the MultiValue database for use in more Data Science Processing.

 

python> myDynArray = file.read("mike")
python> myDynArray
<u2py.DynArray value=b'101xfd201xfd301xfd401xfe102xfd202xfd302xfd402xfe1
03xfd203xfd303xfd403xfe104xfd204xfd304xfd404'>
python> myNestedList = myDynArray.to_list()

 

As mentioned earlier, you can instantiate a numpy array from a nested list.

 

python> npArray = np.array(myNestedList)
python> npArray
array([['101', '201', '301', '401'],
['102', '202', '302', '402'],
['103', '203', '303', '403'],
['104', '204', '304', '404']],
dtype='<U3')

 

Introduction to Pandas

Pandas is an open source Python module used in Data Science.  It can easily import data into an easy-to-use data structure which allows you to perform operations on large data sets.

Since we have started our discussion with Numpy Arrays, we will instantiate our Pandas Data Frame from the Numpy Array:

Note that while numpy handles the array of information, Pandas allows you to define the column headers.

 

python> pdDataFrame = pd.DataFrame(npArray, columns=['f1','f2','f3','f4'])
python> pdDataFrame
f1   f2   f3   f4
0  101  201  301  401
1  102  202  302  402
2  103  203  303  403
3  104  204  304  404

 

You now have a Pandas Data frame to examine.  Note that the Numpy array is just the values portion of the Data Frame, and can be used the same as the numpy array, and return a nested Python List.

 

python> pdDataFrame.values
array([['101', '201', '301', '401'],
['102', '202', '302', '402'],
['103', '203', '303', '403'],
['104', '204', '304', '404']], dtype=object)

 

Note that we can also get the values as a Nested List:

 

python> pdDataFrame.values.tolist()
[['101', '201', '301', '401'], ['102', '202', '302', '402'], ['103', '203', '303
', '403'], ['104', '204', '304', '404']]

 

We can also extract the column names in the same way:

 

python> pdDataFrame.columns.tolist()
['f1', 'f2', 'f3', 'f4']

 

In “Python Data Science and the Rocket MultiValue Database ( Part 3 of 3 )“ I will show some of the things you can do with Pandas, and create a simple object for managing the storage and retrieval of the data to a Rocket MultiValue Database.


#Python