To run oracle commands on oracle server using pyspark .
For EMR
First install software
sudo su
pip install cx_Oracle==6.0b1
Function 1 : To run select command in oracle and print result , we could store this in RDD or DF and use it further as well.
## defining a function
def run_select_oracle(sql_file) :
## Opening file passed into function
file_open=open(sql_file)
## Opening file to read
read_file=file_open.read()
## Defining connections to connect to oracle
connection=cx_Oracle.connect('schema_name/user@ip:port/database')
## Defining cursor so that data can be stored in there for temp
cursor = connection.cursor()
## actually running a command
res_cursor=cursor.execute(read_file)
## Creating empty list to store result
results = []
## Loop to get values row by row
for row in res_cursor.fetchall():
## Appending values in list defined above
results.append(row)
## displaying values
print(results)
## closing connection so that no orphan connections are there
connection.close()
## calling function with file name
run_select_oracle("/home/user/path_to_sql/file_name.sql")
Function 2 : Insert command using similar function , just make sure to add commit post insert else they won't be visible to any one .
def run_insert_oracle(sql_file):
file_open=open(sql_file)
read_file=file_open.read()
connection=cx_Oracle.connect('schema_name/user@ip:port/database')
cursor = connection.cursor()
res_cursor=cursor.execute(read_file)
connection.commit()
connection.close()
run_insert_oracle("/home/user/path_to_sql/file_name.sql")
Comments