top of page
  • Writer's pictureJatin Madaan

Accessing Oracle using PySpark .


To run oracle commands on oracle server using pyspark .


For EMR


First install software


sudo su

pip install cx_Oracle==6.0b1


Function 1 : To run select command in oracle and print result , we could store this in RDD or DF and use it further as well.


## defining a function

def run_select_oracle(sql_file) :

## Opening file passed into function

file_open=open(sql_file)

## Opening file to read

read_file=file_open.read()

## Defining connections to connect to oracle

connection=cx_Oracle.connect('schema_name/user@ip:port/database')

## Defining cursor so that data can be stored in there for temp

cursor = connection.cursor()

## actually running a command

res_cursor=cursor.execute(read_file)

## Creating empty list to store result

results = []

## Loop to get values row by row

for row in res_cursor.fetchall():

## Appending values in list defined above

results.append(row)

## displaying values

print(results)

## closing connection so that no orphan connections are there

connection.close()


## calling function with file name

run_select_oracle("/home/user/path_to_sql/file_name.sql")



Function 2 : Insert command using similar function , just make sure to add commit post insert else they won't be visible to any one .



def run_insert_oracle(sql_file):

file_open=open(sql_file)

read_file=file_open.read()

connection=cx_Oracle.connect('schema_name/user@ip:port/database')

cursor = connection.cursor()

res_cursor=cursor.execute(read_file)

connection.commit()

connection.close()


run_insert_oracle("/home/user/path_to_sql/file_name.sql")

482 views0 comments

Comments


bottom of page