Welcome to Blog by

Jatin Madaan

Apr 28, 20191 min read

Hive on Spark simple program

## PySpark code to run sql command . code : ## Importing HiveContext >>>> from pyspark.sql import Hive Context ## Create a SqlContext...

Jatin Madaan

Apr 28, 20191 min read

Sed Command to remove blank lines from a file in Unix .

Jatin Madaan

Apr 5, 20191 min read

Create a unix file with old timestamp

Jatin Madaan

Apr 3, 20191 min read

Tee command in unix to add a string in multiple files at once .

Jatin Madaan

Apr 3, 20191 min read

Sed command to add characters in starting and end of each line of a file in Unix.

Jatin Madaan

Mar 28, 20191 min read

Load data into Hive table from a file on local system

To Load data from a csv (it can be pipe,tab,comma seprated ) file : Step 1 : Create a table with delimiter as given in file Command :...

Jatin Madaan

Mar 10, 20191 min read

Accessing Oracle using PySpark .

To run oracle commands on oracle server using pyspark . For EMR First install software sudo su pip install cx_Oracle==6.0b1 Function 1 :...

Jatin Madaan

Mar 10, 20191 min read

Getting last year in Hive , using dual table as in oracle .

There is a simple command although it would run map reduce but still in case required . last_year=$(hive -e "select...

Jatin Madaan

Mar 10, 20191 min read

AWS S3 file copy

To copy files on local machine we can use command : aws s3 cp s3://bucket_name/folder_name/file_name.txt . there is a dot at end to...

Jatin Madaan

Mar 10, 20191 min read

Hadoop fs commands on S3

We can perform almost all hadoop fs commands on s3 file system as well. Eg : hadoop fs -du -s -h s3://bucket_name/folder_name 10.1 G ...

Jatin Madaan

Mar 9, 20191 min read

Hive SQL return code check

While running hive query using hive -e or hive -f command merely writing rc=$? below hive command will not help , it will only tell if...

Jatin Madaan

Feb 21, 20191 min read

Unix command to change prompt from default to PWD

Jatin Madaan

Feb 18, 20191 min read

How to open ipynb using Jupyter notebook .

Jatin Madaan

Feb 18, 20190 min read

Difference Between Unix & Linux

Jatin Madaan

Feb 14, 20191 min read

Unix Command to delete all files which are more than x days old.

Jatin Madaan

Feb 7, 20191 min read

Running Informatica workflow in a loop for different dates .

Get parameter such as workflow_name,start_date,end_date,parameter_file as input in a file . Loop through dates to get all date values...

Jatin Madaan

Feb 7, 20191 min read

AWS key for Terminal on Mac

To connect aws cluster (EMR or EC2) via terminal on mac . First make sure you download pem file from aws account. Once file has been...

Jatin Madaan

Feb 6, 20191 min read

Unix Command for different loops:

Until loop until [[ $flag > 1 ]] do [code] done

Jatin Madaan

Feb 6, 20191 min read

Unix Command to Create alias in bash profile

alias sr="cd /[path_to_folder]"

Jatin Madaan

Feb 6, 20192 min read

Simple Pyspark Code to read sequential file,run sql query and storing Text File in hdfs :

import time import sys import subprocess ## Getting start time of a job start_time= time.time() ## importing spark and Hive Context to...

Home: Blog