Saturday, August 3, 2019

How to run Athena queries from Apache NiFi


2. Place the jar file on nifi servers and make nifi user as owner.

3. Create the DBCPConnectionPool service




JDBC url: jdbc:awsathena://AwsRegion=[REGION];UID=[ACCESS KEY];PWD=[SECRET KEY];S3OutputLocation=[LOCATION]

jdbc:awsathena://AwsRegion=us-east-1;S3OutputLocation=s3://aws-athena-query-results-127370262072-us-east-1

for database user and password, provide access key and secret key.

4. Configure the ExecuteSQL processor like below








Wednesday, May 1, 2019

ERROR 9999 (28000): An internal error has occurred. Please retry or report your issues.

I was working on a project for deployment in Microsoft Azure and created an Azure Mysql database.
But while connecting to the database using the MySQL client on a RedHat server I was getting the following error.

ERROR 9999 (28000): An internal error has occurred. Please retry or report your issues.


Normally on the cloud, the first thing I look at checking if the port is open, I used Telnet to check port 3306 is open for the MySQL database. 


[root@cloudera-machine cloudera]# mysql -h mysql001.mysql.database.azure.com -u cloudera -P 3306 -p

Enter password:
ERROR 9999 (28000): An internal error has occurred. Please retry or report your issues.

Then I found to connect to database, we have to use the full username of the pattern youruser@azuredbserver and enclose it in double quotes to work.
[root@cloudera-machine cloudera]# mysql -h mysql001.mysql.database.azure.com -u "cloudera@mysql001" -P 3306 -p
Enter password:
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MySQL connection id is 61601
Server version: 5.6.39.0 MySQL Community Server (GPL)

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.


Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.



MySQL [(none)]>

Thursday, November 1, 2018

Azure CLI for listing VM images

Using az command line to List VM images

We need to have details of VM images to use them programmatically through ansible or other tools.
below I have listed a few examples of Azure CLI commands to list the Publishers, Offers, SKUs, Versions and Image details in general.
  • To list publishers
bash-3.2$ az vm image list-publishers -l eastus --query "[?starts_with(name, 'cloudera')]"
[
  {
    "id": "/Subscriptions/da35404a-2612-4419-baef-45fcdce6045e/Providers/Microsoft.Compute/Locations/eastus/Publishers/cloudera",
    "location": "eastus",
    "name": "cloudera",
    "tags": null
  }
]
  • To list all offers
bash-3.2$ az vm image list-offers -l eastus -p cloudera
[
  {
    "id": "/Subscriptions/da35404a-2612-4419-baef-45fcdce6045e/Providers/Microsoft.Compute/Locations/eastus/Publishers/cloudera/ArtifactTypes/VMImage/Offers/cloudera-altus-centos-os",
    "location": "eastus",
    "name": "cloudera-altus-centos-os",
    "tags": null
  },
...
  {
    "id": "/Subscriptions/da35404a-2612-4419-baef-45fcdce6045e/Providers/Microsoft.Compute/Locations/eastus/Publishers/cloudera/ArtifactTypes/VMImage/Offers/test-image",
    "location": "eastus",
    "name": "test-image",
    "tags": null
  }
]

  • To list all SKUs
bash-3.2$ az vm image list-skus -l eastus -p cloudera -f cloudera-centos-os
[
  {
    "id": "/Subscriptions/da35404a-2612-4419-baef-45fcdce6045e/Providers/Microsoft.Compute/Locations/eastus/Publishers/cloudera/ArtifactTypes/VMImage/Offers/cloudera-centos-os/Skus/6_7",
    "location": "eastus",
    "name": "6_7",
    "properties": {
      "automaticOSUpgradeProperties": {
        "automaticOSUpgradeSupported": false
      }
    },
    "tags": null
  },
......
  {
    "id": "/Subscriptions/da35404a-2612-4419-baef-45fcdce6045e/Providers/Microsoft.Compute/Locations/eastus/Publishers/cloudera/ArtifactTypes/VMImage/Offers/cloudera-centos-os/Skus/7_4",
    "location": "eastus",
    "name": "7_4",
    "properties": {
      "automaticOSUpgradeProperties": {
        "automaticOSUpgradeSupported": false
      }
    },
    "tags": null
  }
]

  • to list all images
bash-3.2$ az vm image list  -l eastus -p cloudera -f cloudera-centos-os -o tsv --all
cloudera-centos-os cloudera 6_7 cloudera:cloudera-centos-os:6_7:1.0.0 1.0.0
cloudera-centos-os cloudera 6_7 cloudera:cloudera-centos-os:6_7:1.0.1 1.0.1
cloudera-centos-os cloudera 6_7 cloudera:cloudera-centos-os:6_7:2.0.1 2.0.1
cloudera-centos-os cloudera 6_7 cloudera:cloudera-centos-os:6_7:2.0.4 2.0.4
cloudera-centos-os cloudera 6_8 cloudera:cloudera-centos-os:6_8:1.0.0 1.0.0
cloudera-centos-os cloudera 6_8 cloudera:cloudera-centos-os:6_8:2.0.1 2.0.1
cloudera-centos-os cloudera 6_8 cloudera:cloudera-centos-os:6_8:2.0.4 2.0.4
cloudera-centos-os cloudera 7_2 cloudera:cloudera-centos-os:7_2:1.0.2 1.0.2
cloudera-centos-os cloudera 7_2 cloudera:cloudera-centos-os:7_2:1.0.5 1.0.5
cloudera-centos-os cloudera 7_2 cloudera:cloudera-centos-os:7_2:1.0.7 1.0.7
cloudera-centos-os cloudera 7_2 cloudera:cloudera-centos-os:7_2:1.0.8 1.0.8
cloudera-centos-os cloudera 7_2 cloudera:cloudera-centos-os:7_2:2.0.3 2.0.3
cloudera-centos-os cloudera 7_2 cloudera:cloudera-centos-os:7_2:2.0.4 2.0.4
cloudera-centos-os cloudera 7_4 cloudera:cloudera-centos-os:7_4:2.0.7 2.0.7
  • to list details of a specific image
bash-3.2$ az vm image show -l eastus -p cloudera -f cloudera-centos-os -s 7_2 --version 1.0.2
{
  "dataDiskImages": [],
  "id": "/Subscriptions/da35404a-2612-4419-baef-45fcdce6045e/Providers/Microsoft.Compute/Locations/eastus/Publishers/cloudera/ArtifactTypes/VMImage/Offers/cloudera-centos-os/Skus/7_2/Versions/1.0.2",
  "location": "eastus",
  "name": "1.0.2",
  "osDiskImage": {
    "operatingSystem": "Linux"
  },
  "plan": {
    "name": "7_2",
    "product": "cloudera-centos-os",
    "publisher": "cloudera"
  },
  "tags": null
}

Wednesday, October 31, 2018

Create Azure VM using Ansible - With Extension

Extensions are similar to user data from AWS EC2 instances on Azure VM.
Below is an example of creating a virtual machine on Azure using Ansible.
Apart from creating a VM, it also demos creating an Extension, I am using a Custom Script as an extension so I can perform the Cloudera Director installation on the VM.

It can be used to run any shell script on the machine.

This play has below steps:

1. Create a public IP
2. Create a NIC
3. Create a Virtual Machine
4. Create an Extension

---
- name: Deploy Azure ARM template.
  hosts: localhost
  connection: local
  gather_facts: false
  vars_files:
    - var.yml

  tasks:
  - name: Create a public ip address
    azure_rm_publicipaddress:
      resource_group: '{{ cd_resource_group }}'
      name: '{{ cd_public_ip }}'
      allocation_method: Static
      domain_name: '{{ cd_vmname }}'
      state: present
    register: output_ip_address

  - name: Dump public IP for VM which will be created
    debug:
      msg: "The public IP is {{ output_ip_address.state.ip_address }}."

  - name: Create NIC
    azure_rm_networkinterface:
      name: '{{ cd_nic_name }}'
      resource_group: '{{ cd_resource_group }}'
      virtual_network_name: '{{ cd_vnet }}'
      subnet_name: '{{ cd_subnet_name }}'
      security_group_name: '{{ cd_security_group }}'
      public_ip_address_name: '{{ cd_public_ip }}'

  - name: Create VM
    azure_rm_virtualmachine:
      resource_group: '{{ cd_resource_group }}'
      name: '{{ cd_vmname }}'
      network_interfaces: '{{ cd_nic_name }}'
      admin_username: centos
      ssh_password_enabled: true
      admin_password: xxxxxxx
      # ssh_public_keys:
      #   - path: /home/centos/.ssh/authorized_keys
      #     key_data: <your-key-data>
      vm_size: Standard_A4_v2
      state: present
      image:
        offer: CentOS
        publisher: OpenLogic
        sku: '7.5'
        version: latest a

  - name: Create VM Extension
    azure_rm_virtualmachine_extension:
       name: '{{ cd_vm_extention }}'
       location: eastus
       resource_group: '{{ cd_resource_group }}'
       virtual_machine_name: '{{ cd_vmname }}'
       publisher: Microsoft.Azure.Extensions
       virtual_machine_extension_type: CustomScript
       type_handler_version: 2.0
       settings: '{
                  "commandToExecute": "./director-install.sh",
                  "fileUris": ["https://raw.githubusercontent.com/samcloudera/random/master/director-install.sh"
                  ]
                }'
       auto_upgrade_minor_version: true
       state: present


Below is the playbook for deleting the Azure virtual machine.

It has below plays:

1. Deleting the extension
2. Kill VM
3. Delete the NIC
4. Delete the Public IP


---
- name: Deploy Azure ARM template.
  hosts: localhost
  connection: local
  gather_facts: false
  vars_files:
    - var.yml

  tasks:
  - name: Delete VM Extension
    azure_rm_virtualmachine_extension:
      name: '{{ cd_vm_extention }}'
      location: eastus
      resource_group: '{{ cd_resource_group }}'
      virtual_machine_name: '{{ cd_vmname }}'
      state: absent

  - name: Kill VM
    azure_rm_virtualmachine:
      resource_group: '{{ cd_resource_group }}'
      name: '{{ cd_vmname }}'
      state: absent
      remove_on_absent:
#          - network_interfaces
          - virtual_storage
#          - public_ips

  - name: Delete network interface
    azure_rm_networkinterface:
      resource_group: '{{ cd_resource_group }}'
      name: '{{ cd_nic_name }}'
      state: absent

  - name: Delete public ip
    azure_rm_publicipaddress:
      resource_group: '{{ cd_resource_group }}'
      name: '{{ cd_public_ip }}'

      state: absent

Wednesday, July 19, 2017

Use of jq in aws cli

I wrote a post with use of query option in aws cli for filtering.
Another powerful approach for filtering is jq.

jq is resourceful and helps in getting all the operations done, which query options fails

for example, I was struggling to pass a shell variable to cli command which I was able to do though jq

below command lets me get the exact name of the emr cluster and passing cluster ID as shell variable.

aws emr list-clusters --region $region |jq -r --arg cid "$cid"  '.Clusters[] | select(.Id == $cid ) | .Name'


Example to query the instaances tag

aws ec2 describe-instances  --region us-east-1 --instance-ids i-02426396b622c2b23   2> /dev/null | jq -r '.Reservations[].Instances[].Tags[] | select(.Key=="owner") | .Value'

Thursday, June 29, 2017

Restart Hadoop Services on EMR

https://aws.amazon.com/premiumsupport/knowledge-center/restart-service-emr/


to check all the services which you can manage

initctl status

 [root@host conf.dist]# initctl list
rc stop/waiting
tty (/dev/tty3) start/running, process 4452
tty (/dev/tty2) start/running, process 4450
tty (/dev/tty1) start/running, process 4448
tty (/dev/tty6) start/running, process 4458
tty (/dev/tty5) start/running, process 4456
tty (/dev/tty4) start/running, process 4454
gmetad start/running, process 10054
update-motd stop/waiting
hadoop-mapreduce-historyserver start/running, process 19861
hadoop-yarn-timelineserver start/running, process 19056
hive-server2 start/running, process 17579
plymouth-shutdown stop/waiting
control-alt-delete stop/waiting
hive-hcatalog-server start/running, process 18745
rcS-emergency stop/waiting
kexec-disable stop/waiting
quit-plymouth stop/waiting
hadoop-yarn-resourcemanager start/running, process 19206
hadoop-kms start/running, process 11467
rcS stop/waiting
hue start/running, process 19671
prefdm stop/waiting
hadoop-httpfs start/running, process 11750
init-system-dbus stop/waiting
print-image-id stop/waiting
oozie start/running, process 15441
ganglia-rrdcached start/running, process 9933
elastic-network-interfaces stop/waiting
splash-manager stop/waiting
hadoop-yarn-proxyserver start/running, process 18922
start-ttys stop/waiting
spark-history-server start/running, process 20044
hadoop-hdfs-namenode start/running, process 12003
rcS-sulogin stop/waiting
serial (ttyS0) start/running, process 4447
gmond start/running, process 10073



to restart a service
[root@server conf.dist]# status hive-hcatalog-server
hive-hcatalog-server start/running, process 18745
[root@server conf.dist]# restart hive-hcatalog-server
hive-hcatalog-server start/running, process 18745
[root@server conf.dist]# restart hive-server2
hive-server2 start/running, process 17579




show databases;

Wednesday, June 28, 2017

Remote JDBC connection to Hive on EMR

1. SSH tunneling :
ssh -o ServerAliveInterval=10 -N -L 10000:localhost:10000 username@masternode


for automating this tunning
we can use autossh

2. use local url which will use above tunnel:
jdbc:hive2://localhost:10000/default