Oklahoma State University - Teradata University Network

0 downloads 185 Views 3MB Size Report
This document describes the step-by-step process for getting started with the ... Player or VMware Workstation: Download
Oklahoma State University Spears School of Business Dep. of Mgmt. Sc. & Info. Sys.

Big Data/Advanced Analytics Technologies

Getting Started with Aster Express and AppCenter Ramesh Sharda and Pankush Kalgotra [email protected] [email protected]

Acknowledgements: This material has been adapted from various sources within Teradata Aster. Specifically, we acknowledge assistance from Mark Ott, Gregory Bethardy, John Thuma, and Susan Baskin.

1

Purpose: This document describes the step-by-step process for getting started with the Aster Express. It begins with downloading the Aster Express VMs and concludes with running your first SQL-Map Reduce (MR) query. It also includes instructions about installing and using AppCenter for visualizations.

Hardware and Software Requirements Check 

4 GB memory | 20 GB free disk space | CPU should be 64 bit support capable



Operating System: Microsoft Windows Vista or Windows 7



VMware Player or VMware Workstation: Download and install the latest (free) version of the VMware Player



7-Zip: Extract (or uncompress) the Aster Express package using 7-Zip, available from: http://www.7-zip.org/

2

Steps to be followed: 1. Download Aster Express from https://downloads.teradata.com/download/aster/asterexpress 2. Unzip the downloaded file. It will contain two VM images:

3. Install VMware workstation and open the two images on it. 4. Before turning on the two machines (Aster Queen and worker), you have to configure some settings. Follow the two screenshots below. It lists the steps to be followed to verify the IP address and subnet mask of VMware network adapter; it should be 192.168.100.1 and 255.255.255.0 respectively.

3

5. You also have to verify that whether the subnet IP and mask in the VMware workstation are the same or not. On VMware Workstation, go to EDIT>Virtual Network Editor>Vmnet8 and then make sure the subnet IP is same as the screenshot below.

4

6. Before turning on the machines, some suggestions: a. If you are using a 4GB RAM machine, assign 2 GB RAM to the worker and 1GB ram to the Queen. b. If you are using 8 GB RAM, assign 3 GB RAM to worker and 2GB to Queen. And if 16GB RAM, you can assign 3GB to queen and 6 GB to worker. 7. Now turn on the VM images in VMware workstation. These are two SLES machines. GUI are turned off/ not installed. You will only see the terminal when started, as shown in the screenshot below. The root username is “root” and password is “aster”. Another user is there named “aster” with password “aster.” These two users are present in both machines: queen and worker.

5

8. After tuning on the machines, you have to activate the cluster on Aster management console (AMC). Open Web Browser and type 192.168.100.100. (This is queen IP address and worker IP address is 192.168.100.150 (you can check using ifconfig command on the command line terminal of each machine). This will ask for username and password: a. Username: db_superuser b. Password: db_superuser. db_superuser is a database administrator user. Another database user is also there named “beehive” with password “beehive”. When you login using db_superuser credentials, this will open AMC where a user can manage, configure, and monitor Aster Database activities.

9. To activate the Aster cluster, Click Admin>Cluster Management>Activate cluster 10. Now, the cluster is set up and you can start working with the databases. To access databases,

ACT (Aster Command Tool) is used. You can use command line terminal of queen to work with databases or with Putty. You can download putty from: http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html

6

11. Open Putty and type the IP address of queen to login: 192.168.100.100. You know the username/password for root (root/aster). (You can also login to worker typing worker’s IP address).

12. Now you are in the command line terminal of queen. Ok! Now to access database, we will use act. Here is the command to enter database. act -U db_superuser -d beehive (here db_superuser is the database user and the database name is beehive. For specifying user, you have to type -U and for database name -d.) It will ask for password and you know it: db_superuser

7

Note: there is another user you can use: beehive/beehive instead of db_superuser. 13. Now you are connected to the database, you can play with the datasets in the database by SQL queries. Type \dt to see the tables in the database. Type SQL queries to see the data (select * from tablename;). To get out of the database type /q. 14. Now, we are at the stage of working with the data. We will play with bank click data that is already present in the /home/aster/demo directory of the queen.

Unzip the bank_web_data.zip file using commands: cd /home/aster/demo/ unzip bank_web_data.zip

At this point, you have a file named bank_web_data.txt under /home/aster/demo directory. To see first 10 lines of the data, type: head -n 10 bank_web_data.txt It has four variables in the first row: customer_id, sessionid, page and datastamp. The dataset contains modified information on web clicks on a bank website (Account summary, FAQ, etc.). In addition to a unique ID of each customer, the web page accessed and timestamp, an important variable named sessionid is present. Session ID variable compiles successive clicks into a single session. A session is defined as “a sequence of clicks by a particular user where no more than n seconds pass between successive clicks.”

8

15. Now we will upload the bank dataset in the beehive database. For that, we have to specify the schema first and then upload records, like we do in SQL. Connect to database and run the query below. To connect to beehive database, use act as shown below. act -U db_superuser -d beehive SQL query to create a table named class_bank is shown below. It includes some unique aster attributes such as DISTRIBUTE BY, STORAGE ROW and COMPRESS to manage the tables. Refer Aster Analytics Foundation Guide for more information. http://www.info.teradata.com/eDownload.cfm?itemid=122580002 CREATE TABLE class_bank ( customer_id INTEGER NULL, session_id INTEGER NULL, page VARCHAR(100) NULL, datestamp TIMESTAMP WITHOUT TIME ZONE NULL) DISTRIBUTE BY HASH (customer_id) STORAGE ROW COMPRESS LOW;

The query above creates an empty table in the beehive database.

9

16. A table named class_bank is created but it does not contain any record (ckeck using select command). We have to load data (records) in this table now. For that, we have to use ncluster_loader.

First come out of database by typing \q Now, enter the following ncluster_loader command in the terminal. ncluster_loader -U db_superuser -w db_superuser -d beehive --skip-rows 1 class_bank bank_web_data.txt The arguments for the ncluster_loader command: -U db_superuser {database user id} -w db_superuser {database password} -d beehive {database where the table exists} --skip-rows 1 {skip the first row because it contains column headings}

The screenshot shows that approx. 1 million records are loaded. You can play using SQL queries by connecting to the database again. Important: You can always write queries in the terminal but it is not very user friendly. Why not use a client application that connects to the database and where you can run all your queries? We have TERADATA STUDIO for this. 17. Download Teradata Studio from https://downloads.teradata.com/download 18. Install it on your local windows machine. 19. Open Teradata Studio. Teradata studio user interface and various panels in it are shown in the screenshot below. 

On the leftmost panel under Data Source Explorer, a new database connection can be made. 10



Projects can be saved and accessed through the Project Explorer on the left window.



Queries are written on the wide window in the middle (make sure Query Development is selected on the right corner).



On the top, connection profile is selected. Here, the database, where tables are stored, is selected.



The results of the queries are produced in the Teradata Result Set Viewer, as shown in the screenshot below.

Select database where tables are stored

Access project here

Write queries here

Create database connections here Query output here

20. To connect to the beehive database, follow the steps on the screenshot below.

11

21. Under database connections, you can see Aster_class and under the public schema, you can see the dataset “class_bank” that we just created.

22. You can write queries in the TD studio to see the data. For example: select * from class_bank limit 10; SELECT COUNT(*) FROM class_bank;

23. Now we will run an interesting SQL-MR function: npath. Refer Aster Analytics Foundation Guide for more information about SQL-MR functions. Use Case: Provide the Banks Digital Marketing Department with the 40 most traveled web paths? In the bank web clicks dataset (class_bank), one user performs a sequence of activities within one session. For example, one accesses the Home page followed by the FAQ page and then the Account Summary page. In this use case, our objective is to find top 40 paths followed

12

by the customers. An SQL-MR function named “nPath” is present in Aster that takes the class_bank dataset as an input and generates all the paths. In the query below, nPath function is used to find 40 most traveled web paths.

Create Dimension table bank_npath as SELECT DISTINCT path, count(*) cnt FROM nPath ( ON class_bank PARTITION BY customer_id, session_id ORDER BY datestamp MODE(NONOVERLAPPING) PATTERN ('PAGE+') SYMBOLS (TRUE AS PAGE) RESULT (ACCUMULATE (page OF ANY (PAGE)) AS path))n GROUP BY 1 ORDER BY 2 DESC LIMIT 40;

Below is the screenshot showing how to run it. After the query runs successfully, you can run the following to see the results. Select * from bank_npath;

13

24. This was about SQL-MR. Now let us visualize this using AppCenter. AppCenter is a web browser based application. You can install it on queen or worker. Follow the steps to install AppCenter on queen. You can follow the same steps if installing on the worker. Important: Those with 4GB RAM machine, install it on worker; others can install it on queen. 25. First of all transfer the AppCenter bin file (installation file) to the queen machine using WinSCP (https://winscp.net/eng/download.php). Make sure you know the location of the file. In the screenshot, it is in the /home/aster directory.

26. Now go to Putty and login queen using root/aster. 27. We need to create directories: /tmp and /data. /tmp may already be there but you will have to create /data. Use the following commands. sudo mkdir /tmp sudo mkdir /data Then, type ls / to confirm whether the directories are successfully created or not.

14

28. Now you have to run the binary file. Type the following commands. cd /home/aster sudo chmod +x AppCenter-Installer-6.00.00.00.run sudo ./ AppCenter-Installer-6.00.00.00.run

This will first ask you to update from Asterlens to AppCenter; press Y. In the process, it will ask for AppCenter admin password. Type whatever password you want, but remember it. Follow the instructions unless you get the screen showing that the installation completed.

29. Now go to web browser and type 192.168.100.100:10 to access AppCenter. It will ask for username/ password. Username is admin and password is whatever you typed during installation. Note: Those who installed AppCenter in the worker, you have to type 192.168.100.150.

Note: To uninstall AppCenter, use the following command:

15

30. Now, we will visualize the npath table bank_npath (Use Case). Step 1: Click on Build an App. Step 2: Specify the info as in the screenshot.

Step 3: Click on logic and then Generate Visualization Code. Follow the screenshot. And save it.

16

Step 4: When you save it. It will ask for the database configuration. Click here.

Step 5: As there is no database configured, you have to configure it for the first time. Click on Create a new database connection.

Step 6: Follow the screenshot. Test the connection and save it.

17

Step 7: Then under database connections, select beehive. Save it.

Step 8: Click on the Bigdata app you just created and get ready to Run it by typing any name.

Step 9: See the visualization by clicking on the title.

18

The above path visualization shows the 40 most traveled web paths in the dataset. The width of the link represents the number of customers following a particular sequence. This analysis can help the website manager to improve the design of the website for better customer experience.

Conclusions This document explained the step-by-step process to get started with Teradata Aster, Teradata Studio and AppCenter. We presented a use case on a bank web clicks dataset. Similarly, one can upload a personal dataset and perform Big Data Analytics using Teradata Aster. The steps to be followed are: 

turn-on Aster Queen and Worker VMs;



activate the cluster using Aster Management Console (AMC):192.168.100.100;



move the dataset in queen using any FTP service (here we used WinSCP);



access database using act in Putty (aster command tool) and create a schema;



upload values into schema table using ncluster_loader;



use Putty or Teradata Studio to access the database;



run SQL-MR queries in TD Studio or Putty;



create visualizations using AppCenter.

19