Sunday, July 29, 2012

Publishing Planning Application modifications using ODI (Just an idea)

Planning dimensions are modified every time during the development phase, and when multiple developers are working on the same Planning cube, nobody knows who is adding or deleting what. So one fine day you open up the Financial Report you had created just to find out that it doesnt work because some member that existed before is not available now.
You can use HSP_AUDIT_RECORDS table of your planning application to find out who-did-it !
Setting Audit trails in Hyperion Planning is an age old method of doing this. Read on here  http://docs.oracle.com/cd/E17236_01/epm.1112/hp_admin_11122/frameset.htm?launch.html
(search HSP_AUDIT_RECORDS)

However, whats difficult is presenting this information to users who are not given details of underlying database schema. So ODI can be used beyond this point, where you can print the records to a flat file, journalize the table for tracking additions and firing mails containing this information the moment someone modifies the Planning app.

That was one of the many things you can do with ODI and Planning tables. I will be posting more such things as I get time.

Monday, July 23, 2012

Weblogic and HTTP servers for beginners

One thing that has always been a source of confusion for all beginners is the Weblogic server. I will be coming to that later, before that I will tell you the purpose of this post. Its about the new feature of ODI 11g which can use Weblogic Server for running ODI agent. A comprehensive doc for setting up this is available here
http://www.oracle.com/webfolder/technetwork/tutorials/obe/fmw/odi/odi_11g/setup_jee_agent/setup_jee_agent.htm
I will post my experiences of setting up the ODI agent later since I am still not through it.

So what is Application Server and HTTP server?
For beginners, suppose you want to buy a camera on ebay, you open the website's page on your machine (client), from where does the information that you see - about your desirable camera on your screen - come from ?
How does it calculate the monthly installments for you ? Does it use your machine's calculator app for doing that ? Definitely not.

There is a application server ( usually the big machine residing far away from your home) which accepts your (client) requests and then processes the desired results and sends it over to your machine. To standardize/streamline/secure this communication, there is a middle-man who is called the http server (HS). So your requests will be given to a http server, http server will forward it to the appropriate application server (AS) and AS after processing the request will respond to the HS with a 'reponse', this response will appear on your (client) machine as the end result, in our case the image of your desired camera and the calculated installment !

Though this is a very simple example I have given, over the period of time, the roles and responsibilities of Application Servers and HTTP servers have increased, but the core functionality remains same. For more details, you can refer to the web :)

Now tell me more about Weblogic ...
Its basically an application which runs usually on big machines which will do the server-side processes. So Weblogic Server is an Application server. What is my application ?? It can be anything that you want to host on this machine. A simple webpage or a complex site like ebay or facebook.
In our case the Application is going to be Oracle Data Integrator's component called 'Agent'.

Weblogic was originally called BEA Weblogic but when Oracle acquired BEA it became Oracle Weblogic.
Weblogic's direct competitors are Apache and Microsoft IIS. Weblogic is lightweight compared to Apache meaning it wont eat much RAM or disk space of the server machine.
Oracle Weblogic server is usually called wls or ows in AS jargon.

And what is Oracle HTTP server ?
Like I said above, it is the middle-man which will direct client requests to the Weblogic AS and respond to clients with response from WLS. It is lovingly called OHS.

Things get real messy when you dig deeper into WLS and OHS, but once you understand the way it works, its quite easy to conceptualize things.
To really get a hang of these two things, you should try installing and configuring Oracle EPM system all by yourself, face the difficulties starting and stopping services, read the log files and Oracle forums. Some things are better learnt the hard way :)

How do I install Agent using Weblogic ?
Coming back to the purpose of this post, the 'newest' and the most irritating way of installing an agent is through Weblogic. Its also called the Java EE agent. For installing this agent, you need to install all the features that come with the setup (check Java EE ) and make sure that Weblogic server is installed on the same machine. The path where you will be installing ODI must incorporate Weblogic installation path.
Find out how it is done here
http://www.oracle.com/webfolder/technetwork/tutorials/obe/fmw/odi/odi_11g/setup_jee_agent/setup_jee_agent.htm
and if you are lucky, you might get the agent installed in the first shot. So best of luck !!!

Friday, July 20, 2012

How to add a new JDBC driver in ODI 11g

What is JDBC driver anyway ?
When you want to make connections between any two machines where one is acting as a database and other may or may not be a database but just a simple computer, its done by specifying connection URLs. A Driver is just code that will enable connections with different databases.
The language in which application's code is written will decide which database connection driver you will be using. Read more on JDBC drivers here http://en.wikipedia.org/wiki/JDBC_driver
Now, coming to the JDBC drivers, look at the example of a driver below
com.microsoft.sqlserver.jdbc.SQLServerDriver

The orange part tells us which technology you will be using for the connectivity, here it is the Microsoft SQL server. It means there is some code at the db side which will help bridge the connection with application.
The yellow part tells us what coding language is used to establish the connection at the application side, jdbc here means java database connectivity. The next part SQLServerDriver tells us the database with the exact driver name.

Another example of SQL driver is
weblogic.jdbc.sqlserver.SQLServerDriver
Here it states that the connectivity to MS SQL with application (ODI which uses Java) will be established through Weblogic Application Server (Sometimes it is called the middle-layer)
[ It might sound geeky for beginners, so dont ponder much on this]

Since ODI is coded in Java, it will use JDBC to connect with other databases. These driver files (codes) are stored in a particular location where ODI will look up.
So tomorrow if you come up with your own database and want to connect it to ODI, where would you put the driver ?

The location is
C:\Users\username\AppData\Roaming\odi\oracledi\userlib

And one more thing, how does odi know that there are some additional drivers (code/jars) you need to refer to ?
In the same path above, there is a file named additional_path.txt,  you specify the full path and the driver file name, see the contents I put after adding a sqljdbc4.jar driver file in that folder above.
Though sometimes ODI is clever enough to get the drivers without updating the above file.

; Additional paths file

; You can add here paths to additional libraries

; Examples:

; C:\ java\libs\myjar.jar

; C:\ java\libs\myzip.zip

; C:\java\libs\*.jar will add all jars contained in the C:\java\libs\ directory

; C:\java\libs\**\*.jar will add all jars contained in the C:\java\libs\ directory and subdirectories


C:\Users\username\AppData\Roaming\odi\oracledi\userlib\*.jar


And dont forget to restart ODI (Close ODI studio and open again) if you add your driver.
Note : For Experts
The location just doesnt specify putting Driver files, you can also put some of your own java classes if you want at this location. You can put your groovy scripts and external jars that might be used if you are going to customize anything in ODI.

Monday, July 16, 2012

ETL and E-LT ? Please Help !!

If you are going to face an interview for a job related to Data Integration, get  ready for this question. People are going to ask this question even if they themselves don't know the answers. :D
Data integration primarily involves pulling data, doing some cleanup activities and then putting it somewhere.
In DI jargon the three steps mentioned above are Extraction, Transformation and Loading. So thats ETL for you. You can read a lot about this on Wikipedia.

The question to be asked is : Where you do all the clean-up activities or where does this 'Transformation' take place ?
Once you have pulled the data from source, there has to be a temporary area to stage the information right ?
As an example,You can't change the format of the date column in your source directly because then you wont be left with the original data. It always makes sense to do all the transformations on a 'copy' of source and fire all the sql queries on that 'copy' before we put it in the target.
Now, where is this 'copy' or temporary 'staging' is created defines the approach : ETL or E-LT

The Neanderthal approach to this staging area where most DI tools didnt work well was to have their own proprietary Staging Area which would have huge storing capacity, high processing speed and could handle millions of rows of data efficiently. But the catch was, you would have to learn the syntax (query) of pulling data from  Source and putting it in the staging, learn the syntax of transforming the data in staging and then learn the syntax of putting the transformed data into your target. And with so many database technologies emerging with minor tweaks in the syntax, it became difficult to learn all things together. Besides that, these proprietary Staging areas were costly and didnt quite live up to the expectations as far as Bulk Loading capabilities were concerned.

So the geeks at Sunopsis thought 'Lets get rid of this staging area and use something else that is cost effective and efficient'. Then they thought, 'What if we use the source database/schema or target database/schema itself for transformation ?' example, I have a table called EMPLOYEE and then I create a temporary table called T$EMPLOYEE in the same schema on which I will do the transformation and load the data into target database, I can save millions of dollars by not buying a proprietary Staging area.
So, using this idea, they created this new data integrator called Sunopsis data Integrator and marketed it with the word 'E-LT' claiming that it didn't use a proprietary staging schema (I will come to that later). Sunopsis DI allowed the users to select the staging area whether it should be Source db or Target db. Another major factor that seperated this tool from others was its declarative approach using graphical UI (More on that later). This was a major hit and Oracle saw this as an opportunity of growth. Result, Sunopsis was bought by Oracle and soon it became Oracle Data Integrator.

Can the Staging Area be completely eliminated ?
The answer to this is No, though Oracle/Sunopsis Data Integrator claims to use E-LT approach there are some bitter times when you have to use a separate Staging area when both your source as well as target dont have the ability to perform SQL activities. Example: If you want to read data from a flat (text) file and put it in a text file, you will have to use a staging schema which can do sql. Sunopsis Memory engine or the new In-Memory engine is their own staging technology and can be used in such cases, along with other technologies. This basically is your JVM in action and it is known that this staging area is not able to handle large volumes of data. Personally this is the last staging area technology I would use for my integration projects.

Wednesday, July 11, 2012

I need a raise - Upgrading from ODI version 10 to 11g

This is one question almost every one who have worked on version 10 have an answer until they try doing it and come to know its not working.
Migrating ODI from version 10 to 11 is not as simple as zipping the repositories using 'Export' command and importing it in a newer repository, Even exporting individual objects in XML format is not going to work.

Basically you need to modify the tables in the underlying schema of Master and Work repository so that they are version 11 'compliant' and how do you do that ??
ua.bat or upgrade assistant will help you in doing this. To know how it works, odiexperts have put it nicely
Here is the link
http://odiexperts.com/11g-oracle-data-integrator-upgrading-from-10g-part-411g/
And you will find  ua.bat file here
C:\oracle\product\11.1.1\Oracle_ODI_1\bin (ODI_HOME/bin)
One point to note here is that the agents that were created in version 10 might not work right after you upgrade the repositories and import them, there are some things that you will need to do differently.
Check the other posts related to ODI agent to find what can be done.
I would say that this post is rather obsolete considering nobody uses ODI 10 now, For those who start with version 11g might want to ignore this topic.

Best of luck

Welcoming the new Agent with OPMN

Having seen how to create agent service on windows it must be clear that why Oracle has done away with this idea of Windows service using wrapper classes, the most obvious reason is to coax developers to use OPMN which stands for Oracle Process Management and Notification. You can find a lot of documentation  on this, one link is here http://docs.oracle.com/cd/B13597_05/core.904/b12057/intro.htm.
But to put it in layman's language, it is the Oracle's very own task/process manager just like windows Services (services.msc).
The very first advantage of this technique is that it eliminates the need to have expertise in creating services on different kinds of OS (Windows, Linux etc) like the one we saw in the previous post.
Besides that, OPMN is good and reliable.

To create an agent service using OPMN, you can refer to Oracle's very own yet so incomplete documentation here
http://www.oracle.com/webfolder/technetwork/tutorials/obe/fmw/odi/odi_11g/odi_opmn/odi_opmn.htm
or you can refer to John Goodwin's blog where he has put it in a very fine way
http://john-goodwin.blogspot.in/2010/12/managing-odi-11g-standalone-agents.html

What can go wrong while configuring ?
If you follow the steps mentioned in John's blog as they are mentioned, everything is bound to work properly. I faced some issues starting it in the first shot, I am listing a few ...

- Please dont miss changing \ to / in agentcreate.properties file, I bet most of us will miss it for the first time

- After getting those java errors while adding the agent, I searched if the environment variables are getting set when running the script odi_opmn_addagent.bat, but for some reason they weren't getting updated so I had to add them manually using Windows->Control Panel->System and Security->System->Advanced system settings->Environment variables. However, I feel this problem is very much specific to my machine.

- The agent service will install, start and will show status as alive when using 'opmnctl status' command, but when you test it in Studio, the test fails. To debug or to find the status of the agent, you need to check the log files in the location
C:\oracle\Middleware\user_projects\epmsystem1\diagnostics\logs\OPMN\opmn and my log file's name was
console~ODIAGENT1~odiagent~1.log.

If you are trying it out for the first time, it is advisable to have only one Work Repository associated to a Master since I had three repositories (WORK, WORKREP99 and WORK2 ) and it added to the confusion of specifying its name everywhere.

I got this error when checking the logs (test failed but opmn said the service is alive !!)

SEVERE: Agent ODIAGENT1 encountered an error: ODI-1405: Agent ODIAGENT1 start failure: the agent is not defined in the topology for master repository.
ODI-1405: Agent ODIAGENT1 start failure: the agent is not defined in the topology for master repository.
I ran the odiparams.bat again and tried my luck, this time it started but when I checked the log it had connected to another work repository named WORKREP99 whereas I had specified WORK in odiparams.bat and all other files

INFO: Scheduler started for work repository WORKREP99 on Agent ODIAGENT1.
Obviously logging into ODI using this WORKREP99 worked and the test was successful,
I will be digging deeper to find out whats going wrong and why is it taking another work repository.





Tuesday, July 10, 2012

Bringing back the good old Agent in ODI 11g

Most of them who have worked with ODI 10.x will know how good the agent was. Creating, starting, stopping and converting it into service was very easy in older versions.
With ODI using Weblogic to create agent services in the new version, things have become tricky.

If you have skipped master repository creation step while installing, the agent name that you give while installing ODI never shows up automatically in the 'Physical Architecture' tab in Topology, though you can see a batch file created with name 'agent_AGENT_NAME.bat' in the 'C:\oracle\product\11.1.1\Oracle_ODI_1\oracledi\agent\bin' folder and running this batch file directly will give you nothing but an error. You have to modify and run the odiparams.bat file before running the agent file.
Modifying the odiparams.bat file : File is located in the path mentioned above. Search for the below lines


REM #
REM # Repository Connection Information
REM #
set ODI_MASTER_DRIVER=oracle.jdbc.OracleDriver
set ODI_MASTER_URL=jdbc:oracle:thin:@localhost:1521:XE
set ODI_MASTER_USER=MASTER
set ODI_MASTER_ENCODED_PASS=a7yXZz0C1gPf2GxKzUNPxEJMp


REM #
REM # User credentials for agent startup program
REM #
set ODI_SUPERVISOR=SUPERVISOR
set ODI_SUPERVISOR_ENCODED_PASS=a7yabVOzj7C5oY503v.Ei6xKy
REM #
REM # User credentials for ODI tools
REM #
set ODI_USER=%ODI_SUPERVISOR%
set ODI_ENCODED_PASS=%ODI_SUPERVISOR_ENCODED_PASS%


REM #
REM # Work Repository Name
REM #
set ODI_SECU_WORK_REP=WORK
The text highlighted in yellow needs to be changed as per your Master Rep and Work rep details (just like the good old days)

This time you wont find the 'agent encode' command working, use the 'encode.bat' file to encode the passwords eg. encode The_password
Add the agent using Topology manager and relate it to a context.
This will start the agent successfully. But the moment you close the command prompt window or try to exit, it will terminate the batch script and your agent will fail.

To create a service of ODI agent in the newer versions refer to the following link

https://blogs.oracle.com/dataintegration/entry/running_odi_11gr1_standalone_a

I tried this on my laptop and it worked till the time I didn't shut down my machine. Since then the service has not started once, though I feel redoing the steps mentioned above will solve the problem. I wont be doing that since its not necessary when we can use OPMN to create agent services. Read the next blog post to find out ...



Essbase Dataload using ODI - will my dataload ever finish ??

In older versions of ODI (10.x) while loading data into Essbase, you might find that it takes eternity to load data however big or small it might be in terms of volume. This issue as Oracle claims has been resolved in version 11 but I am not sure and will give it a try when I get an opportunity.

Row by Row insertion : ODI's older version uses an outdated KM for loading data into Essbase, It will load data in bulk from staging area into target until it encounters an erroneous record (You are trying to insert a record which doesnt have a proper member in Essbase associated or data is not numeric). After that it will take each row, validate it and insert in the cube. If you have selected to ignore errors in records while loading  data, you might want to wait till doomsday to finish the dataload.
The best way out is to not ignore erroneous records and cancel the dataload on first error record itself, moreover, it doesn't make sense to ignore records with errors.

Though it remains to be validated, I personally feel that dataload speeds in ODI version 11 have improved as compared to version 10. The reason was that we had migrated from  ODI 10 to 11 while keeping the source, staging and target untouched.

Why is it necessary to start all processes as Administrator ?

If you have tried starting EPM services by double-clicking on the batch scripts in Windows (start.bat, startEssbase.bat etc), you will probably get no error and the black screen will disappear as if nothing happened.
Running the script using a command prompt will give you the same result, it will come out the batch job peacefully, but when you actually check the services if they have started you will be surprised to see that nothing has worked as planned.
The best approach is start the services as 'Run as administrator' or run the cmd prompt in administrator mode.
The installation needs to be done by an administrator or 'Run as administrator' to get the best results.

What about Linux ??
Oracle says that you need to create a new user and it is not recommended to perform installation or start any services by super user. So never sudo...

I have come across this issue once when the services were being started by a super user, but the user who installed it was different, they failed to start.
Going through the logs with help of Oracle revealed that some file attributes (Ownership) are changed on the fly every time services are stopped/started, so this will result in some files not being opened by the service trying to open it.
Solution : If the services were started by some other user and now you need to restore the file access, change the ownership using 'chown' command (recursively) of the 'Oracle' folder


Friday, July 6, 2012

Data Integration ?

Data Integration, simply put, is a way of pulling data from somewhere and putting it somewhere else. I must warn you that its easier said than done.

The most common uses of Data Integration are
1. Building a Central Data Repository : Suppose you have lots of databases spread across various geographies and across different platforms and the business needs to have a central database, which will contain all the data of all geographies in a uniform way.
2. Populating Databases used by Decision Support systems from Transaction Systems : You need to build a data warehouse which will contain information brought in from Transaction systems and then it will be used by BI tools for reporting and analysis
3. Metadata management : Updating Columns, adding new members in a multidimensional database etc can also come under data integration, although there are some disagreements on this.

There are many other cases where data integration tools can be used, the ones posted above are those which you will face most of the times.

Practical Issues of Data Integration
You can find various links regarding this on the net, I will be putting them here again in my words.

Format of Data : The date column in one table might be different from the date column in other. So while integrating both tables (Union) we need to make sure that the date format is uniform in target database.
formatting is perhaps the most common issue faced in integration. It can be as simple as date format or as complex as using particular rules to be written in SQL.

 Mapping : A source data value say INDIA needs to be put as REGION10 in target, Source value US = REGION4 in target, AUSTRALIA=REGION3 and like wise. This will involve using look up tables which will map the source value to a particular target value.

Data Integration time, size and method : The time taken for integration, method of loads (row by row or bulk loads) and the size of data to be loaded will decide the effectiveness of your integration tool.

Scheduling : Data Integration process need to be scheduled properly so that there are no inconsistencies. eg. If you start a data integration process while the source transaction systems are still being updated, the target may or may not have the correct data.

Consistency : The data needs to be consistent across source and target at all times. Changed data capture is a very important aspect in the field of Data Integration.