Data is an asset, with an increase in the volume of data to be collected through a variety of channels, we need a segmented approach to provide you the real-time view of business performance to make intelligent business decisions. Data integration is a strategic function of advanced analytic processes to transform data between operational, transactional, and analytical target systems creating multi-dimensional views that aligns with business objectives.
We at PRONIX provides a graphical framework to design and run the jobs that transform and cleanse your data (ETL tool).
Main components are:
a. Administrator for administration tasks like setting up data stage users, setting up purging criteria and creating & moving projects.
b. The manager is the main interface of the Repository of DataStage, used for the storage and management of reusable Metadata. Through the DataStage manager, one can view and edit the contents of the Repository.
c. Designer to design interface for creating DataStage applications that specify the data source, required transformation, and destination of data.
d. Director used to validate, schedule, execute and monitor DataStage server jobs and parallel jobs.
DataStage is divided into Shared Components and Runtime Architecture:
1. Shared Services: Shared service activities are
a. Unified parallel hardware platforms multi-processor processing capabilities that handle data processing needs as diverse as performing analysis of large databases for IBM.
b. The infosphere Information server is built on unified metadata infrastructure, which enables sharing to understand between technical and business domains.
i. Dynamic metadata includes design time.
ii. Operational metadata includes performance monitoring, audit log, profiling sample data.
iii. Common metadata repository provides persistent storage, to navigate, query, update metadata to reduce development time.
c. Common connectivity for data browsing and sampling, runtime dynamic metadata access, error handling, high-performance runtime data access which enables to solve large-scale business problems.
d. Common services are built on a set of shared services that centralize core tasks, administrative tasks such as security, user administration, logging, reporting and services like design, execution, metadata.
e. The unified user interface is a graphical interface and tool framework of IBM Infosphere Information Server console, IBM Infosphere Information Server Web console which provides client interfaces for highly detailed development work and thin clients that run-in web browsers for administration, visual controls, and user experience.
Runtime Architecture: Runtime architecture activities are
a. Orchestrate Shell Script(OSH) gives the execution flow for extracting, cleansing, transforming, integrating, and loading data into target files IBM Infosphere DataStage using the Information Server engine.
b. Application programming interfaces (APIs) support a variety of interface styles that include standard request-reply, service-oriented, event-driven, and scheduled task invocation.
DataStage includes components and stages that enable integration between InfoSphere Information Server and Apache Hadoop. Oozie Workflow activity stage enables integration between Oozie which is managed by Hadoop jobs and Infosphere DataStage.