IBM InfoSphere DataStage Interview Questions - Set Q

IBM InfoSphere DataStage Interview Questions

Set Q



Boost your career with IBM InfoSphere DataStage, a powerful ETL tool used for data integration, transformation, and data warehousing. Our platform offers a comprehensive collection of DataStage interview questions and exam preparation materials, covering everything from basic concepts to advanced topics. Whether you're a beginner or an experienced professional, explore real-world scenarios, practical questions, and expert-level insights to confidently prepare for interviews and certification exams.


DataStage Interview Questions



Question 01: What is Slowly Changing Dimension (SCD)?

Answer:
SCD is a concept in Data Warehousing used to manage changes in dimension data over time. It ensures historical data is preserved or updated based on business requirements.


Question 02: What are the types of SCD?

Answer:

  • Type 1 → Overwrite
  • Type 2 → Maintain history
  • Type 3 → Partial history

Question 03: What is SCD Type 1?

Answer:
In Type 1, old data is overwritten with new data. No history is maintained.


Question 04: What is SCD Type 2?

Answer:
Maintains full history by creating a new record for each change with:

  • Effective date
  • End date
  • Active flag

Question 05: What is SCD Type 3?

Answer:
Stores limited history by keeping old value in additional columns.


Question 06: How is SCD implemented in DataStage?

Answer:
Using:

  • Surrogate Key Generator
  • Lookup stage
  • Transformer logic
  • SCD stage (in some versions)

Question 07: What is Effective Date in SCD?

Answer:
The date when a record becomes active.


Question 08: What is Expiry Date?

Answer:
The date when a record becomes inactive.


Question 09: What is Current Flag?

Answer:
Indicates active record (Y/N).


Question 10: When to use SCD Type 2?

Answer:
When full historical tracking is required.



🔹 Change Data Capture (CDC)

Question 11: What is CDC?

Answer:
CDC captures only changed data (insert, update, delete) instead of full load.


Question 12: Why use CDC?

Answer:

  • Improves performance
  • Reduces data load
  • Enables incremental processing

Question 13: Types of CDC?

Answer:

  • Timestamp-based
  • Trigger-based
  • Log-based

Question 14: What is Timestamp-based CDC?

Answer:
Uses last updated timestamp to extract changed data.


Question 15: What is Log-based CDC?

Answer:
Reads database transaction logs.


Question 16: How is CDC implemented in DataStage?

Answer:

  • Using CDC stages
  • Using SQL queries
  • Using comparison logic

Question 17: What is Delta Load?

Answer:
Loading only changed data.


Question 18: Difference between Full Load and Incremental Load?

Answer:

Full LoadIncremental Load
All dataChanged data
SlowFast

Question 19: What is Soft Delete in CDC?

Answer:
Marking records as deleted instead of removing them.


Question 20: What are challenges in CDC?

Answer:

  • Handling deletes
  • Data consistency
  • Performance


🔹 Surrogate Keys

Question 21: What is a Surrogate Key?

Answer:
A system-generated unique identifier for dimension tables.


Question 22: Why use Surrogate Keys?

Answer:

  • Avoid dependency on business keys
  • Improve performance
  • Handle SCD

Question 23: What is Natural Key?

Answer:
A business-defined key (e.g., Employee ID).


Question 24: Difference between Surrogate and Natural Key?

Answer:

Surrogate KeyNatural Key
System-generatedBusiness-defined
NumericCan be string

Question 25: How to generate Surrogate Keys in DataStage?

Answer:
Using:

  • Surrogate Key Generator stage
  • Sequence logic

Question 26: What is Key Management?

Answer:
Handling uniqueness and sequence of keys.


Question 27: What is Gap in Surrogate Keys?

Answer:
Missing numbers in sequence due to job failures.


Question 28: How to avoid duplicate keys?

Answer:

  • Use sequence generator
  • Maintain metadata

Question 29: Can Surrogate Keys be reused?

Answer:
No, they should be unique and not reused.


Question 30: What is Composite Key?

Answer:
Combination of multiple columns as key.



🔹 Hash File Stage

Question 31: What is Hash File Stage?

Answer:
Used for fast lookup and storage using hashed indexing.


Question 32: Why use Hash File Stage?

Answer:

  • Fast access
  • Efficient lookup
  • Good for large data

Question 33: How does Hash File work?

Answer:
Uses hashing algorithm to store and retrieve records quickly.


Question 34: What are types of Hash Files?

Answer:

  • Static
  • Dynamic

Question 35: What is Dynamic Hash File?

Answer:
Automatically resizes based on data.


Question 36: What is Primary Key in Hash File?

Answer:
Key used for hashing and lookup.


Question 37: What is Overflow in Hash File?

Answer:
Occurs when bucket is full.


Question 38: How to improve Hash File performance?

Answer:

  • Proper key selection
  • Adequate sizing
  • Avoid overflow


🔹 Shared Containers

Question 39: What is Shared Container?

Answer:
Reusable job logic stored separately and used across multiple jobs.


Question 40: Why use Shared Containers?

Answer:

  • Reusability
  • Maintainability
  • Standardization

Question 41: How to create Shared Container?

Answer:

  • Create job
  • Convert to shared container

Question 42: Can Shared Containers have parameters?

Answer:
Yes, they can accept parameters.


Question 43: What is advantage of Shared Container?

Answer:
Centralized logic → Easy updates.


Question 44: What is disadvantage of Shared Container?

Answer:

  • Dependency issues
  • Impact on multiple jobs if changed


🔹 Local Containers

Question 45: What is Local Container?

Answer:
Reusable logic within a single job.


Question 46: Difference between Shared and Local Container?

Answer:

SharedLocal
Reusable across jobsWithin job only
Stored separatelyInside job

Question 47: When to use Local Container?

Answer:

  • Small reusable logic
  • Within same job

Question 48: Can Local Containers be converted to Shared?

Answer:
Yes.


Question 49: What is best practice for Containers?

Answer:

  • Use Shared for common logic
  • Use Local for small logic

Question 50: Real-world use of Containers?

Answer:

  • Standard transformations
  • Data cleansing logic
  • Reusable pipelines

Post a Comment