📚 Chapter 6: Data Types in Programming Languages
Source Information: This study material is compiled from a lecture audio transcript and a copy-pasted text document, both covering Chapter 6 topics on Data Types.
1. Introduction to Data Types 💡
In programming, a data type 📚 fundamentally defines a collection of data objects and a set of predefined operations that can be performed on those objects. It's a blueprint for how data is stored and manipulated.
- Descriptor: This refers to the collection of attributes for a variable. A data type descriptor is a structured memory representation that outlines a variable's or data structure's attributes, including its type, size, memory layout, and address.
- Object: An object represents an instance of a user-defined (or abstract) data type.
- Design Issue: A crucial consideration for all data types is determining which operations are defined for them and how these operations are precisely specified.
2. Primitive Data Types ✅
Primitive data types are foundational and are not defined in terms of other data types. Some are direct reflections of hardware capabilities, while others require minimal non-hardware support for their implementation.
2.1. Integer Types
- Almost always an exact reflection of hardware, making their mapping trivial.
- Languages may offer various integer sizes.
- Example: Java's signed integer types include
byte,short,int, andlong.
2.2. Floating-Point Types
- Model real numbers, but only as approximations.
- Languages for scientific use typically support at least two types, such as
floatanddouble. - Often align with hardware implementations.
- Standard: IEEE Floating-Point Standard 754 is widely adopted.
2.3. Complex Types
- Some languages (e.g., C99, Fortran, Python) support a complex type.
- Each value consists of two floats: a real part and an imaginary part.
- Example (Python):
(7 + 3j), where7is the real part and3is the imaginary part.
2.4. Decimal Types
- Essential for business applications, especially for handling money.
- Examples: COBOL, C# (offers a
decimaldata type). - Store a fixed number of decimal digits in coded form (BCD).
- Advantage: High accuracy.
- Disadvantages: Limited range, can waste memory.
2.5. Boolean Types
- The simplest primitive type.
- Range of values: Two elements,
trueandfalse. - Can be implemented as bits but are often stored as bytes for efficiency.
- Advantage: Enhances code readability.
2.6. Character Types
- Stored as numeric codings.
- Common Coding: ASCII (7-bit).
- Alternative (16-bit): Unicode (UCS-2), includes characters from most natural languages. Originally used in Java, also supported by C# and JavaScript.
- Extended (32-bit): UCS-4 Unicode, supported by Fortran (since 2003).
3. Character String Types 📝
Character string types represent sequences of characters.
3.1. Design Issues
- Is it a primitive type or a special kind of array?
- Should the length of strings be static or dynamic?
3.2. Typical Operations
- Assignment and copying
- Comparison (
=,>, etc.) - Concatenation
- Substring reference
- Pattern matching
3.3. Character String Length Options
String length can be managed with either static or dynamic options, primarily differing in when the length is determined and whether it can change.
-
Static String Length:
- Definition: Length is fixed at compile time and cannot change. Memory is allocated for a predetermined maximum size.
- Characteristics: Memory efficient (fixed), potentially better performance due to no runtime allocation/deallocation overhead.
- Example:
chararrays in C/C++.
-
Dynamic String Length: Allows strings to change size during runtime, offering flexibility with some overhead.
- Limited Dynamic Length:
- Definition: Strings can have varying lengths up to a fixed maximum size. Actual content length is often indicated by a special terminating character (e.g., null character
\0in C-style strings). - Characteristics: A balance between static allocation and dynamic content. Requires manual management in some languages (C) to avoid buffer overflows.
- Example: C-style strings (null-terminated
char*arrays) in C and C++.
- Definition: Strings can have varying lengths up to a fixed maximum size. Actual content length is often indicated by a special terminating character (e.g., null character
- Arbitrary Dynamic Length:
- Definition: Strings can grow and shrink as needed at runtime with no fixed maximum length (limited only by available memory). Requires dynamic storage allocation and deallocation.
- Characteristics: Maximum flexibility but incurs overhead of memory management operations.
- Example:
std::stringin C++,Stringin Java, and strings in Python, JavaScript.
- Limited Dynamic Length:
3.4. Character String Type Evaluation (Core Concepts)
- Definition: A character string is a sequence of code units (characters); its length is determined by the number of code units.
- Types: Can be fixed-length or variable-length. Variable-length strings are common in modern programming, often stored as objects with built-in manipulation methods.
- Characters vs. Strings: A single character (
char) is a primitive type, distinct from a string (a sequence or array of characters). - Encoding: Computers represent characters as numerical values based on standards like ASCII or Unicode.
3.5. Character String Implementation
Strings are typically implemented as an array of characters in memory, with details varying by language concerning memory management, mutability, and encoding.
- Null-terminated arrays: In C/C++, strings are
chararrays terminated by\0. Space-efficient but requires manual memory management. - Object-oriented types: Modern languages (C++, Java, Python, Swift) provide
Stringclasses that abstract the implementation. These often include:- Automatic memory management.
- Explicit length storage (e.g., an integer) for constant-time length retrieval.
- Rich APIs for operations like concatenation, comparison, searching.
- Immutability: In languages like Java and Python, strings are often immutable. Operations that appear to modify a string actually create a new string object. This simplifies concurrent programming and allows optimizations.
- Variable vs. Fixed Length Encoding: Modern strings handle large character sets (Unicode). Implementations vary in how they store these characters internally.
3.6. Compile- and Run-Time Descriptors
- Compile-time descriptor: For static strings.
- Run-time descriptor: For limited dynamic strings.
4. User-Defined Ordinal Types: Enumeration Types 📊
User-defined ordinal types allow programmers to create their own named, ordered sets of countable values.
4.1. Enumeration Types
- An enumerated type explicitly lists all possible values as symbolic constants.
- Values are typically assigned an ordinal value starting from zero by default (can sometimes be overridden).
- Characteristics: Values are ordered and can be compared. Functions can often determine the ordinal value (
Ord), successor (Succ), or predecessor (Pred) of an element.
4.2. Evaluation of Enumerated Types
- Improved Readability and Self-Documentation: Use meaningful names (e.g.,
MON,TUE) instead of raw integers, making code more understandable. - Enhanced Type Safety: Restrict variables to a specific set of predefined values. Compilers can enforce strong type-checking, preventing invalid assignments.
5. Array Types 📈
An array is an aggregate of data elements where an individual element is identified by its position relative to the first element. Arrays are categorized by dimensionality and memory management.
5.1. Array Design Issues
- What types are legal for subscripts (indices)? (e.g., integer types, enums in C/C++)
- Are subscript expressions range-checked? (Safe languages like Java/C#/Python do; C/C++ typically do not).
- When are subscript ranges bound?
- When does allocation take place?
- Are ragged (rows with varying elements) or rectangular (all rows/columns same size) multidimensional arrays allowed?
- What is the maximum number of subscripts?
- Can array objects be initialized?
- Are any kind of slices supported?
5.2. Array Indexing (Subscripting)
Indexing is a mapping from indices to elements: array_name(index_value_list) -> an element.
- Index Syntax: Fortran and Ada use parentheses; most other languages use brackets.
5.2.1. By Indexing Origin (Start Value)
- Zero-based Indexing: First element at index 0 (e.g., C, C++, Java, Python, C#).
- One-based Indexing: First element at index 1 (e.g., Fortran, MATLAB, Julia).
- n-based (Arbitrary) Indexing: User-defined starting index (e.g., Pascal, Ada).
5.2.2. By Subscript Expression Form
- Scalar Subscript: A single integer value to access one element (e.g.,
arr[5]). - Array Subscript (Advanced Indexing): Using another array as an index to select multiple elements.
- Subscript Range (Slicing): A range of indices to select a subset (e.g.,
start:end). - Asterisk/Colon Subscript: Denotes all elements within a specific dimension (e.g.,
arr[*, 0]).
5.2.3. By Data Type (in C/C++)
- Integer Subscripts: Most common.
- Enumeration Types: Enums can index arrays, using their underlying integer value.
- Char Types: Characters can be used as subscripts, converted to their ASCII value.
5.2.4. Special Subscript Types
- Negative Subscripts: In Python, index from the end of the array backward (
-1is the last element). - Boolean Subscripts (Masking): Using a boolean array to select elements where the corresponding value is true (common in NumPy/MATLAB).
5.3. Array Categories by Binding and Allocation 1️⃣2️⃣3️⃣4️⃣5️⃣
Arrays are categorized based on when their subscript ranges are bound and when their memory is allocated:
-
Static Array:
- Binding: Subscript ranges and memory allocation bound before runtime (compile time).
- Pros/Cons: Highly efficient (no runtime overhead); size is fixed for entire program execution.
- Example:
static int arr[10]in C.
-
Fixed Stack-Dynamic Array:
- Binding: Subscript ranges statically bound; memory allocated from the stack when declaration is reached during execution.
- Pros/Cons: Space-efficient (subprograms reuse stack space); size fixed once allocated.
- Example: Standard local arrays in C or C++ (without
static).
-
Stack-Dynamic Array:
- Binding: Both subscript ranges and memory allocation dynamically bound at elaboration time (when code is reached). Size remains fixed for variable's lifetime.
- Pros/Cons: Flexible (size not needed until use); faster than heap allocation.
- Example: Variable-length arrays (VLAs) in Ada or C99.
-
Fixed Heap-Dynamic Array:
- Binding: Subscript ranges and storage bound when requested by program at runtime from the heap. Size remains fixed once allocated.
- Pros/Cons: Size determined at runtime; heap allocation is slower than stack.
- Example: C++ arrays created with
new int[size]or Java'sArrayList.
-
Heap-Dynamic Array:
- Binding: Subscript ranges and storage can change any number of times during the array's lifetime.
- Pros/Cons: Maximum flexibility (arrays can grow/shrink); highest runtime overhead.
- Example: Arrays in Python, JavaScript, and Perl.
5.4. Array Initialization
Some languages allow initialization at the time of storage allocation.
- C, C++, Java, C#:
int list[] = {4, 5, 7, 83}; - C/C++ Character Strings:
char name[] = "freddie"; - C/C++ Arrays of Strings:
char *names[] = {"Bob", "Jake", "Joe"}; - Java String Objects:
String[] names = {"Bob", "Jake", "Joe"}; - Python (List Comprehensions):
list = [x ** 2 for x in range(12) if x % 3 == 0]results in[0, 9, 36, 81].
5.5. Heterogeneous Arrays
- Arrays where elements need not be of the same type.
- Supported by: Perl, Python, JavaScript, Ruby.
- Benefits: Convenience (grouping related items), Polymorphism (calling methods on common base class elements).
- Considerations: Performance (mixing types can affect efficiency), Type Safety (may require checking element types upon retrieval).
5.6. Array Operations
Fundamental actions to manage and manipulate data in arrays.
5.6.1. Basic Data Structure Operations
- Traversal: Visiting every element.
- Insertion: Adding an element at a specific index (may require shifting).
- Deletion: Removing an element (may require shifting).
- Search: Finding the index of a specific value (Linear Search, Binary Search).
- Update: Modifying an element's value at a given index.
- Access/Lookup: Retrieving a value directly using its index (O(1) efficiency).
5.6.2. Advanced & Numerical Operations
- Sorting: Arranging elements in a specific order (e.g., Quick Sort, Merge Sort).
- Mathematical Operations: Element-by-element arithmetic between arrays.
- Reductions: Aggregating elements into a single value (Sum, Product, Min, Max).
- Broadcasting: Performing operations between arrays of different shapes by "stretching" the smaller array (common in NumPy).
- Concatenation & Splitting: Joining arrays or breaking one array into multiple parts.
5.7. Rectangular and Jagged Arrays
- Rectangular Array: A multi-dimensioned array where all rows have the same number of elements, and all columns have the same number of elements.
- Jagged Array: A multi-dimensioned array where rows can have varying numbers of elements (appears as arrays of arrays).
- Support: C, C++, Java support jagged arrays. F# and C# support both rectangular and jagged arrays.
5.8. Slices
- A slice is a substructure of an array; it's a referencing mechanism.
- Useful in languages with array operations.
- Example (Python):
vector = [2, 4, 6, 8, 10, 12, 14, 16]vector[3:6]is a three-element array[8, 10, 12].mat = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]mat[0][0:2]is[1, 2].
- Example (Ruby):
list.slice(2, 2)returns the third and fourth elements oflist.
5.9. Implementation of Arrays
- An access function maps subscript expressions to an address in the array.
- Single-dimensioned arrays:
address(list[k]) = address(list[lower_bound]) + ((k - lower_bound) * element_size) - Multi-dimensioned arrays:
- Row major order: Elements stored row by row (most languages).
- Column major order: Elements stored column by column (Fortran).
- General formula for
a[i,j](row major):Location(a[i,j]) = address of a[row_lb,col_lb] + (((i - row_lb) * n) + (j - col_lb)) * element_size(wherenis the number of columns).
6. Associative Arrays 🔑
An associative array is an unordered collection of data elements indexed by an equal number of values called keys. User-defined keys must be stored.
6.1. Design Issues
- What is the form of references to elements?
- Is the size static or dynamic?
6.2. Language Support
- Built-in type in Perl, Python, Ruby, and Lua.
6.3. Example (Perl)
- Names begin with
%; literals are delimited by parentheses. %hi_temps = ("Mon" => 77, "Tue" => 79, "Wed" => 65);- Subscripting uses braces and keys:
$hi_temps{"Wed"} = 83; - Elements can be removed with
delete:delete $hi_temps{"Tue"};








