Overview of Oracle architecture and performance tuning. Part 1.

Short overview about most popular topics as "task-answer" format.

CHAPTER 1. "Oracle DB architecture"
CHAPTER 2. "Tuning. Basic concepts of tuning"
CHAPTER 3. "Tuning. Statistics. Execution plan."
CHAPTER 4. "Table and index scan"
CHAPTER 5. "Join methods and types"

CHAPTER 1. "Oracle DB architecture"
Task 1.1. Architecture. Part 1.
    1. What is a user process? How many user processes can be?
    2. What is a server process? How many server processes can be?
    3. What memory area does the server process work with? What about this area and
    user?
    4. Where are stores execution plans? Explain.
    5. How can we name the memory areas from points 3 and 4? Why are these areas
    used? What kind of data is stored in these areas?
Solution:
    --1
    User process - is a process creates by Oracle to run user`s application, send
    SQL commands to the database and get data back. Can be multiple user processes.
    --2
    Server process - after user process send commands to the database server, the
    server takes the control and pushes the statement to the Program Global Area
    (PGA). Can be multiple server processes.
    --3
    PGA (Program Global Area or Process Global Area or Private Global Area) - memory
    area that is dedicated to a single user. Every user has and can read only its own
    PGA. PGA store specific SQL (for your session) and other data to be executed.
    Can include multiple Private SQL areas.
    --4
    In 'shared SQL area' from SGA (System Global Area or Shared Global Area). SGA is
    shared by all of the users. Oracle check same SQL statement and instead create
    same execution plan takes existing execution plan from 'shared SQL area'.
    --5
    SGA and PGA - memory caches. Uses for performance because reading from memory is
    faster than from disks. In PGA stores private or session-specific data, in SGA -
    common or shared data.

Task 1.2. Architecture. Part 2.
    1. What is include SHARED POOL? Explain these elements.
    2. What is a BUFFER CACHE? Explain it.
    3. What is a REDO LOG BUFFER? Explain it.
    4. What is JAVA POOL and STREAMS POOL?
    5. Explain disk areas and operations with memory areas: BUFFER CACHE, PGA, REDO
    LOG BUFFER.
Solution:
    --1
    SHARED POOL include:
    - DATA DICTIONARY CACHE (ROW CACHE). This cache have information (from data
    dictionary views, that copied into this cache for faster reading) about the
    database, its structures, and its users for checking correctness of query;
    - RESULT CACHE. Store result of queries if RESULT_CACHE clause/hint defined. It
    is mean if run same query again (and data in table not changed) then result
    return immediately from RESULT CACHE. If data changed then remove previous result
    from RESULT CACHE and store new;
    - LIBRARY CACHE. Can include multiple Shared SQL areas. Can delete unused,
    create new or increase size of one of Shared SQL areas;
    - OTHER.
    --2
    BUFFER CACHE - is a memory area that stored data blocks for a short period time.
    When you run query or make DML-operation the related data block is checked if it
    is in the buffer cache or not. If not, it is read from the disk and written into
    buffer cache. And after the related operation is done on that block, that block
    is written into the disks again. So Oracle use buffer cache for performance, data
    concurrency and consistency.
    --3
    Redo log buffers keep the initial status of the changed values with some specific
    info about these values. This buffer allows turn back values to its initial
    status using redo log data, for example when system failure.
    --4
    Java Pool has the Java classes that the database needs to use. Streams pool
    handles the streams.
    --5
    - Disk area with data. Include all database objects: tables, packages, ... When
    read data from disks or change table (or any operations) then this data stored in
    BUFFER CACHE. And then stored back to disks after commit changes.
    Disk area can write data to PGA, but PGA can`t write data to disks directly.
    - Redo log disk area. Redo log data (files) from redo log buffer can be stored
    (when buffer overflow) in the redo log disk area and deleted from redo log
    buffer.

Task 1.3. Database Data Blocks.
    1. How does Oracle store a table in a database?
    2. What mean block? What does it consist of? How much block can be?
    3. How much rows can be in a block?
    4. Which size block can have? What a default size? When specifies size of block?
    5. What does the block include? Explain each element.
    6. When we can specify size of spaces after rows? How does size affect
    performance?
    7. How to Oracle finds a rows?
    8. What row area include?
Solution:
    --1
    Oracle stores table in multiple blocks.
    --2
    It is a logical unit that consists of multiple Operating System blocks. Can be
    millions or billions blocks.
    --3
    A block can have: whole table; couple rows of a table; rows of different tables
    when multiple tables are clustered.
    --4
    2KB - 32KB. Default - 8KB. Size specifies by DBA on the database instalation.
    --5
    - Block header: block type information (row block or index block); table
    information; row directory - contain addresses (ROWID) of each row in this block.
    - Rows area (or index area if store index) with spaces after each row. If row`s
    size increases, then data writes in these spaces. If space`s size not enough, but
    block have enough size, then row deleted from here and written into the big space
    area. If in block not enough space, then row written into another block.
    - Extra space area after all rows. This area for new inserts.
    --6
    In PCTFREE (% (suppose 20%) free space of block for only future UPDATEs, another
    80% for updates and inserts) or PCTUSE (max filled % of block for allows INSERTs)
    parameters. Size = 0 decrease a performance because may be changed the place of
    the row.
    --7
    Find block with row, then go to address of that row using ROWID.
    --8
    Row area include:
    - Row header:
        - Row overhead;
        - Quantity of columns in that row;
        - Cluster key ID (if clustered);
        - ROWID of chained row pieces (if any);
    - Column data (for each column):
        - Column length (how many bytes will read from column, Oracle understand
        start and end of reading);
        - Column value.

Task 1.4. PGA.
    1. Which are memory areas include in PGA? Explain what they store.
    2. What operations are stored and performed in PGA?
    3. How does PGA size affect performance? How to specify it?
Solution:
    --1
    1) SESSION AREA. Database creates a session when user connects to the database.
    This area stores: session information of each user, session variables, login
    information, session status, etc. For each connection creates area and do not
    forget close unnecessary connections (in Java, .NET apps).
    2) PRIVATE SQL AREA:
    - persistent area - stores the bind variable values used in implicit or explicit
    cursors (cursor opens for each query) and removed them when the cursor is closed;
    - runtime area - stores the execution state information (did we start reading
    the tables, or what percentage we read till now, or did we finish reading, etc.);
    3) CURSOR AREA stores the information of the cursors.
    4) SQL WORK AREA used to operate the data returned from the discs. Consist of:
    - sort area. Data read from the discs is sorted (if query have order by, group
    by, roll-up or window commands);
    - hash join area. For hash join operations with data.
    - bitmap merge area. For bitmap merge operations with data.
    - bitmap create area. For bitmap create operations with data.
    --2
    All the session-specific operations are stored in the PGAs. All the
    session-specific data operations are performed in the PGAs.
    --3
    Small size decrease performance. Big size increase. Specify PGA size manually or
    can let Oracle to automatically change it (recommends).

Task 1.5. SHARED POOL.
    1. Which area contain SHARED POOL? For which sessions SHARED POOL visible?
    2. What does the DATA DICTIONARY CACHE store and perform?
    3. What does the RESULT CACHE store and perform? What happen if reach maximum
    memory?
    4. What does the LIBRARY CACHE store and perform?
    5. How to Oracle perform query using caches from points 2, 3, 4?
Solution:
    --1
    SGA. For all sessions.
    --2
    Data dictionary cache stores the definitions of the database objects and their
    permissions for checking columns, tables, privs, etc.
    --3
    - Stores result of commonly user queries and use this result instead reading from
    disks.
    - Stores results and parameters of functions. Return this result if function
    calls with these parameters again.
    When it reaches to the maximum memory, it deletes the least used one and inserts
    the new one. It can be done by Oracle automatically or programmer can command
    do not store result of query.
    --4
    Include Shared SQL Area where stores execution plans. Instead of directly reading
    from disks Oracle create execution plan how to read data from the disks, BUFFER
    CACHE. Oracle use stored execution plans and it is improve performance.
    In addition to the queries and their execution plans the LIBRARY CACHE stores the
    procedures and packages, and the control structures like locks.
    --5
    In SHARED POOL Oracle checks query in DICTIONARY CACHE. If query is proper and
    if execution plans exists in Shared SQL Area then Oracle go to RESULT_CACHE and
    return result if exists. If result not exists, then Oracle go to disc or BUFFER
    CACHE using this plan. If LIBRARY CACHE not exists execution plan, then Oracle
    create new and go to disk or BUFFER CACHE with new plan.

Task 1.6. BUFFER CACHE.
    1. Which memory area in SGA is larger than the BUFFER CACHE?
    2. What does the BUFFER CACHE store? Is the data private for each user or is it
    available to all users?
    3. How to Oracle work with BUFFER CACHE? What processes are used? How to Oracle
    clears BUFFER CACHE? Why Oracle use BUFFER CACHE? What is mean 'dirty blocks'?
    4. What is mean in-memory database?
Solution:
    --1
    Nothing. BUFFER CACHE is a largest area in SGA.
    --2
    Stores the copies of the most recently used and the most touched ones data (or
    index data) blocks that are read from the database discs.
    Data from BUFFER CACHE available for all users.
    --3
    When user run a query then Oracle, using 'server process', read data from disk
    and copy into the BUFFER CACHE. If any user want reach this data (this query, or
    may be one table or index from query) again, then Oracle read data from BUFFER
    CACHE. If some of data not exists in BUFFER CACHE, then Oracle read from disks
    only absent blocks.
    Oracle can write data from BUFFER CACHE to the disks using 'writer process', for
    example if you make UPDATE statement on a block in BUFFER CACHE and when you
    COMMIT, then these blocks are written to the disks. This changed and not commited
    blocks called as "dirty blocks".
    Oracle deletes first stored block if it is not frequently read from the cache.
    Otherwise it deletes the least used block of the oldest.
    Oracle use BUFFER CACHE because:
    - reading from BUFFER CACHE faster then reading from disks; also reading from
    BUFFER CACHE uses if RESULT CACHE have not saved result;
    - in BUFFER CACHE makes faster join operations (nested loop, hash join, etc.);
    - using BUFFER CACHE makes faster writing all the changed blocks to disks in one
    step, not writing one by one.
    --4
    Databases with big memory cache that allowed store required data in memory.

Task 1.7. REDO LOG BUFFER.
    1. What is a REDO LOG BUFFER for?
    2. Explain how to work REDO LOG BUFFER and which data it is operate? What
    processes are used?
    3. What is mean circle buffer?
    4. How it influence to performance?
Solution:
    --1
    For handles the data loss when system crashes or something that. It helps recover
    the database or a table to a previous state using redo log data.
    --2
    When we change table data by an insert, update, delete, create, alter, drop, the
    Oracle creates REDO LOG ENTRIES (in REDO LOG BUFFER) for some time - information
    about the changes to the database.
    When COMMIT, or REDO LOG BUFFER gets full, or buffer gets one-third full or every
    3 seconds the 'LOG WRITER PROCESS (LGWR)' gives a SCN (system change number) to
    these REDO LOG ENTRIES and write them from REDO LOG BUFFER to the REDO LOG FILES
    in the REDO LOG DISCS.
    Oracle can recover the database, table or period of time to its previous state
    from REDO LOG FILES using 'server process'.
    If make rollback then changes from REDO LOG FILES from disk will be deleted.
    --3
    When REDO LOG BUFFER is full, then Oracle starts writing to it starting from
    beginning.
    --4
    Each creation of REDO LOG ENTRY have a cost.

Task 1.8. UNDO.
    1. If use UPDATE what happens with changed data before commit and after. What is
    mean UNDO data? What can other users see?
    2. Explain logical containers that include blocks.
Solution:
    --1
    Before commit:
        - original (old, before modifying) data copies to UNDO segment in UNDO
        tablespace, it is called 'undo data'. Undo blocks uses for ROLLBACK, read
        consistency (other users see only this 'undo data', but we can see changed
        blocks for your session from BUFFER CACHE).
        - another copy of this data written into the BUFFER CACHE and all the changes
        are done in these blocks (the changes applied to the REDO LOG BUFFER before
        the BUFFER CACHE).
    After commit:
        - original (old, before modifying) data stores in UNDO segments some time for
        flashback feature.
    In both cases new change details (new data) stores in REDO LOG BUFFER. Other
    users can see new data after commit.
    --2
    Blocks > Extents > Segments > Tablespaces

Task 1.9.1. Architecture of Oracle database storage.
    1. What is mean database storage. Explain storage elements and write extensions
    and what are they store.
Solution:
    --1
    Storage is a disks. Storage include:
    1) Control files - information about database physical structure. Without this
    files cannot access to data in database. Extension is ".CTL".
    2) Data files - stores tables, procedures, etc. ".DBF".
    3) Online redo log files - stores redo log entries received from REDO LOG BUFFER.
    ".LOG".
    4) Archived redo log files - to provide more space in Online redo log files redo
    log entries constantly moves here. ".LOG".
    5) Backup files - files on backup disks stores copies of actual data. Database
    can work with it when accident occurs or use it for recovery. Should include:
    ".DBF", ".CTL", ".LOG" files or may be used data from dump file - ".DMP".
    6) Parameter file - stores the configure data of the instance. "SPFILE.ORA" - on
    server, PFILE (for example "INIT.ORA") - on client.
    7) Password file - passwords of the admin users: sysdba, sysoper and sysasm.
    "PWD.ORA".
    8) Alert log (stores the log of messages and errors occured in the database) -
    "ALERT.LOG" and trace files (stores the error trace info) - ".TRC".

Task 1.9.2. Logical and physical database structure.
    1. List and explain elements of database logical structure (indicate physical
    structure).
Solution:
    --1
    Oracle data block (physically stores in different OS blocks) size 2-32KB.
    1) Extent - includes several consecutive data blocks. Stores specific type of
    information.
    2) Segment - includes multiple extents. Used for storing big data (tables,
    indexes, etc). Can be: data segments, index segments, undo segments, temporary
    segments (created to provide a temporary work area for the SQL statements to
    complete the executions). Data and index can`t be stored in the same segments.
    3) Tablespace - includes multiple segments. Two types:
    - TEMPORARY. Contains temporary data (stores in the temporary files) stored only
    for duration of a session.
    - PERMANENT (physically stores in ".DAT", ".DBF" files, these files stores in OS
    blocks). Contains persistent schema objects. UNDO tablespace never have 0
    segments.
    Database must have at least 2 tablespaces: SYSTEM, SYSAUX. Benefits:
    - can make special operations for different tablespaces (enable online, offline
    status; make one for recovery; export, import data of another tablespace);
    - can assign user to tablespace with special size;
    4) Database - includes multiple tablespaces.
    Schema in database is logical container of an objects created by an user.


CHAPTER 2. "Tuning. Basic concepts of tuning"
Task 2.1. Tuning. Part 1.
    1. What does 'runaway query' mean?
    2. After what events makes tuning?
    3. List the reasons of performance loss and write examples.
    4. List characteristics of bad SQL and explain.
    5. What is mean DB TIME?
Solution:
    --1
    Runaway query is a query whose execution time is taking longer than the execution
    time estimated by the optimizer.
    --2
    - after checking the top consuming queries frequently;
    - after any compliants of bad performance.
    --3
    1) Structural changes (change column, drop index);
    2) Changes at the data (table become big or small);
    3) Application changes (changed varchar2 parameter to number, stopped use bind
    variables);
    4) Old statistics (changed gather statistic from one per day to one per week);
    5) Database upgrades (new database upgrades);
    6) Changed database parameters (DBA increase or decrease PGA, SGA);
    7) Operating system and hardware changes (decreased size of memory).
    --4
    1) Unnecessary parse time (validate query, then allocate PRIVATE SQL AREA for
    this statement, then generating the execution plan);
    2) Unnecessary I/O operations (read 100 blocks instead of 1 to get 1 row, for
    example have not index);
    3) Unnecessary CPU time (unnecessary joins (or wrong methods), sorting);
    4) Unnecessary waits (CPU wait, network wait).
    --5
    Time on wait (CPU) + Time on execution.

Task 2.2. Tuning. Part 2.
    1. Explain storing data using VARCHAR2.
    2. Explain stages of processing SQL statement. What is mean (write synonyms) SOFT
    and HARD PARSE? When does the database performs HARD PARSE always? Explain how
    the SQL statement is stored and where.
Solution:
    --1
    Data in VARCHAR2 occupy memory as defined, for example 2 bytes, not 4000 bytes.
    Also it is true for NUMBER, CLOB, etc.
    --2
    1) Syntax check - if incorrect, then return an error immediately.
    2) Semantic check. Checks info about tables, columns in DATA DICTIONARY CACHE.
    Return an error immediately.
    3) Privilege check. Checks in DATA DICTIONARY CACHE user privileges on objects
    from query. Return an error immediately.
    4) Allocate PRIVATE SQL AREA. Creates this area in PGA for stores query result
    (cursor). Stores rowid of execution plans stored in LIBRARY CACHE. If SQL
    statement runs again, then existing execution plan will be found rapidly by rowid
    (default keep 50 rowid`s) in the LIBRARY CACHE (instead of search hash).
    5) Finding existing execution plan in SHARED SQL AREA from SHARED POOL.
    SQL statement does not stores directly in SHARED SQL AREA (because it finite).
    For that SQL code Oracle creates hash value and store it in LIBRARY CACHE.
    Remember that small changes in query (for example swap columns in SELECT) makes
    another hash.
    5.1) SOFT PARSE (LIBRARY CACHE HIT) - if 1, 2, 3 stages passed and in stage 5
    found execution plan.
    5.2) HARD PARSE will be if related execution plan is not found. Database always
    performs a HARD PARSE of DDL. That have stages:
    - allocate memory in SHARED SQL AREA for write new execution plan. It have cost,
    because have some processes or if SHARED SQL AREA is full then need delete aged
    one;
    - optimization. Optimizer calculates several execution plans (if current best
    cost is small, the oprimizer ends explores alternative plans, because futher cost
    improvments is not significant) and select best in one second (time can be
    increased if change optimization level);
    - row source generation process - gets the execution plan and generates
    operations (how to get data from tables, join, sort, etc.) from plan.
    6) Execute SQL code and return result to the user.

Task 2.3. Optimizer.
    Suppose that you have query:
    SELECT * FROM MY_TABLE WHERE MY_COLUMN = 2;
    1. What is the Optimizer?
    2. List the steps that the optimizer performs when executing the query. What is
    mean selective query?
    3. List an optimizer components. Explain. When is execution plan saved in memory?
    4. What is mean cost basically?
Solution:
    --1
    Optimizer is a software used to find the best execution plan - those plan allows
    execute query with minimum cost.
    --2
    1) Checks schema indexes from query.
    2) Generates possible access paths (use indexes or read whole table).
    3) Checks statistics for choose the best plan. It is mean checks statistics of
    MY_TABLE for MY_COLUMN to determine selective MY_COLUMN or not. Selective query
    is a query that returns 1%-4% of table and can be used index. If returns 25% of
    table - unselective (will be read whole table).
    4) Use plan for row source generator.
    --3
    1) Query transformer. Transform query to the best query for better performance.
    2) Estimator. Estimates cost based on statistics from data dictionary (it is not
    real cost, because statistics gathered in specific time, and data can be
    changed).
    3) Plan generator:
        - generates plans using different access paths, join methods, join orders;
        - for each of these plans estimates cost using estimator (from point 2);
        - at the end chooses best plan with lowest cost and passes that plan to the
        row source generator.
    After row source generation execution plan stores in the SHARED SQL AREA.
    --4
    Cost = CPU usage + memory usage + disk usage for reading 1 block and then
    multiply it with blocks to be read.

Task 2.4. Query transformer.
    Do next operations:
        DROP TABLE TAB1;
        CREATE TABLE TAB1 (COL1 NUMBER PRIMARY KEY,
                           COL2 NUMBER UNIQUE,
                           COL3 VARCHAR2(100));
        BEGIN
            FOR i IN 1..1000000 LOOP
                INSERT INTO TAB1(COL1, COL2, COL3) VALUES(i, i*2, i||'text');
            END LOOP;
            COMMIT;
        END;
        DROP TABLE TAB2;
        CREATE TABLE TAB2 (COL2 NUMBER);
        INSERT INTO TAB2(COL2) VALUES(500000);
        INSERT INTO TAB2(COL2) VALUES(600000);
        INSERT INTO TAB2(COL2) VALUES(700000);
        COMMIT;
    Then do next:
    1. What in WHERE clause of query can prevent use index or reduce performance?
    Write examples. Also write examples how to correct that query.
    2. How to optimizer choose best query from point 1? When does the optimizer not
    transform a query?
Solution:
    --1
    --1.1. Using OR in where clause may not use index:
    SELECT *
        FROM TAB1
            WHERE COL1 = 500000
               OR COL2 = 1200000
    --use UNION ALL for activate indexes
    SELECT * FROM TAB1 WHERE COL1 = 500000
    UNION ALL
    SELECT * FROM TAB1 WHERE COL1 <> 500000 AND COL2 = 1200000
    --1.2. Nested query can reduce performance
    SELECT * FROM TAB1
        WHERE COL2 IN (SELECT COL2 FROM TAB2)
    --Use JOIN statement for better cost
    SELECT t1.*
        FROM TAB1 t1
            JOIN TAB2 t2
            ON t1.COL2 = t2.COL2
    --2
    Optimizer calculate costs of queries and choose best query. Optimizer does not
    transform query if a table is very small, instead it just uses full table scan.

Task 2.5. Selectivity. Cardinality.
    Do next operations:
        DROP TABLE TAB1;
        CREATE TABLE TAB1 (COL1 NUMBER PRIMARY KEY,
                           COL2 NUMBER UNIQUE,
                           COL3 VARCHAR2(100));
        BEGIN
            FOR i IN 1..1000000 LOOP
                INSERT INTO TAB1(COL1, COL2, COL3) VALUES(i, i*2, i||'text');
            END LOOP;
            COMMIT;
        END;
    Then do next:
    1. What is selectivity? What is high and low selectivity? Where is it used?
    Where is selectivity displays in execution plan?
    2. What is cardinality? Where is it used? What are common reasons of estimating
    cardinality incorrectly?
    3. How to Oracle estimates selectivity and cardinality? What is dictionary`s view
    helps us calculates it? How to estimate selectivity and cardinality for TAB1:
    - WHERE COL1 = 5;
    - WHERE COL1 = 5 AND COL2 = 8;
Solution:
    --1
    Selectivity = number of rows returning from the query / total number of rows.
    Selectivity close to 0 - high (good) selectivity, if close to 1, then low (bad).
    Selectivity uses in: estimating in I/O cost; influence on sort cost (if returns
    many rows, then sort cost will be high).
    Selectivity is not visible in execution plan.
    --2
    Cardinality is the number of rows returned by each operation in an execution
    plan. Optimizer uses cardinality for choosing join method, estimate sort cost.
    Missing statistics, inaccurate statistics or complex predicates.
    --3
    --At the points 1 and 2 is exact selectivity and cardinality
    Selectivity = 1 / optimizer calculates distinct values for column from WHERE.
    --If a histogram exists on the predicate column, then the estimator uses it
    --instead of the number of distinct values
    SELECT COLUMN_NAME, NUM_DISTINCT --NULL values ignores for calculate NUM_DISTINCT
        FROM ALL_TAB_COLUMNS
            WHERE TABLE_NAME = 'TAB1'
    --Each column can have different selectivity (high or low).
    -- WHERE COL1 = 5;
    Selectivity = 1 / COUNT_OF_DISTINCT_VALUES_FROM_COL1
    -- WHERE COL1 = 5 AND COL2 = 8; --need multiply selectivities
    Selectivity = (1 / COUNT_OF_DISTINCT_VALUES_FROM_COL1) *
                    (1 / COUNT_OF_DISTINCT_VALUES_FROM_COL2)
    Cardinality = total number of rows * selectivity.

Task 2.6. Cost. Row source generator.
    1. What is mean estimation cost, for example if cost = 100?
    2. What is use estimator to estimate the cost?
    3. Write formula of calculating the cost. Explain elements.
    4. What other costs are calculated? What is the most expensive cost?
    5. What does creates a plan generator and a row source generator?
    6. What is produce row source generator? What is mean row source?
Solution:
    --1
    Oracle must read 100 single blocks to execute our query.
    --2
    Disk I/O usage, CPU usage, memory usage.
    --3
    Cost =
    (
     (Single-block I/O Cost = number_of_single-block_reads * single-block_read_time)
     +
     (Multiblock I/O Cost = number_of_multiblock_reads * multiblock_read_time)
     +
     (CPU Cost = number_of_CPU_cycles / CPU_speed)
    ) / single-block_read_time
    Single-block reading - reads block by block. Multiblock reading - reads multiple
    blocks (when full table read or read a range of table) at a time to speed up the
    reading. Number of CPU cycles - number of cycles for sorting, hashing, caching
    for our execution. CPU Cost may vary if CPU busy.
    --4
    Memory reading cost. I/O cost is a haviest cost.
    --5
    Plan generator generates best plan and transfer it to row source generator. Row
    source generator creates final execution plan.
    --6
    Generator generates iterative execution plan with row source tree. Tree includes
    step by step row sources - areas with row sets: table, view, result of join or
    grouping operation. Each step returns a row set, and that row set is used by the
    next step.

Task 2.7. Steps for analyze bad performance query.
    1. What are steps can be included in collecting data? Explain.
    2. List tools for collecting performance data.
    3. What are steps can be included in analyze of the query text?
    4. List tools for get the execution plan.
    5. What are steps can be included in analyzing execution plan?
    6. What could be possible solutions of a bad performance query?
Solution:
    --1
    1) Examine execution plan;
    2) Information of objects in the query: tables, indexes, columns, types, ...;
    3) Collect both statistics: object and system. If old then update it or use
    dynamic sampling for better plan. Or not our query, but another queries or
    applications may highly load CPU;
    4) Checks histograms. Histogram allows understand how many rows really contains
    our distinct value in column: 3%, or 10%, or 60% of table;
    5) Parameter settings. Changed by DBA.
    --2
    SQLT, DBMS_STATS, TKPROF, AWR report, etc.
    --3
    1) Check unnecessary business data, especially if PGA is small;
    2) Check the predicates;
    3) Check constructions: subqueries, joins, IN, OR.
    --4
    SQL trace, TKPROF, V$SQL_PLAN, DBMS_MONITOR, AWRSQRPT.SQL, etc.
    --5
    1) Checks access paths. Read whole table, use index, index type, ...;
    2) Checks join order, join types. Can replace tables, change join type;
    3) Compare actual and estimated number of rows. Update statistics for proper
    selectivity, use hints for better plan;
    4) Find steps where cost and the logical reads have big difference.
    --6
    1) Updating statistics. Update manually or makes another update frequency (per
    day, per 12 hour, etc.);
    2) Use dynamic statistics;
    3) Create or re-create indexes, choose proper index;
    4) Create index-organized tables;
    5) Using hints.
    6) Others (eliminate implicit data type conversions, creating function-based
    indexes, use parallel execution, etc.).

CHAPTER 3. "Tuning. Statistics. Execution plan."
Task 3.1. Statistics. Part 1.
    1. What is difference between explain plan and execution plan?
    2. List types of statistics: explain purpose and where stores.
    3. What does the optimizer statistics include?
    4. How to Oracle collects optimizer statistics automatically and when by default?
Solution:
    --1
    Explain plan - based on estimates using statistics. Execution plan can be vary
    after execute query if estimates wrong and after execute will be written to
    SHARED SQL AREA.
    --2
    1) Optimizer statistics - created for the purposes of query optimization and are
    stored in the data dictionary.
    2) Performance statistics - visible through V$ views.
    --3
    Table statistics:
        Number of rows
        Number of blocks
        Average row length
    Column statistics:
        Number of distinct values (NDV) in column
        Number of nulls in column
        Data distribution (histogram)
    Index statistics:
        Number of leaf blocks
        Levels
        Clustering factor
    System statistics:
        I/O performance and usage
        CPU performance and usage
    --4
    Gathers missing and stale statistics with the job GATHER_STATS_JOB on all objects
    in the database. Every night from 10 P.M. to 6 A.M. and all day on weekends.

Task 3.2. Statistics. Part 2.
    Do next operations:
        DROP TABLE TAB1;
        CREATE TABLE TAB1 (COL1 NUMBER PRIMARY KEY,
                           COL2 NUMBER UNIQUE,
                           COL3 VARCHAR2(100));
        BEGIN
            FOR i IN 1..1000000 LOOP
                INSERT INTO TAB1(COL1, COL2, COL3) VALUES(i, i*2, i||'text');
            END LOOP;
            COMMIT;
        END;
    Then do next:
    1. What is default value of CPU single block reading?
    2. How to collect system statistics manually and for which user allowed by
    default? Which object stores information about system statistics and explain why
    some values can be NULL?
    3. Lists procedures to gather optimizer statistics manually for: database, SYS
    schemas, one schema, one table, one index. Explain it. Which user can run it?
Solution:
    --1
    12 milliseconds.
    --2
    --SYS user can execute it by default. Do not forget connect by SYS to proper
    --container, for example: ALTER SESSION SET CONTAINER=ORCLPDB;
    EXEC DBMS_STATS.GATHER_SYSTEM_STATS('Start');
    --some values from PVAL1, PVAL2 can be NULL if we not add additional parameters
    --in GATHER_SYSTEM_STATS procedure
    SELECT * FROM SYS.AUX_STATS$;
    --3
    --Gather statistics (may be a long time) available by SYS user:
    --for all objects in all schemas in the database
    EXEC DBMS_STATS.GATHER_DATABASE_STATS;
    --for schemas SYS and SYSTEM (these statistics for database itself)
    EXEC DBMS_STATS.GATHER_DICTIONARY_STATS;
    --for a one schema
    EXEC DBMS_STATS.GATHER_SCHEMA_STATS(OWNNAME => 'HR');
    --for a one table
    --CASCADE => TRUE - gather statistics for all indexes of table
    EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME => 'HR', TABNAME => 'TAB1',
                                       CASCADE => TRUE);
    --for a one index
    EXEC DBMS_STATS.GATHER_INDEX_STATS(OWNNAME => 'HR', INDNAME => 'IND1');
    --For HR. For certain object (not recommended, for backward compatibility)
    ANALYZE TABLE TAB1 COMPUTE STATISTICS;

Task 3.3. Statistics views.
    1. List dictionary views to see optimizer statistics. Explain for which objects
    these statistics and list important columns for table statistics.
    2. List views of system and session statistics.
Solution:
    --1
    SELECT TABLE_NAME,
            LAST_ANALYZED --last date of DBMS_STATS.GATHER_TABLE_STATS for table
        FROM SYS.DBA_TABLES --tables
    SELECT PARTITION_NAME,
            PARTITION_POSITION,--plan displays these numbers that server read in step
            NUM_ROWS, --quantity of rows in table or partition
            BLOCKS, --quantity of blocks in table or partition
            AVG_ROW_LEN --the average row length in bytes, including row overhead
        FROM SYS.DBA_TAB_STATISTICS
    SELECT * FROM SYS.DBA_TAB_COL_STATISTICS--table columns
    SELECT * FROM SYS.DBA_INDEXES           --indexes
    SELECT * FROM SYS.DBA_CLUSTERS          --clusters
    SELECT * FROM SYS.DBA_TAB_PARTITIONS    --data of the partitions based on tables
    SELECT * FROM SYS.DBA_IND_PARTITIONS    --data of the indexes based on partitions
    SELECT * FROM SYS.DBA_PART_COL_STATISTICS--column statistics for partitions
    --2
    SELECT * FROM SYS.V$SYSSTAT --system
    SELECT * FROM SYS.V$SESSTAT --session

Task 3.4. Generating execution plan.
    Do next operations:
        DROP TABLE TAB1;
        CREATE TABLE TAB1 (COL1 NUMBER PRIMARY KEY,
                           COL2 NUMBER UNIQUE,
                           COL3 VARCHAR2(100));
        BEGIN
            FOR i IN 1..1000000 LOOP
                INSERT INTO TAB1(COL1, COL2, COL3) VALUES(i, i*2, i||'text');
            END LOOP;
            COMMIT;
        END;
    Then do next:
    1. How to generate execution plan manually for 'SELECT * FROM TAB1' using EXPLAIN
    PLAN? Explain where we can see result and what main parts in here?
    2. Explain using AUTOTRACE for point 1. What privs user must have? How to display
    AUTOTRACE status? How to disable AUTOTRACE? What are views uses in statistics?
    When AUTOTRACE may work incorrectly?
Solution:
    --1
    --Plan table is a GLOBAL TEMPORARY TABLE
    EXPLAIN PLAN FOR SELECT * FROM TAB1; --explain plan will be saved into PLAN_TABLE
    --Result, complex for reading
    SELECT * FROM PLAN_TABLE;
    --Readable result based on selected rows from PLAN_TABLE. Here HASH value +
    --binded explain plan, from LIBRARY CACHE.
    SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY());
    --Also we can assign ID to query and find it in PLAN_TABLE
    EXPLAIN PLAN
        SET STATEMENT_ID = 'ID1'
            FOR SELECT * FROM TAB1;
    SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY(STATEMENT_ID => 'ID1'))
    SELECT * FROM PLAN_TABLE WHERE STATEMENT_ID = 'ID1';
    --Or we can put result into our created table (usually not used)
    EXPLAIN PLAN
        SET STATEMENT_ID = 'ID1'
            INTO MY_TABLE
                FOR SELECT * FROM TAB1;
    --2
    --Autotrace - generates: rows, explain plan (from PLAN_TABLE), actual statistics.
    --User (for example HR) must have privs:
    --GRANT SELECT_CATALOG_ROLE TO HR; GRANT SELECT ANY DICTIONARY TO HR;
    --in SQL PLUS (in SQL developer use F6 after highlighting query, also choose
    --'Fetch all rows' in SQL Developer preferences for Autotrace) you must run
    --query (or DML) after these commands for return result:
    SET AUTOTRACE ON; --produce result: rows, explain plan, statistics
    SET AUTOTRACE ON EXPLAIN; --rows and explain plan
    SET AUTOTRACE ON STATISTICS; --rows and statistics
    SET AUTOTRACE TRACE; SET AUTOTRACE TRACEONLY; --explain plan, statistics
    SET AUTOTRACE TRACE EXPLAIN; SET AUTOTRACE TRACEONLY EXPLAIN; --explain plan
    SET AUTOTRACE TRACE STATISTICS; SET AUTOTRACE TRACEONLY STATISTICS; --statistics
    SHOW AUTOTRACE; --display status
    SET AUTOTRACE OFF; --Successful disabling if have not any messages after execute
    SELECT * FROM V$STATNAME; --statistics names
    SELECT * FROM V$MYSTAT; --statistics values
    --AUTOTRACE may work incorrectly producing nonexact execution plan when use bind
    --variables.

Task 3.5. Dynamic performance views.
    Do next operations:
        DROP TABLE TAB1;
        CREATE TABLE TAB1 (COL1 NUMBER PRIMARY KEY,
                           COL2 NUMBER UNIQUE,
                           COL3 VARCHAR2(100));
        BEGIN
            FOR i IN 1..1000000 LOOP
                INSERT INTO TAB1(COL1, COL2, COL3) VALUES(i, i*2, i||'text');
            END LOOP;
            COMMIT;
        END;
    Then do next:
    1. Where are the performance views stored?
    2. List and explain what stores popular dynamic performance views (except
    V$SQL_PLAN) using query 'SELECT * FROM TAB1 ORDER BY COL1 DESC' and demonstrate
    how to find exact query in these views.
    3. Explain V$SQL_PLAN using query 'SELECT * FROM TAB1 ORDER BY COL1 DESC' and
    demonstrate how to find exact query in this view.
    4. What is difference between PLAN_TABLE and V$SQL_PLAN?
Solution:
    --1
    In SHARED SQL AREA.
    --2
    --Views will be updated after executing query SELECT * FROM TAB1
    --Statistics about memory usage for each executed query:
    SELECT * FROM V$SQLAREA
        WHERE UPPER(SQL_TEXT) LIKE UPPER('%SELECT * FROM TAB1 ORDER BY COL1 DESC%');
    --Displays information about work areas (hash join area, sort area, etc.) used by
    --SQL cursors and helps answer on next questions:
    --What are the top 10 work areas that require the most cache area?
    --For work areas allocated in AUTO mode, what percentage of work areas are
    --running using maximum memory?
    SELECT * FROM V$SQL_WORKAREA
        WHERE SQL_ID = (SELECT SQL_ID
                            FROM V$SQLAREA
                                WHERE UPPER(SQL_TEXT) LIKE
                                    UPPER('SELECT * FROM TAB1 ORDER BY COL1 DESC'));
    --For simply checking costs, parse counts, logical and physical reads, etc. This
    --view can updates every 5 sec. and it is allows analyze a long-running query:
    SELECT * FROM V$SQL
        WHERE UPPER(SQL_TEXT) LIKE UPPER('SELECT * FROM TAB1 ORDER BY COL1 DESC');
    --Provides execution statistics at the row source level (it is mean for each
    --step) for each child cursor:
    ALTER SESSION SET STATISTICS_LEVEL = ALL;
    SELECT * FROM TAB1 ORDER BY COL1 DESC;
    SELECT * FROM V$SQL_PLAN_STATISTICS
        WHERE SQL_ID = (SELECT SQL_ID
                            FROM V$SQLAREA
                                WHERE UPPER(SQL_TEXT) LIKE
                                    UPPER('SELECT * FROM TAB1 ORDER BY COL1 DESC'));
    --Memory usage statistics for row sources that use SQL memory (sort or hash-join)
    --and displays comparison of optimizer estimates with the actual execution
    --statistics for the row sources.
    SELECT * FROM V$SQL_PLAN_STATISTICS_ALL
        WHERE SQL_ID = (SELECT SQL_ID
                            FROM V$SQLAREA
                                WHERE UPPER(SQL_TEXT) LIKE
                                    UPPER('SELECT * FROM TAB1 ORDER BY COL1 DESC'));
    --3
    --Contains execution plans including the estimated statistics for each row
    --source:
    SELECT * FROM V$SQL_PLAN
        WHERE SQL_ID = (SELECT SQL_ID
                            FROM V$SQLAREA
                                WHERE UPPER(SQL_TEXT) LIKE
                                    UPPER('SELECT * FROM TAB1 ORDER BY COL1 DESC'));
    SELECT * FROM
        TABLE(DBMS_XPLAN.DISPLAY_CURSOR(
                        (SELECT SQL_ID
                            FROM V$SQLAREA
                                WHERE UPPER(SQL_TEXT) LIKE
                                    UPPER('SELECT * FROM TAB1 ORDER BY COL1 DESC'))
                                        ));
    --4
    In PLAN_TABLE stores the estimated plan, in V$SQL_PLAN - actual plan.

Task 3.6. Reading execution plan.
    Do next operations:
        DROP TABLE TAB1;
        CREATE TABLE TAB1 (COL1 NUMBER PRIMARY KEY,
                           COL2 NUMBER UNIQUE,
                           COL3 VARCHAR2(100));
        BEGIN
            FOR i IN 1..1000000 LOOP
                INSERT INTO TAB1(COL1, COL2, COL3) VALUES(i, i*2, i||'text');
            END LOOP;
            COMMIT;
        END;
    Then do next:
    1. What columns displays execution plan using DBMS_XPLAN.DISPLAY_CURSOR for table
    with partitions? Explain each of them. What else include in execution plan except
    columns? Explain.
    2. What is a difference between access predicate and filter predicate?
    3. In which directions to read the query plan? Which data will operate Oracle on
    each step?
    4. What is mean driving table? How to Cost Based Optimizator determines that
    table?
    5. What is cost can be if displayed cost for 3 steps = 15 + 1 + 0? Explain.
Solution:
    --1
    ID - row number of execution plan.
    OPERATION - name of operation.
    NAME - row source name (name of table, index, etc.).
    ROWS - estimated cardinality (quantity of rows returns by operation).
    BYTES - amount of bytes read from disk or memory.
    COST (%CPU) - cost + % of CPU usage.
    TIME - time to perform the operation.
    PSTART - Oracle start read data from this partition id.
    PSTOP - Oracle stop read data to this partition id.
    Predicate information displays row condtitions from operation. Will be retrieved
    rows only satisfy these conditions.
    --2
    Access - we only "access" relevant data (for example when using an index we
    "access" only to interest rows). Filter - when we get more data (for example all
    rows) than we need and then filter it.
    --3
    Right to left and bottom to top using indentation level. Oracle will operate by
    data retrieved from previous step only.
    --4
    Is a first table in execution plan. CBO uses the statistics and data dictionary
    to determine which table to be used as the driving table.
    --5
    15. 0 mean that cost is very small, but not zero. 1 also mean that cost small and
    sometimes Oracle does not calculate small cost for final cost.

CHAPTER 4. "Table and index scan"
Task 4.1. Index types.
    1. List types of indexes and it`s subtypes. Explain what does it store.
    2. List attributes of indexes.
    3. Explain structure of index types from point 1. What is mean index entry?
    4. How to Oracle can read data between leafs?
    5. Which indexes cannot be used with LOB column and which can be used?
Solution:
    --1
    B-TREE indexes:
    - normal (default): stores actual value of column(s);
    - function-based: contain result of the stored function based on column(s);
    - index-organized table (IOT): stores all columns from table in index. Related
    data gets from the index directly.
    BITMAP indexes.
    B-TREE and BITMAP indexes can be both function based.
    --2
    - Key compression - compress indexes;
    - Reverse key (uses only with equality operator) - stores the column values as
    reversed;
    - Ascending - descending ordered indexes. Orders values of column(s) in index by
    ASC or DESC.
    --3
    'B-TREE index' have: root, branches, leafs. Root and branches contain intervals
    of value(s) of column(s). Leafs have rows, every row have index entry = value of
    column(s) + rowid.
    'BITMAP index' (uses for columns with low selectivity, can stores NULL, can also
    reduce storage space compared to other indexes).
    Root and branches are the same as in BTREE type, but leafs stores a key (value
    of column) + interval (start ROWID and end ROWID). Also have bitmap column that
    have values 1 or 0. 1 mean that row with ROWID from interval have the key,
    0 - not. So when using that type, then index returns key, then retrieves batch
    of ROWIDs for this key, then retrieves rows using ROWIDs.
    --4
    Oracle can read next leaf using double links between leafs.
    --5
    You cannot build a B-tree or bitmap index on a LOB column, but can create a
    function-based or a domain index.

Task 4.2. Table and index access paths.
    1. What is mean table or index access paths?
    2. List and explain table access paths.
    3. List and explain index access paths.
Solution:
    --1
    It is mean how to read data from: table or index.
    --2
    'Table access full' (Full Table Scan) - reads all rows from a table and then
    returns only required rows.
    'Table Access by Rowid' - optimizer returns row from block using ROWID (it is a
    fastest way).
    'Sample Table Scan' - retrieves a random sample of data from a table or complex
    select statement (in a query you must manually instruct optimizer use it).
    --3
    'Index Unique Scan' - returns 1 rowid, all column(s) in WHERE must have "=" sign
    and primary key (or unique constraints or unique index). It is fastest way after
    table access by ROWID.
    'Index Range Scan' - optimizer use it if one or more leading columns of an index
    are specified in conditions. It will read index leafs starting from the leading
    entry and continue reading using links between leafs.
    'Index Full Scan' - scan all index data if need order result or if use GROUP BY
    clause. Or does not read every block in the index, but processes all blocks until
    find first leaf block.
    'Index Fast Full Scan' - data gets from index directly if query have columns only
    from index.
    'Index Skip Scan' - optimizer use it if initial column of a composite index is
    "skipped" or not specified in the query. Optimizer skips the blocks which do not
    have any key for our query, and reads only the blocks which have the keys.
    'Index Join Scan' - optimizer use it if a hash join of multiple indexes together
    returns all columns requested by a query and all data is retrieved directly from
    the indexes.

Task 4.3. Table access full (FULL TABLE SCAN). Table access BY ROWID. SAMPLE TABLE
scan.
    1. List typical reasons for FULL TABLE SCAN.
    2. How to instruct Oracle read 4 blocks at a same time?
    3. List typical reasons for table access by ROWID.
    4. How to named SAMPLE TABLE scan in execution plan? Explain two forms, what can
    be faster?
Solution:
    --1
    1) Have not suitable index;
    2) The query predicate applies a function to the indexed column;
    3) A SELECT COUNT(*) query is issued, and an index exists, but the indexed column
    contains nulls;
    4) Low selectivity;
    5) Table statistics are stale (table was small, but become big);
    6) Table is small;
    7) The table has a high degree of parallelism;
    8) The query uses a full table scan hint;
    9) In predicate uses not leading column from B-tree index.
    --2
    DB_FILE_MULTIBLOCK_READ_COUNT = 4;
    --3
    1) We can use ROWID in WHERE clause for fastest row access.
    2) After a scan of one or more indexes.
    --4
    TABLE ACCESS SAMPLE.
    --Two forms (SAMPLE BLOCK can be faster):
    SELECT t1.* FROM MY_TABLE SAMPLE (1) t1 --database reads random 1% of rows
    SELECT * FROM MY_TABLE SAMPLE BLOCK (1)--database reads random 1% of table blocks

Task 4.4. B-Tree Index Access Paths. Part 1.
    1. List typical reasons for INDEX UNIQUE SCAN.
    2. List rules for using INDEX UNIQUE SCAN.
    3. For which types of index available INDEX RANGE SCAN?
    4. List typical reasons for INDEX RANGE SCAN.
    5. Explain sorting for INDEX RANGE SCAN.
    6. When INDEX RANGE SCAN not be uses.
Solution:
    --1
    Want return 1 row from indexed column with unique values.
    --2
    1)SQL must have all columns from Primary Key or Unique Key indexes and have not
    another index (for example non-unique) on these columns with column order as in
    these Primary Key or Unique Key (or you can use INDEX() hint if have 2 or more
    indexes in these columns to choose proper index).
    2) All columns from index must used with '=' operator.
    --3
    B-TREE, BITMAP. Unique, non-unique. Function-based.
    --4
    One or more leading columns of an index are specified in conditions with '<',
    '>' signs, LIKE, BETWEEN or use '=' operator for rows with duplicates. Even if
    returned 1 row.
    --5
    If in GROUP BY, ORDER BY use indexed columns RANGE SCAN will not do any sort,
    because index sorted (not important ASCENDING or DESCENDING index). But it should
    not have NULLs.
    --6
    When condition begin since '%' , for example: WHERE MY_COLUMN LIKE '%A'

Task 4.5. B-Tree Index Access Paths. Part 2.
    1. List typical reasons for INDEX FULL SCAN.
    2. List typical reasons for INDEX FAST FULL SCAN.
    3. For which indexes INDEX FAST FULL SCAN can be.
    4. What is difference between INDEX FULL SCAN and INDEX FAST FULL SCAN?
Solution:
    --1
    1) When in ORDER BY clause we have only non-nullable columns from index and want
    return all rows.
    2) GROUP BY clause include only indexed columns and want return all rows.
    3) If in query use a SORT-MERGE JOIN.
    4) A predicate references on non-leading column in the index and have GROUP BY,
    ORDER BY with columns from index.
    5) No predicate is specified, but all of the following conditions are met:
        - All columns in the table and in the query are in the index;
        - At least one indexed column is not null.
    --2
    1) Query uses columns only from index and read data directly from index.
    2) Use hint INDEX_FFS.
    --3
    B-TREE and BITMAP.
    --4
    1) After INDEX FULL SCAN can read from table also. INDEX FAST FULL SCAN reads
    only from index.
    2) INDEX FULL SCAN reads blocks sequentially. INDEX FAST FULL SCAN reads multiple
    index leaf blocks at a time, but in unsorted order.
    3) INDEX FULL SCAN usually slower than INDEX FAST FULL SCAN, but sometimes if
    need sort then will be faster.
    4) INDEX FULL SCAN can eliminate sorting, INDEX FAST FULL SCAN - cannot.

Task 4.6. B-Tree Index Access Paths. Part 3.
    1. List typical reasons for INDEX SKIP SCAN.
    2. List typical reasons for INDEX JOIN SCAN.
    3. What is alternative name of INDEX JOIN SCAN? Explain joining two indexes on a
    single table.
    4. When will INDEX JOIN SCAN not be performed?
    5. What will be as result of INDEX JOIN operation?
Solution:
    --1
    1) The leading column of composite index is not specified in the query predicate.
    2) Few distinct values exist in the leading column of the composite index, but
    many distinct values exist in the second, third, ... column of the index.
    --2
    1) A hash join of multiple (2, 3, ...) indexes retrieves all data (columns)
    requested by the query, without requiring table access.
    2) The cost of retrieving rows from the table is higher than reading the indexes
    without retrieving rows from the table.
    --3
    INDEX HASH JOIN (HASH JOIN). Before INDEX JOIN Oracle can use any index scans
    (for example INDEX FAST FULL SCAN, INDEX RANGE SCAN), and then using a hash join
    on the rowids obtained from these scans to return the rows.
    --4
    If write pseudocolumn ROWID in SELECT clause.
    --5
    Temporary view store result of join with values from indexes, not from table.

CHAPTER 5. "Join methods and types"
Task 5.1. Join methods. Part 1.
    1. What is join combines and returns?
    2. What is mean join method? List these methods and explain when it is efficient
    and why. How much reads for driving table and joining table?
    3. What is mean join type? List these types.
    4. What are alternative names of driving table?
Solution:
    --1
    A join combines the output from exactly two row sources, such as tables or views,
    and returns one row source.
    --2
    Join method mean how to join two row sources.
    'NESTED LOOP JOIN'. Efficient when joining row sources are small and have indexes
    on the join conditions, because there is no sort, no hash table. Driving table
    reads once. Joining table reads for every row of the driving table.
    'SORT MERGE JOIN'. As NESTED LOOP, but efficient if tables is big and (or) one
    side is sorted - because two sides will be sorted. Both row sources reads once.
    'HASH JOIN'. Efficient than SORT MERGE JOIN if both row sources are not sorted
    and very big. Driving table reads once. Joining table reads once.
    'CARTESIAN JOIN' (MERGE JOIN CARTESIAN). Efficient when one or more of the tables
    does not have any join conditions. Driving table reads once. Joining table reads
    for every row of the driving table.
    --3
    The join condition determines the join type. INNER JOIN, OUTER JOIN, SEMIJOIN,
    ANTIJOIN, CARTHESIAN JOIN.
    --4
    External table, outer table.

Task 5.2. Join methods. Part 2.
    1. List typical reasons for NESTED LOOP JOIN.
    2. What is advantage of NESTED LOOP JOIN versus SORT MERGE JOIN, HASH JOIN and
    why?
    3. Which hint is used for NESTED LOOP JOIN? Explain which table at the hint will
    be as driving table. How to specify any table as driving in NESTED LOOP JOIN?
    4. What is mean NESTED NESTED LOOPS JOIN?
    5. How is work NESTED LOOP JOIN?
Solution:
    --1
    - Two row sources are small;
    - One row source is small and bigger row source has an index.
    --2
    NESTED LOOP returns first row faster (returns immediately) because SORT MERGE
    JOIN needs to sort and HASH JOIN needs to create a hash table.
    --3
    --The optimizer try use the smallest row source as driving table: can be table1
    --or table2. Use LEADING hint to specify the driving table.
    USE_NL(table1 table2)
    /*+ LEADING(tab2) USE_NL(tab1 tab2) */ --specify tab2 as the driving table
    --4
    The outer loop of a nested loop can itself be a row source generated by a
    different nested loop. Oracle implement it for nested loop joins since 11g to
    reduce overall latency for physical I/O.
    --5
    Firstly optimizer for each row from DRIVING table reads each row from INNER table
    and if values matched then save that row from DRIVING table + ROWIDs from INNER
    table in memory. Secondly using these ROWIDs join rows from INNER table.

Task 5.3. Join methods. Part 3.
    1. List typical reasons for SORT MERGE JOIN.
    2. What are two operations includes? How is works SORT MERGE JOIN?
    3. What hint is used for SORT MERGE JOIN?
    4. Which memory area uses for sorting and what about cost?
    5. Why sometimes you may not see sorting in the execution plan?
Solution:
    --1
    - Two row sources are big and the join condition between two tables is not an
    equijoin;
    - Because of sorts required by other operations, the optimizer finds it cheaper
    to use a SORT MERGE JOIN;
    - When the hash table does not fit completely in memory - use SORT MERGE JOIN.
    --2
    SORT JOIN - always sorts unsorted joining keys from row sets. MERGE JOIN -
    joins row sources basing its start position on the match made in the previous
    iteration. For each row in the first data set, the database finds (since next row
    after previous matching) a starting row in the second data set, and then reads
    the second data set until it finds a nonmatching row.
    --3
    USE_MERGE(table1 table2)
    --4
    Uses sort area in PGA and if not enough size, then Oracle write sorted data into
    the disks (it is increases cost).
    --5
    - Data already sorted;
    - If an index exists, then the database can avoid sorting the first data set.
    However, the database always sorts the second data set, regardless of indexes.

Task 5.4. Join methods. Part 4.
    1. List typical reasons for HASH JOIN.
    2. When is a HASH JOIN most cost effective?
    3. How is works HASH JOIN?
    4. What hint is used for HASH JOIN?
    5. List typical reasons for CARTHESIAN JOIN.
    6. When is a CARTHESIAN JOIN most cost effective?
    7. How is works CARTHESIAN JOIN?
Solution:
    --1
    - Row sources are large, have only equijoin and not sorted.
    --2
    Most cost effective when the smaller data set fits in memory. Because the hash
    table is in the PGA, Oracle can access rows without latching them in buffer
    cache.
    --3
    The database performs a full scan of the smaller row source, called the 'build
    table', and then applies a hash function to the join key in each row to build a
    'hash table' (stores rows with ROWID and hash; one hash (if mulltiple rows have
    same hash) can include multiple ROWIDs - called "slot") in the PGA. For second
    table uses typically full scan and for each row applies same hash function to the
    join column(s) and checkes this hash against the hash table. If hashes equal then
    Oracle, using ROWID from hash table, finds row in first table and joins row from
    second table.
    --4
    USE_HASH(table1 table2)
    --5
    - No join condition exists.
    - The ORDERED hint specifies a table before its join table is specified: uses
    ORDERED TABLE1, TABLE2, TABLE3, but no join condition connects TABLE1 and TABLE2.
    --6
    Not effective.
    --7
    The optimizer joins every row from one data source with every row from the other
    data source.

Task 5.5. Join types. Part 1.
    1. How to works SEMIJOIN?
    2. What are constructions can be used with SEMIJOIN? What is construction
    preferable replace by another construction for using SEMIJOIN?
    3. List join methods that can be used with SEMIJOIN. Which is method generally
    used? What are hints force Oracle to use SEMIJOIN?
    4. What phrase can you see in execution plan if use SEMIJOIN?
    5. How to works ANTIJOIN?
    6. What are constructions can be used with ANTIJOIN? Explain difference between
    these constructions.
    7. List join methods that can be used with ANTIJOIN. Which is method generally
    used? What are hints force Oracle to use ANTIJOIN?
    8. What phrase can you see in execution plan if use ANTIJOIN?
    9. List join types that can be used with HASH JOIN.
Solution:
    --1
    For each row from DRIVING table Oracle match each row from JOINING table until
    first match. After first match Oracle returns matched row immediately without
    matching remaining rows from JOINING table.
    --2
    IN and EXISTS can be used with SEMIJOIN. Replacing IN by EXISTS is recommended.
    --3
    SEMIJOIN can be used with all join methods. Generally uses the NESTED LOOPS
    method. Place hints into EXIST subquery: HASH_SJ (hash semijoin), MERGE_SJ (merge
    join semijoin), and NL_SJ (nested loop semijoin).
    --4
    We can see, for example, NESTED LOOPS (SEMI).
    --5
    For each row from DRIVING table Oracle match each row from JOINING table until
    first match (if matched, then immediatelly start match next row from DRIVING
    table against all rows from JOINING table) or if all rows from JOINING table does
    not match with row, then returns this nonmatched row from DRIVING table in
    result.
    --6
    NOT IN, NOT EXISTS: produces different results if columns from condition have
    NULL.
    --7
    Generally uses the NESTED LOOPS method, also optimizer may choose another method.
    Use in subquery (in inner row source) hints: HASH_AJ, NL_AJ, MERGE_AJ.
    --8
    We can see, for example, NESTED LOOPS (ANTI).
    --9
    All equijoin types.

: Category: Overviews; Last Updated: 01 October 2024; Hits: 372

Overview of Oracle architecture and performance tuning. Part 1.

Most Read

Latest Posts

Added Overview of Oracle architecture and performance tuning. Part 2.

Website updated

New chapter added to Overview of Oracle architecture and performance tuning. Part 1.

Added new info to Overview of Oracle architecture and performance tuning. Part 1.

New overview about Oracle architecture and performance tuning. Part1.

Minor updates for Oracle articles

Website updated

Minor update for all Oracle articles

Added overview about group, analytical, and hierarchical queries

Site description