Short overview about most popular topics as "task-answer" format.
CHAPTER 1. "Oracle DB architecture"
CHAPTER 2. "Tuning. Basic concepts of tuning"
CHAPTER 3. "Tuning. Statistics. Execution plan."
CHAPTER 4. "Table and index scan"
CHAPTER 5. "Join methods and types"
CHAPTER 1. "Oracle DB architecture"
Task 1.1. Architecture. Part 1.
1. What is a user process? How many user processes can be?
2. What is a server process? How many server processes can be?
3. What memory area does the server process work with? What about this area and
user?
4. Where are stores execution plans? Explain.
5. How can we name the memory areas from points 3 and 4? Why are these areas
used? What kind of data is stored in these areas?
Solution:
--1
User process - is a process creates by Oracle to run user`s application, send
SQL commands to the database and get data back. Can be multiple user processes.
--2
Server process - after user process send commands to the database server, the
server takes the control and pushes the statement to the Program Global Area
(PGA). Can be multiple server processes.
--3
PGA (Program Global Area or Process Global Area or Private Global Area) - memory
area that is dedicated to a single user. Every user has and can read only its own
PGA. PGA store specific SQL (for your session) and other data to be executed.
Can include multiple Private SQL areas.
--4
In 'shared SQL area' from SGA (System Global Area or Shared Global Area). SGA is
shared by all of the users. Oracle check same SQL statement and instead create
same execution plan takes existing execution plan from 'shared SQL area'.
--5
SGA and PGA - memory caches. Uses for performance because reading from memory is
faster than from disks. In PGA stores private or session-specific data, in SGA -
common or shared data.
Task 1.2. Architecture. Part 2.
1. What is include SHARED POOL? Explain these elements.
2. What is a BUFFER CACHE? Explain it.
3. What is a REDO LOG BUFFER? Explain it.
4. What is JAVA POOL and STREAMS POOL?
5. Explain disk areas and operations with memory areas: BUFFER CACHE, PGA, REDO
LOG BUFFER.
Solution:
--1
SHARED POOL include:
- DATA DICTIONARY CACHE (ROW CACHE). This cache have information (from data
dictionary views, that copied into this cache for faster reading) about the
database, its structures, and its users for checking correctness of query;
- RESULT CACHE. Store result of queries if RESULT_CACHE clause/hint defined. It
is mean if run same query again (and data in table not changed) then result
return immediately from RESULT CACHE. If data changed then remove previous result
from RESULT CACHE and store new;
- LIBRARY CACHE. Can include multiple Shared SQL areas. Can delete unused,
create new or increase size of one of Shared SQL areas;
- OTHER.
--2
BUFFER CACHE - is a memory area that stored data blocks for a short period time.
When you run query or make DML-operation the related data block is checked if it
is in the buffer cache or not. If not, it is read from the disk and written into
buffer cache. And after the related operation is done on that block, that block
is written into the disks again. So Oracle use buffer cache for performance, data
concurrency and consistency.
--3
Redo log buffers keep the initial status of the changed values with some specific
info about these values. This buffer allows turn back values to its initial
status using redo log data, for example when system failure.
--4
Java Pool has the Java classes that the database needs to use. Streams pool
handles the streams.
--5
- Disk area with data. Include all database objects: tables, packages, ... When
read data from disks or change table (or any operations) then this data stored in
BUFFER CACHE. And then stored back to disks after commit changes.
Disk area can write data to PGA, but PGA can`t write data to disks directly.
- Redo log disk area. Redo log data (files) from redo log buffer can be stored
(when buffer overflow) in the redo log disk area and deleted from redo log
buffer.
Task 1.3. Database Data Blocks.
1. How does Oracle store a table in a database?
2. What mean block? What does it consist of? How much block can be?
3. How much rows can be in a block?
4. Which size block can have? What a default size? When specifies size of block?
5. What does the block include? Explain each element.
6. When we can specify size of spaces after rows? How does size affect
performance?
7. How to Oracle finds a rows?
8. What row area include?
Solution:
--1
Oracle stores table in multiple blocks.
--2
It is a logical unit that consists of multiple Operating System blocks. Can be
millions or billions blocks.
--3
A block can have: whole table; couple rows of a table; rows of different tables
when multiple tables are clustered.
--4
2KB - 32KB. Default - 8KB. Size specifies by DBA on the database instalation.
--5
- Block header: block type information (row block or index block); table
information; row directory - contain addresses (ROWID) of each row in this block.
- Rows area (or index area if store index) with spaces after each row. If row`s
size increases, then data writes in these spaces. If space`s size not enough, but
block have enough size, then row deleted from here and written into the big space
area. If in block not enough space, then row written into another block.
- Extra space area after all rows. This area for new inserts.
--6
In PCTFREE (% (suppose 20%) free space of block for only future UPDATEs, another
80% for updates and inserts) or PCTUSE (max filled % of block for allows INSERTs)
parameters. Size = 0 decrease a performance because may be changed the place of
the row.
--7
Find block with row, then go to address of that row using ROWID.
--8
Row area include:
- Row header:
- Row overhead;
- Quantity of columns in that row;
- Cluster key ID (if clustered);
- ROWID of chained row pieces (if any);
- Column data (for each column):
- Column length (how many bytes will read from column, Oracle understand
start and end of reading);
- Column value.
Task 1.4. PGA.
1. Which are memory areas include in PGA? Explain what they store.
2. What operations are stored and performed in PGA?
3. How does PGA size affect performance? How to specify it?
Solution:
--1
1) SESSION AREA. Database creates a session when user connects to the database.
This area stores: session information of each user, session variables, login
information, session status, etc. For each connection creates area and do not
forget close unnecessary connections (in Java, .NET apps).
2) PRIVATE SQL AREA:
- persistent area - stores the bind variable values used in implicit or explicit
cursors (cursor opens for each query) and removed them when the cursor is closed;
- runtime area - stores the execution state information (did we start reading
the tables, or what percentage we read till now, or did we finish reading, etc.);
3) CURSOR AREA stores the information of the cursors.
4) SQL WORK AREA used to operate the data returned from the discs. Consist of:
- sort area. Data read from the discs is sorted (if query have order by, group
by, roll-up or window commands);
- hash join area. For hash join operations with data.
- bitmap merge area. For bitmap merge operations with data.
- bitmap create area. For bitmap create operations with data.
--2
All the session-specific operations are stored in the PGAs. All the
session-specific data operations are performed in the PGAs.
--3
Small size decrease performance. Big size increase. Specify PGA size manually or
can let Oracle to automatically change it (recommends).
Task 1.5. SHARED POOL.
1. Which area contain SHARED POOL? For which sessions SHARED POOL visible?
2. What does the DATA DICTIONARY CACHE store and perform?
3. What does the RESULT CACHE store and perform? What happen if reach maximum
memory?
4. What does the LIBRARY CACHE store and perform?
5. How to Oracle perform query using caches from points 2, 3, 4?
Solution:
--1
SGA. For all sessions.
--2
Data dictionary cache stores the definitions of the database objects and their
permissions for checking columns, tables, privs, etc.
--3
- Stores result of commonly user queries and use this result instead reading from
disks.
- Stores results and parameters of functions. Return this result if function
calls with these parameters again.
When it reaches to the maximum memory, it deletes the least used one and inserts
the new one. It can be done by Oracle automatically or programmer can command
do not store result of query.
--4
Include Shared SQL Area where stores execution plans. Instead of directly reading
from disks Oracle create execution plan how to read data from the disks, BUFFER
CACHE. Oracle use stored execution plans and it is improve performance.
In addition to the queries and their execution plans the LIBRARY CACHE stores the
procedures and packages, and the control structures like locks.
--5
In SHARED POOL Oracle checks query in DICTIONARY CACHE. If query is proper and
if execution plans exists in Shared SQL Area then Oracle go to RESULT_CACHE and
return result if exists. If result not exists, then Oracle go to disc or BUFFER
CACHE using this plan. If LIBRARY CACHE not exists execution plan, then Oracle
create new and go to disk or BUFFER CACHE with new plan.
Task 1.6. BUFFER CACHE.
1. Which memory area in SGA is larger than the BUFFER CACHE?
2. What does the BUFFER CACHE store? Is the data private for each user or is it
available to all users?
3. How to Oracle work with BUFFER CACHE? What processes are used? How to Oracle
clears BUFFER CACHE? Why Oracle use BUFFER CACHE? What is mean 'dirty blocks'?
4. What is mean in-memory database?
Solution:
--1
Nothing. BUFFER CACHE is a largest area in SGA.
--2
Stores the copies of the most recently used and the most touched ones data (or
index data) blocks that are read from the database discs.
Data from BUFFER CACHE available for all users.
--3
When user run a query then Oracle, using 'server process', read data from disk
and copy into the BUFFER CACHE. If any user want reach this data (this query, or
may be one table or index from query) again, then Oracle read data from BUFFER
CACHE. If some of data not exists in BUFFER CACHE, then Oracle read from disks
only absent blocks.
Oracle can write data from BUFFER CACHE to the disks using 'writer process', for
example if you make UPDATE statement on a block in BUFFER CACHE and when you
COMMIT, then these blocks are written to the disks. This changed and not commited
blocks called as "dirty blocks".
Oracle deletes first stored block if it is not frequently read from the cache.
Otherwise it deletes the least used block of the oldest.
Oracle use BUFFER CACHE because:
- reading from BUFFER CACHE faster then reading from disks; also reading from
BUFFER CACHE uses if RESULT CACHE have not saved result;
- in BUFFER CACHE makes faster join operations (nested loop, hash join, etc.);
- using BUFFER CACHE makes faster writing all the changed blocks to disks in one
step, not writing one by one.
--4
Databases with big memory cache that allowed store required data in memory.
Task 1.7. REDO LOG BUFFER.
1. What is a REDO LOG BUFFER for?
2. Explain how to work REDO LOG BUFFER and which data it is operate? What
processes are used?
3. What is mean circle buffer?
4. How it influence to performance?
Solution:
--1
For handles the data loss when system crashes or something that. It helps recover
the database or a table to a previous state using redo log data.
--2
When we change table data by an insert, update, delete, create, alter, drop, the
Oracle creates REDO LOG ENTRIES (in REDO LOG BUFFER) for some time - information
about the changes to the database.
When COMMIT, or REDO LOG BUFFER gets full, or buffer gets one-third full or every
3 seconds the 'LOG WRITER PROCESS (LGWR)' gives a SCN (system change number) to
these REDO LOG ENTRIES and write them from REDO LOG BUFFER to the REDO LOG FILES
in the REDO LOG DISCS.
Oracle can recover the database, table or period of time to its previous state
from REDO LOG FILES using 'server process'.
If make rollback then changes from REDO LOG FILES from disk will be deleted.
--3
When REDO LOG BUFFER is full, then Oracle starts writing to it starting from
beginning.
--4
Each creation of REDO LOG ENTRY have a cost.
Task 1.8. UNDO.
1. If use UPDATE what happens with changed data before commit and after. What is
mean UNDO data? What can other users see?
2. Explain logical containers that include blocks.
Solution:
--1
Before commit:
- original (old, before modifying) data copies to UNDO segment in UNDO
tablespace, it is called 'undo data'. Undo blocks uses for ROLLBACK, read
consistency (other users see only this 'undo data', but we can see changed
blocks for your session from BUFFER CACHE).
- another copy of this data written into the BUFFER CACHE and all the changes
are done in these blocks (the changes applied to the REDO LOG BUFFER before
the BUFFER CACHE).
After commit:
- original (old, before modifying) data stores in UNDO segments some time for
flashback feature.
In both cases new change details (new data) stores in REDO LOG BUFFER. Other
users can see new data after commit.
--2
Blocks > Extents > Segments > Tablespaces
Task 1.9.1. Architecture of Oracle database storage.
1. What is mean database storage. Explain storage elements and write extensions
and what are they store.
Solution:
--1
Storage is a disks. Storage include:
1) Control files - information about database physical structure. Without this
files cannot access to data in database. Extension is ".CTL".
2) Data files - stores tables, procedures, etc. ".DBF".
3) Online redo log files - stores redo log entries received from REDO LOG BUFFER.
".LOG".
4) Archived redo log files - to provide more space in Online redo log files redo
log entries constantly moves here. ".LOG".
5) Backup files - files on backup disks stores copies of actual data. Database
can work with it when accident occurs or use it for recovery. Should include:
".DBF", ".CTL", ".LOG" files or may be used data from dump file - ".DMP".
6) Parameter file - stores the configure data of the instance. "SPFILE.ORA" - on
server, PFILE (for example "INIT.ORA") - on client.
7) Password file - passwords of the admin users: sysdba, sysoper and sysasm.
"PWD.ORA".
8) Alert log (stores the log of messages and errors occured in the database) -
"ALERT.LOG" and trace files (stores the error trace info) - ".TRC".
Task 1.9.2. Logical and physical database structure.
1. List and explain elements of database logical structure (indicate physical
structure).
Solution:
--1
Oracle data block (physically stores in different OS blocks) size 2-32KB.
1) Extent - includes several consecutive data blocks. Stores specific type of
information.
2) Segment - includes multiple extents. Used for storing big data (tables,
indexes, etc). Can be: data segments, index segments, undo segments, temporary
segments (created to provide a temporary work area for the SQL statements to
complete the executions). Data and index can`t be stored in the same segments.
3) Tablespace - includes multiple segments. Two types:
- TEMPORARY. Contains temporary data (stores in the temporary files) stored only
for duration of a session.
- PERMANENT (physically stores in ".DAT", ".DBF" files, these files stores in OS
blocks). Contains persistent schema objects. UNDO tablespace never have 0
segments.
Database must have at least 2 tablespaces: SYSTEM, SYSAUX. Benefits:
- can make special operations for different tablespaces (enable online, offline
status; make one for recovery; export, import data of another tablespace);
- can assign user to tablespace with special size;
4) Database - includes multiple tablespaces.
Schema in database is logical container of an objects created by an user.
CHAPTER 2. "Tuning. Basic concepts of tuning"
Task 2.1. Tuning. Part 1.
1. What does 'runaway query' mean?
2. After what events makes tuning?
3. List the reasons of performance loss and write examples.
4. List characteristics of bad SQL and explain.
5. What is mean DB TIME?
Solution:
--1
Runaway query is a query whose execution time is taking longer than the execution
time estimated by the optimizer.
--2
- after checking the top consuming queries frequently;
- after any compliants of bad performance.
--3
1) Structural changes (change column, drop index);
2) Changes at the data (table become big or small);
3) Application changes (changed varchar2 parameter to number, stopped use bind
variables);
4) Old statistics (changed gather statistic from one per day to one per week);
5) Database upgrades (new database upgrades);
6) Changed database parameters (DBA increase or decrease PGA, SGA);
7) Operating system and hardware changes (decreased size of memory).
--4
1) Unnecessary parse time (validate query, then allocate PRIVATE SQL AREA for
this statement, then generating the execution plan);
2) Unnecessary I/O operations (read 100 blocks instead of 1 to get 1 row, for
example have not index);
3) Unnecessary CPU time (unnecessary joins (or wrong methods), sorting);
4) Unnecessary waits (CPU wait, network wait).
--5
Time on wait (CPU) + Time on execution.
Task 2.2. Tuning. Part 2.
1. Explain storing data using VARCHAR2.
2. Explain stages of processing SQL statement. What is mean (write synonyms) SOFT
and HARD PARSE? When does the database performs HARD PARSE always? Explain how
the SQL statement is stored and where.
Solution:
--1
Data in VARCHAR2 occupy memory as defined, for example 2 bytes, not 4000 bytes.
Also it is true for NUMBER, CLOB, etc.
--2
1) Syntax check - if incorrect, then return an error immediately.
2) Semantic check. Checks info about tables, columns in DATA DICTIONARY CACHE.
Return an error immediately.
3) Privilege check. Checks in DATA DICTIONARY CACHE user privileges on objects
from query. Return an error immediately.
4) Allocate PRIVATE SQL AREA. Creates this area in PGA for stores query result
(cursor). Stores rowid of execution plans stored in LIBRARY CACHE. If SQL
statement runs again, then existing execution plan will be found rapidly by rowid
(default keep 50 rowid`s) in the LIBRARY CACHE (instead of search hash).
5) Finding existing execution plan in SHARED SQL AREA from SHARED POOL.
SQL statement does not stores directly in SHARED SQL AREA (because it finite).
For that SQL code Oracle creates hash value and store it in LIBRARY CACHE.
Remember that small changes in query (for example swap columns in SELECT) makes
another hash.
5.1) SOFT PARSE (LIBRARY CACHE HIT) - if 1, 2, 3 stages passed and in stage 5
found execution plan.
5.2) HARD PARSE will be if related execution plan is not found. Database always
performs a HARD PARSE of DDL. That have stages:
- allocate memory in SHARED SQL AREA for write new execution plan. It have cost,
because have some processes or if SHARED SQL AREA is full then need delete aged
one;
- optimization. Optimizer calculates several execution plans (if current best
cost is small, the oprimizer ends explores alternative plans, because futher cost
improvments is not significant) and select best in one second (time can be
increased if change optimization level);
- row source generation process - gets the execution plan and generates
operations (how to get data from tables, join, sort, etc.) from plan.
6) Execute SQL code and return result to the user.
Task 2.3. Optimizer.
Suppose that you have query:
SELECT * FROM MY_TABLE WHERE MY_COLUMN = 2;
1. What is the Optimizer?
2. List the steps that the optimizer performs when executing the query. What is
mean selective query?
3. List an optimizer components. Explain. When is execution plan saved in memory?
4. What is mean cost basically?
Solution:
--1
Optimizer is a software used to find the best execution plan - those plan allows
execute query with minimum cost.
--2
1) Checks schema indexes from query.
2) Generates possible access paths (use indexes or read whole table).
3) Checks statistics for choose the best plan. It is mean checks statistics of
MY_TABLE for MY_COLUMN to determine selective MY_COLUMN or not. Selective query
is a query that returns 1%-4% of table and can be used index. If returns 25% of
table - unselective (will be read whole table).
4) Use plan for row source generator.
--3
1) Query transformer. Transform query to the best query for better performance.
2) Estimator. Estimates cost based on statistics from data dictionary (it is not
real cost, because statistics gathered in specific time, and data can be
changed).
3) Plan generator:
- generates plans using different access paths, join methods, join orders;
- for each of these plans estimates cost using estimator (from point 2);
- at the end chooses best plan with lowest cost and passes that plan to the
row source generator.
After row source generation execution plan stores in the SHARED SQL AREA.
--4
Cost = CPU usage + memory usage + disk usage for reading 1 block and then
multiply it with blocks to be read.
Task 2.4. Query transformer.
Do next operations:
DROP TABLE TAB1;
CREATE TABLE TAB1 (COL1 NUMBER PRIMARY KEY,
COL2 NUMBER UNIQUE,
COL3 VARCHAR2(100));
BEGIN
FOR i IN 1..1000000 LOOP
INSERT INTO TAB1(COL1, COL2, COL3) VALUES(i, i*2, i||'text');
END LOOP;
COMMIT;
END;
DROP TABLE TAB2;
CREATE TABLE TAB2 (COL2 NUMBER);
INSERT INTO TAB2(COL2) VALUES(500000);
INSERT INTO TAB2(COL2) VALUES(600000);
INSERT INTO TAB2(COL2) VALUES(700000);
COMMIT;
Then do next:
1. What in WHERE clause of query can prevent use index or reduce performance?
Write examples. Also write examples how to correct that query.
2. How to optimizer choose best query from point 1? When does the optimizer not
transform a query?
Solution:
--1
--1.1. Using OR in where clause may not use index:
SELECT *
FROM TAB1
WHERE COL1 = 500000
OR COL2 = 1200000
--use UNION ALL for activate indexes
SELECT * FROM TAB1 WHERE COL1 = 500000
UNION ALL
SELECT * FROM TAB1 WHERE COL1 <> 500000 AND COL2 = 1200000
--1.2. Nested query can reduce performance
SELECT * FROM TAB1
WHERE COL2 IN (SELECT COL2 FROM TAB2)
--Use JOIN statement for better cost
SELECT t1.*
FROM TAB1 t1
JOIN TAB2 t2
ON t1.COL2 = t2.COL2
--2
Optimizer calculate costs of queries and choose best query. Optimizer does not
transform query if a table is very small, instead it just uses full table scan.
Task 2.5. Selectivity. Cardinality.
Do next operations:
DROP TABLE TAB1;
CREATE TABLE TAB1 (COL1 NUMBER PRIMARY KEY,
COL2 NUMBER UNIQUE,
COL3 VARCHAR2(100));
BEGIN
FOR i IN 1..1000000 LOOP
INSERT INTO TAB1(COL1, COL2, COL3) VALUES(i, i*2, i||'text');
END LOOP;
COMMIT;
END;
Then do next:
1. What is selectivity? What is high and low selectivity? Where is it used?
Where is selectivity displays in execution plan?
2. What is cardinality? Where is it used? What are common reasons of estimating
cardinality incorrectly?
3. How to Oracle estimates selectivity and cardinality? What is dictionary`s view
helps us calculates it? How to estimate selectivity and cardinality for TAB1:
- WHERE COL1 = 5;
- WHERE COL1 = 5 AND COL2 = 8;
Solution:
--1
Selectivity = number of rows returning from the query / total number of rows.
Selectivity close to 0 - high (good) selectivity, if close to 1, then low (bad).
Selectivity uses in: estimating in I/O cost; influence on sort cost (if returns
many rows, then sort cost will be high).
Selectivity is not visible in execution plan.
--2
Cardinality is the number of rows returned by each operation in an execution
plan. Optimizer uses cardinality for choosing join method, estimate sort cost.
Missing statistics, inaccurate statistics or complex predicates.
--3
--At the points 1 and 2 is exact selectivity and cardinality
Selectivity = 1 / optimizer calculates distinct values for column from WHERE.
--If a histogram exists on the predicate column, then the estimator uses it
--instead of the number of distinct values
SELECT COLUMN_NAME, NUM_DISTINCT --NULL values ignores for calculate NUM_DISTINCT
FROM ALL_TAB_COLUMNS
WHERE TABLE_NAME = 'TAB1'
--Each column can have different selectivity (high or low).
-- WHERE COL1 = 5;
Selectivity = 1 / COUNT_OF_DISTINCT_VALUES_FROM_COL1
-- WHERE COL1 = 5 AND COL2 = 8; --need multiply selectivities
Selectivity = (1 / COUNT_OF_DISTINCT_VALUES_FROM_COL1) *
(1 / COUNT_OF_DISTINCT_VALUES_FROM_COL2)
Cardinality = total number of rows * selectivity.
Task 2.6. Cost. Row source generator.
1. What is mean estimation cost, for example if cost = 100?
2. What is use estimator to estimate the cost?
3. Write formula of calculating the cost. Explain elements.
4. What other costs are calculated? What is the most expensive cost?
5. What does creates a plan generator and a row source generator?
6. What is produce row source generator? What is mean row source?
Solution:
--1
Oracle must read 100 single blocks to execute our query.
--2
Disk I/O usage, CPU usage, memory usage.
--3
Cost =
(
(Single-block I/O Cost = number_of_single-block_reads * single-block_read_time)
+
(Multiblock I/O Cost = number_of_multiblock_reads * multiblock_read_time)
+
(CPU Cost = number_of_CPU_cycles / CPU_speed)
) / single-block_read_time
Single-block reading - reads block by block. Multiblock reading - reads multiple
blocks (when full table read or read a range of table) at a time to speed up the
reading. Number of CPU cycles - number of cycles for sorting, hashing, caching
for our execution. CPU Cost may vary if CPU busy.
--4
Memory reading cost. I/O cost is a haviest cost.
--5
Plan generator generates best plan and transfer it to row source generator. Row
source generator creates final execution plan.
--6
Generator generates iterative execution plan with row source tree. Tree includes
step by step row sources - areas with row sets: table, view, result of join or
grouping operation. Each step returns a row set, and that row set is used by the
next step.
Task 2.7. Steps for analyze bad performance query.
1. What are steps can be included in collecting data? Explain.
2. List tools for collecting performance data.
3. What are steps can be included in analyze of the query text?
4. List tools for get the execution plan.
5. What are steps can be included in analyzing execution plan?
6. What could be possible solutions of a bad performance query?
Solution:
--1
1) Examine execution plan;
2) Information of objects in the query: tables, indexes, columns, types, ...;
3) Collect both statistics: object and system. If old then update it or use
dynamic sampling for better plan. Or not our query, but another queries or
applications may highly load CPU;
4) Checks histograms. Histogram allows understand how many rows really contains
our distinct value in column: 3%, or 10%, or 60% of table;
5) Parameter settings. Changed by DBA.
--2
SQLT, DBMS_STATS, TKPROF, AWR report, etc.
--3
1) Check unnecessary business data, especially if PGA is small;
2) Check the predicates;
3) Check constructions: subqueries, joins, IN, OR.
--4
SQL trace, TKPROF, V$SQL_PLAN, DBMS_MONITOR, AWRSQRPT.SQL, etc.
--5
1) Checks access paths. Read whole table, use index, index type, ...;
2) Checks join order, join types. Can replace tables, change join type;
3) Compare actual and estimated number of rows. Update statistics for proper
selectivity, use hints for better plan;
4) Find steps where cost and the logical reads have big difference.
--6
1) Updating statistics. Update manually or makes another update frequency (per
day, per 12 hour, etc.);
2) Use dynamic statistics;
3) Create or re-create indexes, choose proper index;
4) Create index-organized tables;
5) Using hints.
6) Others (eliminate implicit data type conversions, creating function-based
indexes, use parallel execution, etc.).
CHAPTER 3. "Tuning. Statistics. Execution plan."
Task 3.1. Statistics. Part 1.
1. What is difference between explain plan and execution plan?
2. List types of statistics: explain purpose and where stores.
3. What does the optimizer statistics include?
4. How to Oracle collects optimizer statistics automatically and when by default?
Solution:
--1
Explain plan - based on estimates using statistics. Execution plan can be vary
after execute query if estimates wrong and after execute will be written to
SHARED SQL AREA.
--2
1) Optimizer statistics - created for the purposes of query optimization and are
stored in the data dictionary.
2) Performance statistics - visible through V$ views.
--3
Table statistics:
Number of rows
Number of blocks
Average row length
Column statistics:
Number of distinct values (NDV) in column
Number of nulls in column
Data distribution (histogram)
Index statistics:
Number of leaf blocks
Levels
Clustering factor
System statistics:
I/O performance and usage
CPU performance and usage
--4
Gathers missing and stale statistics with the job GATHER_STATS_JOB on all objects
in the database. Every night from 10 P.M. to 6 A.M. and all day on weekends.
Task 3.2. Statistics. Part 2.
Do next operations:
DROP TABLE TAB1;
CREATE TABLE TAB1 (COL1 NUMBER PRIMARY KEY,
COL2 NUMBER UNIQUE,
COL3 VARCHAR2(100));
BEGIN
FOR i IN 1..1000000 LOOP
INSERT INTO TAB1(COL1, COL2, COL3) VALUES(i, i*2, i||'text');
END LOOP;
COMMIT;
END;
Then do next:
1. What is default value of CPU single block reading?
2. How to collect system statistics manually and for which user allowed by
default? Which object stores information about system statistics and explain why
some values can be NULL?
3. Lists procedures to gather optimizer statistics manually for: database, SYS
schemas, one schema, one table, one index. Explain it. Which user can run it?
Solution:
--1
12 milliseconds.
--2
--SYS user can execute it by default. Do not forget connect by SYS to proper
--container, for example: ALTER SESSION SET CONTAINER=ORCLPDB;
EXEC DBMS_STATS.GATHER_SYSTEM_STATS('Start');
--some values from PVAL1, PVAL2 can be NULL if we not add additional parameters
--in GATHER_SYSTEM_STATS procedure
SELECT * FROM SYS.AUX_STATS$;
--3
--Gather statistics (may be a long time) available by SYS user:
--for all objects in all schemas in the database
EXEC DBMS_STATS.GATHER_DATABASE_STATS;
--for schemas SYS and SYSTEM (these statistics for database itself)
EXEC DBMS_STATS.GATHER_DICTIONARY_STATS;
--for a one schema
EXEC DBMS_STATS.GATHER_SCHEMA_STATS(OWNNAME => 'HR');
--for a one table
--CASCADE => TRUE - gather statistics for all indexes of table
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME => 'HR', TABNAME => 'TAB1',
CASCADE => TRUE);
--for a one index
EXEC DBMS_STATS.GATHER_INDEX_STATS(OWNNAME => 'HR', INDNAME => 'IND1');
--For HR. For certain object (not recommended, for backward compatibility)
ANALYZE TABLE TAB1 COMPUTE STATISTICS;
Task 3.3. Statistics views.
1. List dictionary views to see optimizer statistics. Explain for which objects
these statistics and list important columns for table statistics.
2. List views of system and session statistics.
Solution:
--1
SELECT TABLE_NAME,
LAST_ANALYZED --last date of DBMS_STATS.GATHER_TABLE_STATS for table
FROM SYS.DBA_TABLES --tables
SELECT PARTITION_NAME,
PARTITION_POSITION,--plan displays these numbers that server read in step
NUM_ROWS, --quantity of rows in table or partition
BLOCKS, --quantity of blocks in table or partition
AVG_ROW_LEN --the average row length in bytes, including row overhead
FROM SYS.DBA_TAB_STATISTICS
SELECT * FROM SYS.DBA_TAB_COL_STATISTICS--table columns
SELECT * FROM SYS.DBA_INDEXES --indexes
SELECT * FROM SYS.DBA_CLUSTERS --clusters
SELECT * FROM SYS.DBA_TAB_PARTITIONS --data of the partitions based on tables
SELECT * FROM SYS.DBA_IND_PARTITIONS --data of the indexes based on partitions
SELECT * FROM SYS.DBA_PART_COL_STATISTICS--column statistics for partitions
--2
SELECT * FROM SYS.V$SYSSTAT --system
SELECT * FROM SYS.V$SESSTAT --session
Task 3.4. Generating execution plan.
Do next operations:
DROP TABLE TAB1;
CREATE TABLE TAB1 (COL1 NUMBER PRIMARY KEY,
COL2 NUMBER UNIQUE,
COL3 VARCHAR2(100));
BEGIN
FOR i IN 1..1000000 LOOP
INSERT INTO TAB1(COL1, COL2, COL3) VALUES(i, i*2, i||'text');
END LOOP;
COMMIT;
END;
Then do next:
1. How to generate execution plan manually for 'SELECT * FROM TAB1' using EXPLAIN
PLAN? Explain where we can see result and what main parts in here?
2. Explain using AUTOTRACE for point 1. What privs user must have? How to display
AUTOTRACE status? How to disable AUTOTRACE? What are views uses in statistics?
When AUTOTRACE may work incorrectly?
Solution:
--1
--Plan table is a GLOBAL TEMPORARY TABLE
EXPLAIN PLAN FOR SELECT * FROM TAB1; --explain plan will be saved into PLAN_TABLE
--Result, complex for reading
SELECT * FROM PLAN_TABLE;
--Readable result based on selected rows from PLAN_TABLE. Here HASH value +
--binded explain plan, from LIBRARY CACHE.
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY());
--Also we can assign ID to query and find it in PLAN_TABLE
EXPLAIN PLAN
SET STATEMENT_ID = 'ID1'
FOR SELECT * FROM TAB1;
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY(STATEMENT_ID => 'ID1'))
SELECT * FROM PLAN_TABLE WHERE STATEMENT_ID = 'ID1';
--Or we can put result into our created table (usually not used)
EXPLAIN PLAN
SET STATEMENT_ID = 'ID1'
INTO MY_TABLE
FOR SELECT * FROM TAB1;
--2
--Autotrace - generates: rows, explain plan (from PLAN_TABLE), actual statistics.
--User (for example HR) must have privs:
--GRANT SELECT_CATALOG_ROLE TO HR; GRANT SELECT ANY DICTIONARY TO HR;
--in SQL PLUS (in SQL developer use F6 after highlighting query, also choose
--'Fetch all rows' in SQL Developer preferences for Autotrace) you must run
--query (or DML) after these commands for return result:
SET AUTOTRACE ON; --produce result: rows, explain plan, statistics
SET AUTOTRACE ON EXPLAIN; --rows and explain plan
SET AUTOTRACE ON STATISTICS; --rows and statistics
SET AUTOTRACE TRACE; SET AUTOTRACE TRACEONLY; --explain plan, statistics
SET AUTOTRACE TRACE EXPLAIN; SET AUTOTRACE TRACEONLY EXPLAIN; --explain plan
SET AUTOTRACE TRACE STATISTICS; SET AUTOTRACE TRACEONLY STATISTICS; --statistics
SHOW AUTOTRACE; --display status
SET AUTOTRACE OFF; --Successful disabling if have not any messages after execute
SELECT * FROM V$STATNAME; --statistics names
SELECT * FROM V$MYSTAT; --statistics values
--AUTOTRACE may work incorrectly producing nonexact execution plan when use bind
--variables.
Task 3.5. Dynamic performance views.
Do next operations:
DROP TABLE TAB1;
CREATE TABLE TAB1 (COL1 NUMBER PRIMARY KEY,
COL2 NUMBER UNIQUE,
COL3 VARCHAR2(100));
BEGIN
FOR i IN 1..1000000 LOOP
INSERT INTO TAB1(COL1, COL2, COL3) VALUES(i, i*2, i||'text');
END LOOP;
COMMIT;
END;
Then do next:
1. Where are the performance views stored?
2. List and explain what stores popular dynamic performance views (except
V$SQL_PLAN) using query 'SELECT * FROM TAB1 ORDER BY COL1 DESC' and demonstrate
how to find exact query in these views.
3. Explain V$SQL_PLAN using query 'SELECT * FROM TAB1 ORDER BY COL1 DESC' and
demonstrate how to find exact query in this view.
4. What is difference between PLAN_TABLE and V$SQL_PLAN?
Solution:
--1
In SHARED SQL AREA.
--2
--Views will be updated after executing query SELECT * FROM TAB1
--Statistics about memory usage for each executed query:
SELECT * FROM V$SQLAREA
WHERE UPPER(SQL_TEXT) LIKE UPPER('%SELECT * FROM TAB1 ORDER BY COL1 DESC%');
--Displays information about work areas (hash join area, sort area, etc.) used by
--SQL cursors and helps answer on next questions:
--What are the top 10 work areas that require the most cache area?
--For work areas allocated in AUTO mode, what percentage of work areas are
--running using maximum memory?
SELECT * FROM V$SQL_WORKAREA
WHERE SQL_ID = (SELECT SQL_ID
FROM V$SQLAREA
WHERE UPPER(SQL_TEXT) LIKE
UPPER('SELECT * FROM TAB1 ORDER BY COL1 DESC'));
--For simply checking costs, parse counts, logical and physical reads, etc. This
--view can updates every 5 sec. and it is allows analyze a long-running query:
SELECT * FROM V$SQL
WHERE UPPER(SQL_TEXT) LIKE UPPER('SELECT * FROM TAB1 ORDER BY COL1 DESC');
--Provides execution statistics at the row source level (it is mean for each
--step) for each child cursor:
ALTER SESSION SET STATISTICS_LEVEL = ALL;
SELECT * FROM TAB1 ORDER BY COL1 DESC;
SELECT * FROM V$SQL_PLAN_STATISTICS
WHERE SQL_ID = (SELECT SQL_ID
FROM V$SQLAREA
WHERE UPPER(SQL_TEXT) LIKE
UPPER('SELECT * FROM TAB1 ORDER BY COL1 DESC'));
--Memory usage statistics for row sources that use SQL memory (sort or hash-join)
--and displays comparison of optimizer estimates with the actual execution
--statistics for the row sources.
SELECT * FROM V$SQL_PLAN_STATISTICS_ALL
WHERE SQL_ID = (SELECT SQL_ID
FROM V$SQLAREA
WHERE UPPER(SQL_TEXT) LIKE
UPPER('SELECT * FROM TAB1 ORDER BY COL1 DESC'));
--3
--Contains execution plans including the estimated statistics for each row
--source:
SELECT * FROM V$SQL_PLAN
WHERE SQL_ID = (SELECT SQL_ID
FROM V$SQLAREA
WHERE UPPER(SQL_TEXT) LIKE
UPPER('SELECT * FROM TAB1 ORDER BY COL1 DESC'));
SELECT * FROM
TABLE(DBMS_XPLAN.DISPLAY_CURSOR(
(SELECT SQL_ID
FROM V$SQLAREA
WHERE UPPER(SQL_TEXT) LIKE
UPPER('SELECT * FROM TAB1 ORDER BY COL1 DESC'))
));
--4
In PLAN_TABLE stores the estimated plan, in V$SQL_PLAN - actual plan.
Task 3.6. Reading execution plan.
Do next operations:
DROP TABLE TAB1;
CREATE TABLE TAB1 (COL1 NUMBER PRIMARY KEY,
COL2 NUMBER UNIQUE,
COL3 VARCHAR2(100));
BEGIN
FOR i IN 1..1000000 LOOP
INSERT INTO TAB1(COL1, COL2, COL3) VALUES(i, i*2, i||'text');
END LOOP;
COMMIT;
END;
Then do next:
1. What columns displays execution plan using DBMS_XPLAN.DISPLAY_CURSOR for table
with partitions? Explain each of them. What else include in execution plan except
columns? Explain.
2. What is a difference between access predicate and filter predicate?
3. In which directions to read the query plan? Which data will operate Oracle on
each step?
4. What is mean driving table? How to Cost Based Optimizator determines that
table?
5. What is cost can be if displayed cost for 3 steps = 15 + 1 + 0? Explain.
Solution:
--1
ID - row number of execution plan.
OPERATION - name of operation.
NAME - row source name (name of table, index, etc.).
ROWS - estimated cardinality (quantity of rows returns by operation).
BYTES - amount of bytes read from disk or memory.
COST (%CPU) - cost + % of CPU usage.
TIME - time to perform the operation.
PSTART - Oracle start read data from this partition id.
PSTOP - Oracle stop read data to this partition id.
Predicate information displays row condtitions from operation. Will be retrieved
rows only satisfy these conditions.
--2
Access - we only "access" relevant data (for example when using an index we
"access" only to interest rows). Filter - when we get more data (for example all
rows) than we need and then filter it.
--3
Right to left and bottom to top using indentation level. Oracle will operate by
data retrieved from previous step only.
--4
Is a first table in execution plan. CBO uses the statistics and data dictionary
to determine which table to be used as the driving table.
--5
15. 0 mean that cost is very small, but not zero. 1 also mean that cost small and
sometimes Oracle does not calculate small cost for final cost.
CHAPTER 4. "Table and index scan"
Task 4.1. Index types.
1. List types of indexes and it`s subtypes. Explain what does it store.
2. List attributes of indexes.
3. Explain structure of index types from point 1. What is mean index entry?
4. How to Oracle can read data between leafs?
5. Which indexes cannot be used with LOB column and which can be used?
Solution:
--1
B-TREE indexes:
- normal (default): stores actual value of column(s);
- function-based: contain result of the stored function based on column(s);
- index-organized table (IOT): stores all columns from table in index. Related
data gets from the index directly.
BITMAP indexes.
B-TREE and BITMAP indexes can be both function based.
--2
- Key compression - compress indexes;
- Reverse key (uses only with equality operator) - stores the column values as
reversed;
- Ascending - descending ordered indexes. Orders values of column(s) in index by
ASC or DESC.
--3
'B-TREE index' have: root, branches, leafs. Root and branches contain intervals
of value(s) of column(s). Leafs have rows, every row have index entry = value of
column(s) + rowid.
'BITMAP index' (uses for columns with low selectivity, can stores NULL, can also
reduce storage space compared to other indexes).
Root and branches are the same as in BTREE type, but leafs stores a key (value
of column) + interval (start ROWID and end ROWID). Also have bitmap column that
have values 1 or 0. 1 mean that row with ROWID from interval have the key,
0 - not. So when using that type, then index returns key, then retrieves batch
of ROWIDs for this key, then retrieves rows using ROWIDs.
--4
Oracle can read next leaf using double links between leafs.
--5
You cannot build a B-tree or bitmap index on a LOB column, but can create a
function-based or a domain index.
Task 4.2. Table and index access paths.
1. What is mean table or index access paths?
2. List and explain table access paths.
3. List and explain index access paths.
Solution:
--1
It is mean how to read data from: table or index.
--2
'Table access full' (Full Table Scan) - reads all rows from a table and then
returns only required rows.
'Table Access by Rowid' - optimizer returns row from block using ROWID (it is a
fastest way).
'Sample Table Scan' - retrieves a random sample of data from a table or complex
select statement (in a query you must manually instruct optimizer use it).
--3
'Index Unique Scan' - returns 1 rowid, all column(s) in WHERE must have "=" sign
and primary key (or unique constraints or unique index). It is fastest way after
table access by ROWID.
'Index Range Scan' - optimizer use it if one or more leading columns of an index
are specified in conditions. It will read index leafs starting from the leading
entry and continue reading using links between leafs.
'Index Full Scan' - scan all index data if need order result or if use GROUP BY
clause. Or does not read every block in the index, but processes all blocks until
find first leaf block.
'Index Fast Full Scan' - data gets from index directly if query have columns only
from index.
'Index Skip Scan' - optimizer use it if initial column of a composite index is
"skipped" or not specified in the query. Optimizer skips the blocks which do not
have any key for our query, and reads only the blocks which have the keys.
'Index Join Scan' - optimizer use it if a hash join of multiple indexes together
returns all columns requested by a query and all data is retrieved directly from
the indexes.
Task 4.3. Table access full (FULL TABLE SCAN). Table access BY ROWID. SAMPLE TABLE
scan.
1. List typical reasons for FULL TABLE SCAN.
2. How to instruct Oracle read 4 blocks at a same time?
3. List typical reasons for table access by ROWID.
4. How to named SAMPLE TABLE scan in execution plan? Explain two forms, what can
be faster?
Solution:
--1
1) Have not suitable index;
2) The query predicate applies a function to the indexed column;
3) A SELECT COUNT(*) query is issued, and an index exists, but the indexed column
contains nulls;
4) Low selectivity;
5) Table statistics are stale (table was small, but become big);
6) Table is small;
7) The table has a high degree of parallelism;
8) The query uses a full table scan hint;
9) In predicate uses not leading column from B-tree index.
--2
DB_FILE_MULTIBLOCK_READ_COUNT = 4;
--3
1) We can use ROWID in WHERE clause for fastest row access.
2) After a scan of one or more indexes.
--4
TABLE ACCESS SAMPLE.
--Two forms (SAMPLE BLOCK can be faster):
SELECT t1.* FROM MY_TABLE SAMPLE (1) t1 --database reads random 1% of rows
SELECT * FROM MY_TABLE SAMPLE BLOCK (1)--database reads random 1% of table blocks
Task 4.4. B-Tree Index Access Paths. Part 1.
1. List typical reasons for INDEX UNIQUE SCAN.
2. List rules for using INDEX UNIQUE SCAN.
3. For which types of index available INDEX RANGE SCAN?
4. List typical reasons for INDEX RANGE SCAN.
5. Explain sorting for INDEX RANGE SCAN.
6. When INDEX RANGE SCAN not be uses.
Solution:
--1
Want return 1 row from indexed column with unique values.
--2
1)SQL must have all columns from Primary Key or Unique Key indexes and have not
another index (for example non-unique) on these columns with column order as in
these Primary Key or Unique Key (or you can use INDEX() hint if have 2 or more
indexes in these columns to choose proper index).
2) All columns from index must used with '=' operator.
--3
B-TREE, BITMAP. Unique, non-unique. Function-based.
--4
One or more leading columns of an index are specified in conditions with '<',
'>' signs, LIKE, BETWEEN or use '=' operator for rows with duplicates. Even if
returned 1 row.
--5
If in GROUP BY, ORDER BY use indexed columns RANGE SCAN will not do any sort,
because index sorted (not important ASCENDING or DESCENDING index). But it should
not have NULLs.
--6
When condition begin since '%' , for example: WHERE MY_COLUMN LIKE '%A'
Task 4.5. B-Tree Index Access Paths. Part 2.
1. List typical reasons for INDEX FULL SCAN.
2. List typical reasons for INDEX FAST FULL SCAN.
3. For which indexes INDEX FAST FULL SCAN can be.
4. What is difference between INDEX FULL SCAN and INDEX FAST FULL SCAN?
Solution:
--1
1) When in ORDER BY clause we have only non-nullable columns from index and want
return all rows.
2) GROUP BY clause include only indexed columns and want return all rows.
3) If in query use a SORT-MERGE JOIN.
4) A predicate references on non-leading column in the index and have GROUP BY,
ORDER BY with columns from index.
5) No predicate is specified, but all of the following conditions are met:
- All columns in the table and in the query are in the index;
- At least one indexed column is not null.
--2
1) Query uses columns only from index and read data directly from index.
2) Use hint INDEX_FFS.
--3
B-TREE and BITMAP.
--4
1) After INDEX FULL SCAN can read from table also. INDEX FAST FULL SCAN reads
only from index.
2) INDEX FULL SCAN reads blocks sequentially. INDEX FAST FULL SCAN reads multiple
index leaf blocks at a time, but in unsorted order.
3) INDEX FULL SCAN usually slower than INDEX FAST FULL SCAN, but sometimes if
need sort then will be faster.
4) INDEX FULL SCAN can eliminate sorting, INDEX FAST FULL SCAN - cannot.
Task 4.6. B-Tree Index Access Paths. Part 3.
1. List typical reasons for INDEX SKIP SCAN.
2. List typical reasons for INDEX JOIN SCAN.
3. What is alternative name of INDEX JOIN SCAN? Explain joining two indexes on a
single table.
4. When will INDEX JOIN SCAN not be performed?
5. What will be as result of INDEX JOIN operation?
Solution:
--1
1) The leading column of composite index is not specified in the query predicate.
2) Few distinct values exist in the leading column of the composite index, but
many distinct values exist in the second, third, ... column of the index.
--2
1) A hash join of multiple (2, 3, ...) indexes retrieves all data (columns)
requested by the query, without requiring table access.
2) The cost of retrieving rows from the table is higher than reading the indexes
without retrieving rows from the table.
--3
INDEX HASH JOIN (HASH JOIN). Before INDEX JOIN Oracle can use any index scans
(for example INDEX FAST FULL SCAN, INDEX RANGE SCAN), and then using a hash join
on the rowids obtained from these scans to return the rows.
--4
If write pseudocolumn ROWID in SELECT clause.
--5
Temporary view store result of join with values from indexes, not from table.
CHAPTER 5. "Join methods and types"
Task 5.1. Join methods. Part 1.
1. What is join combines and returns?
2. What is mean join method? List these methods and explain when it is efficient
and why. How much reads for driving table and joining table?
3. What is mean join type? List these types.
4. What are alternative names of driving table?
Solution:
--1
A join combines the output from exactly two row sources, such as tables or views,
and returns one row source.
--2
Join method mean how to join two row sources.
'NESTED LOOP JOIN'. Efficient when joining row sources are small and have indexes
on the join conditions, because there is no sort, no hash table. Driving table
reads once. Joining table reads for every row of the driving table.
'SORT MERGE JOIN'. As NESTED LOOP, but efficient if tables is big and (or) one
side is sorted - because two sides will be sorted. Both row sources reads once.
'HASH JOIN'. Efficient than SORT MERGE JOIN if both row sources are not sorted
and very big. Driving table reads once. Joining table reads once.
'CARTESIAN JOIN' (MERGE JOIN CARTESIAN). Efficient when one or more of the tables
does not have any join conditions. Driving table reads once. Joining table reads
for every row of the driving table.
--3
The join condition determines the join type. INNER JOIN, OUTER JOIN, SEMIJOIN,
ANTIJOIN, CARTHESIAN JOIN.
--4
External table, outer table.
Task 5.2. Join methods. Part 2.
1. List typical reasons for NESTED LOOP JOIN.
2. What is advantage of NESTED LOOP JOIN versus SORT MERGE JOIN, HASH JOIN and
why?
3. Which hint is used for NESTED LOOP JOIN? Explain which table at the hint will
be as driving table. How to specify any table as driving in NESTED LOOP JOIN?
4. What is mean NESTED NESTED LOOPS JOIN?
5. How is work NESTED LOOP JOIN?
Solution:
--1
- Two row sources are small;
- One row source is small and bigger row source has an index.
--2
NESTED LOOP returns first row faster (returns immediately) because SORT MERGE
JOIN needs to sort and HASH JOIN needs to create a hash table.
--3
--The optimizer try use the smallest row source as driving table: can be table1
--or table2. Use LEADING hint to specify the driving table.
USE_NL(table1 table2)
/*+ LEADING(tab2) USE_NL(tab1 tab2) */ --specify tab2 as the driving table
--4
The outer loop of a nested loop can itself be a row source generated by a
different nested loop. Oracle implement it for nested loop joins since 11g to
reduce overall latency for physical I/O.
--5
Firstly optimizer for each row from DRIVING table reads each row from INNER table
and if values matched then save that row from DRIVING table + ROWIDs from INNER
table in memory. Secondly using these ROWIDs join rows from INNER table.
Task 5.3. Join methods. Part 3.
1. List typical reasons for SORT MERGE JOIN.
2. What are two operations includes? How is works SORT MERGE JOIN?
3. What hint is used for SORT MERGE JOIN?
4. Which memory area uses for sorting and what about cost?
5. Why sometimes you may not see sorting in the execution plan?
Solution:
--1
- Two row sources are big and the join condition between two tables is not an
equijoin;
- Because of sorts required by other operations, the optimizer finds it cheaper
to use a SORT MERGE JOIN;
- When the hash table does not fit completely in memory - use SORT MERGE JOIN.
--2
SORT JOIN - always sorts unsorted joining keys from row sets. MERGE JOIN -
joins row sources basing its start position on the match made in the previous
iteration. For each row in the first data set, the database finds (since next row
after previous matching) a starting row in the second data set, and then reads
the second data set until it finds a nonmatching row.
--3
USE_MERGE(table1 table2)
--4
Uses sort area in PGA and if not enough size, then Oracle write sorted data into
the disks (it is increases cost).
--5
- Data already sorted;
- If an index exists, then the database can avoid sorting the first data set.
However, the database always sorts the second data set, regardless of indexes.
Task 5.4. Join methods. Part 4.
1. List typical reasons for HASH JOIN.
2. When is a HASH JOIN most cost effective?
3. How is works HASH JOIN?
4. What hint is used for HASH JOIN?
5. List typical reasons for CARTHESIAN JOIN.
6. When is a CARTHESIAN JOIN most cost effective?
7. How is works CARTHESIAN JOIN?
Solution:
--1
- Row sources are large, have only equijoin and not sorted.
--2
Most cost effective when the smaller data set fits in memory. Because the hash
table is in the PGA, Oracle can access rows without latching them in buffer
cache.
--3
The database performs a full scan of the smaller row source, called the 'build
table', and then applies a hash function to the join key in each row to build a
'hash table' (stores rows with ROWID and hash; one hash (if mulltiple rows have
same hash) can include multiple ROWIDs - called "slot") in the PGA. For second
table uses typically full scan and for each row applies same hash function to the
join column(s) and checkes this hash against the hash table. If hashes equal then
Oracle, using ROWID from hash table, finds row in first table and joins row from
second table.
--4
USE_HASH(table1 table2)
--5
- No join condition exists.
- The ORDERED hint specifies a table before its join table is specified: uses
ORDERED TABLE1, TABLE2, TABLE3, but no join condition connects TABLE1 and TABLE2.
--6
Not effective.
--7
The optimizer joins every row from one data source with every row from the other
data source.
Task 5.5. Join types. Part 1.
1. How to works SEMIJOIN?
2. What are constructions can be used with SEMIJOIN? What is construction
preferable replace by another construction for using SEMIJOIN?
3. List join methods that can be used with SEMIJOIN. Which is method generally
used? What are hints force Oracle to use SEMIJOIN?
4. What phrase can you see in execution plan if use SEMIJOIN?
5. How to works ANTIJOIN?
6. What are constructions can be used with ANTIJOIN? Explain difference between
these constructions.
7. List join methods that can be used with ANTIJOIN. Which is method generally
used? What are hints force Oracle to use ANTIJOIN?
8. What phrase can you see in execution plan if use ANTIJOIN?
9. List join types that can be used with HASH JOIN.
Solution:
--1
For each row from DRIVING table Oracle match each row from JOINING table until
first match. After first match Oracle returns matched row immediately without
matching remaining rows from JOINING table.
--2
IN and EXISTS can be used with SEMIJOIN. Replacing IN by EXISTS is recommended.
--3
SEMIJOIN can be used with all join methods. Generally uses the NESTED LOOPS
method. Place hints into EXIST subquery: HASH_SJ (hash semijoin), MERGE_SJ (merge
join semijoin), and NL_SJ (nested loop semijoin).
--4
We can see, for example, NESTED LOOPS (SEMI).
--5
For each row from DRIVING table Oracle match each row from JOINING table until
first match (if matched, then immediatelly start match next row from DRIVING
table against all rows from JOINING table) or if all rows from JOINING table does
not match with row, then returns this nonmatched row from DRIVING table in
result.
--6
NOT IN, NOT EXISTS: produces different results if columns from condition have
NULL.
--7
Generally uses the NESTED LOOPS method, also optimizer may choose another method.
Use in subquery (in inner row source) hints: HASH_AJ, NL_AJ, MERGE_AJ.
--8
We can see, for example, NESTED LOOPS (ANTI).
--9
All equijoin types.