Vmware Architecture
Vmware Architecture
Vmware Architecture
INTRODUCTION
Introduction
Why are virtual machines interesting? They allow transcending of standard interfaces (which often seem to be an obstacle to innovation) They enable innovation in flexible, adaptive software & hardware, security, network computing (and others) They involve computer architecture in a pure sense Virtualization will be a key part of future computer systems A fourth major discipline? (with HW, System SW, Application SW)
August 2005
Abstraction
Software
Computer systems are built on levels of abstraction Higher level of abstraction hide details at lower levels Example: files are an abstraction of a disk
Application Programs
fileLibraries abstraction
Drivers Operating System Scheduler
file
Memory Manager
Controllers
Main Memory
Hardware
August 2005 VM Intro (c) 2005, J. E. Smith
4
Virtualization
Similar to abstraction
Except Details not necessarily hidden
virtualization file file
August 2005
The Machine
Execution Hardware
Memory Translation
Main Memory
August 2005
The Machine
Execution Hardware
Memory Translation
Main Memory
August 2005
The Machine
Execution Hardware
Memory Translation
Main Memory
August 2005
Virtual Machines
add Virtualizing Software to a Host platform and support Guest process or system on a Virtual Machine (VM) Example: System Virtual Machine
Applications Applications OS
Guest
OS Virtualizing Software Hardware "Machine"
VMM Host
Virtual Machine
August 2005
August 2005
10
August 2005
11
Provide a system environment Constructed at ISA level Persistent Examples: IBM VM/360, VMware, Transmeta Crusoe
guest process
HOST PLATFORM
August 2005
12
Process VMs
Execute application binaries with an ISA different from hardware platform Couple at ABI level via Runtime System Not persistent
Guest Runtime
Application Process
Host
Machine Hardware
Virtual Machine
August 2005
13
Guest processes may intermingle with host processes As a practical matter, guest and host OSes are often the same Same-ISA Dynamic optimizers are a special case Examples: IA-32 EL, FX!32, Dynamo
host process
guest process
runtim e
guest process
runtim e
guest process
runtim e
host process
create
HOST OS
file sharing
Disk
network communication
August 2005
14
HLL VMs
Java and CLI are recent examples Binary class files are distributed ISA is part of binary class format OS interaction via APIs (part of VM platform)
Java Binary Classes
Java VM Architecture
VM implementation Sparc Workstation VM implementation x86 PC VM implementation Apple Mac
August 2005
15
Co-Designed VMs
Perform both translation and optimization VM provides interface between standard ISA software and implementation ISA Primary goal is performance or power efficiency Use proprietary implementation ISA Transmeta Crusoe and IBM Daisy best-known examples
VLIW
August 2005
16
Composition
apps 1 OS 1 apps 2 OS 1
ISA 2
August 2005
17
Composition: Example
Java application JVM Linux x86 VMware Windows x86 Code Morphing Crusoe VLIW
August 2005
18
Summary (Taxonomy)
VM type (Process or System) Host/Guest ISA same or different
Process VMs System VMs
Java VM MS CLI
VM Intro (c) 2005, J. E. Smith
August 2005
19
Tutorial Topics
Introduction & VM Overview Emulation: Interpretation & Binary Translation Process VMs & Dynamic Translators HLL VMs Co-Designed VMs System VMs
August 2005
20
Key VM Technologies
Emulation: binary in one ISA is executed on processor supporting a different ISA Dynamic Optimization: binary is improved for higher performance
May be done as part of emulation May optimize same ISA (no emulation needed)
X86 apps Windows
HP UX HP Apps.
Alpha
HP PA ISA
Emulation
August 2005
Optimization
VM Intro (c) 2005, J. E. Smith
22
Definitions
NOTE -- there are no standard definitions Emulation:
A method for enabling a (sub)system to present the same interface and characteristics as another. E.g. the execution of programs compiled for instruction set A on a machine that executes instruction set B. Interpretation: relatively inefficient instruction-at-a-time Binary Translation: block-at-a-time optimized for repeated instruction executions
August 2005
23
Definitions
Guest
Environment that is being supported by underlying platform Underlying platform that provides guest environment
Guest
Host
supported by
Host
August 2005
24
Definitions
Original instruction set or binary I.e. the instruction set to be emulated Instruction set being executed by processor performing emulation I.e. the underlying instruction set Or the binary that is actually executed
Source
emulated by
Target
August 2005
Interpreters
Often simplified Some performance tradeoffs are different E.g. the significance of using an intermediate form
August 2005
26
Interpreter State
Code
. . .
Data
Reg n-1
Stack
Interpreter Code
August 2005
27
Decode-Dispatch Interpretation
while (!halt) { inst = code(PC); opcode = extract(inst,31,6); switch(opcode) { case LdWordAndZero:LdWordAndZero(inst); case ALU: ALU(inst); case Branch: Branch(inst); . . .} } Instruction function list
August 2005
28
August 2005
29
Decode-Dispatch: Efficiency
Decode-Dispatch Loop
Mostly serial code Several jumps/branches (some hard-to-predict) Approximately 20 target instructions Several loads/stores Several shift/mask steps Example: DEC/Compaq FX!32 Software pipelined decode-dispatch loop
VM Intro (c) 2005, J. E. Smith
August 2005
30
Binary Translation
August 2005
31
R1
R2
program counter
R3
R2 R5 R6
reg n
RN+4
August 2005
32
points to x86 register context block points to x86 memory image contains x86 ISA PC value holds x86 register %eax holds x86 register %edx etc.
VM Intro (c) 2005, J. E. Smith
33
PowerPC Target
addi lwzx add stwx mr addi
August 2005
;add 4 to %eax ;load operand from memory ;perform add of %edx ;store %edx value into memory ;move update value into %eax ;update PC (9 bytes)
34
August 2005
35
Dynamic Translation
First Interpret
And perform code discovery as a byproduct Incrementally, as it is discovered Place translated blocks into Code Cache Save source to target PC mappings in lookup table Execute translated block to end Lookup next source PC in table
If translated, jump to target PC Else interpret and translate
Translate Code
Emulation process
August 2005
36
Dynamic Translation
source binary
miss
translator
hit
August 2005
37
Flow of Control
Emulation Manager
translation block
translation block
August 2005
38
Can always update SPC as part of translated code Better to place SPC in stub
Emulation Manager
Code Block
General Method:
Hash Table
Translator returns to EM via BL Source PC placed in stub immediately after BL EM can then use link register to find source PC and hash to next target code block
... VM Intro (c) 2005, J. E. Smith ...
Code Block
August 2005
39
Example
x86 Binary
4FD0: addl movl sub jz 4FDC: add jmp %edx,(%eax) (%eax),%edx %ebx,1 51C8 %eax,4 4FD0 ;load and accumulate sum 9AC0: ;store to memory ;decrement loop count ;branch if at loop end ;increment %eax ;jump to loop top
9AE4: lwz add stw addic. beq bl 4FDC bl 51C8 stw xor bl 6200
PowerPC Translation
r16,0(r4) r7,r7,r16 r7,0(r5) r5,r5,-1 cr0,pc+12 F000 F000 r7,0(r6) r7,r7,r7 F000 ;load value from memory ;accumulate sum ;store to memory ;decrement loop count, set cr0 ;branch if loop exit ;branch & link to EM ;save source PC in link register ;branch & link to EM ;save source PC in link register ;store last value of %edx ;clear %edx ;branch & link to EM ;save source PC in link register
51C8:
9C08:
August 2005
40
Example
PowerPC Translation
9AC0:
HASH TABLE
SPC TPC link
1 2
9AE4:
3 4
;load value from memory ;accumulate sum ;store to memory ;decrement loop count, set cr0 ;branch if loop exit ;branch & link to EM ;save source PC in link register ;branch & link to EM ;save source PC in link register
51C8
6 8
9C08
//////
Emulation Manager
;retrieve address in link register ;load SPC from stub ;perform halfword shift left ;perform XOR hash ;finish hash - logical shift ;access at hash address w/update ;r30 points to map table base ;compare for hit ;use target address ;else follow hash chain
9C08:
10
F000: ;store last value mflr of %edx r20 lwz r20,0(r20) ;clear %edx slwi r21,r20,16 5 ;branch & link toxor EM r21,r21,r20 ;save source PC in link register srwi r21,r21,12 lwzux r26,r21,r30 CR0,r26,r20 CR0, run lookup_translate cmpw beq b
1 2 3 4 5 6 7 8 9 10
Translated basic block is executed 9 Branch is taken to stub Stub BAL to Emulation Mgr. EM loads SPC from stub, using link EM hashes SPC and does lookup EM loads SPC from hash tbl; compares Branch to transfer code Load TPC from hash table Jump indirect to next translated block Continue execution
7
run: lwz mtlr blr r27,4(r21) r27 ;read target address from table ;branch to next translated block
lookup_translate: follow hash chain, if hit, branch to TPC If miss, branch to translate
August 2005
41
Translation Chaining
translation block
translation block
EM
translation block
EM
translation block
translation block
translation block
translation block
August 2005
42
Form of Inline Caching Example Code: Say Rx holds source branch address
addr_i are predicted addresses (in probability order) Determined via profiling target_i are corresponding target code blocks target_1 goto target_2 goto target_3 ; do it the slow way
August 2005
43
Lazy evaluation as needed Floating point Decimal MMX Byte vs Word addressing Natural vs arbitrary Big/Little endian
VM Intro (c) 2005, J. E. Smith
44
Address resolution
Address alignment
Byte order
August 2005
Emulation Summary
Decode/Dispatch Interpretation
interpreter routines
Memory Requirements: Low Startup: Fast Steady State Performance: Slow Portability: Excellent
dispatch loop
August 2005
45
Emulation Summary
Binary Translation
Memory Requirements: High Startup: Very Slow Steady State Performance Fast Portability: Poor
binary translator
August 2005
46
Perform guest/host mapping at ABI level Encapsulate guest process in processlevel runtime Issues
host process
guest process
runtim e
guest process
runtim e
guest process
runtim e
host process
Memory Architecture Exception Architecture OS Call Emulation Overall VM Architecture High Performance Implementations System Environments
create
HOST OS
file sharing
Disk
network communication
August 2005
48
Process VM Architecture
Application Mem ory Im age
Initialization
Initialize signals
Profile Data
Interpreter
OS Call Em ulator
Exception Em ulation
August 2005
49
Runtime Components
Initialization
Allocate Memory Initialize runtime data structures Initialize all signals Implement replacement algorithm when cache fills Flush when required (e.g. self-modifying code) Translate OS Calls Translate OS Responses Handle signals If registered by source code, pass to emulated source handler If not registered emulate host response Form precise state
VM Intro (c) 2005, J. E. Smith
50
OS Call Emulator
Exception Emulator
August 2005
State Mapping
Guest register space fits inside host register space Guest memory space fits inside host memory space Best case does not always happen But often does (x86 on RISC)
Guest Data
Guest Code
August 2005
51
Software Mapping
Similar to hardware page tables/TLBs Slow, but can always be made to work
mapping table
Runtime Software
August 2005
52
Several instructions per load/store If guest address space + runtime fit within host space
Runtime Software
Runtime Software
+base addr
August 2005
(a)
53
Runtime must be protected from Guest process VM software mapping can be easily used
Place (and check) protection info in mapping table Runtime must be able to set privileges Protection faults should be reported to Runtime (So it can respond as guest OS would later) Requires some support from Host OS
August 2005
54
Host OS Support
A system call where runtime can set protection levels A signal mechanism where protection faults trap to handler in runtime SimOS Example
Map guest space to a file (map, unmap, read-only mapping) Signal (SIGSEGV) delivered to VM software on fault
references succeed references cause page faults
references succeed
Free Pages
Read-Only Mappings
August 2005
55
Self-Modifying Code
Write-Protect original code to catch code self-modification Exception handler invalidates old translations Be sure to make forward progress original code Pseudo self-modifying code may require optimizations
data
translated code
translator
Sometimes can rely on source binary to indicate self modification e.g. SPARC flush
VM Intro (c) 2005, J. E. Smith
write protected
August 2005
56
Self-referencing code
Original copy is maintained by translator All reads are with respect to original copy correct data is returned
data
translated code
translator
August 2005
57
Exceptions: Interrupts
Precise state easier than traps (because there is more flexibility wrt location)
Problem: Translated blocks may executed for an unbounded time period Solution:
Interrupt signal goes to runtime Runtime unchains translation block currently executing (eliminates loops) Runtime returns control to current translation Translation soon reaches end (and precise state is available)
August 2005
Exceptions: Traps
Runtime registers all trap conditions as signals If trap is architecturally similar in target in source then trap/signal may be used Otherwise interpretive method must be used
Semantic matching
August 2005
59
Interpretation: Easy source PC is maintained Binary translation: more difficult source PC only available at translation block boundaries
Trap PC is in terms of target code Target PC must be mapped back to correct source PC Use side table and reverse translate Can be combined with PC mapping table Requires search of table to find trapping block Reconstruct block translation to identify specific source PC of trapping instruction
Solution
August 2005
60
PC Side Table
source code code cache
block A
block B 3
side table
target PCs
Start PC A Block Formation Info Block Formation Info
Start PC B
. . .
block N
Start PC N
August 2005
61
Simple if target code updates register state in same order as source code
Register state mapping can be used to generate source register values Implement software version of reorder buffer or checkpoints
August 2005
62
Simple if target code updates memory state in same order as source code
Restricts optimizations (more difficult to back-up than register state) Most process VMs maintain original store order
August 2005
63
OS Call Emulation
Binary Translation
August 2005
64
OS Call Emulation
Syntactic translation only E.g. pass arguments in stack rather than registers Semantic translation/matching required Similar to inter-OS porting May be difficult (or impossible) OS deals with real world What if source OS supports a type of device that the target does not?
August 2005
65
Important tradeoff
Startup time -- Cost of converting code for emulation Steady state -- Cost of emulating Low startup, high steady state cost High startup, low steady state cost
2500 2000 1500 1000 interpretation 500 0 10 20 30 40 50 60 70 80 90 100 N - Number of Times Emulated binary translation
Interpretation:
Binary translation
August 2005
66
Staged Emulation
runtime = program runtime + translation overhead Higher optimization shorter program runtime Lower optimization lower overhead
Interpreter
Profile Data
Code Cache
Emulation Manager
Translator/ Optimizer
August 2005
67
Staged Emulation
General Strategy
1. 2.
3.
Begin interpreting For code executed above a threshold Use simple translation/optimization For translated code executed above a threshold Optimize more etc. Shade uses 1 and 2 Wabi uses 2 and 3 FX!32 uses 1 and 3 IA32-EL, UQDBT use 2 and 3
August 2005
68
Variable sized blocks Dependences among blocks due to linking No backing store; re-generating is expensive LRU replacement is typically not used (fragmentation problems)
August 2005
69
Simple, basic algorithm Gets rid of stale links if control flow changes High overhead for re-translating after flush
August 2005
70
Pre-emptive Flush
new translations
time
August 2005 VM Intro (c) 2005, J. E. Smith
71
Coarse-Grain FIFO
Code Cache Backpointer Tables
FIFO block A
Large fixed-size blocks Only backpointers among replacement blocks need to be maintained OR linking between large blocks can be prohibited.
FIFO block B
. . .
FIFO block D
August 2005
72
System Environment
High level of interoperability Seamless access to both guest and host processes Works best with same OS
guest process
runtim e
guest process
runtim e
host process
create
HOST OS
file sharing
Disk
August 2005
73
Encapsulation
At creation by loader DLLs at load time Host can create guest Guest can create host Guest can use guest or host Host uses only host
Guest Process create Guest DLL Host Process create Host DLL
Creation
DLLs
Guest Process
August 2005
74
Loaders
One for host processes One for guest processes Modify kernel loader Identifies type of binary, calls correct loader Requires modification of kernel loader Add code to guest binary when installed Invokes guest loader Requires local installation of guest binary Modify host process create_process API Invokes guest loader for guest binaries Modifies create_process in host binaries Used in FX!32
Approaches
August 2005
75
Persistence
One ABI instantiation Re-translate each time an ABI is initiated Multiple ABI instantiations Save translation/profile data on disk Is it faster to optimize or read from disk?
A lot of instructions can execute in a few milliseconds
August 2005
76
Example: FX!32
Follows typical model But, translations/optimizations are done between executions First execution of binary: interpret and profile Translate and optimize off line Later execution(s): use translated version, continue profiling Translations and profile data are saved on disk between runs Very time consuming optimization with x86 source Hybrid static/dynamic binary translation
Persistence
August 2005
77
Performance
(comparing 200 MHz Pentium Pro and 500 MHz 21164) Goal: same as high-end x86 Byte benchmark integer 40% faster than Pentium Pro Flt point 30% slower than Pentium Pro Achieves 70% of native alpha performance
August 2005
78
Optimization Example
Basic Block 1 ... ... R3 <- R7 <- ... R1 <- R2 + R3 Br L1 if R3 ==0
Superblock ... ... R3 <- R7 <- ... Br L2 if R3 !=0 R1 <- 0 ... Compensation code R1 <- R2 + R3
(a)
August 2005
80
Profiling
Predictability allows these statistics to be used for optimizations to be used in the future Profiling in a VM differs from traditional profiling used for compiler feedback
August 2005
81
Types of Profiles
Identify hot code blocks Fewer nodes than edges Give a more precise idea of program flow Block profile can be derived from edge profile (not vice versa)
A
65
Edge profiles
A 50 C
15
15 C 12 13 D 17 2 15
50
B 48
25
38 10
48
17
August 2005
82
Collecting Profiles
Instrumentation-based
Software probes Slows down program more Requires less total time Hardware probes Less overhead than software Less well-supported in processors Typically event counters Interrupt at random intervals and take sample Slows down program less Requires longer time to get same amount of data Not useful during interpretation
Sampling based
August 2005
83
. . .
K Call proc xyz L
. . .
K X Z L
August 2005
84
3 A
Trace 2 Trace 1
30
70 D 29 C 68 2 E 68
B
Trace 31
F 1
29 G15
97
August 2005
85
One entry multiple exits May contain redundant blocks (tail duplication) Commonly used in optimizing VMs
A A
G15
G15
August 2005
86
Superblock Formation
Start Points
When block use reaches a threshold Profile all blocks (UQDBT) Profile selected blocks (Dynamo) Profile only targets of backward branches (close loops) Profile exits from existing superblocks Use hottest edges above a threshold (UQDBT) Follow current control path (most recent edge) (Dynamo) Start point of this superblock Start point of some other superblock When a maximum size is reached When no edge above threshold can be found (UQDBT) When an indirect jump is reached (depends on whether inlining is enabled)
VM Intro (c) 2005, J. E. Smith
Continuation
End Points
August 2005
87
A B A B C
opt. A B C
comp comp
C
Collect basic blocks using profile information August 2005 Add compensation Schedule and Generate Convert to target code code; place in code intermediate optimize cache form; place in buffer VM Intro (c) 2005, J. E. Smith
88
Previous approach was in hardware OS independent section (BTgeneric) OS dependent section (BTlib) Fast binary translation (cold code) Optimized binary translation (hot code)
Two stages
August 2005
89
August 2005
90
IA-32 Optimizations
Floating point/MMX
IPF uses large flat register file IA-32 uses stack register file IA-32 TAG indicates valid entries IA-32 aliases MMX regs to FP regs
Speculate common case usage and put guard code at beginning of block Examples:
TOS (Top of Stack) same for all block executions No invalid accesses (indicated by TAG) 99-100% accurate
Data Misalignment
Similar to FX!32 See paper for details
August 2005
91
IA-32 EL Performance
Provides 65% performance (Gmean) mcf performs better because it has a 32-bit data footprint rather than 64-bits
August 2005
92
IA-32 EL Performance
SPEC mostly in hot code; very little overhead Sysmark only 45% hot code; 22% in OS (IPF code)
August 2005
93
Same-ISA Optimization
Many binaries are un-optimized or are at a low optimization level Translation at basic block level is identity translation Initial sample-based profiling is attractive Original code can be used, running at native speeds Patch code cache regions into original code Replace original code with branches into code cache (saves code some code duplication) Can avoid hash table lookup on indirect jumps
August 2005
94
A B patch
link
August 2005
95
no
interpret until taken branch lookup branch target in cache m iss start-of-trace condition?
hit
jum p to top of fragm ent in cache
yes
increm ent counter assoc. w ith branch target addr
no
counter value exceeds hot threshold?
signal handler
OS signal
em it into cache, link w ith other fragm ents & recycle the associated counter
yes
end-of-trace condition?
no
August 2005
96
Superblock Selection
Does not use hardware counters, PC sampling, or path sampling Interpreter performs MRET
Most Recently Executed Tail Associate a counter with superblock-start points If counter exceeds threshold then trigger instruction collection At superblock-end, collected instructions are hot superblock Concept: when an instruction becomes hot, the very next sequence will also be hot Simple, small counter overhead No overheads Problem if branch behavior changes Fragment cache is occasionally flushed
VM Intro (c) 2005, J. E. Smith
97
No profiling on fragments
August 2005
Prototype Implementation
Conservative optimizations
Allow recovery of state for synchronous traps Do not allow recovery of state Include Dead code removal Code sinking Loop invariant code motion
Aggressive optimizations
Start in aggressive mode, switch to conservative mode if suspicious code sequence is encountered Bail out for ill-behaved code
August 2005
Performance
Compare with +o2 Biggest gain from inlining and improved code layout Conservative opts help about as much as aggressive Some benchmarks bail-out
25 Percent speedup relativ e to nativ e +O2 execution aggressive conservative no optimization
20 15
10
5 0
ks im
eg
pr es s
i se
go
ue l ta bl de
li
rl
pe
rte
88
-10
August 2005
co
Av
-5
vo
er ag
ijp
bo
99
Performance
500 Native +O2 Native +O3 300 Native +O4 Native +O4 +P Dynamo +O2 Dynamo +O3 Dynamo +O4 Dynamo +O4 +P
400
200
100
ks im
eg
pr es s
go
i se
ue
li
rl
pe
vo rte
88
l ta
August 2005
co
de
Av
er ag
ijp
bo
bl
100
Performance Conclusions
Mostly useful for code optimized at low levels Dynamo ran on processor that stalled indirect jumps
Baseline is slow compared with most superscalar processors Dynamo removes indirect jumps via procedure inlining and inlined software jump prediction
On other modern processors there is a significant performance loss due to indirect jumps
August 2005
101
HLL VMs
Major difference is specification level: Virtual instruction set + libraries Instead of ISA and OS interface
HLL Program Compiler front-end Intermediate Code Compiler back-end Object Code (ISA) Loader Memory Image Traditional HLL Program Compiler Portable Code (Virtual ISA ) VM loader Virt. Mem. Image VM Interpreter/Translator Host Instructions HLL VM
August 2005
103
UCSD P-Code
Primitive libraries Machine-independent object file format Stack-based ISA A set of byte-oriented pseudo-codes Virtual machine definition of pseudo-code semantics
August 2005
104
Stack-based ISA Standard libraries BUT, Objective is application portability, not compiler portability Untrusted software (this is the internet, after all) Robustness (generally a good idea) => object-oriented programming Bandwidth is a consideration Good performance must be maintained Java VM Microsoft Common Language Infrastructure (CLI)
August 2005
105
Terminology
The instruction part of the ISA ISA + Libraries; a higher level ABI
August 2005
106
August 2005
107
Robustness: Object-Orientation
Objects
Data carrying entities Dynamically allocated Must be accessed via pointers or references Procedures that operate on objects Method operating on an object is like sending a message A type of object and its associated methods Object created at runtime is an instance of the class Data associated with a class may be dynamic or static
Methods
Classes
August 2005
108
Security
Remote System
A key aspect of modern network-oriented VMs Rely on protection sandbox Must protect:
application
Local System
August 2005
109
Protection Sandbox
class file class file class file class file
Remote resources
Protected by remote system Protected by security manager Protected via static/dynamic checking
local file local file
Local resources
loaded method
loaded method
loaded method
VM software
loaded method
native method
standard libraries
native method
loaded method
trusted
security agent
trusted
loader
trusted
August 2005
110
Tethered by references In architecture, memory is unbounded in size In reality it is limited During program execution, many objects are created then abandoned (become garbage) Due to limited memory space, Garbage should be collected so memory can be re-used Forcing programmer to explicitly free objects places more burden on programmer Can lead to memory leaks, reducing robustness To improve robustness, have VM collect garbage automatically
VM Intro (c) 2005, J. E. Smith
111
Garbage creation
Collection
August 2005
Network Friendliness
Load only classes that are needed Spread loading out over time Use stack-oriented ISA (as in Pascal) Metadata also consumes bandwidth, however Overall, it is probably a wash
August 2005
112
Java ISA
Includes
August 2005
113
Implied Registers
PC Stack Pointer etc. Locals Operands Objects Arrays (intrinsic objects) Constant pool holds immediates and other constant information
Stack
Heap
August 2005
114
Data Accessing
index Instruction stream opcode opcode opcode opcode opcode opcode opcode opcode operand operand operand operand operand operand operand implied implied index index
CONSTANT POOL
HEAP
Array
Object
STACK FRAME
Locals
Operands Object
Object
August 2005
115
Instruction Set
opcode
One byte opcode Zero or more operands Opcode indicates how many Instruction Current constant pool Current frame local variables Values on operand stack Distinguish storage types and computation types
opcode
index
opcode
index1
index2
opcode
data
opcode
data1
data2
August 2005
116
Instruction Types
Pushing constants onto the stack Moving local variable contents to and from the stack Managing arrays Generic stack instructions (dup, swap, pop & nop) Arithmetic and logical instructions Conversion instructions Control transfer and function return Manipulating object fields Method invocation Miscellaneous operations Monitors
VM Intro (c) 2005, J. E. Smith
117
August 2005
Data Movement
GLOBAL STORAGE
operand operand
operand
operand stack
ALU
August 2005
118
Bytecode Example
public int perimeter();
PC
0: 1: 2: 5: 6: 7: 8: 11: 12: 13: 14: 15:
August 2005
instruction
iconst_2 aload_0 getfield #2; iconst_0 iaload aload_0 getfield #2; iconst_1 iaload iadd imul ireturn //Field: sides reference
119
Stack Tracking
August 2005
120
Exception Table
Exceptions identified by table in class file Address Range where checking is in effect Target if exception is thrown
Operand stack is emptied Pop stack frame and check calling method Default handlers at main
To 12 Target 96 Type Arithmetic Exception
From 8
August 2005
121
Binary Classes
Formal ISA Specification Magic number and header Major regions preceded by counts
Access Flags This Class Super Class Interface Count Interfaces Field count Field Information Methods count
Methods
Attributes Count
Attributes
August 2005
122
An abstract entity that gives meaning to class files Has many concrete implementations
Hardware Interpreter JIT compiler An instance is created when an application starts Terminates when the application finishes
Persistence
August 2005
123
Memory
method area Java stacks native method stacks
heap
Garbage Collector
addresse s
PCs & implied regs
Execution Engine
August 2005
Method Area
Type information provided by class loader Contains objects created by program Every created thread gets a set Every created thread gets one Divided into Frames Contains state of method invocations for the thread Local variables, parameters, return value, operand stack Special area for implementation-dependent native methods
VM Intro (c) 2005, J. E. Smith
125
Heap Area
Java stacks
August 2005
Finds and imports binary information describing type Verifies correctness of type Allocates and initializes memory for class variables Resolves symbolic references to direct references Invokes initialization code
August 2005
126
A trusted class containing check methods Attached when Java program starts
Cannot be removed or changed Files, types of access, etc. Native methods that involve resource accesses (e.g. I/O) first call check method(s)
Operation
August 2005
127
Verification
To ensure security and protection Checks for magic number Checks for truncation or extra bytes Each component specifies a length Make sure components are well-formed Check valid opcodes Perform full path analysis Regardless of path to an instruction contents of operand stack must have same number and types of items Checks arguments of each bytecode Check no local variables are accessed before assigned Makes sure fields are assigned values of proper type
VM Intro (c) 2005, J. E. Smith
128
Internal Checks
Bytecode checks
August 2005
Java Side
Jav a HLL Program
Native Side
C Program
E.g. Java can call C program (and vice versa) Native routines allow access of Java data
inv oke nativ e method Nativ e Machine Code load/store Nativ e Data Structures
JNI get/put
obj ect
obj ect
array
August 2005
129
Interpretation
Simple, fast startup, but slow Compile each method when first touched Simple, static optimizations Find frequently executed code Apply more aggressive optimizations on that code Typically phased with interpretation or JIT Based on Hot-Spot compilation Use runtime information to optimize
VM Intro (c) 2005, J. E. Smith
130
Hot-Spot Compilation
Dynamic Compilation
August 2005
Microsoft CLI
Common Language Infrastructure Part of .NET framework Allows multiple HLLs and multiple Platforms Allows both verifiable and unverifiable modules (class files)
Verifiability is different from validity Unverifiable modules must be trusted by user Verifiable and unverifiable modules can be mixed (but the program becomes unverifiable)
August 2005
131
Compile
Compile
Compile
Compile
Verifiable Module
Verifiable Module
Verifiable Module
Unverifiable Module
X86 Platform
IA-64 Platform
August 2005
132
Object oriented Stack-based ISA Broader in scope ISA not designed for interpretation Module can be valid (but not verifiable), verifiable, or invalid Support for C-like pointers and un-typed memory blocks (not verifiable)
Some differences
August 2005
133
Memory architecture
Object model is less implementation-dependent No compatibility problems due to size limitations/differences Pointers very carefully controlled No rogue load/stores Exception checking is explicit (no masks) Operand stack imprecise within a method Locals imprecise if exception goes to higher level
VM Intro (c) 2005, J. E. Smith
134
Memory protection
Precise Exceptions
August 2005
No registers No condition codes Restricted, explicit control flow All code can be discovered at method entry Simply doesnt exist
Code discovery
August 2005
135
Co-Designed VMs
Co-Designed VMs
Design hardware and VM software concurrently and cooperatively Use proprietary target ISA
Applications
Or modified ISA
OS
Emulation/Translation Software
Cached Translated Code
Not compatibility
VMM
Hardware
August 2005
137
Concealed Memory
VM
concealed memory
conventional memory
August 2005
138
Precise Exceptions
All conventional software is unaware of underlying VM Code may undergo heavy duty re-organization E.g. CISC VLIW Have VMM periodically checkpoint state Consistent with a point in original binary On fault, rollback and interpret original binary Keep out-of-order results in scratch registers, update architected registers in-order I.e. software renaming
August 2005
139
set checkpoint
Translation Block B
restore checkpoint
Translation Block B
Source Code
trap
set checkpoint
Translation Block N
August 2005
140
As in Transmeta Crusoe Shadow copies of registers Gated store buffer Code divided into translation groups
Crusoe
x86
X86 regs
shadow
At commit point make shadow copy, release gated stores & establish new gate stores
Flush store buffer Backup with shadow registers Interpret forward until trap occurs Larger precise interrupt units => coarser grain optimizations, dead code elimination, etc. Store buffer size limits translation unit size
Crusoe
x86
X86 regs
shadow
Advantage:
On exception restore from shadow copy, squash gated stores & establish new gate for stores
Disadvantage:
August 2005
142
Major difference wrt Process VMs All page faults in guest must be accurately emulated Data accesses no problem
Detected via page table/TLB Fetches are from code cache, not guest memory Code cache pages are not related to guest pages
August 2005
143
Page Crossings
code cache guest pages
A B C D E D E probe page table F G H I J H I J page correctly mapped? yes no F G page correctly mapped? yes A B C
no
jump to VMM
continue execution
K L
jump to VMM
August 2005
144
Input/Output
August 2005
145
decompress
512 KBytes compressed VMM 2 MBytes VMM 8KB Local Inst. Mem.
64 KB D-Cache
August 2005
146
L1 I-Cache 64B LInes 8-w ay 64 Kbytes Local Program Mem ory 8Kbytes
F P
Shadow GPRs
ALU0
64 GPRs
August 2005
147
Crusoe Translation
Staged optimization
August 2005
148
Alias Hardware
Special load opcode records load address and loaded data size in table Special store opcode Checks specified (via mask) loads in table if conflict, triggers re-do of loads
store-under-alias-mask
August 2005
149
August 2005
150
Hardware independence Demonstrated by move to PowerPC Support robust/well integrated software Re-define conventional software boundaries Divided OS into implementation independent/dependent parts Architect object-orientation
August 2005
151
System /38 Proprietary implementation ISA AS/400 First, extend proprietary ISA Then transition to PowerPC ISA
OS/400 User Applications MI & LIC Translator; Implementation-Dep. OS IMPI Proprietary Platform PowerPC Platform Translator; Implementation-Dep. OS PowerPC OS/400 User Applications MI & LIC
August 2005
152