0% found this document useful (0 votes)

24 views8 pages

Exam RL 2022 Sample

Uploaded by

yigiblirujjjrxpthj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views8 pages

Exam RL 2022 Sample

Uploaded by

yigiblirujjjrxpthj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Reinforcement Learning 2022: Written Exam

SAMPLE 2022

Examiner: Aske Plaat

January 22, 2022

Reinforcement Learning Master Computer Science Leiden University 9 June
2022.
The exam text is in English. This is a multiple choice exam. Each question
has one best answer. Indicate your answer clearly for each question using.
The exam starts at 09.00 hrs and ends at 12.00 hrs. Participation in the
exam requires being present for at least 1 hrs.
There are 40 questions in the real exam, each best answer scores 1 point. The
total score is converted to a grade afterwards. THERE ARE 20 QUESTIONS
IN THIS SAMPLE EXAM.
No books, no electronic devices with communications functions, no smart-
phones, no smartwatches, no earphones, no cheating.
Write your name and student number clearly on your answer sheet.

Introduction
Question 1
Reinforcement Learning. Which of the following statements is true?
a. Reinforcement Learning learns a function from labeled examples in a pre-
existing dataset.
b. Reinforcement Learning learns the inherent relations between items in a
dataset.
c. Reinforcement Learning uses a number to score the quality of a state.
d. Reinforcement Learning environments are always programmed in Gym.

Tabular Value-Based
Question 2
What is the correct expression for the greedy policy?
a. π(s) = arg maxa s0 p(s0 |s, a)Q(s, a)
P

b. π(s) = arg maxa V (s)

c. π(s) = arg maxa s0 p(s0 |s, a)r + γ · Q(s, a)
P

d. π(s) = arg maxa Q(s, a)

Question 3
You are teaching an algorithm to choose the correct action based on the state
of an environment. Because you are lacking a dataset of states and the correct
action to choose, you are instantly updating your values based on the state and
reward pairs you are receiving from the environment. What kind of learning is
this?

1
a. Online Reinforcement Learning
b. Supervised Learning
c. Offline Reinforcement Learning
d. Transfer Reinforcement Learning

Question 4
A function that, based on the current state of the environment and a taken
action, determines the next state can be called:
a. Value-Function
b. Target-Function

c. Loss-Function
d. Transition-Function

Question 5
You have changed the gravity constant in the code of the Cartpole environment.
What is affected by this?

a. Reward-Function
b. State Space
c. Transition-Function

d. Action Space

Question 6
Mike is facing a problem. Due to some data corruption his (S, a, r, S’) tuple
only retained S’ and a. He would like to use this data to re-calculate S. What
do you tell him?
a. That is impossible!

b. Use the environment formula to solve for the missing variables.

c. You’re pretty sure you scrolled by some StackOverflow code recently that
solved that problem.
d. None of the above

2
Question 7
Given the Q-table below, entry values are Q values. The agent just took action
2 from state s1 to state s2, and got a reward of 1(the episode is not terminated
yet). According to the behavior policy, the agent will take action 2 in s2 for the
next step. The discount factor is set to 1 and the learning rate to 0.5. What
will the value for (s1, action2) be if you are using SARSA for updating?
state action 1 action 2
s1 3 5
s2 5 3

a. 8

b. 5.5
c. 4.5
d. 7

Question 8
What is the difference between tabular Q-learning and SARSA?

a. Q-learning is on-policy and SARSA is off-policy.

b. The target policy in Q-learning is e-greedy, but in SARSA it is greedy.
c. The behavior policy in Q-learning is e-greedy, but in SARSA it is greedy.

d. The behavior policy and the target policy are different in Q-learning, but
they are the same in SARSA.

Deep Value-Based
Question 9
Why is diversity important in learning?
a. Through de-correlation it improves stability in reinforcement learning
b. Through de-correlation it improves stability in supervised learning

c. Through correlation it prevents over-generalization in reinforcement learn-

ing
d. Through correlation it prevents over-generalization in supervised learning

3
Question 10
Which of the following DQN Extensions adresses overestimated action values?
a. Double DQN

b. Dueling DQN
c. Distributional DQN
d. Prioritized Action Replay

Question 11
Zhao is implementing a replay buffer for DQN and was wondering whether you
had some tips regarding sampling methods. Your recommendation is:

a. Use uniform sampling

b. Use prioritized experience replay

c. Compare both to find out which one works best on his problem, as their
performance varies per application
d. None of the above

Question 12
Which statement about the benefit of using DQN compared with tabular Q-
learning is True?(pick the most convincing reason)
a. DQN can better deal with high-dimensional input.
b. DQN outperforms tabular Q-learning.

c. DQN is faster.
d. DQN is more data-efficient.

Policy-Based
Question 13
A3C. Which is true?
a. A3C is an efficient, distributed, implementation of Actor Critic

b. A3C is an asynchronous algorithm, is calculates multiple results in parallel

c. A3C can be used for Atari Learning Environment
d. All of the above is true

4
Question 14
Pick the correct statement.
a. Value-based RL is primarily applicable to discrete action space, policy-
based RL is applicable to both discrete and continuous action spaces.
b. Value-based RL is primarily applicable to continuous action space, policy-
based RL is applicable to both discrete and continuous action spaces.
c. Policy-based RL is primarily applicable to discrete action space, value-
based RL is applicable to both discrete and continuous action spaces.
d. Policy-based RL is primarily applicable to continuous action space, value-
based RL is applicable to both discrete and continuous action spaces.

Model-Based
Question 15
Latent model. Which is true?
a. Latent variables are confounding variables; latent models in reinforcement
learning are based on these variables
b. Latent models train the model on value prediction
c. Latent models forego the actual models, and can therefore miss the policy
d. Latent models are like heuristics, they are based on rules of thumb

Two-Agent Self-Play
Question 16
Tabula Rasa vs Supervised. Which is true?
a. AlphaGo uses grandmaster games to learn by supervised learning. It also
uses supervised learning for MCTS rollouts. AlphaGo Zero is based on
this approach, and uses self-play.
b. AlphaGo uses grandmaster games to learn by supervised learning. It also
uses supervised learning for MCTS rollouts, and reinforcement learning
based on self-play games. AlphaGo Zero is based on this approach, and
uses self-play.
c. AlphaGo uses grandmaster games to learn by supervised learning. It also
uses supervised learning for MCTS rollouts. AlphaGo Zero is not based
on this approach, and only uses self-play.
d. AlphaGo Zero is a clean sheet software engineering design, which caused
the term: Tabula Rasa.

5
Question 17
How does UCT achieve trading off exploration and exploitation, which inputs
does it use? s
wj ln n
UCT(j) = + Cp
nj nj

a. The UCT formula balances winrate and “newness” in the selection of

nodes to expand. A low Cp is more exploitation, a high Cp is more explo-
ration.
b. The UCT formula balances visits and “newness” in the selection of nodes
to expand. A high Cp is more exploitation, a low Cp is more exploration.
c. The UCT formula balances visits and “newness” in the selection of nodes
to expand. A low Cp is more exploitation, a high Cp is more exploration.
d. The UCT formula balances winrate and “newness” in the selection of
nodes to expand. A high Cp is more exploitation, a low Cp is more explo-
ration.

Multi-Agent
Question 18
Counterfactual Regret Minimization. Which is true?
a. CFR achieves the minimax point in a two-agent competitive situation
b. The strongest two-agent Poker program, Libratus, is based on CFR. CFR
is a probabilistic algorithm.
c. The strongest multi-agent Poker program, Pluribus, is based on CFR
d. All of the above are true

Hierarchical
Question 19
What is intrinsic motivation?
a. An inner drive to explore
b. Named so to contrast it with classic extrinsic motivation (the conventional
RL reward signal)
c. Often related to model curiosity
d. All of the above

6
Meta-Learning
Question 20
What is pretraining?
a. Pretraining is what comes before posttraining: it initializes the network
of weights

b. Pretraining is using a part of the knowledge of a network for the target

network. It is followed by finetuning the target network
c. Pretraining is the opposite of posttraining. Posttraining finished the do-
main adaptation of the network

d. Pretraining is a difficult topic in deep learning, that can be solved by

transfer learning

Future
- No questions in the sample exam -

Answers
1c
2d
3a
4d
5c
6a
7c
8d
9a
10a
11c
12a
13d
14a
15b
16c
17a
18d
19d
20b

Case Study Pneumatic
100% (1)
Case Study Pneumatic
6 pages
Principles of Electrochemistry
100% (2)
Principles of Electrochemistry
497 pages
RL Viva
No ratings yet
RL Viva
30 pages
Assignment 9: Reinforcement Learning Prof. B. Ravindran
No ratings yet
Assignment 9: Reinforcement Learning Prof. B. Ravindran
3 pages
1 DRL Compre Regular
No ratings yet
1 DRL Compre Regular
12 pages
MTech ML MID2 OBJ
No ratings yet
MTech ML MID2 OBJ
10 pages
Notes For Module 4 and 5
No ratings yet
Notes For Module 4 and 5
9 pages
1, 2, 3 MCQ RL
No ratings yet
1, 2, 3 MCQ RL
15 pages
Reinforcement Learning Question Bank
No ratings yet
Reinforcement Learning Question Bank
11 pages
Reinforcement Learning Exam
No ratings yet
Reinforcement Learning Exam
6 pages
Week 12
No ratings yet
Week 12
22 pages
Unit 4 QP
No ratings yet
Unit 4 QP
19 pages
RL Unitwise Imp Questions
No ratings yet
RL Unitwise Imp Questions
4 pages
RL Concepts and Methods
No ratings yet
RL Concepts and Methods
8 pages
Practice Assignment 6: Reinforcement Learning Prof. B. Ravindran
No ratings yet
Practice Assignment 6: Reinforcement Learning Prof. B. Ravindran
24 pages
Question Bank 1
No ratings yet
Question Bank 1
2 pages
Reinforcement Learning 20CAE01
No ratings yet
Reinforcement Learning 20CAE01
2 pages
Bits
No ratings yet
Bits
5 pages
Question Bank - Reinforcement Learning
No ratings yet
Question Bank - Reinforcement Learning
3 pages
FunAI Assignment Week 12
No ratings yet
FunAI Assignment Week 12
3 pages
ML MCQs 4units
No ratings yet
ML MCQs 4units
30 pages
CSLM 621
No ratings yet
CSLM 621
2 pages
Unit 5-1
No ratings yet
Unit 5-1
8 pages
A12 Spring2024
No ratings yet
A12 Spring2024
5 pages
Midterm Report Example3
No ratings yet
Midterm Report Example3
4 pages
Rule-Based Reinforcement Learning Augmented by External Knowledge
No ratings yet
Rule-Based Reinforcement Learning Augmented by External Knowledge
7 pages
6S191 MIT DeepLearning L5
No ratings yet
6S191 MIT DeepLearning L5
62 pages
Module 1
No ratings yet
Module 1
72 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
17 pages
AIML MCQ All
No ratings yet
AIML MCQ All
20 pages
Reinforcement Learning Question Bank
No ratings yet
Reinforcement Learning Question Bank
5 pages
Q - Networks (1) 31 50
No ratings yet
Q - Networks (1) 31 50
20 pages
Serge Levine Course Introduction To Reinforcement Learning 3: RL Introduction
No ratings yet
Serge Levine Course Introduction To Reinforcement Learning 3: RL Introduction
46 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
32 pages
MLA Question Bank
No ratings yet
MLA Question Bank
25 pages
Reinforcement Learning Notes ?
No ratings yet
Reinforcement Learning Notes ?
40 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
38 pages
Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)
No ratings yet
Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)
11 pages
2 DRL Compre Makeup
No ratings yet
2 DRL Compre Makeup
12 pages
2022 Resit Solution
No ratings yet
2022 Resit Solution
12 pages
RL Unit - Iii
No ratings yet
RL Unit - Iii
20 pages
Unit 5 ML 3year
No ratings yet
Unit 5 ML 3year
17 pages
10 Deep Reinforcement
No ratings yet
10 Deep Reinforcement
40 pages
Lecture Notes On Reinforcement Learning Basics
No ratings yet
Lecture Notes On Reinforcement Learning Basics
6 pages
DL Unit 6 QP Solution
No ratings yet
DL Unit 6 QP Solution
15 pages
Unit 5d - Deep Reinforcement Learning
No ratings yet
Unit 5d - Deep Reinforcement Learning
52 pages
Op Tim Ization
No ratings yet
Op Tim Ization
19 pages
07 Deep Reinforcement Learning (John)
No ratings yet
07 Deep Reinforcement Learning (John)
52 pages
ML Mod-5
No ratings yet
ML Mod-5
11 pages
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
25 pages
Mlt-Cia Iii Ans Key
No ratings yet
Mlt-Cia Iii Ans Key
14 pages
What Is TD Learning
No ratings yet
What Is TD Learning
15 pages
Demonstration Final Presentation
No ratings yet
Demonstration Final Presentation
59 pages
NPTEL ML Assignment Week1
100% (4)
NPTEL ML Assignment Week1
5 pages
Solution ML KOE - 073 PUT (7th Sem 2024-25) Neeru
No ratings yet
Solution ML KOE - 073 PUT (7th Sem 2024-25) Neeru
14 pages
Unit 6
No ratings yet
Unit 6
34 pages
Unit 5
No ratings yet
Unit 5
45 pages
Reinforcement LN-6
No ratings yet
Reinforcement LN-6
13 pages
Deep Learning
No ratings yet
Deep Learning
45 pages
Unit-5 and Unit-6 QP
No ratings yet
Unit-5 and Unit-6 QP
13 pages
13-RL DRL
No ratings yet
13-RL DRL
102 pages
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
From Everand
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
Idea Link
No ratings yet
Munshi Premchand's Stories
No ratings yet
Munshi Premchand's Stories
6 pages
Alcuaz v. PSBA
No ratings yet
Alcuaz v. PSBA
4 pages
Senior High School: First Semester S.Y. 2020-2021
No ratings yet
Senior High School: First Semester S.Y. 2020-2021
8 pages
Analysis and Design of Substructures Limit State Design Balkema Proceedings and Monographs in Engineering Water and by Swami Saran 0415418445 PDF
No ratings yet
Analysis and Design of Substructures Limit State Design Balkema Proceedings and Monographs in Engineering Water and by Swami Saran 0415418445 PDF
5 pages
Amit Kumar DL 02
No ratings yet
Amit Kumar DL 02
3 pages
San Miguel Foods Inc V San Miguel Corporation Supervisors and Exempt Union
No ratings yet
San Miguel Foods Inc V San Miguel Corporation Supervisors and Exempt Union
3 pages
Grade 9 Mughal Empire Original
No ratings yet
Grade 9 Mughal Empire Original
2 pages
Resume Jennifer Lathrop
No ratings yet
Resume Jennifer Lathrop
3 pages
How Do Leaders Get Selected Paper
No ratings yet
How Do Leaders Get Selected Paper
33 pages
殷墟賓組卜辭正反相承例研究
No ratings yet
殷墟賓組卜辭正反相承例研究
100 pages
Overview Thesis
No ratings yet
Overview Thesis
35 pages
Sarat Babu Agasti E-Mail:: 9686585839 (Bangaloreno) /09704200251
No ratings yet
Sarat Babu Agasti E-Mail:: 9686585839 (Bangaloreno) /09704200251
6 pages
Measurements and Instrumentation (ELE 3202)
No ratings yet
Measurements and Instrumentation (ELE 3202)
4 pages
Chapter 02b Available Solar Radiation
No ratings yet
Chapter 02b Available Solar Radiation
37 pages
People v. Fabian Urzais y Lanurias
No ratings yet
People v. Fabian Urzais y Lanurias
5 pages
Examining The Evidence For Chytridiomycosis in Threatened Amphibian Species
No ratings yet
Examining The Evidence For Chytridiomycosis in Threatened Amphibian Species
4 pages
Bender Gestalt Test
67% (3)
Bender Gestalt Test
2 pages
Eucharistic Celebration Template
No ratings yet
Eucharistic Celebration Template
7 pages
Tips For Passing The CISSP Exam
No ratings yet
Tips For Passing The CISSP Exam
6 pages
The Princess of Etharia - Kelly Wightman
No ratings yet
The Princess of Etharia - Kelly Wightman
319 pages
Paypal User Agreement: About Your Account
No ratings yet
Paypal User Agreement: About Your Account
85 pages
Katie Verhoeven The Equity Literacy Case Analysis Worksheet Case Study 10.2: English-Only
No ratings yet
Katie Verhoeven The Equity Literacy Case Analysis Worksheet Case Study 10.2: English-Only
2 pages
Chemengthermo Tutorial 022 K 17
No ratings yet
Chemengthermo Tutorial 022 K 17
4 pages
P.2 Third Term Holliday Package.
No ratings yet
P.2 Third Term Holliday Package.
34 pages
Penis Enlargement
50% (4)
Penis Enlargement
8 pages
Phoneme Cards: Teaching Resource
100% (1)
Phoneme Cards: Teaching Resource
25 pages
Grade 12 Ucsp
No ratings yet
Grade 12 Ucsp
2 pages
Breakdown Color Legend: Type Color Description Script Examples
No ratings yet
Breakdown Color Legend: Type Color Description Script Examples
5 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Exam RL 2022 Sample

Uploaded by

Exam RL 2022 Sample

Uploaded by

Reinforcement Learning 2022: Written Exam

Examiner: Aske Plaat

January 22, 2022

b. π(s) = arg maxa V (s)

d. π(s) = arg maxa Q(s, a)

b. Use the environment formula to solve for the missing variables.

a. Q-learning is on-policy and SARSA is off-policy.

c. Through correlation it prevents over-generalization in reinforcement learn-

a. Use uniform sampling

b. Use prioritized experience replay

b. A3C is an asynchronous algorithm, is calculates multiple results in parallel

a. The UCT formula balances winrate and “newness” in the selection of

b. Pretraining is using a part of the knowledge of a network for the target

d. Pretraining is a difficult topic in deep learning, that can be solved by

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.