Machine learning terms/Reinforcement Learning: Difference between revisions
(Created page with "*action *agent *Bellman equation *critic *Deep Q-Network (DQN) *DQN *environment *episode *epsilon greedy policy *experience replay *greedy policy *Markov decision process (MDP) *Markov property *policy *Q-function *Q-learning *random policy *reinforcement learning (RL) *replay buffer *return *reward *state *state-action value function *tabular Q-learning *target network *...") |
(No difference)
|
Revision as of 16:45, 26 February 2023
- action
- agent
- Bellman equation
- critic
- Deep Q-Network (DQN)
- DQN
- environment
- episode
- epsilon greedy policy
- experience replay
- greedy policy
- Markov decision process (MDP)
- Markov property
- policy
- Q-function
- Q-learning
- random policy
- reinforcement learning (RL)
- replay buffer
- return
- reward
- state
- state-action value function
- tabular Q-learning
- target network
- termination condition
- trajectory