Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Supervised learning algorithms make predictions based on a set of examples
  • Classification: When the data are being used to predict a categorical variable, supervised learning is also called classification. This is the case when assigning a label or indicator, either dog or cat to an image. When there are only two labels, this is called binary classification. When there are more than two categories, the problems are called multi-class classification.
  • Regression: When predicting continuous values, the problems become a regression problem.
  • Forecasting: This is the process of making predictions about the future based on past and present data. It is most commonly used to analyze trends. A common example might be an estimation of the next year sales based on the sales of the current year and previous years.

...

Model Free

Model-Based

 forego the potential gains in sample efficiency from using a model

Allows to plan ahead and look in possible results for a range of possible choices.

 easier to implement and tune.

Ground Truth Model for any task is generally not available.


If agents want to use a model then it has to prepare it purely from experience


fundamentally hard


being willing to throw lots of time 


High computation


Can fail off due to over-exploitation of bias

When would model-free learners be appropriate?

If you don't have an accurate model provided as part of the problem definition, then model-free approaches are often superior.

Model-based agents that learn their own models for planning have a problem that inaccuracy in these models can cause instability (the inaccuracies multiply the further into the future the agent looks).

If you have a real-world problem in an environment without an explicit known model at the start, then the safest bet is to use a model-free approach such as DQN or A3C.

The distinction between model-free and model-based reinforcement learning algorithms is analogous to habitual and goal-directed control of learned behavioral patterns. Habits are automatic. They are behavior patterns triggered by appropriate stimuli (think: reflexes). Whereas goal-directed behavior is controlled by knowledge of the value of goals and the relationship between actions and their consequences.

What's the difference between model-free and model-based reinforcement learning?

RL problem is formulated as Markov Decision Process.

Represents the dynamics of the environment, the way the environment will react to the possible actions the agent might take, at any given state.

Transition Function: Fn(state, action) → P(all next possible states)

Reward Function: Fn(state) → Reward

The Transition Function and Reward Function constitutes the "model" of the RL problem.

Therefore, MDP is the problem and Policy Function is its solution.

The model-based RL can be of two types, algorithm learn the model separately or model is provided.

Examples of Model: Rules of Games, Rules of Physics, etc.

Image Added

In the Absence of MDP, Agent Interacts with the environment and observes the responses of the environment. They don't have any transition or reward function, the algorithms purely sample and learn from the experience. They rely on real samples from the environment and never use generated predictions of the next state and next reward to alter behavior.

A model-free algorithm either estimates a "value function" or the "policy" directly from experience. A value function can be thought of as a function that evaluates a state (or an action taken in a state), for all states. From this value function, a policy can then be derived.

Examples: Robot Hand Trying to Solve Rubix Cube, Robotic Model learn to walk, etc. These tasks can be learned in a "model-free" approach.


There is no direct distinction b/w model-free and model-based by applications. The use of the algorithm depends upon if "model" is provided or can be learned then use "model-based" else use "model-free".



What to Learn in Model-Free RL

...

  1. Do you have very high computation power?
  2. Do you have lots of time to train an agent?
  3. Do you need your model to be self-explanatory, humans can understand the reasoning behind the predictions and decisions made by the model?
  4. Do you need your model to be easy to implement and maintain?
  5. Is it possible to try the problem several times and afford to make many mistakes?
  6. In your situation, do active and online learning of algorithms is possible i.e while learning by actions, explore new data space and then learn from such conditions and data?
  7. In your situation, Can the algorithm take sequential action and complete the task?
  8. Is it possible to define policy function, actions that the agent takes as a function of the agent's state and the environment.?
  9. Is it possible to define a function to receive feedback from actions, such that feedback helps to learn and take new action?
  10. Can you simulate an environment for the task so that algorithm can try lots of times and can make mistakes to learn?

Model-Based vs Model Free :

Do you have a probability function that helps you to select new actions based on current action?

If a model is not available, Is it possible to train such a model?