|
How AI
Would Work
by
Paul Almond
Introduction
This article will outline how I think artificial intelligence (AI)
should work. In previous articles I have proposed an approach for
making AI, but I have not yet outlined it in a single article.
This article will give an overview of the proposed AI system. This
could be met with the challenge that it is easy to specify things
in general terms like this, but there are the previous, more detailed
articles. This article is not a substitute for these, but it should
make my approach more accessible.
This article will also provide a discussion of how the carpet
texture problem can be resolved in this system. I have not previously discussed
this.
Readers may wish to compare the concepts in this article, and preceding
articles, with the hierarchical system proposed by Jeff
Hawkins [1,2].
The Basic Idea
The approach involves a probabilistic hierarchy of meaning
extraction by partial model algorithms from inputs and outputs in which low
level probability values imply higher level probability values.
Meaning extraction is abstraction and is the finding of patterns.
Downward transfer of probabilities does the reverse: it transmits
information down the hierarchy, causing high level probability
values to influence low level probability values and fill in details.
Planning uses the same probabilistic hierarchy of meaning extraction.
There is no separate planning system, as in the Hawkins system,
in which there is a separate action planning hierarchy "closely-coupled" with
the meaning extraction hierarchy, in which actions are planned
in an abstract way and then become more detailed during transmission
down the hierarchy: in this system there is only one hierarchy
that does both modelling and planning. Planning is not a distinct
process from modelling. This approach is based on the idea that
planning of behaviour is prediction of behaviour: a system that
makes a predictive model of the world, when allowed to observe
its own behaviour as just another part of reality, will automatically
make models that incorporate planning of its future actions. The
system's future behaviour is based on predictive modelling from
its past behaviour, with the addition of a separate system, the
output vectoring system, constrained by its predictions of its
own future behaviour, giving it a "direction"(but not
doing the real planning, which is done in the probabilistic modelling
hierarchy).
The use of computing resources in the modelling hierarchy needs
to be prioritized; otherwise the system would expend resources
on extracting irrelevant patterns from reality. This is dealt with
by the designation of some of the system's own outputs as special
control outputs which, instead of being directed at the outside
world, are turned inwards and used to control the prioritization
of computing resources in the probabilistic hierarchy.
Now that I have summarized the approach I will provide a short
discussion of each of the concepts on which the AI system is based.
I will then expand on this in the rest of the article.
A Short Overview of the AI System
Conceptual Hierarchy
The conceptual hierarchy is an idealized structure used to represent
reality with hierarchical extraction of meaning by partial model
algorithms. Each partial model algorithm can be applied for different
indices. Each meaning is a binary digit (bit). The conceptual hierarchy
is based on input and output events that are fed into the bottom
level an AI system and involves hierarchical extraction of meaning.
It should be noted that outputs are fed into the bottom level of
the system, together with inputs, so that the system's own actions
are part of what it observes to make its model of reality: this
is important with regard to how the system plans. A single input
or output in the hierarchy is actually an input or output event – an instance of a conventional input or output occurring at an
instant in time. That is to say, a conventional input changing
over time is represented in the hierarchy by a number of inputs
– each one corresponding to its state at a single instant. Because
the conceptual hierarchy is based on inputs and outputs at different
instants of time it contains temporal patterns. The conceptual
hierarchy cannot actually be used in an AI system.
Functional Hierarchy
The functional hierarchy is a probabilistic version of the conceptual
hierarchy that is actually used in an AI system. Each meaning bit
in the conceptual hierarchy has a probability in the functional
hierarchy. The functional hierarchy uses a probabilistic version
of the meaning extraction implied by partial models in the conceptual
hierarchy to transmit information up the hierarchy, producing higher
levels of abstraction. Information is transmitted down the hierarchy
by downward transfer of probabilities, a process by which probability
values at high levels of the hierarchy affect those at low levels,
equivalent to "filling in the details." Some of the
input and output values in the bottom level of the functional hierarchy
will relate to input and output events that have not occurred yet,
but downward transfer of probabilities will influence their probability
values. This is equivalent to the hierarchy making predictions
of future inputs and outputs.
Planning of Actions
There is no division between planning and modelling. Planning
of behaviour is prediction of behaviour. Planning of actions occurs
in the functional hierarchy itself: there is no separate planning
system. As downward transfer of probabilities will affect the probabilities
for future input and output events in the bottom level of the hierarchy,
the natural behaviour of the system involves predicting its own
future outputs along with events in reality. The system's previous
behaviour establishes patterns that direct its future behaviour.
For this to work the system needs to be set on the path of desirable
behaviour: it needs a direction. This is provided by the output
vectoring system, which performs a tree search of possible future
outputs to determine the outputs to make now which put the system
in the best situation for later obtaining a high score from a situational
evaluation function. The search by the output vectoring system
for optimum, immediate behaviour is constrained by the probability
values for the output events in the bottom level of the functional
hierarchy. This is how the functional hierarchy's probabilistic
predictions of the system's future behaviour are used to direct
the planning of future behaviour.
This approach to planning can be shown by a simple example.
Suppose we have a conventional AI system, controlling some sort
of robot, and its modelling system determines, based on modelling
from previous data, that there is a 60% chance of rain tomorrow.
In a conventional system, this information would be passed to a
planning system, which may then decide to take an umbrella out.
In this sort of system, however, things are different. As well
as determining that there is a 60% chance of rain, the modelling
system may also determine, based on the same kind of modelling
of previous observations that gave rise to the prediction of rain,
that there is an 80% chance that the system will take an umbrella
out tomorrow. This prediction of the taking out of an umbrella
is equivalent to the decision to do it! This prediction of future
action is part of the information sent to the output vectoring
system to constrain it. The output vectoring system has most of
the work done for it when given predictions like this. This example
is simplistic and, in reality, the predictions would not refer
specifically to future actions like taking umbrellas: they would
refer to future output bit values.
Use of Outputs to Control Internal Prioritization
One issue needing addressing is the carpet texture problem – the
problem of how to prioritize processing resources in the functional
hierarchy. The approach taken is to designate some of the system's
outputs as special prioritization control outputs which, instead
of being used for motor outputs in the outside world, are used
to control prioritization of computing resources in the functional
hierarchy, determining which partial model algorithms are in use
for particular indices. This approach deals with the carpet texture
problem by using same planning capabilities that are used to manipulate
the system's situation in the outside world.
See figure 1 for an overview of the AI system.

Now that I have
provided a short discussion of each of the concepts in the
AI system, I
will provide a more detailed description – although it
will still be an overview.
The Conceptual Hierarchy
What the Conceptual Hierarchy Is
The conceptual hierarchy is an idealized, layered structure used
to describe reality. Each layer consists of meanings extracted
from the layer below. Extraction of meaning is abstraction and
pattern detection. Each extracted meaning is a single bit.
Meaning Extraction Algorithms
The way in which meanings are extracted is specified by partial
model algorithms (meaning extraction algorithms). A partial
model algorithm processes bits in a layer and generates a value
of "0" or "1" for
a bit in the layer above. A single partial model algorithm may
be used for a number of bits with different indices when the meaning
for those bits is extracted in the same general way. Partial models
algorithms are described in a previous article Occam's
Razor Part 6: Partial Models as "Envelopes" [3].
Inputs and Outputs in the Conceptual Hierarchy
The bottom level of the conceptual hierarchy is the interface
between the hierarchy and reality. It consists of input and output
bits.
Each input or output bit corresponds to an input or output event.
An input or output event is not the same as a conventional input
or output in a computer: it is the state of one of the system's
inputs or outputs at a single instant in time.
This means that any conventional input or output which changes
over time – such as a light sensor which gives an input of "1" when
light strikes it and "0" otherwise – is represented over
a period of time in the hierarchy by a number of input or output
bits, each of which corresponds to the state of the input or output
at a single instant and has a value of "0" or "1."
All the inputs that the system can receive and the outputs that
it can make are represented in the system by input bits and output
bits.
No Uncertainty in the Conceptual Hierarchy
Because all of the input and output events that occur, at any
time, are in the bottom level of the hierarchy it is incoherent
to consider
anything happening to change the information in the hierarchy
later.
Temporal Nature of Meanings in the Hierarchy
All bits at higher levels in the hierarchy are derived by partial
model algorithms from the input and output bits at the bottom
level and these describe a temporal situation – the change
in the inputs
and outputs over time – so the meanings corresponding
to bits in the higher levels in the hierarchy are also temporal.
Inclusion of Outputs into the Hierarchy
The bottom level of the hierarchy contains output bits as well
as input bits. The hierarchy is therefore extracting temporal
meanings from input events and output events. As far as the hierarchy
is
concerned, output events are treated in the same way as input
events.
The bottom level of the hierarchy contains all the observations
of reality that the system makes. This includes any inputs, but
the system also observes its own outputs and these observations
are just another form of input into the bottom level.
No Distinction Between Inputs and Outputs At Higher Levels
Because the bottom level of the hierarchy contains bits corresponding
to input events and output events, all meaning at higher levels
is derived from input events and output events. Partial model
algorithms do not distinguish between inputs and outputs.
A single extracted meaning bit at a high level could be derived
from a number of input bits and a number of output bits and another
high level extracted meaning bit could be extracted from a number
of lower level extracted meanings, each being derived from a
number of input bits and output bits. It is incoherent to talk
about whether
a particular high level meaning is extracted from inputs or outputs.
Above the bottom level of the hierarchy the idea of differentiation
between inputs and outputs is incoherent: there are just extracted
meanings.
Static Nature of the Conceptual Hierarchy
The inputs and outputs in the bottom level of the hierarchy do
not change as the inputs and outputs of a real AI system would
change. This is because each input or output bit in the hierarchy
corresponds to an input or output event and the bits corresponding
to any future input or output events are already in the hierarchy.
The bottom level of the hierarchy does not describe the states
of the system's inputs or outputs at some instant of time, but
at all instants of time.
The bottom level of the hierarchy is atemporal. Higher level
meanings are just extracted from this so the entire conceptual
hierarchy
is hierarchical and does not change over time: the hierarchy
is static.
Although the hierarchy as a whole is atemporal it does contain
temporal information.
Idealism of the Conceptual Hierarchy
The conceptual hierarchy does not exist in a computer: it is
purely a conceptual structure. It assumes omniscience of the
computer
containing it and leaves no room for uncertainty and its static
nature leaves no scope for a system containing it even to act.
Ideas like this are discussed in more detail in Occam's Razor
Part 7: Hierarchy and Ontology [4].
The Functional Hierarchy
What the Functional Hierarchy Is
The functional hierarchy is a representation of the conceptual
hierarchy which can be implemented in a computer. It forms the
basis of AI systems. The purpose of the functional hierarchy
is to provide a model of reality. As shall be discussed later,
modelling
encompasses planning so the functional hierarchy is the basis
of everything in an AI system.
Probabilistic Nature of the Functional Hierarchy
Whereas the conceptual hierarchy has no uncertainty, the functional
hierarchy contains uncertainty. Where the conceptual hierarchy
contains bits, each with a value of "0" or "1" to
describe input or output events at the bottom level or extracted
meanings at higher levels, the functional hierarchy contains a
probability for each of these bits. Each probability value in the
functional hierarchy is the probability that the corresponding
bit in the conceptual hierarchy is "1."
Where the Uncertainty Comes From
In the conceptual hierarchy, the values of all input and output
bits are known, but this is impossible for the functional hierarchy:
some of the input and output bits correspond to future input
and output events and their values cannot be known yet.
Input and output bits for future input or output events can
only be described by probabilities. Any meanings extracted
from such
bits by partial models must also be probabilistic. Because
of this injection of uncertainty, all input, output and extracted
meaning
bits in the functional hierarchy are described by probability
values rather that values of "0" and "1."
If an input or output bit corresponds to a future input or
output event then its default probability is 0.5. This can
be changed,
however, by downward transfer of probabilities – a process
that will be discussed later.
Probabilistic Interpretation of Partial Model Algorithms
Partial model algorithms do not have any probabilistic nature:
a partial model algorithm simply processes the bits on one level
and generates an extracted meaning bit at a higher level. With
the functional hierarchy, however, we no longer have actual bit
values, but just probabilities. This means that partial model
algorithms cannot be executed: we do not know the bit values
that they take
as data.
Instead of executing partial model algorithms, we make a probabilistic
interpretation of them. This means examining the probability
values for the bits from which the partial model algorithm
will extract
its meaning and the code in the partial model algorithm itself
and determining the probability that the partial model algorithm
would return a value of "1" for the higher level
extracted meaning bit. This probability is then placed in the
functional
hierarchy where the corresponding higher level, extracted meaning
bit would be in the conceptual hierarchy.
Fixing of Input and Output Bits
Input and output bits corresponding to future input or output
events have uncertainty, but there is no uncertainty about
input or output
bits corresponding to past input or output events: the value
that the bit had in the event is obviously known as the event
has happened.
In the functional hierarchy, input or output bits corresponding
to input or output events in the past are fixed, meaning that
they can only have one of two probability values: 0 if the
bit is known
to have a value of "0" or 1 if the bit is known to have
a value of "1."
When a bit has been fixed its probability value cannot change
again: its actual value is now known. A probability value of
0 or 1 leaves
no scope for further change, as it implies that there is no certainty
about the bit's value.
Downward Transfer of Probabilities
Low level probabilities imply high level probabilities and probabilistic
interpretation of partial model algorithms deals with this, but
high level probabilities also imply low level probabilities:
information about probability values needs to be transmitted
down the hierarchy
and this is downward transfer of probabilities.
While transfer of probability information up the hierarchy generates
abstract, high level meanings, these high level meanings imply
the filling in of blanks at a low level.
As an example, if we have the sequence 0,1,0,1,0,? Then a high
level, abstract meaning would be that this is a sequence of
bits that alternates: 010101010101010101010, etc. Such a high
level
meaning simply describes the entire sequence as an object and
makes no reference to individual numbers in the sequence. This
higher
level meaning implies that the next number in the sequence
is "1":
the high level meaning influences what we know about things
at a lower level.
If we did not know for certain that this was the sequence,
but had only a probability of this, then our assertion that
the next
digit is "1" would be probabilistic, and if we already
had a probability for the next digit being "1" then
the probability that this is a sequence of alternating bit
values would
influence this probability. This is what is going on with downward
transfer of probabilities.
Downward Transfer of Probabilities as Prediction
Because the hierarchy contains temporal meanings then downward
transfer of probabilities also has a temporal aspect: some of
the probability values affected by downward transfer of probabilities
will be input and output bits corresponding to future input and
output events. This is equivalent to making predictions about
the
future inputs that will be received and the outputs that will
be made.
The system is trying to predict its own outputs
Downward transfer of probabilities, when it affects the bottom
level of the hierarchy, is equivalent to the prediction of future
inputs and outputs. It may seem normal for the system to try
to predict its future inputs as this just means predicting what
is
going to happen in the world, but what about the system trying
to predict its own outputs? This may seem strange, because the
system is in control of its future outputs: it is trying to predict
its own behaviour.
It is natural for the system to try to predict its own behaviour.
This is an inevitable result of feeding the system's outputs
into the bottom level of the hierarchy so that it can observe
them.
The system needs to predict what will happen in the worl, and
its own behaviour is part of the world. If we declared it invalid
for
the system to do this then this would be a contrivance. A lot
of the future inputs that the system will receive will be contingent
on its future behaviour anyway, so trying to predict these would
really involve indirect prediction of the system's own behaviour.
Rather than try to dismiss the issue of the system predicting
its own behaviour in some contrived way, the explicit inclusion
of
the system's own behaviour into its modelling of the future,
by inclusion of output events in the bottom level of the hierarchy,
is necessary to allow planning of behaviour to work: there is
a
close link between prediction of behaviour and planning of behaviour
and this will be discussed later in this article.
Relationship Between the Conceptual and Functional Hierarchies
The conceptual and functional hierarchies are basically the same
hierarchy. The functional hierarchy is the conceptual hierarchy
expressed probabilistically because of limited knowledge about
it. This limitation in knowledge comes from some of the bottom-level
input and output bits relating to input and output events that
will occur in the future.
When input and output bits in the functional hierarchy become
fixed they always agree with the conceptual hierarchy. This is
the closest
we can get to actually seeing the conceptual hierarchy.
An aim of probabilistic interpretation of partial model algorithms
and downward transfer of probabilities is to reduce the uncertainty
in the functional hierarchy as much as possible to make it as
similar as possible to the conceptual hierarchy.
Differences Between the Functional Hierarchy and Bayesian Networks
In Bayesian networks the relationships between variables are
probabilistic. The conceptual hierarchy does not feature probability:
abstract
meanings are generated by partial model algorithms in a deterministic
way. Probability is introduced in the functional hierarchy, but
there is still no explicitly stated probabilistic relationship
between variables.
Uncertainty is only introduced into the system in input and output
bits corresponding to future input and output events and this
uncertainty is propagated through the system through the processes
of probabilistic
interpretation of partial model algorithms and downward transfer
of probabilities.
Use of this sort of probabilistic hierarchy is described in Occam's
Razor Part 8: Modelling in Artificial Intelligence [5].
Planning of Actions
How Planning Works
Planning is the process by which the system determines what outputs
it should make. An important part of the planning process is
the hierarchy's natural tendency to predict the system's own
outputs,
which was mentioned previously and is a consequence of including
output bits into the bottom level of the hierarchy from which
higher level meanings are derived.
What I will now describe is only one approach to planning. Other
methods may be possible, but they should all have the same general
characteristics of using the statistical predictions by the functional
hierarchy of the system's own behaviour to provide statistical
constraint in some kind of secondary system which searches for
optimum behaviour.
Planning in this system involves four main elements:
- • the functional hierarchy
• the output vectoring system
• constraint of the output vectoring system
• the situational evaluation function
The functional hierarchy has already been described. Descriptions
of the other elements will now follow:
Output Vectoring System
The output vectoring system performs a tree search involving possible
future outputs. This tree search has some similarity with the tree
searches performed by chess algorithms [6,7].
The purpose of the tree search is to determine the consequences
of different sequences of outputs. This is not so that a particular
sequence of outputs can be selected, but so that the best value
for the next output to be made can be determined. Each output alters
the situation in reality and alters the affects of any subsequent
outputs, so we have to model future sequences of outputs, just
as chess algorithms have to model future sequences of moves.
The tree search run by the output vectoring system involves making
outputs in chronological order. This means that the output vectoring
system must consider the output bit corresponding to the next output
that the system will make, followed by the output bit corresponding
to the next output event that the system will make after this and
so on.
Each output bit corresponds to a node in the search tree
and each decision about whether the output should be "0" or "1" corresponds
to a branch. The output bit corresponding to the next output event
to occur is the top node in the search tree. From this node, branches
for "0" and "1" values being selected
for that output go to other nodes. At the end of each of
these branches
is the next node, corresponding to the next output bit
in chronological order.
Each path down through the search tree terminates at some node.
In the simplest search trees, the criterion for termination is
search depth; for example, a simple chess algorithm may explore
hypothetical futures three moves ahead. The criterion for termination
of this search tree is not simply based on search depth, but involves
a form of constraint of the output vectoring system that depends
on the functional hierarchy.
Simulation and Propagation
The above planning process is one of simulation of the
effects on the hierarchy. At each node there is a choice of two
values for a particular output event –"0" or "1."
When a particular output is chosen – that is to say,
when the decision has been taken to follow a branch in the
search
tree
– the effects
of following that branch are simulated in the functional
hierarchy. This means that:
• The probability value of the relevant
output bit in the functional hierarchy is set to one of two values – 0
if the selected output
is "0" or 1 if the selected output is "1."
• The effects of this probability change are propagated
up the hierarchy by meaning extraction by partial
models and down the
hierarchy by downward transfer of probabilities. This will
automatically cause probability values to change in the bottom
level
for future
input and output events – equivalent to predictions
of the future experiences and behaviour of the
system being made based
on the
decision to make that output.
All of these changes are simulated:
they are not permanent in the functional hierarchy. Any change to
an output value, and the changes
propagated up and down the hierarchy as a result,
only apply below the node in the search tree for which the changes
were made. This
is the situation that exists in chess algorithms:
hypothetical moves made at some node in a search tree are only in
effect below
that node.
One way of dealing with this is to make the changes
due to selection of an output value and its propagation
in the original functional
hierarchy, but also to store the previous probability
values, so the changes can be undone when reverting
to a previous node in
the search tree, a type of method widely used in
chess algorithms.
Constraint of the Output Vectoring System
Constraint of the search tree involves allowing it
to reach greater depths on those paths that are more
likely to occur, based on the
predictions of the system's own future outputs that
have been made in the functional hierarchy.
A value known as the cut-off probability is defined.
The cut-off probability has a real value between
0 and 1.
A value known as the running probability is maintained
during the search. The running probability is initially
set to 1.
Each time the search tree goes down a new branch
(that is to say, each time we add a new output to
the sequence), we multiply the
running probability by the probability of the corresponding
output bit having that value, according to the bottom
level of the hierarchy,
as follows:
If a search tree branch involves making an output
of "1" (assigning
a probability of 1 to the relevant output bit) then the running
probability is multiplied by the probability stored for this output
bit in the functional hierarchy. If the search tree branch involves
making an output of "0" (assigning a probability of "0" to
the relevant output bit) then the running probability is multiplied
by the probability of the output having a value of "0" -
which is 1 minus the probability stored for this output
bit in the functional hierarchy. The resultant value
becomes the new
running probability.
Any path through the search tree terminates when
the running probability is less than the cut-off
probability. Following any termination
of the search tree the desirability of the hierarchy
at the terminating node is assessed by the situational
evaluation function.
Situational Evaluation Function
The situational evaluation function is an algorithm
that examines the probabilities in input values at
the bottom level of the hierarchy
and returns a score indicating the desirability of
the situation. A similar concept is used in chess
algorithms [8]. The situational
evaluation function can examine the probabilities
for input bits relating to input events in the past,
present or future.
The situational evaluation function is similar to
the evaluation functions used in chess algorithms,
except that instead of being
applied to a chess position it is applied to the
input bits in the bottom level of the hierarchy.
The purpose of the situational evaluation function
is to allow the desirability of the system's situation
to be assessed after
the making of hypothetical sequences of outputs.
The score returned by the situational evaluation
function at a terminal node is "backed-up" the search
tree. This means that it is sent up to the node above.
This is an approach common
in chess. There will have been two branches leading down
from that node – one for the "0" output
being selected and one for the "1" output –
so two backed-up scores will be received by the node.
The higher
of these two scores is selected
as the score for that node and this score in turn is
backed up so that, ultimately, scores are available for
values of "0" and "1" being
selected for the earliest (next) output event bit in
the search tree.
Where Planning Occurs
It may seem that the output vectoring system is functioning
as a planning system. In fact, in another article [9]
I also referred
to it as "the planning system", but most
of the system's planning is occurring outside the output
vectoring
system. The
system's predictions of its own future behaviour, and
the use of these predictions to constrain the output
vectoring
system's
search
for optimum behaviour, mean that the predictions of
behaviour in the hierarchy are driving planning. Planning
and prediction
are
closely linked and we can say, for an AI system:
Planning of behaviour is prediction of behaviour.
Justification of the Planning Approach
Why should this approach cause desirable behaviour?
The hierarchical system automatically generates predictions
of the system's future
observations. If the system were observing some other
intelligent agent in the environment then the hierarchy
would automatically
predict the future intelligent behaviour of this
other agent. There is nothing special about agents
in the environment, so a predictive
modelling system like the functional hierarchy should
model their future behaviour along with anything
else in reality. This would
be done simply by modelling, without recourse to
any special planning system. This is significant
because prediction of the future behaviour
of another agent's planning means that the modelling
system has the capability to plan. All that the system
needs to do to exploit
this is to do it for its own behaviour. This is done
by treating the system's outputs as inputs and feeding
them into the bottom
level of the hierarchy, so that the system can observe
its behaviour and will automatically predict its
future behaviour – which is
equivalent to planning its future behaviour.
The Role of the Output Vectoring System in Improving Behaviour
An obvious question about this is why planning of
behaviour based on prediction of future behaviour
should be of any use. If the
system were already acting intelligently then modelling
of its future behaviour by the hierarchy should predict
(and therefore
plan) future intelligent behaviour, but what if the
system is initially behaving stupidly? Would modelling
of its future behaviour not
just produce more stupid behaviour?
This is where the output vectoring system comes in.
Even if the functional hierarchy is not making any
useful predictions
about
the future behaviour of the system, the output vectoring
system will still attempt to make the best outputs:
it just will not
have the benefit of any effective constraint. This
will still be better
than nothing and after this behaviour has occurred
for a while the functional hierarchy will be predicting
it
and probability
values for the outputs in the bottom level of the
hierarchy will be constraining the output vectoring
system's
search appropriately.
This means that behaviour that was previously the
best behaviour that the output vectoring system could
find
is now the "default" behaviour
for the system, meaning that there is an inbuilt preference
in the output vectoring system for it. The output vectoring
system
will still search for the optimum behaviour, but the
type of behaviour previously found to be optimum is
now readily
available
to it so
it can use its processing resources to find an improvement
in this behaviour. This means that the behaviour can
get slightly
better.
This does not need to stop here. When the system
has been managing to make improvements for a while
then this improvement itself is
part of the behaviour of the system and the functional
model will start predicting future outputs in which
this improvement continues! The functional hierarchy will eventually start improving
the system's behaviour by itself, because the best
model of what it is going
to observe next, based on what it has observed previously,
is a smarter system. It should be clear from this
that, although vital,
the role of the output vectoring system in what is
going on is actually quite shallow.
Prediction of behaviour is not simplistic
What I have said here could be misunderstood as saying
that the system stores past behaviour and repeats
it in a simple way, and
that the system can only continue past behaviour
and not originate any new behaviour.
This would involve a misunderstanding of what it
would mean to say that the future behaviour is a
continuation of past behaviour.
It simply means that the future outputs and past
outputs are related to each other by being described
by the same predictive model in
the functional hierarchy. There is no reason why
this model should be simple with the future outputs
being easily discernible from
past outputs. The model could actually be complicated.
Some people would still argue with this, saying that,
even if the future behaviour and past behaviour are
linked by
a complicated
model it is still the same model, so the system can
never really originate anything new, but this is
just semantics
with the meaning
of "originate anything new". Such people should probably
reject the idea of computational intelligence in general. In any
computer system, future behaviour must be linked to past behaviour
by being generated by the same algorithm and there is no reason
to think that our own behaviour is not similarly limited. Some
people may talk about "algorithms that can change themselves" or "computers
that can alter their own hardware" but thinking that such
things are profound is a fallacy, as in any such system an invariant
algorithm could be used to describe the process of such an algorithm
changing itself. There is no reason to think that "original" human
thoughts do not occur within the same limitations,
a subject discussed by Hofstadter [10].
No Separate Planning System
There is no distinction between modelling and planning.
Planning has been absorbed into the modelling system
and the functional
hierarchy is used for both prediction and planning.
This is different to other approaches in which there
are separate planning and modelling systems. It should
not
be confused with
approaches in which the planning system and modelling
system use the same principles and are closely linked together.
As an example,
the approach suggested by Hawkins uses a "sensory" hierarchy
based on extraction of meanings from inputs to produce
a model, with successive layers of the hierarchy providing
more abstraction.
A similar "action" hierarchy deals with actions
and generates actions in an abstract form. These are
then passed down the hierarchy
and become less abstract, eventually reaching the system's
outputs where they are realized as the system's actions.
The two hierarchies
are supposed to be "closely-coupled" so that
there is lot of involvement between meanings abstracted
from inputs at an
given level of the hierarchy, on the way up, and actions
on the same level getting "unabstracted" on
the way down. With the approach that I propose there
is no such need for any "close-coupling" between
input and output hierarchies as the distinction between
modelling and planning is removed, leaving only one hierarchy.
What is
done in the Hawkins hierarchy by the separate planning
system is done
in this system by downward transfer of probabilities
in the hierarchy of extracted meanings – and
there is nothing
particularly special
about planning: the probabilistic predictions of the
system's behaviour, used in constraint of the output
vectoring system,
are just one
aspect of the system's predictions of future observations
(in this case, its predictions of its future observations
of its
own inputs)
and the hierarchy is not doing anything specifically
to make these predictions.
Planning is merely an incidental function of a predictive,
probabilistic modelling system in which the system's
observations include its
own outputs.
Some readers may point out that planning does not
just happen in the functional hierarchy because the
output vectoring system is
needed and this is a separate system, but the output
vectoring system does not really do much compared
to the hierarchy. It exists
to give a direction to the system's behaviour by
relating it to the criteria for desirability and
by the time probability values
have been sent down to the bottom level of the hierarchy
the system has really already decided what it is
going to do.
Conversion and Limits on Performance
I have described the way that planning works as allowing
the system to incorporate previous patterns of improvement
in its behaviour
into the functional hierarchy, so that continuation
of
the improvement becomes automatic without the need for
the output vectoring system
to provide it. This may sound suspiciously like a
claim for a "free
lunch" situation involving an unlimited runaway
process of improvement, but this is not the case: the
system is
ultimately dependent on the functional hierarchy.
An AI machine must construct a model and convert this
modelling information into planning and actions. When there is
any distinction
between planning and modelling, conversion is an
explicit process and is a bottleneck through which
all the information must go.
The sophistication of the modelling system is irrelevant
if the only affect that it can have on planning is
that which is permitted
by an inefficient conversion process.
We can therefore think of this in terms of modelling-planning
conversion efficiency. Modelling-planning conversion
efficiency is a measure
of the amount of sophistication in the model that
gets through to be expressed in terms of planning
and acting.
Conversion efficiency is total when planning is completely
integrated into modelling, as has been attempted
with this system. There is
a small loss of conversion efficiency due to the
need for the output vectoring system, but I do not
see any way of avoiding at least
some minimal system like this. The extent of the
integration does, however, mean that conversion efficiency
is near total.
Total conversion efficiency, combined with a functional
hierarchy with unlimited performance, would lead
to unlimited performance
in planning, but this should not surprise us. That
is what "total
conversion efficiency" would have to mean. It
is a situation that cannot arise in reality.
In this system we have near total conversion efficiency,
with a modelling system (the functional hierarchy)
that has limited of
processing power. The system's planning capability
will be limited by the computing resources available
to the functional hierarchy
and, to a lesser degree, the output vectoring system.
This approach to planning is discussed in Occam's
Razor Part 9: Representation and Planning of Actions
in Artificial Intelligence [9].
Use of Outputs to Control Internal Prioritization
The Carpet Texture Problem
The carpet texture problem is the problem of how
an AI system should prioritize use of its computing
resources. It gets it name from
the idea of a machine analysing the pattern in a
carpet and finding many interesting, but irrelevant,
things about it. Just because
an AI system can analyze things, this does not mean
that it should analyse everything. There are two
obvious questions which a solution
to the carpet texture problem needs to address in
this system:
• What partial model
algorithms should be made? There is an infinite set of partial model algorithms
available and it is only feasible
to generate some of them. The use of a hierarchical
system allows us to rely on shorter partial
models, but we still have the problem
of deciding which ones to make.
• What partial model algorithms should be used? It
will not be appropriate for every partial model
algorithm to be used in every situation.
The same partial model algorithm can be applied
to the data which it processes for more than
one index,
so a partial model algorithm
may only be relevant for some indices at various
times.
It could be argued that both of the
first of these questions are really contained in the second. Instead
of thinking
about "making" partial
model algorithms we can imagine the entire infinite set of partial
model algorithms being available to the computer, and the decision
to use one of them in a particular situation and at a particular
index, and select it from the set of all possible partial model
algorithms, being really equivalent to the decision to "make
it". The general question is this: how should
use of computing resources be prioritized in the functional
hierarchy?
Equivalence of the Carpet Texture Problem and Planning of External
Behaviour
If the system does not use adequate computing resources
where it should then there will be much uncertainty
in areas critical to
its behaviour and it will fail to find behaviour
that leads to high situational evaluation function
scores. If the system uses
inadequate resources for trivial purposes then
this will reduce the resources available for more
important uses, again reducing
its score. The criterion for optimum use of computing
resources is that it would allow the system to
achieve optimum scores in
the output vectoring system's situational evaluation
function, because this is all that the system is
intended
to do.
The same thing can be said about the system's behaviour
in general! Optimum making of outputs by the
system maximizes situational evaluation
function scores and optimum prioritization of
use of computing resources in the functional hierarchy
also maximizes situational
evaluation function scores.
The criteria for optimum behaviour by the system
and optimum prioritization of use of computing
resources in the functional hierarchy are the
same.
and therefore:
The same process should be used to optimize the
system's behaviour as is used to prioritize the
use computing
resources in the functional
hierarchy.
which means:
Prioritization of the use of computing resources
in the functional hierarchy should be by means
of outputs designated for this purpose.
Use of Outputs
As well as outputs that affect things outside
the system, some of the outputs should be set
aside
for controlling prioritization
of computing resources in the functional hierarchy.
These outputs would control express decisions
about the prioritization of the
use of different partial model algorithms, with
differant indices, in the functional hierarchy.
The functional
hierarchy would use
its limited computing resources to do as much
processing as it can, with the prioritization
set by these
outputs and changing
continuously as the outputs change.
The advantage of this is that no separate method
is required for the system to determine the prioritization
to be used.
The existing
methods of planning will automatically deal with
it.
The system does not even need to "know" which
of its outputs are conventional and which control prioritization
in the functional
hierarchy: as far as the system is concerned, outputs
relating
to prioritization may as well be external outputs.
Such an approach would need some work. The way
in which the prioritization control outputs actually
control the prioritization would need
to be specified. For it to be efficient, a single
control output would need to be able to control
a
number of partial model algorithms.
There are some issues to be resolved here, so
at this stage I will just take the general position
that I have outlined.
Turning Complexity Inwards
The carpet texture problem is a serious issue
and any general purpose AI proposal that does
not address
it will fail. Saying that the
carpet texture problem is about increasing efficiency
trivializes the role that prioritization of computing
resources has in generating
a lot of the complexity in an AI system.
The planning capability intrinsic to the functional
hierarchy of the AI system already directs complexity
outwards
by imposing some
of the information in the hierarchical model
on the environment in the form of actions. The
suggested
approach to the
carpet texture problem is one of turning some of this complexity
inwards and back
into the functional hierarchy itself, by diverting
some of the system's "outputs" into control of internal
prioritization – which in turn contributes to
the more efficient planning of
these outputs.
Analogy
This approach of turning complexity inwards has
a broad analogy in Paulo Soleri's "arcology" philosophy of
urban planning [11] and the emphasis on miniaturization
and inward
focusing
of complexity in nature [12] that Soleri points out.
I am not saying
that too much should be read into this – just
that people familiar with these ideas will recognize
some general
similarity.
Conclusion
I have outlined how AI could work. Previous articles,
to which I have referred, have described these
concepts in more detail.
Here are some of the main characteristics of
the AI system that have been discussed in this
article:
• The system is based on the functional
hierarchy – a
probabilistic hierarchy of extracted meanings.
• The functional hierarchy is based on the conceptual
hierarchy, which has no probabilistic nature.
• Meanings extracted in the functional and conceptual
hierarchies are temporal.
• Separate system is used to plan outputs. This
is done in the functional hierarchy itself
as part
of the system's modelling. There is a
separate system needed - the output vectoring
system - to give the system's behaviour a direction
and
relate it to the concept
of desirability.
• Planning of behaviour is prediction of behaviour.
• Planning is merely an incidental function of
a predictive, probabilistic modelling system
in which
the system's
observations include its
own outputs.
• When the system has been managing to make improvements
for a while then this improvement itself
is part of the behaviour of the system
and the functional model will start predicting
future outputs in which this improvement continues! The
functional hierarchy will
eventually start improving the system's behaviour
by itself, because the best model of what it
is going to observe next, based on what
it has observed previously, is a smarter system.
• The carpet texture problem is solved by designating
some of the system's outputs as prioritization
control outputs. These outputs
are used to control prioritization of the use
of computing resources in the functional hierarchy.
There are issues still to be resolved:
The way in which the hierarchy and the partial
models works, as described in the previous
articles [4,5],
seems too rigid to me,
in the sense of relying on matrices with fixed
numbers of dimensions, so further consideration
of the ontology
is required. This is really
an issue of indexing used to differentiate
between different applications of, and meanings
produced
by, the same partial model algorithm.
The current indexing system lacks flexibility.
I have said in previous articles that the way
in which partial model algorithms are constructed
needs to be considered. In this
article I have discussed the carpet texture
problem and the two issues are related. If
we consider
the set of all partial model
algorithms being available to an AI system
then the consideration of which partial model
algorithms
to
use deals with the consideration
of which partial model algorithms to make.
The issue of how prioritization of computing
resources
in the
functional hierarchy will be controlled
by the system's own outputs needs further consideration.
References
[1] Hawkins, J., Blakeslee, S. (2004). On Intelligence.
New York: Henry Holt.
[2] Web Reference: George, D., Hawkins, J.
(?). Belief Propagation and Wiring Length Optimization
as Organizing
Principles for Cortical
Microcircuits. Retrieved 24 April 2006 from http://www.stanford.edu/~dil/invariance/Download/CorticalCircuits.pdf.
[3] Web Reference: Almond, P. (2006). Occam's
Razor Part 6: Partial Models as "Envelopes". Retrieved
1 March 2006 from http://www.paul-almond.com/OccamsRazorPart06.htm.
[4] Web Reference: Almond, P. (2006). Occam's
Razor Part 7: Hierarchy and Ontology. Retrieved
30 April
2006 from http://www.paul-almond.com/OccamsRazorPart07.htm.
[5] Web Reference: Almond, P. (2006). Occam's
Razor Part 8: Modelling in Artificial Intelligence.
Retrieved
9 June 2006 from http://www.paul-almond.com/OccamsRazorPart08.pdf.
[6] Levy, D.N.L. (1984). The Chess Computer
Handbook. London: Batsford. Chapter 3, pp38-52.
[7] Heinz, E, A. (2000). Scalable Search in
Computer Chess: Algorithmic Enhancements and
Experiments
at High Search Depths. Vieweg Verlag.
Chapter 0, pp11-18.
[8] Levy, D.N.L. (1984). The Chess Computer
Handbook. London: Batsford. Chapter 2, pp7-37.
[9] Web Reference: Almond, P. (2006). Occam's
Razor Part 9: Representation and Planning of
Actions
in Artificial Intelligence. Retrieved 29
July 2006 from http://www.paul-almond.com/OccamsRazorPart09.pdf.
[10] Hofstadter, D. R. (2000). Godel, Escher,
Bach: an Eternal Golden Braid. London: Penguin
Books.
Chapter 15, pp465-479. (Originally
published:1980. New York: Vintage)
[11] Soleri, P., Strohmeir, J. (2001). The
Urban Ideal: conversations with Paolo Soleri.
Berkeley:
Berkeley Hills Books. pp94-95.
[12] Soleri, P. (1973). The Bridge Between
Matter and Spirit is Matter Becoming Spirit:
The Arcology
of Paolo Soleri. New York:
Anchor Press/Doubleday. pp22-23. Top of page
|