< Back to previous page

## Publication

# Similarity metrics within a point of view

### Journal Contribution - Journal Article

In vector space based approaches to natural language processing,

similarity is commonly measured by taking the angle between

two vectors representing words or documents in a semantic space. This

is natural from a mathematical point of view, as the angle between unit

vectors is, up to constant scaling, the only unitarily invariant metric on

the unit sphere. However, similarity judgement tasks reveal that human

subjects fail to produce data which satisfies the symmetry and triangle

inequality requirements for a metric space. A possible conclusion,

reached in particular by Tversky et al., is that some of the most basic

assumptions of geometric models are unwarranted in the case of psychological

similarity, a result which would impose strong limits on the

validity and applicability vector space based (and hence also quantum

inspired) approaches to the modelling of cognitive processes. This paper

proposes a resolution to this fundamental criticism of of the applicability

of vector space models of cognition. We argue that pairs of words imply

a context which in turn induces a point of view, allowing a subject to

estimate semantic similarity. Context is here introduced as a point of

view vector (POVV) and the expected similarity is derived as a measure

over the POVV's. Different pairs of words will invoke different contexts

and different POVV's. Hence the triangle inequality ceases to be a valid

constraint on the angles. We test the proposal on a few triples of words

and outline further research.

similarity is commonly measured by taking the angle between

two vectors representing words or documents in a semantic space. This

is natural from a mathematical point of view, as the angle between unit

vectors is, up to constant scaling, the only unitarily invariant metric on

the unit sphere. However, similarity judgement tasks reveal that human

subjects fail to produce data which satisfies the symmetry and triangle

inequality requirements for a metric space. A possible conclusion,

reached in particular by Tversky et al., is that some of the most basic

assumptions of geometric models are unwarranted in the case of psychological

similarity, a result which would impose strong limits on the

validity and applicability vector space based (and hence also quantum

inspired) approaches to the modelling of cognitive processes. This paper

proposes a resolution to this fundamental criticism of of the applicability

of vector space models of cognition. We argue that pairs of words imply

a context which in turn induces a point of view, allowing a subject to

estimate semantic similarity. Context is here introduced as a point of

view vector (POVV) and the expected similarity is derived as a measure

over the POVV's. Different pairs of words will invoke different contexts

and different POVV's. Hence the triangle inequality ceases to be a valid

constraint on the angles. We test the proposal on a few triples of words

and outline further research.

Journal: Lecture notes in computer science

ISSN: 0302-9743

Pages: 13-24

Publication year:2011

Keywords:Similarity, Semantic Space, triangle inequality, metric, context, povv