This site, MSR Continuous-Space Text Representations (MSRCSTR), demos several vector space models developed at Microsoft Research. You can test these models by entering word pairs or text chunks. Programmatically accessing these models can be done through a WCF Web service. Some models are available for download as well.


Methods/Models included in This Demo

 

Word Relation


Models in this category take two words as input and return a score indicating the degree of these two words having a particular relation. Relations in this demo include:
  • WordSim

  • This is the method that judges the general semantic relatedness between two words. The detail of this approach can be found at:
    In this work, we showed that by first creating vector space models from heterogeneous information sources and then combining the cosine scores computed using individual vector space models, the score correlates well with human measures. This Demo implements all the vector space models except the one created using Bing.

  • Polarity Inducing Latent Semantics (PILSA)

  • This is the method that measures the degree of synonymy/antonymy of two words. The sign indicates whether two words are closer to synonyms (+) or antonyms (-) and the absolute value is the degree. The detail of this approach can be found at: We included two versions of the PILSA models in this demo.  PILSA (Original) is the version that uses SVD.  PILSA+S2Net is the version refined using a discriminative training method.
     
  • Multi-Relational Latent Semantics Analysis (MRLSA)

  • This model extends PILSA and is able to incorporate multiple relations. The basic idea is to capture the raw data in a tensor and then derive the word vectors and relation matrices using tensor decomposition. More detail can be found at: We included two versions of the MRLSA models in this demo, MRLSA (Antonym) and MRLSA (Hyponym), which are tuned for identifying antonyms and hyponyms, respectively.

Top Word Relation


This set of demos provide a different way to use the word relation models. Given a target word, it finds the words in the vocabulary that have the relation with the target, based on the scores output from the model. Choices of the models include: PILSA (Original) and PILSA (S2Net) for both synonyms and antonyms.

Relational similarity


This set of demos take various kinds of word vectors and apply directional similarity to measure whether two word pairs can be analogous. The basic idea is to first create a offset vector for each word pair, using the difference of their word vectors. The degree of analogy (i.e., relational similarity) is determined by their cosine score. More detail of this approach can be found at:
Using the directional similarity method for measuring relational similarity, when given a analogy question with three words like king:queen = man,:?, the model can help find the correct word, woman.
 

Text Similarity


The set of models in this section aim to measure the general semantic similarity between two text chunks, including:
  • S2Net (Web) uses the parallel corpus between queries and clicked document titles to train a projection matrix that maps the raw TFIDF vectors to low-dimensional real-valued word vectors. For comparison, we also include TFIDF (Web), which uses the original TFIDF term vectors directly. The detailed description of the corpus and model can be found at:
  • S2Net (EnEs Wiki)
  • is a cross-lingual model trained on parallel English and Spanish Wikipedia documents. Similarly, TFIDF (EnEs Wiki) has the raw TFIDF vectors. Detail of this model can be found at:
  • C-DSSM
  • is a deep siamese neural network model. It uses a letter n-gram hashing at the bottom and consists of several non-linear layers on top of that. Trained on the same dataset, C-DSSM extends and outperforms S2Net (Web) significantly on a search ranking task. More detail of this model can be found at:


Web Service

All functions/methods presented in this demo can be accessed through a WCF Web service. The WCF Service Endpoint is: http://msrcstr.cloudapp.net:8080/Service.svc. You'll need to register here to get the API key before using it.
 

Sample Code


  1. using System;
  2. using System.Collections.Generic;
  3. using System.Linq;
  4. using System.Text;
  5. using System.Threading.Tasks;
  6.  
  7. namespace CSTR_WCF_Client_Test
  8. {
  9.     class Program
  10.     {
  11.         static void Main(string[] args)
  12.         {
  13.             if (args.Length != 3)
  14.             {
  15.                 Console.WriteLine("Usage: CSTR_WCF_Client_Test.exe API_Key Model Function");
  16.                 Console.WriteLine("Model choices:");
  17.                 Console.WriteLine(
  18.                   "\tWordSim_Wiki, WordSim_LA, WordSim_Encarta, WordSim_WordNet, WordSim_Avg,\n" +
  19.                   "\tPILSA_Original, PILSA_S2Net, MRLSA_Ant, MRLSA_IsA,\n" +
  20.                   "\tS2Net_Web_CLICK_QRY_TITLE, S2Net_Web_RAW_TFIDF, S2Net_EnEs, S2Net_EnEs_RAW_TFIDF,\n" +
  21.                   "\tCDSSM");
  22.                 Console.WriteLine("Function choices: \n" +
  23.                   "\tWordRelation, TopWordRel, BotWordRel, TextRel, WordDirSim, TopWordRelSim, InVoc");
  24.                 return;                
  25.             }
  26.  
  27.             Dictionary<string, CSTR.MODEL> dtModelMap = new Dictionary<string, CSTR.MODEL>();
  28.             dtModelMap["WordSim_Wiki"] = CSTR.MODEL.WordSim_Wiki;
  29.             dtModelMap["WordSim_LA"] = CSTR.MODEL.WordSim_LA;
  30.             dtModelMap["WordSim_Encarta"] = CSTR.MODEL.WordSim_Encarta;
  31.             dtModelMap["WordSim_WordNet"] = CSTR.MODEL.WordSim_WordNet;
  32.             dtModelMap["WordSim_Avg"] = CSTR.MODEL.WordSim_Avg;
  33.             dtModelMap["PILSA_Original"] = CSTR.MODEL.PILSA_Original;
  34.             dtModelMap["PILSA_S2Net"] = CSTR.MODEL.PILSA_S2Net;
  35.             dtModelMap["S2Net_Web_CLICK_QRY_TITLE"] = CSTR.MODEL.S2Net_Web_CLICK_QRY_TITLE;
  36.             dtModelMap["S2Net_Web_RAW_TFIDF"] = CSTR.MODEL.S2Net_Web_RAW_TFIDF;
  37.             dtModelMap["S2Net_EnEs"] = CSTR.MODEL.S2Net_EnEs;
  38.             dtModelMap["S2Net_EnEs_RAW_TFIDF"] = CSTR.MODEL.S2Net_EnEs_RAW_TFIDF;
  39.             dtModelMap["MRLSA_Ant"] = CSTR.MODEL.MRLSA_Ant;
  40.             dtModelMap["MRLSA_IsA"] = CSTR.MODEL.MRLSA_IsA;
  41.             dtModelMap["CDSSM"] = CSTR.MODEL.CDSSM;
  42.  
  43.             CSTR.ServiceClient cstr = new CSTR.ServiceClient();
  44.  
  45.             string apikey = args[0].ToLower().Trim();
  46.  
  47.             if (args[2] == "WordRelation")
  48.                 TestWordPairs(cstr, apikey, dtModelMap[args[1]]);
  49.             else if (args[2] == "TopWordRel")
  50.                 TopWordRel(cstr, apikey, dtModelMap[args[1]]);
  51.             else if (args[2] == "BotWordRel")
  52.                 BotWordRel(cstr, apikey, dtModelMap[args[1]]);
  53.             else if (args[2] == "TextRel")
  54.                 TextRel(cstr, apikey, dtModelMap[args[1]]);
  55.             else if (args[2] == "WordDirSim")
  56.                 WordDirSim(cstr, apikey, dtModelMap[args[1]]);
  57.             else if (args[2] == "TopWordRelSim")
  58.                 TopWordRelSim(cstr, apikey, dtModelMap[args[1]]);
  59.             else if (args[2] == "InVoc")
  60.                 InVoc(cstr, apikey, dtModelMap[args[1]]);
  61.         }
  62.  
  63.         static void TestWordPairs(CSTR.ServiceClient cstr, string apikey, CSTR.MODEL model)
  64.         {
  65.             while (true)
  66.             {
  67.                 Console.Write("Please enter two words: ");
  68.                 string line = Console.ReadLine();
  69.                 string[] f = line.Split(" \t".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
  70.                 if (f.Length >= 2)
  71.                 {
  72.                     string wd1 = f[0], wd2 = f[1];
  73.                     Console.WriteLine("Word Relation Score = {0:0.000}",
  74.                                     cstr.WordRel(apikey, model, wd1, wd2));
  75.                 }
  76.             }
  77.         }
  78.  
  79.         static void TopWordRel(CSTR.ServiceClient cstr, string apikey, CSTR.MODEL model)
  80.         {
  81.             while (true)
  82.             {
  83.                 Console.Write("Please enter one word: ");
  84.                 string line = Console.ReadLine();
  85.                 string[] f = line.Split(" \t".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
  86.                 if (f.Length >= 1)
  87.                 {
  88.                     string wd = f[0];
  89.                     foreach (var item in cstr.TopWordRel(apikey, model, wd))
  90.                     {
  91.                         Console.WriteLine("{0}\t{1}", item.Key, item.Value);
  92.                     }
  93.                     Console.WriteLine();
  94.                 }
  95.             }
  96.         }
  97.             
  98.         static void BotWordRel(CSTR.ServiceClient cstr, string apikey, CSTR.MODEL model)
  99.         {
  100.             while (true)
  101.             {
  102.                 Console.Write("Please enter one word: ");
  103.                 string line = Console.ReadLine();
  104.                 string[] f = line.Split(" \t".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
  105.                 if (f.Length >= 1)
  106.                 {
  107.                     string wd = f[0];
  108.                     foreach (var item in cstr.BotWordRel(apikey, model, wd))
  109.                     {
  110.                         Console.WriteLine("{0}\t{1}", item.Key, item.Value);
  111.                     }
  112.                     Console.WriteLine();
  113.                 }
  114.             }
  115.         }
  116.  
  117.         static void InVoc(CSTR.ServiceClient cstr, string apikey, CSTR.MODEL model)
  118.         {
  119.             while (true)
  120.             {
  121.                 Console.Write("Please enter one word: ");
  122.                 string line = Console.ReadLine();
  123.                 string[] f = line.Split(" \t".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
  124.                 if (f.Length >= 1)
  125.                 {
  126.                     string wd = f[0];
  127.                     Console.WriteLine("InVoc({0}) = {1}", wd, cstr.InVoc(apikey,model,wd));
  128.                     Console.WriteLine();
  129.                 }
  130.             }
  131.         }
  132.  
  133.         static void WordDirSim(CSTR.ServiceClient cstr, string apikey, CSTR.MODEL model)
  134.         {
  135.             while (true)
  136.             {
  137.                 Console.Write("Please enter four word: ");
  138.                 string line = Console.ReadLine();
  139.                 string[] f = line.Split(" \t".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
  140.                 if (f.Length >= 4)
  141.                 {
  142.                     Console.WriteLine("Sim({0}:{1}, {2}:{3}) = {4}",
  143.                         f[0],f[1],f[2],f[3],cstr.WordDirSim(apikey,model,f[0],f[1],f[2],f[3]));
  144.                     Console.WriteLine();
  145.                 }
  146.             }
  147.         }
  148.  
  149.         static void TopWordRelSim(CSTR.ServiceClient cstr, string apikey, CSTR.MODEL model)
  150.         {
  151.             while (true)
  152.             {
  153.                 Console.Write("Please enter three word: ");
  154.                 string line = Console.ReadLine();
  155.                 string[] f = line.Split(" \t".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
  156.                 if (f.Length >= 3)
  157.                 {
  158.                     string wd = f[0];
  159.                     foreach (var item in cstr.TopWordRelSim(apikey, model, f[0],f[1],f[2]))
  160.                     {
  161.                         Console.WriteLine("{0}\t{1}", item.Key, item.Value);
  162.                     }
  163.                     Console.WriteLine();
  164.                 }
  165.             }
  166.         }
  167.  
  168.         static void TextRel(CSTR.ServiceClient cstr, string apikey, CSTR.MODEL model)
  169.         {
  170.             while (true)
  171.             {
  172.                 Console.Write("Please enter the first text chunk: ");
  173.                 string txt1 = Console.ReadLine();
  174.                 Console.Write("Please enter the second text chunk: ");
  175.                 string txt2 = Console.ReadLine();
  176.  
  177.                 Console.WriteLine("The similarity score is: {0}.",
  178.                     cstr.TextRel(apikey, model, txt1, txt2));
  179.                 Console.WriteLine();
  180.             }
  181.         }
  182.     }
  183. }


Questions?

Please contact Scott Yih if you have any questions.