Content-Length: 263191 | pFad | http://github.com/danishnxt/NLP-Project-VisualQuesGen

CB GitHub - danishnxt/NLP-Project-VisualQuesGen
Skip to content

danishnxt/NLP-Project-VisualQuesGen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 Cannot retrieve latest commit at this time.

History

16 Commits
 
 
 
 
 
 
 
 

Repository files navigation

README

Visual Question Generation task adapted from Microsoft -> https://arxiv.org/abs/1603.06059

Work began -> Summer 2018 (at the time code / exact technical specs for task unavailable)

  • Danish Farid (currently @ U Waterloo) - LUMS (SBASSE)
  • Rabeez Riaz (currently @ McKinsey & Co.) - LUMS (SBASSE)

Work conducted:

1. Downloading dataset from -> https://www.microsoft.com/en-us/download/details.aspx?id=53670
    - Custom script used to avoid trying to download images from dead links (using http get requests)
    - 

2. Data Cleaning along standard methods: 
    - Running a final check for data that had incorrect question data -> found a few rows without correct questions
    - Removing quotation marks and excess formatting
    - normalizing letter-case
    - Adding start and end tokens
    - Building a vocabulary dictionary for [word -> integer] mapping (numerical data)
    - Converting all list of strings to list of integers using vocab dictionary 
    - Converting raw (variable length) sentences into constant length slices - (length determines by max length of sentences)
        - A RNN works by having training data for a single sample at multiple time slices 
        - we went ahead and slowly "revealed" our training example sentences one token at a time, filling them 
        with Stop Tokens before that:
        
        Example dataset
        Max Length: 5 (including start token)
        
        __image: img_1.jpeg (one image)__ 
        
        Before: 
            "What is that man?"
        
        After (all tokens replaced with their integer representation):
            0/ <START> <STOP> <STOP> <STOP> <STOP> <STOP>
            1/ <START> what <STOP> <STOP> <STOP> <STOP>
            2/ <START> what is <STOP> <STOP> <STOP>
            3/ <START> what is that <STOP> <STOP>
            4/ <START> what is that man <STOP>
    

3. Rebuilding microsoft model
    - Picecing together the VQG Microsoft model from overall information from MS paper and using Keras 
       documentation to find ways to implement features
    - Tried several methods for feeding auxiliary picture data to  
        -> including embedding data into each row of an image data
        -> 
    - Found way to include data as aux information


4. Model definition 
    Overall model hierarchy
    
        1. Image and Sentence information 
        2. VGG-16 With last layer removed (INPUT: image) - (OUTPUT: 500 length feature vector)
        3. RNN layer (LSTM/GRU units) -> 
            (X data: Sentence pushed thru an embedding), (Y data: Next word), (Side information: Picture features)
    
    Every training sample is:
    
    -> The image is read and is run through a VGG-16 with it's last layer removed -> so every imager returns 
    a high level feature vector of (length = 500). 
    
    -> This feature vector is used as "side information" to the RNN and is the recursively modified data value over each recurrent run
    
5. Testing
    -> Model trained on a 1080Ti for 25 epochs -> details and results in notebook
    -> Other details available in all reports

5. Other work conducted
    -> We noticed that questions are not exactly open-ended sentences and that there must be a significant pattern to exploit (a form of questions to expect), a question starting with the word "Watermelon" is unlikely and so the model need not account for this as well. 

    A word map in the report -> {} contains a word map that shows a visualization for our questions vocabulary corpus indicating that at the first position, questions were mostly only composed to a few different words. 

    We tried to find a way to exploit this property by also providing a 1 dim "prevailance" vector to our model is auxiliary information, with the expection that this might speed up model testing and guide the optimization algo to a better minima sooner.

    We were, due to lack of availability of compute resource, not able to test this extensively but, after a few tests, our models reached a lower loss value faster (approx 15% sooner) or at the same rate as they did without this "prior" vector information being passed, depending on how the prior infomration was provided


6. Possible future work -> 
    -> A heavy emphasis on this 'prior' dataset information holds potential to be able to speed up models being trained on a certain type of general data. "Questions" for instance as a subset or a certain 'type' of sentence. 








ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: http://github.com/danishnxt/NLP-Project-VisualQuesGen

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy