Thursday, October 31, 2013

11-7 Vo, T., & Gedeon, T. Reading Your Mind: EEG during Reading Task


  1. 1. Why do the authors compare each participant against the whole group? Is this a valid baseline, or should they have compared each person against a personal baseline?

    2. In the intro and conclusion that authors say that EEG has the potential to allow researcher to distinguish between different types of human brain activities. How could distinguishing between activities help researchers? How could it be further combined with eye tracking studies?

    1. I had a similar question as your first, and it does seem like they avoided giving their reason here. I would think that since it's a stochastic signal, it's erroneous to treat two different people's data sets as comparable, but I didn't see that they gave any reason why this would be acceptable.

  2. 1. This paper is more likely to be an attempt to make use of EEG. The results have shown that there are certain relationships between EEG signals from the human brain and the way a person reads the information. I think in order to get more concrete conclusion, we need more experiments such as reading different topics with different reading methods, combined with participants’ pre-knowledge of corresponding topics.

    2. From table 1 we can see that the accuracy for the questions was so high. The lowest accuracy already reached up to 0.918. This indicated that all the problems should be simple problems for the participants. Based on this information, I am not quite convinced by the results from the experiments since it seems to me that the participants all knew the questions very well even before the experiments and thus they might not learn the questions from the reading gradually.

    3. The proposed question in this paper seems really to be a novel research topic. Thus I am wondering what people learn from it and how we can apply it in reality. So I searched on google scholar to see what other papers referenced this paper. However, I only found one paper cited this paper and it was self-referenced. I think one reason is that not many researchers have this kind of EEG device. Another reason is people are still not sure how to make use of the conclusion from this kind of research.

  3. 1. While reading this, I found myself very aware of my own eye-movements- thinking about how they compare to the eye movements of other people reading the same article and what that indicates. I can imagine the participants similarly distracted, especially with the electrodes on their heads and the other equipment used. How could the laboratory setting or the awareness of the experiment interfere with natural eye movement?

    2. On a similar note, what comparisons between participants or general significance can be drawn from Table 1? They mention looking further into the backgrounds of the individual who performed lower than the others (P5). How could this kind of experiment connect psychology and IR?

    3. This paper is almost too short- I would like to have a better understanding of what the authors meant by "Accuracy," "Specificity," and "Sensitivity."

  4. 1. In preliminary analysis, the difference in time-frequency distributions of EEG signals captured from a person reading a relevant against irrelevant piece of text is observed. What does the author refer to when he mentions about "relevant piece of text"? Does it indicate topic relevance or the relevance with respect to the participant? How was the distribution identified?

    2. In the experiment that was performed, there were 10 paragraphs that constituted the dataset. Ideally the topic was hand picked and then paragraphs that were relevant and irrelevant were chosen for the experiment. But where the participants domain experts in that topic or normal users? This would make a lot of difference to their EEG readings. Also the fact that it is known to the participants that their EEG readings were to be recorded for the use of the study adds additional pressure. This factor could also significantly contribute to the accuracy of the readings and results. Did the authors analyze how much these factors attributed to their results?

    3. The results showcase that the overall dataset provided lower accuracy than the individual dataset. And it has been mentioned that this was expected because EEG signals are variable with respect to the time of the day when they were sampled, the mood of the person, other characteristics of the person. Then how were these factors infused into this study? When it is clearly known that they vary from an individual-to-individual how can one come to a consensus about the experiment and on what grounds can its validity be proved?

    1. Your first question makes a good point--the author does not seem to define what type of relevance he is describing, and I didn't even see the specific query that the participants were supposed to be answering through the search. I think EEG would only be able to determine personal relevance, since each person has to be compared against himself or herself as the baseline. It seems that how interested the participant is in the topic could also affect the EEG, and the results might not be as distinct for documents that are not obviously relevant.

  5. 1. This study presents an interesting study of using Electroencephalography (EEG) to predict relevance. My first question is how EEG is measured? Is EEG enough to represent the real brain activities? There are 19 participants recruited for this study. What are their gender, educational levels, physical and mental state? Since EEG is measuring brain activities, all these above background information may strongly affect this measurement.

    2. Artificial Neural Network (ANN) is used as classifier in this study. Why ANN rather than other ML methods (GMM, SVM, KNN and etc.) is used for this study? Why 20 hidden neurons were used in the ANN architecture? Is there any study that tries to optimize the ANN architecture? Several paragraphs from the paper “Keyboard before Head Tracking Depresses User Success in Remote Camera Control” were used for the study, and why this paper was used?

    3. For the signal processing procedure, why 1024 samples were collected per second initially and then down sampled to 256 samples per second? Why one second was used as the sampling window? Also the binary relevance model was used in this study, is there any similar study that using a different scaled relevance model?

  6. 1. The paper says that since the EEG signal is stochastic, the participants' EEG data should be processed individually. Why is this? And it goes on to say that in spite of this they process the data both individually and by all participants as a whole, but they don't say why they think this is valid. What made them think that comparing the participants' data sets would be valid/useful?

    2. The text that they chose was very technical and specific, the participants were presumably graduate students who understood the information. Would similar patterns emerge from reading more common information?

  7. What’s the background of these participants in this research? The topics about the papers read by these participants are of some professional knowledge, in which case, if without the matched educational background, these participants are difficult to understand these papers, which are prone to impact their relevance judgment.

    Why are there more drops in amplitudes found on the spectrographs of EEG signals recorded from reading irrelevant paragraph? Is there other explanation to this phenomenon? For instance, when people get excited and alert, the signal is low in amplitude; therefore, if the design of displaying the information in this research is very bad, it’s also possible to lead to the drop in amplitudes.

    In the introduction, the author proposes that EEG could replace gaze-tracking. However, I think EEG can merely provide very limited information about reading task. So, is it better to employ both EEG and eye tracker into the research?

  8. This comment has been removed by the author.

  9. In section 3, the authors use EEG equipment for the 19 participants in the experiment. But the authors has not explicitly mentioned how the equipment is installed and configured for these participants(except a picture in Fig. 2). If the equipment is installed in an intrusive way (e.g., to wear a helmet as in Fig. 2), it might affect the experiment result as it will introduce discomfort and fatigue confounding variables into the experiment.

    In section 2.2, the authors state there is a lack of apparent difference that could distinguish the two classes (e.g., time frequency distributions of EEG signals from a person reading relevant text and from a person reading irrelevant one). Does it imply a Null hypothesis? The authors has not listed any strong evidence against it.

    In section 3.3, the authors choose to utilize a standard ANN configuration but without providing any theoretical, experimental or external reference support. One might argue the neural network setup and the ML training algorithm chosen are not appropriate for classifying the captured dataset from the experiment.

  10. 1. When performing EEG analysis, how can we ascertain the distinction between genuine EEG activity and that which is introduced through a variety of external influence. Wouldn't such artifacts affect the outcome of the EEG recording? How can we hope to always recognize, ensure that it is not be misinterpreted?

    2. There seems to be no specific structured methodology and no particular generalization paradigm when implementing ANN's. If being used to study EEG signals which have a vast number of intrinsic characteristics and vary from individual to individual - wouldn't the classifier we generate be prone to over fitting?

    3. In order to analyse EEG samples, the variation in the samples would make it necessary to conduct a lot of data pre-processing. Given our problem statement, what normalization methods would provide us with the most optimal results? Also, how we could extrapolate this model to deal with real-time data analysis?

  11. 1. This research shows about 95% average accuracy of their method using participants they have not screened or profiled (p. 401). Don’t you think some indication of the participant profile was necessary - maybe an indication of the recruiting process - since there could be some commonality between the participants that has led to these results? For example, maybe they were all recruited from the same school or organization, or have the same professional or educational background?

    2. The researchers feel that increasing the amount of training (along with further optimizations of ANN configurations) can help improve accuracy. However did they debrief with the participants to see what they thought of the current setup? While reading is an everyday activity, reading with something plugged to your head can be challenging - especially paragraphs of scientific research as demonstrated in Fig 2a (p. 399).

    3. While the researchers explain accuracy, I’m still confused about what they mean by ‘specificity’ and ‘sensitivity’ (Table 1 and 2) with respect to this experiment, and how they established these values? While accuracy and specificity can be purely experimental metrics, sensitivity seems to relate to human behavior, which increases my concern about the lack of participant profiles in this study.

  12. 1. The author provides a number of reasons why he feels EEG can be a good alternative to the current state of the art: gaze-tracking. However, there are a lot of physical limitations such as head movements and it is intrusive to the user in terms of the equipment used. In general, I do not see the average consumer wanting to have to deal with some of these limitations. Although there seems to be advantages EEG can provide over gaze-tracking, does it outweigh the inconvenience to the user? Although these experiments are done in academia, I can see gaze-tracking used in industry since there is no burden to the user, but I don’t see EEG methods being desirable.

    2. For their experimental design, the authors had a handful of relevant and not relevant passages that are presented to the users in a random order. The authors noted that for a valid experiment, they needed not relevant passages that were not an obvious giveaway. As I mentioned in a question for another paper, we have mentioned multiple times in class how subjective relevance judgments are. How can the authors guarantee their relevant and not relevant judgments would be universal to the study participants? The authors just mentioned that care was given to make sure the non-relevant passages were not obvious to the user but no further explanation is given.

    3. During the evaluation, the authors created results first by focusing on the individual and then by focusing on the group as a whole. Based on the data and the author’s comments, it seems that this approach is truly best when you tailor the experience to the individual user. However, this can be an expensive decision. In order to apply the concept based on its current state, one would have to customize the process for each user. Is this customization a minimal cost compared to the benefits derived from the overall approach and the benefits of EEG? Or is this another reason to stick with gaze-tracking? The authors mention that their accuracy is almost on par with that of gaze-tracking, which implies the approach does not perform as well.

  13. 1. The scale of the experiment was too small to make generalizable arguments -- only 19 participants reading 10 paragraphs each. The participants were instructed not to look for relevant information, but the paragraphs where hand picked to increase the likelihood of varying observation, isn’t this a very simple experiment to detect signals?

    2. On comparison with gaze-tracking, the sensitivity of EEG has a significant drop, in skewed class distributions as in the case of relevance, isn’t this of concern? The paper does not address this drop.

    3. Generalization among participants is not evident. How do the signals differ if participants view the same set of documents? What do the signals mean semantically? The signals represent person specific encodings on which an ANN is trained.

  14. This comment has been removed by the author.

    1. The authors say that the increase/decrease in the amount of skipping forward and back-tracking activities found in the gaze correlate with the increase/decrease of the cognitive load in reading. Though the rationale behind this is understandable, I don’t understand how a general comparison can be made with respect to different users. This information is instructive but I don’t think we could use this information unless there is a way (a system that uses the cognitive load information) to derive useful insights.

      The chin rest was necessary so that the movement of head would not affect the EEG signals. However, it isn’t answered in the paper if such ‘unnatural‘ restriction will cause some changes in EEG signals which would otherwise be different in a natural scenario. It may be true that the experimental setup (fixed position of head) gave us some interesting results. But can we extend it for the normal scenario? Furthermore, even if correlation of brain activity and relevance is observed, how can we use this method in the real world?

      I could not understand how an epoch was defined. Is it a fixed amount of time or is the time taken to read each word considered an epoch? Moreover, the experimental scale seems too small (with just 19 volunteers) to conclude with confidence what the authors concluded.

  15. 1. The authors used 19 volunteers for their study, but what were the methods they used to pick these participants? What determined that these 19 people were the ones they would use in their study?

    2. The authors being up terms like sensitivity and specificity in their data section, but they fail to describe what exactly they are representing with these terms.

    3. The authors mention that they processed the original signals that they collected and turned them down from 1024 Hz to 256. Is there any interesting or relevant information that could have been observed in the raw data that may have been lost once it was processed?

  16. 1) Vo and Gedeon go into detail about how the conducted the experiment but they do not give any data about the participants. Wouldn't it be necessary for them to provide some data about them in order to consider possible human factors that might have skewed the results?

    2) Can you explain what the author means with the specificity and sensitivity columns in Table 1?

    3) Given the unnatural setup of the experiment, is it possible for the classifier to generalize something else other than “the reading task”?

  17. 1. The authors of this paper take the EEG signal noises produced by eye muscles as “good noise” and propose not to eliminate the effect of eye movements from their analysis of reading tasks. (p.397) My question is whether we can discrete the EEG signal and the eye movement noise as two variables and analyze them separately?
    2. In the experiment, the authors ask participants to read 10 paragraphs, 7 of which are taken from the same paper and 3 from various resources. However, only 5 of the 7 paragraphs from the same paper are relevant in topic and the other 2 are irrelevant. (p.398) Does it make sense whether the 2 irrelevant paragraphs are taken from the same paper as the 5 relevant paragraphs?
    3, The participants in the experiment are not questioned about the paragraphs read at the end of the trial. (p.399) Does it mean individual subject’s judgement and feeling about the paragraphs won’t affect the analysis results?

  18. 1. The authors record the brain wave with the EEG equipment. They record with 16 channels marked and placed according to the 10-20 system. (p.399) What’s the meaning of “10-20 system”? What do the 16 channels (Fp1, Fp2, F4, Fz, F3, T7, C3, Cz, C4, T8, P4, Pz, P3, O1, Oz, O2) mean?
    2. The authors draw the signal processing process in figure 3.(p.400) But it’s still not clear how the process (Raw EEG Signals -> Downsample and Lowpass Filtered -> Frequency Domain Conversion -> Bin Peak Values -> Classification Samples) is achieved.
    3. The Artificial neural network(ANN) setup constructed in this experiment is a feed-forward, back-propagation network. (p.400) What does it mean?
    4. When evaluating the data by individual participant, the authors run ANN with 10-fold cross-validation. (p.401)How do they do that? How are the ANN classification results (accuracy, specificity and sensitivity) be measured or calculated?

  19. Since the relevant and non-relevant paragraphs were presented in random order did the users have to infer relevance based upon the whole corpus of documents and decide on relevance based upon the 5 paragraphs that “shared” relevance?

    I know it says they were instructed to read the text like they would any other text and no questions about what they read would be given at the end, if they had to decide on relevance how would they decide on relevance after the initial document? assuming it was one of the irrelevant documents as assigned by the experimenters.

    On page 402, the article mentions how EEG signals are almost as accurate as gaze-tracking devices. However given the environment needed to accurately record EEG signals, would the use of a simpler machine like the Emotiv Insight be a better alternative both for dealing with the general noise of recording the signals and by allowing experiments to take place in actual environments which the users find themselves in on a daily basis?

  20. 1. I had a bit of a beef with the experimental setup in this paper. The authors say that the head position of a participant is secured with a chin rest and that the participant is not allowed to move their heads because the EEG signals are sensitive to this. Is this an inherent feature of EEG or is it because the instruments are sensitive? If the former, how can we draw conclusions about this experiment since in real life participants will be moving their heads and not just keeping them statue-still?
    2.The authors used an ANN but it was unclear to me why they made the choices they did. For example, they said they had 20 hidden neurons. Was this an ad hoc choice or did they try cross validating on prior data to see how many hidden neurons would work best?
    3.The authors did not mention the meanings of 'accuracy', 'specificity' and 'sensitivity' in Table 1. While qualitatively, it is easy to see what each of these mean, the quantitative interpretation is not there and we don't know how they're measuring these things. Moreover, there are no statistical significance results. Can we really accept these results as they have been presented?

  21. While "group" accuracy is lower than individual accuracy for the EEG measurement, I am curious what types of variation you might see both between individuals and between "relevant" and "irrelevant" paragraphs. For instance, while there might be user-based differences in the types of signals researchers should expect for both relevant and irrelevant readings, are there always distinctions between the relevant and irrelevant processes in the subjects' brains? Are these distinctions just different from user to user, but could we still build a profile of a user's relevant/irrelevant responses with high accuracy? If so, why not cluster users and see if we can both predict relevance/irrelevance as well as profile the users' brains? It is unclear how ANN helps the researchers reach their findings in a way that other methods could not do, and I finished the piece feeling that they did not provide enough information on their methods as a whole.

    Do the experimental conditions (e.g., wearing an EEG monitor and looking at an old computer screen) accurately simulate those experienced by users performing search? If not, what might some other uses of this research be?

    Speaking of the importance of the research, this paper points out that there are some connections between EEG data and users' reading of relevant/irrelevant documents in a laboratory, but what is the topic of interest here? Is it that one might be able to predict a relevant/irrelevant document based on a user's brain activity, or is it the brain activity itself when examining a relevant or irrelevant document? In other words, why build a predictive model that takes EEG data as an input unless we are going to be strapping EEGs to our heads when we use computers? It seems like, if the interest is in how our brains perceive relevant and irrelevant information, the paper should include findings and suggested conclusions regarding the brain activity of the users itself.

  22. I would have liked to have seen more specific results in regards to the % of paragraphs identified as relevant or not. There seemed to be a lack of specific results in this paper.

    Later in the paper there is reference to users being able to 'see the big picture' in order to figure out if a document was relevant or not. How might this study be skewed by whether or not user can 'see the big picture' when the purpose of the study was to use EEG's to determine relevance.

    If EEG signals are by nature, "stochastic", how can they be repeatable or is there some measure that other experimenters can compare against to validate these findings?

  23. 1. In the experiment participants are asked to read different paragraph sections and their brain activity is monitored. It isn’t clear from the experimental setup what exactly is being measured from the activity. What is the goal here in measuring brain waves? Is the assumption that similar passages evoke similar brain waves?

    2. In the table on pg 402. we see high classification accuracy and specificity for the individually trained ANN. Lower accuracy and specificity rates are seen for the generalized ANN, but the performance is nonetheless very good. Is there reason to believe that the classification process described in this paper is applicable in more IR specific scenarios?

    3. Is the experiment describing a pattern-matching scenario, where participants are attempting to judge which paragraphs are about the same thing? In what sense are the paragraphs representative of the same text category, besides from being the same document? What does it mean for a sentence to be too general?

  24. 1. How relevant or useful will this research be, on an IR point of view? The study seems interesting from an EEG-study/medical/neuroscience point of view but the paper could have been much more convincing on how it can contribute to IR. Do you think that a real time feedback system would make more sense?

    2.With only 19 participants, the ANN system has almost 95% accuracy. Does this essentially represent the normal distribution? Is the study statistically significant? What prevented the authors to conduct the study with more people as it could have presented a better and convincing picture?

    3.How does the research community deal with inter-disciplinary terms? Is there an Information Retrieval equivalent of the terms specificity and sensitivity? Or are these IR terms that we have less frequently seen in the readings? Do you also think that this research would have complemented the eye-tracker research? If both the eye tracker's 'fixation' and the EEG data correlate to excitement, then we might have significant evidence regarding the relevance of the reading task.

  25. 1-I missed the meaning of the results because I did not understand what the ANN classification results described. The authors listed measures for accuracy, specificity, and sensitivity but I don’t know if that is accuracy of the participant, the apparatus, the experiment overall, or something else. Apparently the ANN set up constructed for this experiment is a feed-forward, back propagation network. Knowing this does not help me.

    2- I would like some more detailes on the experimental set up. 10 paragraphs 5 relevant and 5 not were selected to be read by participants. The division of relevant and irrelevant documents was slightly irregular- one source provided relevant and irrelevant paragraphs and relevant documents related to each other but irrelevant documents did not. After explaining in detail these divisions the authors state “ care was taken to make sure that this fact was not obvious to experimental participants.” It is not clear if they mean the division of relevant and irrelevant documents was obscured or that the participants did not know that some documents were relevant and some were irrelevant at all. This could affect results significantly. If a person is just reading information he may focus or become engaged with any paragraph for any reason. If he knows some are relevant and some are not he may react differently, if he is told to identify relevant documents he may react differently. The authors should be clear about what they are measuring.

    3- I also think the authors should make their goal of capturing an engagement level more explicit. Relevance can be distinct from engagement. A reader may become highly engaged in an irrelevant document if he finds it interesting or humorous. A better way to measure engagement would have been to let a reader read a variety of things, measure their brain, and ask them which documents they found engaging.

  26. Q. As posted for the Study : "Understanding Relevance: An fMRI Study” using these mechanical devices for the purposes of testing introduces another factor of error resulted by making the user’s surroundings different that what he is comfortable with by a significant amount. The author has failed to mention as to how it was ensured that the results are normalised with respect to error introduced because of above reason.

    Q. The appliance used for this testing seems to be very uncomfortable, and very sensitive to any kind of movement by the user. This makes me doubt the reliability of the tests even further. I mean how was it be ensured that which of the test results are actually accurate and which of them are because the user moved a slight bit ?

    Q. The author mentions that the normalisation being done on the images have been chosen because the algorithm is very efficient. " FFT because it is a very efficient transformation algorithm”. But how was it ensured that the transformation’s don’t introduced further error, or do not hide any important results, or don’t extrapolate the test results? The author didn’t mention if other normalisation methods tried first and then it was judged that FFT is the best for the given test results? The details of the experiment leaves various questions unanswered. for example it is not clear what type of data was given to users, what they were asked to judge, etc. These details would have helped in understanding the results.

  27. 1) In terms of their experimental setup, the authors state that care was taken to ensure that the irrelevance was masked to the users. How would the authors ensure this? Would it not be inherently obvious that some text is not related to other text?

    2) I am not well versed regarding the functionality of EEG, but I was wondering how they filter the eye muscle movements that might arise from the user’s state of stress/adrenaline/excitement that arises from having electrodes attached to their heads?

    3) Were all of the paragraphs presented to the user at once or were the users asked to signal when they were ready for the next paragraph? How did they adjust to the fact that someone’s eyes might veer to the last paragraph, or return to a previous paragraph? In this regard, it might be interesting to see how all 19 users performed on the exact same piece of text. This might yield more results regarding why the prediction for that single user was so inaccurate.

  28. n this experiment the authors are trying to see if there is a difference in brainwave activity when examining relevant documents versus irrelevant documents. They do this by taking 19 different participants and having them read paragraphs while hooked up to EEG equipment. However they do not mention what type of background these users have and their familiarity with the paper that they are reading. Could this lead to a form of bias as the brain activity for someone who was knowledgeable about the subject material might have more activity when deciding if something was relevant or not?
    2. Also in their experimental design they never state whether or not they have informed the user to which topic the paragraphs, which they are reading, is relevant. Could this also lead to a bias as the brain wave results might change as the user was figuring out what these paragraphs had in common?
    3. This article gives a new method for gaining relevance feedback from the user on what they are reading without having the user actually produce more effort. The method that this article uses to do this is EEG brainwave scans. However the other articles that we have read this week also put forth other methods for doing the same thing namely eye movement tracking and fMRI scans. Which of these methods do you think would be the best for judging if a user thinks a document is relevant and which do you think would be the easiest to implement?