Visual Attention Lab

Publications | Presentations |Posters

  • Click on the titles to view and download the posters.

VSS 2025

  • "Error-prone states in visual search"

  • Jeremy Wolfe1 (jwolfe@bwh.harvard.edu), Jeunghwan Choi2; 1Brigham and Womens Hospital / Harvard Med, 2Graduate Program in Cognitive Science, Yonsei University
  • In visual search, why do people miss targets that are clearly visible? In a search for a T among Ls observers will reliably miss 5-10% of targets. When “retrospectively visible” targets are missed in tasks like breast cancer screening, radiologists can be sued. For TvsL search, Li et al (2024) found errors to be largely random with respect to the specific stimulus. That is, missing the target on one trial tended not to alter the probability of missing the same target in the same display a second time. Of course, you are more likely to miss hard-to-see targets, but if some class of stimuli produces, say, 20% errors, the specific 20% seems to occur randomly. What about the state of the observer? Certainly, errors fall with learning and rise with fatigue, but within a relative steady-state, does anything modulate the probability of error? Bruno et al (2024) have proposed that there is an electrophysiologically identifiable brain state (perhaps related to ‘mindwandering’) that is associated with being temporarily more error prone. Introspectively, it sometimes feels like errors occur in clumps. If this is the case in visual search, then the probability of another error should be elevated after an error. To assess this, we reanalyzed several visual search datasets. We computed the distribution of lags between successive errors. Random production of errors predicts a geometric distribution of lags. If one error marks entry into an error-prone state, then the likelihood of another error should rise after the error. Chi-sq tests reveal significant over-representation of errors shortly following other errors, especially for tasks without feedback. This effect appears to be smaller or non-existent for experiments where feedback may disrupt any error prone state. Of course, in the real world, reliable error feedback is often lacking.
  • Acknowledgements: Grant support from NSF 2146617 & NIH-NEI EY017001
  • "Analyzing the Sources of Error in Visual Search of Whole Slide Images in Pathology"

  • Veronica Thai1 (thai.53@osu.edu), Meng Ling1, Jeremy Wolfe2, Zaibo Li3, Jian Chen1; 1The Ohio State University, 2Brigham and Women's Hospital, 3The Ohio State University Wexner Medical Center
  • Errors are a problem in pathology: false negatives lead to disease not being treated and false positives lead to unnecessary treatment and resulting risk to the patient. We used eye tracking to investigate pathologists’ search patterns and behaviors with the goal of understanding the nature of errors in search for cancer in whole slide images (WSIs) of lymph nodes. Ten pathologists of varying experience levels diagnosed and annotated a set of 60 lymph node WSIs; 45 with metastases and 15 benign, while we recorded their gaze and mouse behaviors. Our pathologists had 100% accuracy on the benign slides. There were no false positives in this data set. The false negative error rate ranged from 17.8% to 73.3% (46.1% avg). Based on eye movement scanpaths, we categorized these errors into “search”, “recognition”, and “decision” errors based on a taxonomy introduced by Kundel et al (1978). The majority (67.5%) of false negatives can be labeled search errors. Search errors are defined as cases where pathologists never fixated on a tumor region. 11.7% were recognition errors where the eyes landed briefly on or near the malignancy without being noted by the pathologist. Decision errors, where a pathologist scrutinized the malignancy but decided it was benign accounted for 20.4%. For all experience groups except residents, longer viewing time was associated with higher accuracy. Curiously, for residents, the reverse was true: longer viewing time led to lower accuracy. Additionally, residents make more use of zooming in comparison to more experienced groups (avg 34.9 vs. 17.2 zooms). They also tended to view at higher magnification (avg 22.64x vs 15.42x). Compared to more experienced pathologists, residents spent more of their viewing time zooming rather than panning (Residents: 22.0% zoom, 10.0% pan; non-residents: 15.2% zoom, 13.4% pan).
  • "How consistent are you? Idiosyncratic polar angle biases in visual search for differentstimuli."

  • Cailey Tennyson1(ctennyson1@bwh.harvard.edu), Injae Hong 1, Rawan Ne'meh 2, Jeremy Wolfe1,3; 1Brigham and Women's Hospital, 2Hariri High School 2, Beirut, Lebanon, 3Harvard Medical School
  • In a 250-300msec fixation, you can attend to ~4-6 nearby items. If there are more candidates in theneighborhood, choices must be made. In prior work, observers moved their eyes to a fixation spot when itmoved to a new location. A ring of one T and seven Ls was flashed around the spot after 300 msec. Observersmade 4AFC assessments of the T’s orientation. Flash duration was adjusted to produce ~25% errors. There wasan average non-uniformity in the distribution of errors with more errors on the vertical meridian. Moreinterestingly, there were often significant idiosyncratic deviations from that average. To find out if patterns ofdeviations were stimulus-specific, we repeated the TvsL condition and added a search for a hammer amongseven different tool silhouettes. Observers ran 360 trials for each task in four, 180 trial sessions. The TvsLcondition replicated prior results with 17 of 20 observers showing significant deviations from the average result.Moreover, patterns were quite consistent with an r=0.6 correlation of session 1 with session 2. Tools showed amore dramatic average deviation with ~26% of all errors at the bottom of the ring of locations. Only 10 of 20observers showed significant idiosyncratic deviations from this pattern. Observers were strongly consistentbetween sessions (r=0.73). There was also a reasonable correlation between TvL and Tool tasks (r=0.39). Weconclude that there are idiosyncratic variations in the deployment of attention in the vicinity of the current pointof fixation. These appear to be quite different from the inhomogeneities in sensitivity to visual stimuli (e.g.Himmelberg, Winawer, & Carrasco, TINS, 2023). The choice of stimuli modulates the patterns of errors forreasons that are not entirely clear. These idiosyncratic relative attentional blindspots at different locationsrelative to fixation could contribute to errors in visual search.
  • Acknowledgements: NEI EY017001, NSF 2146617
  • "To choose or not to choose: Voluntary task switching without cost in visual search"

  • Ava Mitra1, Jeremy Wolfe1,2, Injae Hong1; 1Brigham and Women's Hospital, 2Harvard Medical School
  • In visual search tasks in the lab, participants are typically required to perform blocks of the same type of search tasks repeatedly (e.g., find a T among Ls). In real-world searches, however, you look for your keys, then your jacket, then the doorknob, and so on. You rarely search for your keys 100 times in a row. Real-world searches can offer a degree of choice that is not typically a feature of laboratory tasks (i.e., what do I want to look for next?). For instance, given a worklist of cases, should a radiologist be allowed to determine the order in which they are read? The current study aimed to investigate whether search performance, especially reaction time (RT) and miss rate, would be affected by manipulations of trial ordering and participants’ choice. Fifty observers completed 100 trials of each of four search tasks: T among Ls, a shape search for bumpy targets among smoother distractors, a colorXcolor conjunction search, and a search for any animal among other objects. Each participant was randomly assigned to one of five conditions: four fixed blocks of 100 trials, blocks whose order could be chosen, a random mixture of all four tasks, free choice of which task came next, and a ‘yoked’ condition where trials were presented in the order that someone else had chosen. Interestingly, when given the choice, participants rarely switched between tasks, choosing to run blocks of the same trials. Choice made no significant difference to the RTs across conditions, though, unsurprisingly, tasks differed in difficulty. Similarly, there were only very minor differences in errors between choice conditions. While task-switch imposes a cost in other situations, it does not appear to interfere with the performance of these visual search tasks.
  • Acknowledgements: NEI EY017001, NSF 2146617, NCI CA207490
  • "Pathologists’ Routine Fixations Can Be Used to Supervise Lymph Node Deep Learning Models"

  • Meng Ling1 (ling.253@osu.edu), Veronica Thai1, Shuning Jiang1, Rui Li1, Wei-Lun Chao1, Yan Hu1, Anil Parwani1, Raghu Machiraju1, Srinivasan Parthsarathy1, Zaibo Li1, Jeremy Wolfe2, Jian Chen1; 1Ohio State University, 2Harvard University
  • Locating cancerous tissues from large high-spatial resolution whole-slide images (WSIs) is hindered by a lack of training data to supervise deep convolutional neural network (DCNN) algorithms. Patch-based human annotation is time- and labor-intensive. Additionally, training DCNNs would be improved by seeing data from diverse stimuli from routine clinical settings. Expert pathologists are trained professionals who know where to look to locate cancerous tissue from giga-pixel WSIs. Thus, we could acquire theoretically unlimited training samples by harvesting pathologists’ routine examinations of WSIs. To validate the reliability of this idea, we collected eye-tracking data from 10 pathologists, each viewing 60 slides from the CAMELYON16 dataset. These data were entered into DeepPFNet: our automated humanintelligence- based data preparation pipeline used to supervise AI to identify tumors. Specifically, we computed a pathologist’s fixation-map (PFMap) over each WSI and trained a DCNN using tumor tiles sampled from these maps and benign tiles sampled from benign slides’ tissue area. We validated DeepPFNet in experiments that examined effectiveness and scalability. Our experiments show that: models trained using DeepPFNet can achieve accuracy significantly higher than random sampling (F1 = 0.84, AUC = 0.91), and increasing the number of slides sampled leads to significant improvement (ΔF1 = 0.08, ΔAUC = 0.13). DeepPFNet models have better accuracy (F1 = 0.84, AUC = 0.93) than those using clustering (F1 = 0.70, AUC = 0.82) or viewport (F1 = 0.68, AUC = 0.88) approaches. We used the DeepPFNet model to classify tiles from the training WSIs and expanded the sampling maps, significantly improving the pipeline (ΔF1 = 0.04, ΔAUC = 0.03). Fixation-based fine-tuning of weakly supervised learning improved slide-level classification accuracy (ΔF1 = 0.01, ΔAUC = 0.01). Finally, applying PFMap on benign slides to sample benign tiles can improve sensitivity (Δsensitivity = 0.04), but decrease accuracy (ΔF1 = -0.06, ΔAUC = -0.03).
  • Acknowledgements: OSU Translational Data Analytics Institute (TDAI) Research Pilot Award
  • "Localization biases in the periphery are idiosyncratic: Evidence from over 9000 observers"

  • Anna Kosovicheva 1 (a.kosovicheva@utoronto.ca), Ido Ziv Li1, Jihahm Yoo 2, Jeremy M. Wolfe3,4 ; 1University of Toronto Mississauga, 2Korea Science Academy of KAIST, 3Brigham and Women's Hospital, 4Harvard Medical School
  • Accurately registering the location of an object is a fundamental visual process. Previous studies have emphasized commonalities across individuals in the effect of the polar angle of the target relative to fixation in visual localization. However, there is also evidence that individuals exhibit consistent, idiosyncratic patterns of directional, angular error when reporting target locations in the periphery, and such patterns of error are weakly correlated between observers. This evidence comes from small-scale laboratory studies, involving dozens of participants, which may be underpowered to detect subtle consistencies across the population. We examined the consistency of individuals’ localization errors using a large-scale dataset from an online game (over 9,400 observers and 4.5 million trials across 639,000 sessions). On each trial, participants were instructed to identify a symbol in the center of the screen and were simultaneously shown a peripheral target. The target could appear with 0-10 distractor items. The eccentricities of the peripheral targets and distractors varied randomly and independently across trials. Participants clicked on the location of the peripheral target, and if correct, were asked to identify the central symbol among 5 alternatives. We analyzed trials where participants correctly identified the symbol among 5 alternatives. We analyzed trials where participants correctly identified the symbol and localized the target. We divided trials into bins based on their polar angle. We then calculated pairwise correlations in the angular (directional) click error between participants relative to display center (clockwise vs. counterclockwise). We found that directional localization errors were, on average, uncorrelated between all possible pairs of participants. Between-subject correlations of localization errors were normally distributed and centered around 0. However, within individuals, errors were non-random. Split-half correlation yielded a reliable, positive correlation (r = 0.41, t(9411)= 185.82, p< 0.001). These results align with the findings of smallscale laboratory studies and suggest that consistent idiosyncratic localization errors in the visual periphery are uncorrelated at the population level.
  • Acknowledgements: This work was supported by an NSERC Discovery Grant to AK, and NIH EY017001 and NSF 2146617 to JMW.
  • "Patches half-empty: How to forage when some patches contain only distractors"

  • Injae Hong1, Jeremy M. Wolfe1,2 ; 1Brigham and Women's Hospital, 2Harvard Medical School
  • In two eye tracking experiments, 20 participants searched for T amongst L’s on a 1/f^1.3 noise background (target prevalence 50%). The contrast of an item was defined as the difference between its grayscale [0-255] and average local background grayscale [75-180]. Absolute target contrast was 15, 45, 75, 105 in Experiment 1 and 15, 45, 75 in Experiment 2. Distractor contrast varied freely. We studied the benefits of cueing all item locations with yellow boxes drawn around each item. In both experiments, 24-item search displays were presented twice. Experiment 1 had 384 unique displays. Half of the displays were first presented cued and then uncued (Cue– NoCue), for the other half this order was reversed (NoCue–Cue). Experiment 2 had 288 unique displays. Half the displays were presented NoCue–Cue, the other half as NoCue–NoCue. Both experiments found the same result for present trials: cueing reduced the error rates for the lowest contrast targets (15), without increasing RTs. In Experiment 1, cueing did not change RTs or accuracy for higher contrast targets (45, 75, 105). In Experiment 2, cueing did not improve accuracy for higher contrasts (45, 75) either, but RTs slowed down. For the absent trials both experiments did not find any effect of cueing on either accuracy or RTs. Looking at eye movement data, we can characterize errors as “search” errors where the observer never fixates the target and “recognition” errors where the target is fixated but the eyes move away without registering it successfully. Cueing all potential targets reduced misses by reducing “search” errors in both experiments. Specifically, low contrast targets were more likely to be visited. We conclude that cueing directs attention to items, rather than enhancing their processing once attention arrives. Encouragingly, the accuracy gain from cueing does not necessarily come with an RT cost.
  • JH and AL were supported by UKRI grant ES/X000443/1. JMW was supported by NIH-NEI: EY017001, NSF: 2146617, and NIH-NCI: CA207490
  • "It's All About Semantics: How Semantic Categories Shape Memory Partitioning in Hybrid Visual Search"

  • Narit Gronau1, Makaela Nartker 2, Sharon Yakim 1,3, Igor Utochkin 4Jeremy M. Wolfe5,6 ; 1The Open University of Israel, 2University of Texas at Austin, 3The Hebrew University, 4University of Chicago, 5Brigham and Women's Hospital, 6Harvard Medical School
  • In many everyday situations, we search our visual surroundings for one of many types of possible targets held in memory, a process known as hybrid search (e.g., searching for items on a shopping list). In some cases, only a portion of the memorized list is relevant to a specific visual context, so restricting the memory search to the relevant subset would be beneficial (e.g. there’s no need to search for carrots in the dairy section). Previous research has shown that participants often fail to "partition" memory into distinct, useful subsets on a trial-by-trial basis. However, given the known role of semantic content in long-term memory organization, we hypothesized that clearly semantically-defined subsets could facilitate flexible memory partitioning in dynamic hybrid search situations. Experiment 1 revealed that, indeed, semantic characteristics (i.e., object category), but not perceptual features (e.g., arbitrary color), can serve as a strong basis for flexible memory partitioning. Experiments 2 and 3 further demonstrated that this memory partitioning is cost-free and independent of the nature of surrounding visual distractors (i.e., whether the distractors are categorically homogeneous or heterogeneous). These findings demonstrate that confining memory search to a relevant subset of items can be highly effective when the subsets are defined by clear semantic categories. The results underscore the importance of conceptual information in the organization of activated long-term memory (aLTM) – the portion of LTM relevant to the current task - and to aLTM’s role in enabling flexible, trial-by-trial memory selection. Additionally, our findings highlight the relationship between visual search and memory search: Despite the activation of multiple sets or categories that could interfere with each other at the attentional (visual search) level, category-based memory partitioning seems to remain relatively immune to interference from other categories during this type of memory search.
  • Acknowledgements: Israeli Science Foundation (ISF) grant 1622/15 (to NG) and by the National Institutes of Health (NIH) grant EY017001 (to JMW).
  • "A salient, expected target in an unexpected setting can produce inattentional blindness"

  • Daniel Ernst1 (daniel.ernst@uni-bielefeld.de), Gernot Horstmann 1, Johan Hulleman 2, Jeremy M. Wolfe3 ; 1Bielfeld University, 2The University of Manchester 3Brigham and Women's Hospital/Harvard Medical School
  • In visual search experiments, search trials are often blocked by difficulty. This enables observers (Os) to adjust their strategies to an expected level of difficulty. However, in natural search settings, observers may not be able to anticipate how difficult the next search will be; even when the search target remains constant. For instance, search for a cancer on one CT may be easier or harder than previous searches for that same cancer. Search strategies geared towards harder search may be suboptimal when the target is now more salient and this might lead to an increased number of miss errors. To test this, 25 Os searched for a circle among Landolt Cs (gap size of 0.09 degrees) in an initial block of 32 difficult search trials. Here, Os produced 37% miss errors. On trial 33, Os still searched for the same circle, but the gap size of the Landolt Cs was now 0.45 degrees. Normally, this new task would have been easy. However, when surprised by this easy search, Os failed to take advantage of the target’s higher salience and still produced 36% misses. For subsequent presentations of these easy search displays after the surprise trial, miss error rates dropped to 5%. The 36% of Os who responded “targetabsent”, even though a salient target was present, can be said to have experienced a form of inattentional blindness (IB). Notice that, in contrast to traditional IB experiments, the missed IB stimulus here was no gorilla. It was the absolutely task-relevant, unaltered, and expected stimulus here was no gorilla. It was the absolutely task-relevant, unaltered, and expected target of the search. The same target had already been searched for 32 times. Nevertheless, on the first trial when the distractors were unexpectedly changed to make the task easier, observers failed to adapt. We will discuss the relationship of this form of IB to more traditional versions.
  • Acknowledgements: Daniel Ernst was supported by grant ER 962/2-1; Johan Hulleman was supported by UKRI grant ES/X000443/1; Jeremy Wolfe was supported by grants NEI EY017001, NSF 2146617, NCI CA207490

VSS 2024

  • Advantages and Disadvantages of Sequential vs. Simultaneous Search in Simulated Breast Cancer Screening

  • Injae Hong1, Chirag Maheshwari2, Jeremy M. Wolfe1,3 ; 1Brigham and Women's Hospital, 2Cypress Ridge High School, 3Harvard Medical School
  • Radiologists, screening mammograms for breast cancer, are required to search for different signs of cancer such as masses, calcifications, and structural distortions. This search should be fast, accurate, and complete. Search for more than one type of target is known as “hybrid search”. Hybrid search can impose a cost on performance that increases with the number of different types of possible targets. Accuracy might be improved by splitting a hybrid search into multiple simple searches for a single target type. This study investigated whether splitting search might be a useful intervention to improve target detection in breast cancer screening. Non-experts searched for either masses or calcifications or both in simulated 2D mammograms (Experiment 1) or while scrolling through 3D volumes of simulated digital breast tomosynthesis images (Experiment 2). Masses and calcifications were independently present with 60 % prevalence. If the target was present, participants clicked on the target and labeled the item as a mass or a calcification using a key press. They received feedback after pressing the space bar to complete the trial. There were four types of task: search for calcifications alone, search for masses alone, search for both types of targets simultaneously, or both types but one after the other, sequentially. The results showed that sequential search was advantageous compared to simultaneous search. In particular, there was a reduced level of “satisfaction of search” errors in rare cases when more than one target was present. SoS errors declined from 60% to 22% (Experiment 1) and 67% to 32% (Experiment 2). However, reduction of SoS errors on a few trials comes at substantial 20% and 28% increase in time for all trials. The standard simultaneous method of searching might be adequate, but splitting hybrid search into several simple searches may reduce errors in some, important cases.
  • Acknowledgements: CA207490
  • Stepping into the Same River Twice: Are Miss Errors in Visual Search Deterministic or Stochastic?

  • Aoqi Li1 (aoqi.li@manchester.ac.uk), Johan Hulleman1, Wentao (Taylor) Si2, Jeremy Wolfe3,4; 1University of Manchester, 2Bates College, 3Brigham and Women's Hospital, 4Harvard Medical School
  • Observers make errors in visual search, whether in a lab experiment or a real-life task. Those errors can be categorized as “deterministic” or “stochastic”. If errors are deterministic, errors committed once will definitely be repeated again. Alternatively, errors can be “stochastic”: occurring randomly with some probability. An error would lie in between these extremes if it is likely, but not guaranteed to occur a second time. To identify the nature of miss errors in a simple T-vs-L visual search task, we presented each search display twice in random sequence. The miss rate, P1, for the first copy of the display and the miss rate, P2, for the second copy were calculated, as was the proportion of cases where both copies were missed, P12. Purely stochastic errors would predict that P12=P1*P2. Purely deterministic errors will lead to P12=min⁡(P1,P2). If errors are a mix of stochastic and deterministic, P12 will fall between these two predictions. In Experiment 1 where the letters were clearly visible, the errors were almost completely stochastic. An error made on the first appearance of a display did not predict that an error would be repeated on the second appearance. In Experiments 2a and 2b where the visibility of the letters was manipulated, the errors became a mix of stochastic and deterministic. Lower contrast targets produced more deterministic errors. In Experiments 3a, 3b and 3c, we tested several interventions with the goal of finding a 'mindless' intervention that could effectively reduce errors without needing to know the answer in advance. An almost mindless intervention that knew the location but not the identity of items (Exp 3c), succeeded in reducing deterministic errors. This gives some insights into possible methods for reducing errors in important real-life visual search tasks, where search items may not be clearly defined and visible.
  • Acknowledgements: JMW was supported by NIH-NEI: EY017001, NSF: 2146617, and NIH-NCI: CA207490. AL and JH were supported by UKRI grant ES/X000443/1.
  • Don’t talk to me! Relevant sound disrupts visual search, irrelevant sound does not

  • Jan Philipp Röer1 (jan.roeer@uni-wh.de), Ian M. Thornton2, Ava Mitra3, Nathan Trinkl3, Jeremy M. Wolfe3,4; 1Witten/Herdecke University, 2University of Malta, 3Brigham and Women's Hospital, 4Harvard Medical School
  • Visual search experiments are usually conducted in quiet environments to ensure that participants can fully concentrate on the task. The real world, however, is rarely as quiet as the laboratory. We are more or less constantly exposed to auditory information, some of which we choose to attend to, some of which we try our best to ignore. In three experiments, we examined the effects of background sound on visual search. We used the Multi-Item LOcalization (MILO) task, in which participants clicked through items labeled 1-8 in numerical order as quickly as possible while hearing auditory information through headphones. In Experiment 1, participants needed to engage with the auditory information. In the “listening” condition, participants listened to a news report while performing the MILO task. They were subsequently quizzed about the news. In the “counting” condition, participants counted how many times a specific number was mentioned during a sports commentary. Both conditions significantly disrupted visual search performance compared to a quiet control condition. In Experiment 2, auditory distractors were meaningless sequences of random words that had previously been shown to disrupt visual-verbal working memory. Participants were informed that any background sound was irrelevant and asked to ignore it. It appears that they were able to do so, because there was no effect of auditory distraction on visual search performance. In Experiment 3, we increased the difficulty of the search task by using a “shuffle” manipulation in which the subsequent items in a sequence were randomly repositioned after each localizing response. Even so, search performance again proved to be robust against irrelevant sound. The overall pattern of results suggests that visual search performance can be effectively shielded from auditory distraction, but only if we can choose to ignore the sound and not if we actively listen to it.
  • Acknowledgements: NEI EY017001
  • Idiosyncratic Search: Biases in the deployment of covert attention.

  • Nathan Trinkl1 (ntrinkl@bwh.harvard.edu), Ava Mitra1 , Jeremy Wolfe1,2 ; 1Brigham and Womens Hospital, 2Harvard Medical School
  • Eye tracking of visual search tasks shows that the probability that the eyes will move from the current fixation to a nearby target on the next saccade is only ~50%. How can observers fail to find clearly identifiable targets close to fixation (even if they find it later)? One possibility is that processing within the Functional Visual Field (FVF) around fixation is not homogenous. If so, is that inhomogeneity random or systematic? To answer this question, we asked observers to move their eyes to a cue. 300 msec after cue onset, a ring of 7 black Ls and one black T was briefly flashed. Observers made 4AFC decisions about the orientation of Ts. After response, a new fixation location appeared, and this process repeated for two blocks of 360 trials. We found reliably idiosyncratic patterns of accuracy as a function of radial angle (10 of 16 observers were significantly different from normalized group average accuracy, assessed by Chi-sq, p<0.001. Four more p<0.05). Is idiosyncratic accuracy a function of idiosyncratic deployment of attention or retinotopic variation in basic visual processing? To test this, we made the T red, allowing it to summon attention without need for search. Duration was staircased to produce ~25% errors. This eliminated systematic idiosyncrasies in accuracy (Only 1 of 20 observers with p<0.05). Did the original idiosyncrasies depend on making successive saccades to new fixation points? We repeated the experiment with fixation held at a single location. Idiosyncratic patterns were seen, though they seem weaker than with a moving fixation (7 of 20 observers with p<0.001. Three more with p<0.05). We do not yet know if the idiosyncratic patterns for one observer would be the same with saccades and with steady fixation. These results suggest that attentional deployment is systematically inhomogeneous in the immediate vicinity of fixation.
  • Acknowledgements: National Science Foundation (NSF), Grant #2146617 & Economic and Social Research Council (ESRC) — UK Research and Innovation (UKRI), Grant: ES/X000443/1
  • Inattentional blindness for a salient target in visual search: Finding a surprisingly easy target can be surprisingly hard

  • Daniel Ernst1 (daniel.ernst@uni-bielefeld.de), Gernot Horstmann1, Johan Hulleman2, Jeremy Wolfe3; 1Bielefeld University, 2The University of Manchester, 3Brigham and Women’s Hospital/Harvard Medical School
  • In natural search settings, observers often do not know how difficult it will be to find the next target. The next security threat or the next potential cancer in a lung CT might be quite salient or very subtle. In the lab, visual search experiments typically involve multiple successive trials of constant difficulty. This allows participants to anticipate the salience of the next target. How will observers respond if the current target salience suddenly deviates from the target salience of all preceding trials? It would be unremarkable to find that it is harder to detect a low salience target when the observer expects a high salience one. More interestingly, here we report that observers are impaired when they are surprised with a high salience target after a series of difficult searches. In Experiment 1, observers searched for a hard-to-detect O target (always present, compound search) among C distractors with small gaps for 32 trials. On the 33rd trial, the gaps of the C distractors were large, making the O target much more salient. Yet, search efficiency was considerably more inefficient on this surprise trial (147 msec/item) than on subsequent trials with the same high target-distractor dissimilarity (66 msec/item). In Experiment 2, observers reported the presence or absence of an O in a display with short presentation duration. After multiple hard trials, observers frequently showed inattentional blindness towards an unexpectedly salient target that was reported almost perfectly when the identical target was presented on repeated trials. Gaze data suggests that observers adopted a tightly focused attentional window during the initial, hard search. This strategy made them surprisingly ‘blind’ to targets that were unexpectedly highly detectable. The wrong attentional set may be one explanation for situations where we “look but fail to see” obvious stimuli.
  • Acknowledgements: Daniel Ernst was supported by grant ER 962/2-1; Johan Hulleman was supported by UKRI grant ES/X000443/1; Jeremy Wolfe was supported by grants NEI EY017001, NSF 2146617, NCI CA207490
  • VowelWorld 2.0: Using artificial scenes to study semantic and syntactic scene guidance

  • Yuri Markov1 (yuamarkov@gmail.com), Melissa Le-Hoa Vo1, Jeremy M Wolfe2; 1Goethe University Frankfurt, Scene Grammar Lab, Germany, 2Brigham and Women’s Hospital, Harvard Medical School
  • Scene guidance is difficult to investigate in realistic scenes because it is hard to systematically control complex, realistic images. Parameters like set size are often ambiguous in real or even VR scenes. We created a new version of VowelWorld 2.0 (Vo & Wolfe, 2013), where we control various parameters of a highly artificial “scene”. Scenes are 20x20 grids of colored cells with 120 cells containing letters. Participants search for a vowel, present on 67% of trials. Each scene contained three big disks (2x2 cells) with consonants on them. These served as “anchor objects” which are known to predict target locations in real-world searches (Vo, 2021). An additional 96 cells featured rings which were grouped into larger analogs of surfaces. A vowel’s placement could follow three rules. Color rule (semantic): certain targets were associated with one background color “gist” (e.g., A’s appear in red scenes). Structure rule (syntactic): vowels were placed near or inside the small rings. Anchor rule (syntactic): vowels were close to a big circle containing a neighboring consonant (e.g., “B” implies “A”). Two vowels followed all three rules, two vowels followed color and surface rules, and one vowel was placed randomly. On half of the trials, participants were precued with a specific vowel. Otherwise, participants searched for any vowel. For the first three blocks, participants attempted to learn the rules from experience. Then, we explained the rules. Participants failed to fully learn rules but did benefit from the learned anchor rule (shorter RTs). Knowing rules markedly speeded performance for vowels that followed only color and surface rules. Anchor rule vowels showed less improvement over initial learning. Knowing rules had a major impact on ending absent trials. Future work will systematically vary the predictability of different rules to test under which circumstances rule learning becomes more or less optimal.
  • Acknowledgements: This work was supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation), project number 222641018 SFB/TRR 135 TP C7 granted to MLHV and the Hessisches Ministerium für Wissenschaft und Kunst (HMWK; project ‘The Adaptive Mind’).
  • What, where, when did I find this? Associative learning in hybrid search.

  • Iris Wiegand1 (iris.wiegand@ru.nl), Jeremy M. Wolfe2, Joseph R. Maes1, Roy P.C. Kessels1; 1Donders Institute for Brain, Cognition and Behaviour, 2Brigham and Women's Hospital, Harvard University
  • In “hybrid” visual and memory search, observers look for multiple, previously memorized target objects among distractors. Hybrid search is akin to many real-world searches, such as looking for items on your mental shopping list in the grocery store. Thus, hybrid searches occur in spatial and temporal contexts that we encounter repeatedly. In several experiments, we investigated if observers would incidentally learn and utilize spatial and temporal associations in hybrid search. Specifically, we examined learning of four different types of regularities: 1) target item sequences (e.g., the banana always follows the yoghurt), 2) target location sequences (e.g., a target in the lower left corner always follows a target in the upper right corner), 3) target item-location associations (the banana is always in the upper right corner), and 4) target item-location sequences (the banana in the upper right corner always follows the yoghurt in the lower left corner). Learning would be reflected in a decrease in search times. Our results show only weak incidental learning for the temporal sequences of target items or target locations alone, even after many repetitions of the sequence. By contrast, learning of target item-location associations was fast and effectively reduced search times. Furthermore, the experiments show a reliable effect of temporal sequence learning for target item-location associations. These findings suggest that spatiotemporal learning in hybrid search is hierarchical and conditional: Only if spatial and non-spatial target features are bound do temporal associations bias attention, pointing the observer to the task-relevant features expected to occur next.
  • VSS 2023

    • "The FORAGEKID Game: Using Hybrid Foraging to Study Executive Functions and Search Strategies During Development"

    • Beatriz Gil-Gómez de Liaño1 (bgil.gomezdelianno@uam.es), Jeremy M. Wolfe2 ; 1Universidad Autónoma de Madrid, 2BWH-Harvard Medical School
    • Searching for friends in the park, finding specific Lego blocks for a building project, or looking for recipe ingredients in the fridge; Each of these is a "hybrid search" typical of everyday life. Hybrid search is searching for instances of multiple targets held in memory. Hybrid Foraging (HF) is a continuous version in which observers search for multiple exemplars of those multiple target types. HF draws on a wide array of cognitive functions beyond those studied in classic search and can be used as a "one-stop shop" to study those functions within a single task as they develop and interact over the lifespan. We study cognitive development using our FORAGEKID-HF video game. Observers search through diverse moving real-world toys or simpler colored squares and circles. They are asked to collect targets from a memorized set as quickly as possible while not clicking on distractors. We have tested large samples of children, adolescents, and young adults (4-25 years old) running different versions of FORAGEKID. Foraging rate data can be used to assess the development of selective attention under different memory target-load conditions (here, 2 versus 7 targets). Cognitive flexibility and search strategies can be measured by analyzing switch costs when observers change from collecting one target type to collecting another. The organization of search can be studied by examining target-search paths using different measures; e.g., best-r, inter-target distances, etc. Finally, decision-making processes are illustrated by quitting rules; When do observers choose to move from one screen to a fresh screen? Changes in "travel-costs" (time to move from one screen to the next) impact quitting rules differentially across the lifespan. Here, we show data supporting FORAGEKID as a serious but enjoyable game that can effectively assess and potentially train a range of attentional and executive functions over the lifespan.
    • Acknowledgements: European Union’s Horizon 2020, Marie Sklodowska-Curie Action FORAGEKID 793268 & Ministerio de Ciencia e Innovación de España: PID2021-122621OB-I00.
    • "Research on re-search: Foraging in the same patch twice"

    • Injae Hong1 (ihong1@bwh.harvard.edu), Jeremy M. Wolfe1, 2 ; 1Brigham and Women's Hospital, 2Harvard Medical School
    • When humans forage for multiple targets in the succession of ‘patches,’ the optimal strategy is to leave the patch when the instantaneous rate of return falls below the average rate of return (Marginal Value Theory: Charnov, 1976). Human behavior has been shown to be, on average, near optimal in basic foraging tasks. Suppose, however, that foragers are allowed to return to previously foraged patches. What strategy would the foragers take when they revisit patches that they left previously, either compulsorily or voluntarily? Our computer-screen patches contained “ripe” and “unripe” berries, each defined by overlapping color distributions (d’ = 2.5). Observers attempted to collect ripe berries as fast as possible. One group of observers was forced to leave each patch after 10 seconds and then brought back to forage those patches for an additional 5 minutes. A second group foraged and moved to new patches when they wished to, before being brought back to pick the “leftovers” for 5 minutes. A control group foraged at will with no revisiting. The observers who were forced to leave the patches behaved like control observers, continuing where they left off when brought back to the patch and ending at about the same rate. Observers who had already voluntarily left a patch did continue to pick when brought back to the patch. However, the patches having been depleted, that picking was less productive. There appeared to be a small jump in the rate of foraging when these observers returned to their patches. It would be interesting to see if those observers would have bothered to pick on the second visit if they were not required to do so.
    • Acknowledgements: NSF 2146617
    • "How Blind is Inattentional Blindness in Mixed Hybrid search?"

    • Ava Mitra1 (amitra@bwh.harvard.edu), Jeremy M. Wolfe1, 2 ; 1Brigham and Women's Hospital, 2Harvard Medical School
    • In day-to-day visual search tasks, we may search for instances of multiple types of targets (e.g., searching for specific road signs while also scanning for pedestrians, animals, and traffic cones). In the lab, the “mixed hybrid search task” is a model system developed to study such tasks, where you are looking for general categories of items (e.g., things you don’t want to hit with your car) alongside specific items (e.g., the sign for your exit). Previous hybrid visual search studies have shown that observers are much more likely to miss more general “categorical” targets than specific targets, even though it is quite clear that categorical and specific items are equally likely to be attended in this paradigm. If an item is attended but missed, do observers have any access to the information that may have been accumulating about that target? Twelve participants searched arrays for two specific items (e.g., this shoe and this table) while also searching for unambiguous instances of two categorical target types (e.g., ANY animal and ANY car). In order to look for the existence of sub-threshold information about missed targets, we borrowed methods from the inattentional blindness literature. We asked two, 2AFC questions after every miss error and after 5% of target-absent trials. Question 1: Do you think you missed an item? Question 2: If you did miss something, which of these two items was it? On trials where participants asserted that they had NOT missed an item (“No” to question one), participants correctly selected the right item ~63% of the time against a 50% chance level (p<0.018). Interestingly, this ability to identify the missed target was only seen following missed categorical targets, not missed specific targets. Knowledge about the target’s identity can lurk behind after that target is missed.
    • Acknowledgements: NSF grant 2146617, NIH-NEI grant EY017001, NIH-NCI grant CA207490
    • "Image memorability modulates image recognition, but not image localization in space and time"

    • Nathan Trinkl1 (ntrinkl@bwh.harvard.edu), Jeremy M. Wolfe1, 2 ; 1Brigham and Women's Hospital, 2Harvard Medical School
    • We know that observers can typically discriminate old images from new ones with over 80% accuracy even if after seeing hundreds of objects for just 2-3 seconds each (“Massive Memory”). What do they know about WHERE and WHEN they saw each object? From previous work, we know that observers can remember the locations of 50-100 out of 300 items (Spatial Massive Memory – SMM). In a different study, observers could mark temporal locations within 10% of the actual time of the item's original appearance (Temporal Massive Memory - TMM). Are SMM and TMM related? In new experiments, 64 observers saw 50 items, each sequentially presented in random locations in a 7x7 grid. They subsequently saw 100 items (50 old). Four sets of instructions were used: (1) Mere Identity instruction asked 16 observers just to remember the items. (2) Spatial instruction asked 16 observers to also remember item locations. (3) Temporal instruction asked 14 observers to remember when items appeared. (4) Full instruction (13 observers) combined Spatial and Temporal instructions. At test, observers in all conditions were told to click on the original location of old items and to indicate when they saw it on a time bar. ~12% of observers appeared to guess on the spatial task and ~50%(!) guessed on the timing task. Interestingly, just 6% guessed on both, exactly as would be predicted if the choice to guess was independent for space and time. Overall, space and time scores were strongly correlated for Full Instructions (r-sq=.64, p=0.001), Temporal (r-sq=.31, p=0.04), and marginally correlated for Spatial (r-sq=.20, p=0.08). The Mere Identity correlation was insignificant (r-sq=.03, p=0.40). Effects of instruction on performance were generally insignificant. Observers can have quite good memory for when and where they saw an object. Those memories seem to be modestly correlated with each other.
    • Acknowledgements: National Science Foundation (NSF), Grant #1848783
    • "Modestly related memories for when and where an object was seen in a Massive Memory paradigm."

    • Jeremy Wolfe1, 2 (jwolfe@bwh.harvard.edu), Claire Wang3, Nathan Trinkl1, Wanyi Lyu4 ; 1Brigham and Womens Hospital, 2Harvard Medical School, 3Phillips Academy, Andover, MA, 4York U, Toronto
    • We know that observers can typically discriminate old images from new ones with over 80% accuracy even if after seeing hundreds of objects for just 2-3 seconds each (“Massive Memory”). What do they know about WHERE and WHEN they saw each object? From previous work, we know that observers can remember the locations of 50-100 out of 300 items (Spatial Massive Memory – SMM). In a different study, observers could mark temporal locations within 10% of the actual time of the item's original appearance (Temporal Massive Memory - TMM). Are SMM and TMM related? In new experiments, 64 observers saw 50 items, each sequentially presented in random locations in a 7x7 grid. They subsequently saw 100 items (50 old). Four sets of instructions were used: (1) Mere Identity instruction asked 16 observers just to remember the items. (2) Spatial instruction asked 16 observers to also remember item locations. (3) Temporal instruction asked 14 observers to remember when items appeared. (4) Full instruction (13 observers) combined Spatial and Temporal instructions. At test, observers in all conditions were told to click on the original location of old items and to indicate when they saw it on a time bar. ~12% of observers appeared to guess on the spatial task and ~50%(!) guessed on the timing task. Interestingly, just 6% guessed on both, exactly as would be predicted if the choice to guess was independent for space and time. Overall, space and time scores were strongly correlated for Full Instructions (r-sq=.64, p=0.001), Temporal (r-sq=.31, p=0.04), and marginally correlated for Spatial (r-sq=.20, p=0.08). The Mere Identity correlation was insignificant (r-sq=.03, p=0.40). Effects of instruction on performance were generally insignificant. Observers can have quite good memory for when and where they saw an object. Those memories seem to be modestly correlated with each other.
    • Acknowledgements: This work was supported by NSF grant 1848783 to JMW