I am a Ph.D. student in the Data Mining Group @ UIUC (2017-2022, expected), advised by Professor Jiawei Han. Before that, I was a member of the Knowledge Engineering Group (KEG) of Tsinghua University, supervised by Professor Jie Tang. I was a research intern in Cornell University in 2016, working with Professor Thorsten Joachims and his group.
I’m obsessed with exciting problems on knowledge mining from unstructured data with real applications in real scenarios (e.g., with limited resource and human annotation). I worked on the information extraction module of the Aminer system at Tsinghua University (ASONAM’16, SNAM’18). I spent three summers at Google, building practical tools and systems on relation extraction, news story headline generation (WWW’20), and accelerating large-scale language model pre-training (NAACL’21). Full paper list can be found at Google Scholar. Please feel free to drop me an email on any interesting topics!
Publications
- Phrase-aware Unsupervised Constituency Parsing (ACL’22)
- [code&data] Xiaotao Gu, Yikang Shen, Jiaming Shen, Jingbo Shang, and Jiawei Han
- Automated Taxonomy Discovery and Exploration (ICDM’21) tutorial
- [tutorial slides] Jiaming Shen, Xiaotao Gu, Yu Meng, Jiawei Han
- UCPhrase: Unsupervised Context-aware Quality Phrase Tagging (KDD’21)
- [code&data] Xiaotao Gu*, Zihan Wang*, Zhenyu Bi, Yu Meng, Liyuan Liu, Jiawei Han, Jingbo Shang
- On the Transformer Growth for Progressive BERT Training (NAACL’21) short paper
- [code&data] Xiaotao Gu, Liyuan Liu , Hongkun Yu, Jing Li, Chen Chen, and Jiawei Han
- Generating Representative Headlines for News Stories (WWW’20)
- [code&data] X. Gu, Y. Mao, J. Han, J. Liu, H. Yu, Y. Wu, C. Yu, D. Finnie, J. Zhai, N. Zukoski
- Learning Collaborative Agents with Rule Guidance for Knowledge Graph Reasoning (EMNLP’20)
- [code&data] Deren Lei, Gangrong Jiang, Xiaotao Gu, Kexuan Sun, Yuning Mao, Xiang Ren
- Alleviate Dataset Shift Problem in Fine-grained Entity Typing with Virtual Adversarial Training (IJCAI’20)
- [code&data] Haochen Shi, Siliang Tang, Xiaotao Gu, Bo Chen, Xiang Ren
- Learning Dynamic Context Augmentation for Global Entity Linking (EMNLP’19)
- [code&data] X. Yang, X. Gu, S. Lin, S. Tang, Y. Zhuang, F. Wu, Z. Chen, G. Hu, X. Ren.
- OAG: Toward Linking Large-scale Heterogeneous Entity Graphs (KDD’19)
- [code&data] F. Zhang, X. Liu, J. Tang, Y. Dong, P. Yao, J. Zhang, X. Gu, Y. Wang, B. Shao, R. Li
and K. Wang.
- [code&data] F. Zhang, X. Liu, J. Tang, Y. Dong, P. Yao, J. Zhang, X. Gu, Y. Wang, B. Shao, R. Li
- Improving Distantly-supervised Entity Typing with Compact Latent Space Clustering (NAACL’19)
- [code&data] Bo Chen, Xiaotao Gu, Yufeng Hu,
Siliang Tang, Yueting Zhuang, Xiang Ren.
- [code&data] Bo Chen, Xiaotao Gu, Yufeng Hu,
- Efficient Contextualized Representation: Language Model Pruning for Sequence Labeling (EMNLP’18 )
- [code&data] Liyuan Liu, Xiang Ren, Jingbo Shang, Xiaotao Gu, Jian Peng
and Jiawei Han.
- [code&data] Liyuan Liu, Xiang Ren, Jingbo Shang, Xiaotao Gu, Jian Peng
- Learning Named Entity Tagger using Domain-Specific Dictionary (EMNLP’18)
- [code&data] Jingbo Shang, Liyuan Liu, Xiaotao Gu, Xiang Ren, Teng Ren
and Jiawei Han.
- [code&data] Jingbo Shang, Liyuan Liu, Xiaotao Gu, Xiang Ren, Teng Ren
- End-to-End Reinforcement Learning for Automatic Taxonomy Induction (ACL’18)
- [code&data] Yuning Mao, Xiang Ren, Jiaming Shen, Xiaotao Gu, Jiawei Han.
- Large-scale Validation of Counterfactual Learning Methods: A Test-Bed (NIPS’16 Workshop)
- [code&data] D. Lefortier and A. Swaminathan and X. Gu and T. Joachims and M. de Rijke.
- Web User Profiling using Data Redundancy(ASONAM’16)
- [code&data] [Extended Journal] Xiaotao Gu, Hong Yang, Jie Tang*, Jing Zhang.