I have been working with Symbolic Aggregate Representation, a symbolic representation of time-series with a MINDIST distance measure defined that lower bounds the Euclidean distance. The original code was implemented in MatLab (Jessica Lin, Li Wei) and I have been using a module called saxpy for Python. While saxpy is nice, it is also very slow when comes to mining large data set. Therefore I have re-implemented saxpy (I call it saxpyFast) to facilitate faster computation of SAX and MINDIST by using integer array as the internal symbolic representation instead of strings, as well as MatLab-like compact matrix operations in numpy. This speeds up the computation abour ~4 times. See my implementation and demo cases here, and a discussion with the original author of saxpy here.
Prof Hal Daume of UMD visits GUCL to give a talk at the CS department, and we had a good meeting with him talking about our works afterwards.
CS COLLOQUIUM: PROF. HAL DAUME III (UMD) FRIDAY, OCTOBER 14 AT 11:00AM TO 12:00PM STM, 326
Learning Language through Interaction
Machine learning-based natural language processing systems are amazingly effective, when plentiful labeled training data exists for the task/domain of interest. Unfortunately, for broad coverage (both in task and domain) language understanding, we're unlikely to ever have sufficient labeled data, and systems must find some other way to learn. I'll describe a novel algorithm for learning from interactions, and several problems of interest, most notably machine simultaneous interpretation (translation while someone is still speaking).
This is all joint work with some amazing (former) students He He, Alvin Grissom II, John Morgan, Mohit Iyyer, Sudha Rao and Leonardo Claudino, as well as colleagues Jordan Boyd-Graber, Kai-Wei Chang, John Langford, Akshay Krishnamurthy, Alekh Agarwal, Stéphane Ross, Alina Beygelzimer and Paul Mineiro.
Bio: Hal Daumé III is an associate professor in Computer Science at the University of Maryland, College Park. He holds joint appointments in UMIACS and Linguistics. He was previously an assistant professor in the School of Computing at the University of Utah. His primary research interest is in developing new learning algorithms for prototypical problems that arise in the context of language processing and artificial intelligence. This includes topics like structured prediction, domain adaptation and unsupervised learning; as well as multilingual modeling and affect analysis. He associates himself most with conferences like ACL, ICML, NIPS and EMNLP. He earned his PhD at the University of Southern California with a thesis on structured prediction for language (his advisor was Daniel Marcu). He spent the summer of 2003 working with Eric Brill in the machine learning and applied statistics group at Microsoft Research. Prior to that, he studied math (mostly logic) at Carnegie Mellon University. He still likes math and doesn't like to use C (instead he uses O'Caml or Haskell).
[Amir Zeldes] XRENNER (eXternally configurable REference and Non Named Entity Recognizer) is on PYPI now. This is a Coreference resolution tool described in the paper my mentor Amir Zeldes and I co-authored at NAACL16 CORBON workshop. You can now install it using pip install xrenner. LINK: https://pypi.python.org/pypi/xrenner/