Professionally, I work on the agency data analytics team within NASA’s Office of the Chief Information Officer’s Technology and Innovation division. I’m also the technical program manager for a portion of NASA’s open-data activities. More information on the resume page.
A few personal opinions that define how I approach data science include:
- I have a soft spot for things at the intersection of metadata, natural language processing, semantic tooling, and user interfaces as technology improvements have opened up a lot of possibilities there that have yet to be built.
- I love finding ways to use data gathered for one purpose for a completely new purpose. It’s like finding free money.
- I think data visualization is vastly undervalued due to so much of the understanding we get from it happening faster than we can cognitively grasp it.
- I think we focus too much on applying machine-learning to things humans already do fairly often and not enough thinking about tasks machine-learning would be great at that we never do, because they would be too boring or time consuming.
- Understanding how people interact with the analysis, tools, and products we create and how things get built or not built within the larger organizational context is more often the controlling variable on a successful product than the technical details or prediction accuracy.
In my personal time, I co-organize the Houston Data Visualization Meetup . Until recently, I also managed the social media of the Gulf Coast Section of SEPM (sedimentology geology society). I attend hackathons, like the Houston NASA Space-Apps hackathon and geoscience hackathons run by AgileScientific.
Side projects are a major way I add to my skills, so I always have several in development or on the to-do list.
- Assembled a Raspberry Shake, a personal seismometer.
- Built an internet connected pumpkin for Halloween that talked to small children.
- Presented a talk on the changing data visualization landscape in large organizations (1974-2016).
- Competed in a machine-learning contest to predict well log facies put on by a geophysics journal.
- Created an augmented reality webpage / business card using AR.js, which leverages three.js, aframe.js, and ARtoolkit.
- Used SVM machine-learning approach to identify direct returns, reflections, multiples, and coherent noise in seismic gathers as part of a Geoscience Hackathon organized by Agile Scientific and Total. I participated virtually and the rest of my team was physically present in Paris.
- Helped build map applications to assist in the spreading and collection of accurate information about shelters and flooding post-Hurricane Harvey in collaboration with a large number of other volunteers via SketchCity, a civic tech organization.
- Built version zero of an application that can take in any google forms results csv files, pick the right charts, and create a data visualization such that clicking on an answer to a question filters the answers to every other question.
- Participating in a geoscience hackathon run by AgileScientific before the annual Society of Exploration Geophysicists Conference in Houston. Built a python-based machine-learning model to mimic geologists’ stratigraphic picks of the top of the McMurry Formation in Canada. Still working on this project as I get spare cycles.
- Building a “where science happens in Houston” map that leverages web-scraping and machine-learning.
- Played around with using three.js to make three-dimensional data visualizations from car-based lidar data.
- Did a quick bit of machine-learning in a jupyter notebook to answer the question, “software engineer or data scientist”?
- Presented work on using machine learning to predict well log tops at the annual AAPG conference in San Antonio in May of 2019. You can find the Predictatops package on github.
- Participated in the 2 day Glasstire Datahack. Glasstire is an art website with 18 years of art event data in the city of Houston. I combined the art dataset with an older half-cleaned dataset of locations for companies advertising science jobs in Houston to create a visualization of the distribution of Art and Science in the city of Houston. I leveraged a random forest model to clean the science data. Datahack code repository. Old science city repository. Hackathon product.
- Playing around in ObservableHQ.com with the notion of an explorable explainable for basic stratigraphy concepts related to sea-level and shoreface depositional environments.
- Created repository “geoVec-playground” that takes a glove model trained on thousands of geoscience papers (with focus on soils) and represents the words in a 3D embedding space using google’s stand alone embedding projector as a way to explore the model and gain a better understanding of its performance. Also, just interested in how to convert pre-trained glove models to tensorflow word2vec style format, so I can reuse some tools across different word embeddings.
- Wrote medium blog post “Alternatives to Iris: Finding Drop-in Replacements for Overused Example Datasets” as part of a thought experiment part of the way towards the goal I really want to get to, which is programmatically finding datasets that could be used for specific workflows / end projects based on characteristics of the known datasets used in those workflows.
- Finally got around to try some generative art code via some quick notebooks on Observable here and here. Pleasantly surprised how easy it was after the first little learning hump, so will try more of this in future.
Side-Projects over the last couple months
- Was recognized as a “Featured Creator” on Observablehq.com, a website / tool for data visualization.
- Wrote this Medium post about an in progress side project to create an easily forkable repository template that visualizes the de-facto community of related code & developers that is sometimes described by Awesome lists.
- Finally attempted to organize my side projects a bit with a README that splits out what is active, stalled, or just in idea phase.
- Whenever I try to get on the water be it surfing, paddle boarding, fishing, or sailing in the ocean, bay, lake, or river I check a variety of different sources of information. There’s a folder on my phone with 10 different pages. Brainstorming what I could do to combine some of those into one page via iframes or data visualizations. The first draft is a single GitHubPages page in a repo called water-check-houston.
- Doing a bit of research into application of deep learning and image recognition to generating data products that describe wave properties and surfers.