Bot Writer/Scraper

Job description

 

Wanted: great (white hat) scraper/bot writers, to help save the world


OpenCorporates is growing, and looking for more great bot and scraper coders – to help fulfill its mission to open up the world's official public information on companies. This is of vital importance today – giving visibility to hundreds of thousands of users around the world; tomorrow, with an explosion in the number, speed and complexity of companies, it will be essential for fair and free societies.


We write, run and maintain hundreds of scrapers and bots – bots that integrate with APIs, that download open data dumps. Bots that make sense of messy data and put it into our standardised schema, working with our expert Data Analysts.


We're particularly looking for highly talented bot writers who both understand how to extract data from legacy, messy or plain broken public websites, AND who want to work to help achieve our critical public-benefit mission.

What you'll be doing

  • Support & expand our data pipeline. You'll write bots to source publicly available data (scraping websites, consuming data published via APIs or CSV, or extracting data from PDFs) in order to create new data feeds, and also help solve problems with our existing feeds

  • Maintain high data quality. You'll compare datasets to their source to verify that the information is complete and error-free. You'll also suggest ways to make our processes more efficient.

Above all we are looking for smart people who we think will fit in well. During the interview process you should be able to demonstrate that:


  • You're good at writing bots. Extracting information buried deep within government websites can often be challenging, but you know the right techniques to obtain the data without hacking or bombarding our sources with requests.

  • You know data. Our product is data – arranging it, linking it, making it accessible – so you should enjoy dealing with it, including handling complex concepts, managing big datasets and knowing how to keep it high quality.

  • You understand process. The OpenCorporates data pipeline has a variety of data flows and processes which need to work together seamlessly. You must understand and be able to deal with the challenges this raises.

  • You have a keen eye for detail. Accuracy, attention to detail and ability to spot trends is key to keeping data quality levels high.

  • You know how to work in a team. The problems we deal with require a lot of collaboration and communication.

 

Requirements

Required skills

  • At least 2 years’ writing bots, scrapers or similar,.

  • You'll be able to query & maintain structured data: SQL (SQLite/MySQL), JSON, XML

  • Regular expressions

  • Git or other version control software

  • Linux environment

  • Proficiency in Ruby

Desirable skills / experience

  • ETL processes and data pipelines

  • Root cause analysis & data remediation experience

  • Excellent verbal and written communication skills

This is a full-time position, either in Shoreditch, London, UK, or remote, although we would consider part-time positions for the right applicant. Unfortunately we are unable to offer visa/relocation help for now. Strictly no recruitment agencies.



Salary range:  £38k-£55k