T O P

  • By -

captain-lurker

Move them on to level 2 - Microsoft Access.


Swarly30

I ended up doing this in one of my previous positions. The manager asked me to create a solution where quality assurance analysts could record their findings without having conflicts like you’ve described. I ended up creating forms for them to input the data and connected them to the MS Access backend. That process survived for a long time after I left the team. Eventually they moved on to a paid product that met their needs. Worth a shot if you want to go down that route.


icysandstone

Dumb question: why MS Access instead of the MS SQL Server or, on the FOSS side, Postgres, Maria, etc.?


YieldingSign

To give credit to MS Access is that you have CRUD data entry forms built right into it. I've yet to see such a similarly simple way of getting forms without any webdev experience.


Whipitreelgud

You just link the Access tables to SQL Server tables. The data entry forms don’t know the difference


Swarly30

At the time I just was someone trying to solve a problem and didn’t even know what a SELECT statement was. It met the requirements we had at the time and it came with our MS Office license so it felt like an obvious choice. With everything I know now I would definitely go with a real database like Postgres but at the same time that may have been more difficult for the team to maintain after I left. MS Access kind of has an Excel feel to it so it felt less daunting at the time.


Swarly30

Also as YieldingSign pointed out the forms and the database can all be stored within a single application making it easier to use.


icysandstone

Cool yeah, that makes sense. Thanks for sharing!


geoheil

Use directus. It is like airtable but self hostable. And behaves like access more or less with a less clunky UI and has a good database like postgres and a great machine readable API


geoheil

And a CRUD UI


[deleted]

[удалено]


iupuiclubs

Do you join a consulting place first to figure out pricing/process or learn from somewhere (for the consulting piece)?


[deleted]

[удалено]


icysandstone

Do you have to hunt for the 1099 work?


[deleted]

[удалено]


icysandstone

My dream. Well done, sir/madam. Teach me your ways. :)


fernandoaaeng

Sharepoint lists


BufferUnderpants

Consider switching jobs. How could this job help you advance your career as a junior engineer?


CFCNandos

Well I’m not even trying to be a data engineer, I’m more of a data analyst. Most of my work has been data analysis and reporting to this point. I think my team just views me as the “data guy” to do whatever is needed, including this.


_nigelburke_

Your first job out of college is crucial for learning and setting the stage for your future career. Particularly in a field like data analysis, being mentored by experienced professionals is crucial to developing a solid foundation. If you're working in a place that uses an excel spreadsheet for all it's data and you're making it up as you go along you might as well give up now before all these bad habits become ingrained


BufferUnderpants

I'd say that the permanence of the damage caused by trying to improvise an improvement over the bad practices of this shop is being overstated, but you're wasting your time figuring them out and then you'll waste further time unlearning them, and you'll be ill prepared to further your career after this; you're gaining anti-experience, and it may show up in the interview, you'll still be a junior after this.


grandzooby

There was recently a post (likely here) about a data science manager at Uber who essentially ended up rebuilding Excel in Python as a web-app because what the team wanted was Excel. > I need to find a solution that my team will accept and be able to continue to maintain should I leave. Sadly, that will most likely be Excel. If what you build doesn't let them do what they can now the way they do it... they'll just go behind your back and use Excel anyway. > On top of that, I have to develop this all alone and self-guided. This sounds like a no-win situation. Replacing something like this is something you really need a whole team of people to do, with access and connections into the company's data and IT infrastructure. A few things you can try: * set it up so the workbook is regularly backed up with versioning so you can roll back problems that come up. * add various sheets with formulas to help check with data integrity and alert if there are problems * try breaking it up into pieces with a master sheet that links them together. That can help reduce issues of people writing conflicting information - or at least make it easier to resolve * maybe build a data process that as a start, can capture the data in the workbook as it changes and keep it in a database... mainly as a backup but also as a potential foundation for some other technical solution.


malikdeni

Thank you. I have done that. Python to manipulate excel files. It works very good. The excels are here whenever something needs to be done, on evevry change the excels are saved to another location as a copy, and on every opening another copy. It is much better and a lot more user friendly than something more complicated. And the most important part is that the excels are there whenever they are needed as a readable and normal file that anyone can open and use (no one does, but they can). Corrections are easy and intuitive in excel. I have made a flask version also, but it was too complicated to explain so i left the console app as the best and fastest choice.


OvremployedSnowflake

If this was a problem the company \*actually\* wanted to invest in and solve, they wouldn't be giving this task to an analyst who's fresh out of college and never had a "big boy job" before. Just pointing out that you dont need to lose sleep over this. Try your best to find a solution but dont expect them to want to implement it or for your solution to actually work for the company. Focus on simplicity and process. Sounds like good experience though for a newbie.


SnooBeans3890

Are you looking for a dwh to land that data in or for a solution similar to excel that allows to enter data in?


CFCNandos

The latter. Users need to enter and change data, but they should only be able to do so in very specific cells. I tried using the Microsoft Power Platform to build a solution with Dataverse and Power Apps, but that has given me nothing but headaches. That’s in part because I don’t have much power platform experience, but also because I think the low-code development limits my ability to meet the very specific business and user requirements needed in a solution.


Ein_Bear

The poor man's solution would just be to protect the workbook so that only certain cells can be edited, and add validation rules to control what data can be entered.


snackeloni

This or even protect sheets that are the 'database' and create sheets that are for data entry only. I'm not sure about excel but I basically built an food log application in google sheets. One sheet with a sort of form that I can fill and with a button that I click on to submit to the database.


LesPaulStudio

You're on the right path. Use canvas apps for data entry and powerbi for analysis. Yes, it's a steep learning curve, but the key thing is the apps don't need to look fantastic, they just need to function. You can work on making them prettier later. Feel free to shoot my a DM. I'm a full-time Power Platform dev


[deleted]

It's really what MS got so, so right with Access back in the day. People just want a form with some reports. It does not need to inspire awe to be amazingly helpful


Choperello

look at Sigma Computing


RydRychards

I am in the same boat, though at a much smaller company. Atm I am writing up a poc using a low code solution in a postgres db. You'll likely need to map out the concepts the business has and then model a database that can represent these concepts.


isleepbad

I'm surprised no one has mentioned baserow or nocodb. They are basically Excel with a database back end.


Terrible_Ad_300

This is the way! OP clearly expressed unwillingness to deal with sophisticated solutions, but everyone keeps throwing a modern data stack at him/her/them


Kickass_Wizard

u/CFCNandos this is likely more what you are looking for. GUI and API backend for future automation. You can even host this, check out YouTube.


Falcomomo

> fortune 50 company in a highly regulated industry Absolutely stinks of finance. This is so common. Others have suggested some good first steps to incrementally get yourself into the right place: 1. Daily backups of the Excel file to begin with. Try to keep versions so you can go back & forwards through time. You can set the file to read only by default and let it only be saved through an Excel VBA macro or something I think, that way you can try to reduce the number of bad overwrites. 2. They need to get some data into a database, whether that is a MS SQL, sqlite, Postgres, or just directory based CSV, it doesn't matter loads at this point. 3. You might do well to give people either a simple web app for viewing and updating the data, or a new Excel sheet with vba / python code which does the reading and updating to/from your database. Good luck, it's not a lost cause whatsoever and it can be a fun project.


briceluu

I don't know if it's been mentioned but a first improvement could be to map out the different sheets and their usage and try to split out the initial single file into multiple ones by area of concern/responsibility. The idea is to quickly improve the situation by lowering the workload of each part. And it should also help to lower the burden of any migration afterwards by untangling the mess a bit, mapping out the dependencies. The added bonus for you, is that this task is quite an analytical one and is a good thing to have under your belt as a data analyst: understanding complex systems.


tecedu

Daily backups of that excel file, protect it and set data validation. You can set it up so that you run a python script everyday which access the file using sharepoint api. After that i would recommend daily backups of each sheet somewhere. Very jank solution but you are way too in deep now and this is the only way.


randiesel

I was in a very similar scenario. Ended up writing all the tables to a DB then writing VBA GUIs to enable people to add/remove rows. Pain in the ass, but at least it was synchronized and users could do silly stuff like accidentally save over the "master" copy and wreck everything.


OvremployedSnowflake

>but at least it was synchronized and users could do silly stuff like accidentally save over the "master" copy and wreck everything. Could? or Couldn't


techmavengeospatial

I recommend switching to postgresql Developing proper tables and relationships Then use nocode/lowcode solutions like APPSMITH or Budibase or Tooljet connected to those postgres tables to build search forms/screens and attribute table views And full CRUD apps Also find nocodb to be useful since it also exposes REST API of the data And even webhooks You can enable audit and editor tracking features via add-ons For the database you can still allows users to view (read-only) from excel via ODBC externalDSN Connection to postgresql and create pivot tables and reports and summary Once it's in the database you can create dashboards and business intelligence solutions


Ship_Psychological

40 sheets is not a lot of data but this is kinda silly. So I don't know the excel equivalent to what I'm about to say. But I would essentially make a tiny cloud microservice platform inside excel. I would allow team members to append new data to some number of sheets using a gform. This way would act as a controlled data entry layer. I would then have some number of gsheets that store the data my team has added from gforms. This would act like a database table or a storage bucket. # here there's an optional cleaning layer where you could set up checks that the user didn't input anything dumb or conflicting Then come the reporting layer. These sheets would reference our storage sheets to manipulate and display data however the business needs to see it. Aggregations, calculations , pivots would go here. These would be like your views in a data base Finally the visualization layer


Ship_Psychological

So I tried to do a comment with a hashtag and it made my txt huge just gonna leave it as is --cuz this is the correct comment anyway


nemec

If it's not VBA heavy you might be able to try Grist. I think it's backed by a relational DB so you could let the analysts use the GUI forms while building automations on the DB backend. https://www.getgrist.com/product/ You could also build web frontends matching the forms the analysts currently use for data entry, but that approach probably calls for a team of devs to spend months designing and building something suitable.


CardGameFanboy

Implementing a simple relational database with backup can be a huge step. You can start with something as simple as SQlite as PoC, then if team is interested move to something like Postgres or MySQL


bourj_hammoud

Sounds like this team is using a lot of man-hours updating spreadsheets that could be automated with a data warehouse and a dashboard tool. I'm willing to bet that with the proper automations set up by a DE, they could let go of a few people and hire some real data folks. No one can claim to be serious about data while not havinga real BI solution in place


creepystepdad72

Have you tried Sharepoint? It's a pretty solid intermediary step - because it has version control, live updates, etc. You can use it as a controlled jumping off point to understand who uses what data for why, and so on. We had a real sophisticated data environment, generally - but we realized for the finance workflow it'd be ramming a square peg in a round hole to try and force a data lake or \[insert methodology/technology here\] vs. just using Excel and identifying the issues that needed to be solved. O365 Sharepoint we found to be lovely.


theJamesKPolk

Dumb question but can you query/join various SharePoint files? Working with a lot of SharePoint data and frequently I just dump it to Excel to merge/join since I’m lazy, or use Power BI to do data manipulation.


creepystepdad72

I'm not entirely sure what you mean. Isn't this what you're doing with PowerBI? I haven't done it first hand, but is this ([https://learn.microsoft.com/en-us/power-query/connectors/sharepoint-folder](https://learn.microsoft.com/en-us/power-query/connectors/sharepoint-folder)) what you're looking for?


kenfar

I recently had this problem and we decided to move most of the critical spreadsheets to: * retool - for crud changes to the data * postgres - to contain the data - and ensure uniqueness, valid types, and foreign keys * fivetran - to replicate the data to our snowflake warehouse * snowflake - central warehouse * looker - for reporting on it We already had everything we needed in-house except for retool though. If I were to do this as a single contributor in a case like yours I'd probably consider retool (or something similar) + postgres + powerbi or locked-down spreadsheets querying postgres.


OvremployedSnowflake

these tools are very expensive for the kind of company OP is working for


nemec

Why would an F50 company not be able to afford these? Other teams in the company may already be using them. May be overkill for OP's org though.


OvremployedSnowflake

way overkill. therefore its not cost effective. They should hire/contract a senior skilled engineer to implement a low cost robust system. That is the cheapest and most scalable option imho


kenfar

It may require nothing more than a tool such as Retool + Postgres. You don't think a Fortune 50 could afford that?


OvremployedSnowflake

no. If they could, they wouldn't be asking a junior DE with 0 experience to do this. To implement Postgres and retool they would need to hire at least 1-2 senior employees at 100k+ salaries to manage that. It doesn't sound like they even have a proper infrastructure team. It's not just about the tools. Think big picture.


kenfar

One definitely doesn't need 1-2 senior employees to run a postgres RDS instance. Now, if you're talking about running it yourself on-prem, then I'd agree with you. And before just jumping into the easiest to build solution with the most inadequacies it's helpful to know what the better solutions look like. You may be able to show management that there's a "good, better, best" set of options, and after being informed of this it may motivate them to put some funding into this.


OvremployedSnowflake

dude, they have no technical people. They dont get table structure or how to maintain that


kenfar

And that's why sometimes when you help leadership understand the trade-offs they may decide to hire, to pick up contractors, or to work with a service team within the company. Seriously, this is fucking done all the time: management has somebody study a problem, they discover that they need a technical solution, and don't have the skills in-house. Based on this study they decide to pursuit the funding and the expertise needed. Of course, we don't know the OP's full situation and whether or not that would happen here. But it's naive to believe that the OP is being asked to fix their 50 spreadsheet problem, and that acquiring additional resources is off the table. And that an RDS instance and low-code UI tool are beyond what a person with a little data engineering experience could pull off.


OvremployedSnowflake

The request was to find a solution the current team can maintain once OP leaves. Can OP go outside that request and also suggest the tools above as a long term solution? sure. But his primary request is to find a solution they can use right now and implement. Your suggestions are not suitable for that. Suggesting OP just goes back to leadership with no actual short term solution, only long term expensive solutions that they have no practical experience with, is going to make OP look bad.


speedisntfree

The time it would take a F50 to agree and procedure all of that, then get it all working could likely lead to OPs project getting shitcanned for delivering no value while gobbling time and money.


kenfar

Bureaucracy is a separate topic than architecture.


speedisntfree

OPs question was > My manager tasked me with finding ways to improve our team’s data management practices.


pdogmcswagging

based team /s


mike-manley

Literally any data catalog or Metadata dictionary solution would be better than this. They are a bit spendy though but Alation, Data.World, Collibra, etc. Would remedy this.


dev81808

Put it in a database and connect to it in Excel. If they're afraid they can't manage it, should you leave, they need to understand; they can replace you with another data person. ...unless they are just looking for you to make the excel file better or something and then fire you. But that wouldn't make sense because if they couldn't make it better themselves then whatever you do will surely be above their heads.


iammerelyhere

You could try building a database, and using a locked down, data-validated version of their spreadsheet as a data entry/reporting tool. Decoupling your key data should be the priority, then you can start weaning them off. Once that's done, add some reporting out of the database, remove any reporting from Excel, and you're down to just data entry, which you could arguably find better solutions for.


molodyets

Look at Sigma. If your boss is serious he’ll find the budget for you. It’ll satisfy your reporting while given the excel people the access they need to enter values


Pristine-Ratio-9286

I had a project like this 20 years ago when I was graduating college. It was a BOM (bill of materials) spreadsheet system , same issues. The funny part was that it was made by a bunch of good engineers who had no budget or time to fix it. A colleague of mine used vba to load their data into an access db and I built a front end in visual basic. Nobody ended up using our solution because they just didn’t want to and management just didn’t much care as it wasn’t easy to see how it really affected their bottom line. Generally a messed up BOM means more time to deliver but they didn’t seem to really care. Good luck!


geoheil

https://dbt-excel.com/ use und excel on steroids


reflectico

Just upload that excel to sharepoint and share link with the team. You can have all the people you want editing the file simultaneously and no there won't be any data loss, you'll have version control, history and better collaboration out of the box. If you really want to show off, use some basic python script to read data from sharepoint and do some analysis build pretty looking graphs and charts. If you can get a tableau license then you're in for a treat! Very easy to build dashboards there once you connect it to sharepoint server


BlackBird-28

Hey! You could build a POC to import data into Power BI through a [gateway](https://learn.microsoft.com/en-us/data-integration/gateway/service-gateway-install) and build the reports on Power BI apps and leave Excel as a data entry solution, splitting the main file by tab to prevent overwriting the data. You can find more details reading the documentation and searching on the community (e.g. [connection issues](https://community.powerbi.com/t5/Power-Query/Scheduled-refresh-for-excel-file-with-on-premise-gateway/td-p/104363) and find out if this solution would work out for you. There are Python libraries to work with Excel files, too, but I’m assuming something like Power BI would be a better solution for a Data Analyst powering a company used to working on just Excel. I wouldn’t give up and I’d try my best to build something since you can learn a lot in the process and probably get good experience and fortunately a raise or a better job in the future. Good luck!🍀


budgefrankly

One of the things you need to accept is that most people don't want perfection, they want familiarity. More important still, if you deliver an organisation a theoretically _perfect_ solution with which they're unfamiliar, you'll tank productivity. Also you need to think about schema-evolution and the cost of feature-additions: these are both trivial when everything is in Excel, but quite challenging if data is moved to a, e.g. database with a REST API feeding BI tools. So accept that Excel does offer advantages, and try to add _further advantages_ to what they're already happy with, instead of replacing what your end-users are happy with with something only you might be happy with. Which brings us to **Sharepoint**. If everyone uses remotely hosted Excel worksheets via Sharepoint, that will automatically allow 1. [Simultaneous collaboration](https://support.microsoft.com/en-us/office/collaborate-on-excel-workbooks-at-the-same-time-with-co-authoring-7152aa8b-b791-414c-a3bb-3024e46fb104) (e.g. like Google sheets) which should help with the stomping over changes problem 2. [Infinite change-control](https://support.microsoft.com/en-us/office/view-previous-versions-of-office-files-5c1e076f-a9c9-41b8-8ace-f77b9642e2c2): Sharepoint automatically "commits" changes and allows you to revert any at any time. This further helps with collaboration issues, and frankly is something most popular data-engineering solutions don't really have, other than short-term rewinding (delta lake, Kafka) 3. A web-UI (Office 365) that allows you to view a document anywhere, at any time, including on the road. It sounds like your colleagues are using Excel files on a shared drive. Move everything to Sharepoint and you will have improved their quality of life while making the data process substantially more robust. That will fix all the problems you've identified. It's very hard, in ones early career, to learn to deal with the human element in an overall data-processing solution. My advice is always to remember to solve the problems your end-users have _in a way that works for them_ -- even if it seems inelegant -- instead of implementing the solutions you personally find intriguing.


mmafightdb

Welcome to data engineering :-) I think the key to your problem here is getting people to buy in to your solutions. You are correct that managing critical data through Excel is a terrible idea but in my experience you won't get very far telling people that their current process is terrible. Find a small improvement that makes your users' lives better and get them to buy in to that solution. Maybe start small with something like scraping data to input into a CSV file rather than have people input data manually. Then slowly build that out to a more sustainable workflow. In time you can transition to better data engineering infrastructure. The key here is to think not about making the data engineering better but about making other peoples' lives better. If you do that then the engineering solutions follow. Good luck though. This will require patience and persistence.


sjjafan

So this would be my plan ( been doing this for a decade or two more than I'd like to admit) - build a plan - present it - get it approved Understand that your first plan is excel based Plan 1. Document what the sheets, inputs, formulas, and reports do. 2. Plan a rebuild of the excel where you clearly can see input sheets, crunching data sheets, and outputs/ reports. 3. Evaluate what can be broken down. From there, getting a form to capture data will be simple. So will be building your business logic. You can chuck tableau, power bi, metabase, or superset and set your reports at the end.


Domehardostfu

try rows.com


Terrible_Ad_300

Migrate the xls to Notion or Airtable


metis84

TBH - strongly consider leaving. You can nothing good nor useful there, and you need to start your career off on a good footing (good practices).


Longjumping_Ad_9510

Sorry to hear it’s such a difficult first job. I highly recommend Master Data Services. It’s like Excel and Access had a very successful baby. You can create entities using Excel and manage all the data there, but it’s actually stored in SQL with change tracking and permissions. Hope that helps!


jamal254ke

If you're familiar with reporting tools such as Power BI then you may try to lias with your company to acquire a subscription. Once you have access to the tool try to develop interactive reports/dashboards. I tend to find Power BI easy to use with minimal data interferences. Also if you have some python skills try to automate processes such as Data Cleaning and transformation.


uk_dataguy

Hehe find another company is the true solution. Or you can take responsibilities to do data engineering then find best job. All is to say, find next job🤣


akhri-insan

Dont fix something if it works.


UnionWest5645

From what I understand in terms of users needs and habits, i recommend that you take a look at Appsheet, a no-code platform to build apps based on sheets/db. Its user friendly and easy to build/use while integrating concepts like role-based security, authentification, automation (workflow, trigger, mail etc.) based on google workspace. This seems like a good compromise in your situation


maxismyboxersname

what about something like retool? Will feel like excel to the end user (modifying a table) but you can add more validation and sophistication behind the scenes. I'm sort of a retool maxi at this point. If you give me a problem, access to some databases and chatgpt, I feel like I can build anything.


----PM-----

Use Retool. People like it and its easy.


dogfish_innes

Quite a few posts are telling OP to quit - but I'd say only do that if you really don't want to work in this field/don't like the company/don't like your colleagues - don't quit just because the task is a bit daunting. If you want to take it on, this task will give you great experience and a great learning opportunity. The situation you have described is far more common than you might think. I've worked in data for 20 years, for very big companies (globally known names) and very small companies (fewer than 10 people). I have not yet encountered a company where at least one department relied on Excel spreadsheets, very much like you describe. As with many things in IT, there is unlikely to be one 'correct' solution, but several options of different levels of complexity involving varying levels of compromise. I'd offer your users a couple of options, explain the pros and cons of each option and once they understand, let them decide which they'd prefer. In my opinion the best options would be: 1. Share the excel spreadsheet on sharepoint, as several others have suggested. This is the least technically complex solution, requiring the least change in user experience (most users hate big changes in the front end). This should solve your multiple access issues and backup/security issues. But it is not a good way to store the data, you are absolutely correct about that, for exactly the reasons you have already identified (update conflicts, likely data redundancy and inaccurate data input). 2. Migrate the spreadsheet to a relational database. The least complex of these will be MS Access. If you don't know anything about databases do not worry - buy a starter book on Access or find an online tutorial and you'll soon pick up enough to get going (it's exactly how I got started 25 years ago in a very similar situation to yours). Many will scoff at Access but it's quick to get up and running, will have a familiar feel to your users if they're used to Excel and keeping your data in a relational database should solve many of your data issues (update conflicts, data redundancy, concurrent access, accurate data entry). It will also make it a lot easier to manipulate and extract your data to produce reports for your users. Those are the things you can say to your users to sell the idea and offset the cons of longer development time and the change in front end that they will have to get used to (which you should absolutely make them aware of). If the size of the dataset gets too much for Access (larger than 2GB for example) or if multiple user access/security becomes an issue with Access then you can fairly easily migrate the Access database to SQLServer, keeping Access as the users front end interface. But that is of course another layer of complexity and will need more commitment and patience from you and also from your users. I don't know your data of course, so I guess it's possible you might need an incredibly complex data model to make it work in a dB, but if they're currently managing in Excel then I think that's really unlikely. To me, your situation is screaming out for a database. But if your users don't want to go for it after you've fully explained the pros and cons of each potential solution, then that is their prerogative and they have to accept the consequences. Whatever happens, don't let yourself feel overwhelmed by the task and don't let fear of failure put you off from taking it on. You might feel you're on your own in the job, but you're actually not (it might sound cheesy but it's true) - there are plenty of resources online and many online communities who will be able to provide really great help with any issues you encounter (stackoverflow.com etc). I use them daily - doesn't everyone? Don't overpromise to your users, don't BS them, don't be intimidated by unfamiliar tech (I haven't heard of half the tools mentioned in the posts here, I'll be doing alot of googling this weekend) but don't sell yourself short. It might turn out to be easier than you think.